BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
[AUTHORS]
Suvodip Dey, Maunendra Sankar Desarkar
[ABSTRACT]
The standard language modeling (LM) loss by itself has been shown to be
inadequate for effective dialogue modeling. As a result, various training
approaches, such as auxiliary loss functions and leveraging human feedback, are
being adopted to enrich open-domain dialogue systems. One such auxiliary loss
function is Bag-of-Words (BoW) loss, defined as the cross-entropy loss for
predicting all the words/tokens of the next utterance. In this work, we propose
a novel auxiliary loss named Bag-of-Keywords (BoK) loss to capture the central
thought of the response through keyword prediction and leverage it to enhance
the generation of meaningful and interpretable responses in open-domain
dialogue systems. BoK loss upgrades the BoW loss by predicting only the
keywords or critical words/tokens of the next utterance, intending to estimate
the core idea rather than the entire response. We incorporate BoK loss in both
encoder-decoder (T5) and decoder-only (DialoGPT) architecture and train the
models to minimize the weighted sum of BoK and LM (BoK-LM) loss. We perform our
experiments on two popular open-domain dialogue datasets, DailyDialog and
Persona-Chat. We show that the inclusion of BoK loss improves the dialogue
generation of backbone models while also enabling post-hoc interpretability. We
also study the effectiveness of BoK-LM loss as a reference-free metric and
observe comparable performance to the state-of-the-art metrics on various
dialogue evaluation datasets.
[COMMENTS]
Accepted at SIGDIAL 2024
[LINK]
http://arxiv.org/abs/2501.10328v1
[DATE]
2025-01-18 01:57:49+08:00
[CATEGORIES]
cs.CL
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
[AUTHORS]
Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang
[ABSTRACT]
Spoken language understanding (SLU) is a structure prediction task in the
field of speech. Recently, many works on SLU that treat it as a
sequence-to-sequence task have achieved great success. However, This method is
not suitable for simultaneous speech recognition and understanding. In this
paper, we propose a joint speech recognition and structure learning framework
(JSRSL), an end-to-end SLU model based on span, which can accurately transcribe
speech and extract structured content simultaneously. We conduct experiments on
name entity recognition and intent classification using the Chinese dataset
AISHELL-NER and the English dataset SLURP. The results show that our proposed
method not only outperforms the traditional sequence-to-sequence method in both
transcription and extraction capabilities but also achieves state-of-the-art
performance on the two datasets.
[COMMENTS]
5 pages, 2 figures, accepted by ICASSP 2025
[LINK]
http://arxiv.org/abs/2501.07329v2
[DATE]
2025-01-18 01:53:27+08:00
[CATEGORIES]
cs.CL
Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models
[AUTHORS]
Pit Neitemeier, Björn Deiseroth, Constantin Eichenberg, Lukas Balles
[ABSTRACT]
Tokenization is a fundamental step in natural language processing, breaking
text into units that computational models can process. While learned subword
tokenizers have become the de-facto standard, they present challenges such as
large vocabularies, limited adaptability to new domains or languages, and
sensitivity to spelling errors and variations. To overcome these limitations,
we investigate a hierarchical architecture for autoregressive language
modelling that combines character-level and word-level processing. It employs a
lightweight character-level encoder to convert character sequences into word
embeddings, which are then processed by a word-level backbone model and decoded
back into characters via a compact character-level decoder. This method retains
the sequence compression benefits of word-level tokenization without relying on
a rigid, predefined vocabulary. We demonstrate, at scales up to 7 billion
parameters, that hierarchical transformers match the downstream task
performance of subword-tokenizer-based models while exhibiting significantly
greater robustness to input perturbations. Additionally, during continued
pretraining on an out-of-domain language, our model trains almost twice as
fast, achieves superior performance on the target language, and retains more of
its previously learned knowledge. Hierarchical transformers pave the way for
NLP systems that are more robust, flexible, and generalizable across languages
and domains.
[LINK]
http://arxiv.org/abs/2501.10322v1
[DATE]
2025-01-18 01:51:53+08:00
[CATEGORIES]
cs.CL
cs.LG
Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation
[AUTHORS]
Chris Samarinas, Alexander Krubner, Alireza Salemi, Youngwoo Kim, Hamed Zamani
[ABSTRACT]
This paper presents ICAT, an evaluation framework for measuring coverage of
diverse factual information in long-form text generation. ICAT breaks down a
long output text into a list of atomic claims and not only verifies each claim
through retrieval from a (reliable) knowledge source, but also computes the
alignment between the atomic factual claims and various aspects expected to be
presented in the output. We study three implementations of the ICAT framework,
each with a different assumption on the availability of aspects and alignment
method. By adopting data from the diversification task in the TREC Web Track
and the ClueWeb corpus, we evaluate the ICAT framework. We demonstrate strong
correlation with human judgments and provide comprehensive evaluation across
multiple state-of-the-art LLMs. Our framework further offers interpretable and
fine-grained analysis of diversity and coverage. Its modular design allows for
easy adaptation to different domains and datasets, making it a valuable tool
for evaluating the qualitative aspects of long-form responses produced by LLMs.
[LINK]
http://arxiv.org/abs/2501.03545v2
[DATE]
2025-01-18 01:47:24+08:00
[CATEGORIES]
cs.CL
Natural Language Processing of Privacy Policies: A Survey
[AUTHORS]
Andrick Adhikari, Sanchari Das, Rinku Dewri
[ABSTRACT]
Natural Language Processing (NLP) is an essential subset of artificial
intelligence. It has become effective in several domains, such as healthcare,
finance, and media, to identify perceptions, opinions, and misuse, among
others. Privacy is no exception, and initiatives have been taken to address the
challenges of usable privacy notifications to users with the help of NLP. To
this aid, we conduct a literature review by analyzing 109 papers at the
intersection of NLP and privacy policies. First, we provide a brief
introduction to privacy policies and discuss various facets of associated
problems, which necessitate the application of NLP to elevate the current state
of privacy notices and disclosures to users. Subsequently, we a) provide an
overview of the implementation and effectiveness of NLP approaches for better
privacy policy communication; b) identify the methodologies that can be further
enhanced to provide robust privacy policies; and c) identify the gaps in the
current state-of-the-art research. Our systematic analysis reveals that several
research papers focus on annotating and classifying privacy texts for analysis
but need to adequately dwell on other aspects of NLP applications, such as
summarization. More specifically, ample research opportunities exist in this
domain, covering aspects such as corpus generation, summarization vectors,
contextualized word embedding, identification of privacy-relevant statement
categories, fine-grained classification, and domain-specific model tuning.
[COMMENTS]
27 pages
[LINK]
http://arxiv.org/abs/2501.10319v1
[DATE]
2025-01-18 01:47:15+08:00
[CATEGORIES]
cs.CL
Improved Paraphrase Generation via Controllable Latent Diffusion
[AUTHORS]
Wei Zou, Ziyuan Zhuang, Xiang Geng, Shujian Huang, Jia Liu, Jiajun Chen
[ABSTRACT]
Paraphrase generation strives to generate high-quality and diverse
expressions of a given text, a domain where diffusion models excel. Though SOTA
diffusion generation reconciles generation quality and diversity, textual
diffusion suffers from a truncation issue that hinders efficiency and quality
control. In this work, we propose \textit{L}atent \textit{D}iffusion
\textit{P}araphraser~(LDP), a novel paraphrase generation by modeling a
controllable diffusion process given a learned latent space. LDP achieves
superior generation efficiency compared to its diffusion counterparts. It can
facilitate only input segments to ensure paraphrase semantics, improving the
results without external features. Experiments show that LDP better reconciles
paraphrase generation quality and diversity than baselines. Further analysis
shows that our method is also helpful to other similar text generations and
domain adaptations
[COMMENTS]
The article has been accepted by Frontiers of Computer Science (FCS)
[LINK]
http://arxiv.org/abs/2404.08938v2
[DATE]
2025-01-18 01:05:41+08:00
[CATEGORIES]
cs.CL
Computational Protein Science in the Era of Large Language Models (LLMs)
[AUTHORS]
Wenqi Fan, Yi Zhou, Shijie Wang, Yuyao Yan, Hui Liu, Qian Zhao, Le Song, Qing Li
[ABSTRACT]
Considering the significance of proteins, computational protein science has
always been a critical scientific field, dedicated to revealing knowledge and
developing applications within the protein sequence-structure-function
paradigm. In the last few decades, Artificial Intelligence (AI) has made
significant impacts in computational protein science, leading to notable
successes in specific protein modeling tasks. However, those previous AI models
still meet limitations, such as the difficulty in comprehending the semantics
of protein sequences, and the inability to generalize across a wide range of
protein modeling tasks. Recently, LLMs have emerged as a milestone in AI due to
their unprecedented language processing & generalization capability. They can
promote comprehensive progress in fields rather than solving individual tasks.
As a result, researchers have actively introduced LLM techniques in
computational protein science, developing protein Language Models (pLMs) that
skillfully grasp the foundational knowledge of proteins and can be effectively
generalized to solve a diversity of sequence-structure-function reasoning
problems. While witnessing prosperous developments, it’s necessary to present a
systematic overview of computational protein science empowered by LLM
techniques. First, we summarize existing pLMs into categories based on their
mastered protein knowledge, i.e., underlying sequence patterns, explicit
structural and functional information, and external scientific languages.
Second, we introduce the utilization and adaptation of pLMs, highlighting their
remarkable achievements in promoting protein structure prediction, protein
function prediction, and protein design studies. Then, we describe the
practical application of pLMs in antibody design, enzyme design, and drug
discovery. Finally, we specifically discuss the promising future directions in
this fast-growing field.
[LINK]
http://arxiv.org/abs/2501.10282v1
[DATE]
2025-01-18 00:21:18+08:00
[CATEGORIES]
cs.CL
Credit Risk Identification in Supply Chains Using Generative Adversarial Networks
[AUTHORS]
Zizhou Zhang, Xinshi Li, Yu Cheng, Zhenrui Chen, Qianying Liu
[ABSTRACT]
Credit risk management within supply chains has emerged as a critical
research area due to its significant implications for operational stability and
financial sustainability. The intricate interdependencies among supply chain
participants mean that credit risks can propagate across networks, with impacts
varying by industry. This study explores the application of Generative
Adversarial Networks (GANs) to enhance credit risk identification in supply
chains. GANs enable the generation of synthetic credit risk scenarios,
addressing challenges related to data scarcity and imbalanced datasets. By
leveraging GAN-generated data, the model improves predictive accuracy while
effectively capturing dynamic and temporal dependencies in supply chain data.
The research focuses on three representative industries-manufacturing (steel),
distribution (pharmaceuticals), and services (e-commerce) to assess
industry-specific credit risk contagion. Experimental results demonstrate that
the GAN-based model outperforms traditional methods, including logistic
regression, decision trees, and neural networks, achieving superior accuracy,
recall, and F1 scores. The findings underscore the potential of GANs in
proactive risk management, offering robust tools for mitigating financial
disruptions in supply chains. Future research could expand the model by
incorporating external market factors and supplier relationships to further
enhance predictive capabilities. Keywords- Generative Adversarial Networks
(GANs); Supply Chain Risk; Credit Risk Identification; Machine Learning; Data
Augmentation
[COMMENTS]
The paper will be published and indexed by IEEE at 2025 8th
International Conference on Advanced Algorithms and Control Engineering
(ICAACE 2025)
[LINK]
http://arxiv.org/abs/2501.10348v1
[DATE]
2025-01-18 02:42:46+08:00
[CATEGORIES]
cs.LG
On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
[AUTHORS]
Zichang Ge, Changyu Chen, Arunesh Sinha, Pradeep Varakantham
[ABSTRACT]
In real-world sequential decision making tasks like autonomous driving,
robotics, and healthcare, learning from observed state-action trajectories is
critical for tasks like imitation, classification, and clustering. For example,
self-driving cars must replicate human driving behaviors, while robots and
healthcare systems benefit from modeling decision sequences, whether or not
they come from expert data. Existing trajectory encoding methods often focus on
specific tasks or rely on reward signals, limiting their ability to generalize
across domains and tasks. Inspired by the success of embedding models like CLIP
and BERT in static domains, we propose a novel method for embedding
state-action trajectories into a latent space that captures the skills and
competencies in the dynamic underlying decision-making processes. This method
operates without the need for reward labels, enabling better generalization
across diverse domains and tasks. Our contributions are threefold: (1) We
introduce a trajectory embedding approach that captures multiple abilities from
state-action data. (2) The learned embeddings exhibit strong representational
power across downstream tasks, including imitation, classification, clustering,
and regression. (3) The embeddings demonstrate unique properties, such as
controlling agent behaviors in IQ-Learn and an additive structure in the latent
space. Experimental results confirm that our method outperforms traditional
approaches, offering more flexible and powerful trajectory representations for
various applications. Our code is available at
https://github.com/Erasmo1015/vte.
[COMMENTS]
AAMAS 2025
[LINK]
http://arxiv.org/abs/2501.09327v2
[DATE]
2025-01-18 02:30:04+08:00
[CATEGORIES]
cs.LG
Stochastic gradient descent for streaming linear and rectified linear systems with adversarial corruptions
[AUTHORS]
Halyun Jeong, Deanna Needell, Elizaveta Rebrova
[ABSTRACT]
We propose SGD-exp, a stochastic gradient descent approach for linear and
ReLU regressions under Massart noise (adversarial semi-random corruption model)
for the fully streaming setting. We show novel nearly linear convergence
guarantees of SGD-exp to the true parameter with up to $50\%$ Massart
corruption rate, and with any corruption rate in the case of symmetric
oblivious corruptions. This is the first convergence guarantee result for
robust ReLU regression in the streaming setting, and it shows the improved
convergence rate over previous robust methods for $L_1$ linear regression due
to a choice of an exponentially decaying step size, known for its efficiency in
practice. Our analysis is based on the drift analysis of a discrete stochastic
process, which could also be interesting on its own.
[COMMENTS]
Submitted to a journal
[LINK]
http://arxiv.org/abs/2403.01204v2
[DATE]
2025-01-18 02:15:40+08:00
[CATEGORIES]
cs.LG
New Fashion Products Performance Forecasting: A Survey on Evolutions, Models and Emerging Trends
[AUTHORS]
Andrea Avogaro, Luigi Capogrosso, Andrea Toaiari, Franco Fummi, Marco Cristani
[ABSTRACT]
The fast fashion industry’s insatiable demand for new styles and rapid
production cycles has led to a significant environmental burden.
Overproduction, excessive waste, and harmful chemicals have contributed to the
negative environmental impact of the industry. To mitigate these issues, a
paradigm shift that prioritizes sustainability and efficiency is urgently
needed. Integrating learning-based predictive analytics into the fashion
industry represents a significant opportunity to address environmental
challenges and drive sustainable practices. By forecasting fashion trends and
optimizing production, brands can reduce their ecological footprint while
remaining competitive in a rapidly changing market. However, one of the key
challenges in forecasting fashion sales is the dynamic nature of consumer
preferences. Fashion is acyclical, with trends constantly evolving and
resurfacing. In addition, cultural changes and unexpected events can disrupt
established patterns. This problem is also known as New Fashion Products
Performance Forecasting (NFPPF), and it has recently gained more and more
interest in the global research landscape. Given its multidisciplinary nature,
the field of NFPPF has been approached from many different angles. This
comprehensive survey wishes to provide an up-to-date overview that focuses on
learning-based NFPPF strategies. The survey is based on the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow,
allowing for a systematic and complete literature review. In particular, we
propose the first taxonomy that covers the learning panorama for NFPPF,
examining in detail the different methodologies used to increase the amount of
multimodal information, as well as the state-of-the-art available datasets.
Finally, we discuss the challenges and future directions.
[COMMENTS]
Accepted at the Springer Nature Computer Science journal
[LINK]
http://arxiv.org/abs/2501.10324v1
[DATE]
2025-01-18 01:56:27+08:00
[CATEGORIES]
cs.LG
Towards Human-Guided, Data-Centric LLM Co-Pilots
[AUTHORS]
Evgeny Saveliev, Jiashuo Liu, Nabeel Seedat, Anders Boyd, Mihaela van der Schaar
[ABSTRACT]
Machine learning (ML) has the potential to revolutionize healthcare, but its
adoption is often hindered by the disconnect between the needs of domain
experts and translating these needs into robust and valid ML tools. Despite
recent advances in LLM-based co-pilots to democratize ML for non-technical
domain experts, these systems remain predominantly focused on model-centric
aspects while overlooking critical data-centric challenges. This limitation is
problematic in complex real-world settings where raw data often contains
complex issues, such as missing values, label noise, and domain-specific
nuances requiring tailored handling. To address this we introduce CliMB-DC, a
human-guided, data-centric framework for LLM co-pilots that combines advanced
data-centric tools with LLM-driven reasoning to enable robust, context-aware
data processing. At its core, CliMB-DC introduces a novel, multi-agent
reasoning system that combines a strategic coordinator for dynamic planning and
adaptation with a specialized worker agent for precise execution. Domain
expertise is then systematically incorporated to guide the reasoning process
using a human-in-the-loop approach. To guide development, we formalize a
taxonomy of key data-centric challenges that co-pilots must address.
Thereafter, to address the dimensions of the taxonomy, we integrate
state-of-the-art data-centric tools into an extensible, open-source
architecture, facilitating the addition of new tools from the research
community. Empirically, using real-world healthcare datasets we demonstrate
CliMB-DC’s ability to transform uncurated datasets into ML-ready formats,
significantly outperforming existing co-pilot baselines for handling
data-centric challenges. CliMB-DC promises to empower domain experts from
diverse domains – healthcare, finance, social sciences and more – to actively
participate in driving real-world impact using ML.
[COMMENTS]
Saveliev, Liu & Seedat contributed equally
[LINK]
http://arxiv.org/abs/2501.10321v1
[DATE]
2025-01-18 01:51:22+08:00
[CATEGORIES]
cs.LG
Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark
[AUTHORS]
Shenhao Wang, Baichuan Mo, Yunhan Zheng, Stephane Hess, Jinhua Zhao
[ABSTRACT]
Numerous studies have compared machine learning (ML) and discrete choice
models (DCMs) in predicting travel demand. However, these studies often lack
generalizability as they compare models deterministically without considering
contextual variations. To address this limitation, our study develops an
empirical benchmark by designing a tournament model, thus efficiently
summarizing a large number of experiments, quantifying the randomness in model
comparisons, and using formal statistical tests to differentiate between the
model and contextual effects. This benchmark study compares two large-scale
data sources: a database compiled from literature review summarizing 136
experiments from 35 studies, and our own experiment data, encompassing a total
of 6,970 experiments from 105 models and 12 model families. This benchmark
study yields two key findings. Firstly, many ML models, particularly the
ensemble methods and deep learning, statistically outperform the DCM family
(i.e., multinomial, nested, and mixed logit models). However, this study also
highlights the crucial role of the contextual factors (i.e., data sources,
inputs and choice categories), which can explain models’ predictive performance
more effectively than the differences in model types alone. Model performance
varies significantly with data sources, improving with larger sample sizes and
lower dimensional alternative sets. After controlling all the model and
contextual factors, significant randomness still remains, implying inherent
uncertainty in such model comparisons. Overall, we suggest that future
researchers shift more focus from context-specific model comparisons towards
examining model transferability across contexts and characterizing the inherent
uncertainty in ML, thus creating more robust and generalizable next-generation
travel demand models.
[LINK]
http://arxiv.org/abs/2102.01130v2
[DATE]
2025-01-18 01:04:07+08:00
[CATEGORIES]
cs.LG
STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead
[AUTHORS]
Mohammad Mahdavian, Payam Nikdel, Mahdi TaherAhmadi, Mo Chen
[ABSTRACT]
In this paper, we develop a neural network model to predict future human
motion from an observed human motion history. We propose a non-autoregressive
transformer architecture to leverage its parallel nature for easier training
and fast, accurate predictions at test time. The proposed architecture divides
human motion prediction into two parts: 1) the human trajectory, which is the
hip joint 3D position over time and 2) the human pose which is the all other
joints 3D positions over time with respect to a fixed hip joint. We propose to
make the two predictions simultaneously, as the shared representation can
improve the model performance. Therefore, the model consists of two sets of
encoders and decoders. First, a multi-head attention module applied to encoder
outputs improves human trajectory. Second, another multi-head self-attention
module applied to encoder outputs concatenated with decoder outputs facilitates
learning of temporal dependencies. Our model is well-suited for robotic
applications in terms of test accuracy and speed, and compares favorably with
respect to state-of-the-art methods. We demonstrate the real-world
applicability of our work via the Robot Follow-Ahead task, a challenging yet
practical case study for our proposed model.
[LINK]
http://arxiv.org/abs/2209.07600v4
[DATE]
2025-01-18 00:52:03+08:00
[CATEGORIES]
cs.LG
The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI
[AUTHORS]
Christopher Burger, Charles Walter, Thai Le
[ABSTRACT]
Recent work has investigated the vulnerability of local surrogate methods to
adversarial perturbations on a machine learning (ML) model’s inputs, where the
explanation is manipulated while the meaning and structure of the original
input remains similar under the complex model. Although weaknesses across many
methods have been shown to exist, the reasons behind why remain little
explored. Central to the concept of adversarial attacks on explainable AI (XAI)
is the similarity measure used to calculate how one explanation differs from
another. A poor choice of similarity measure can lead to erroneous conclusions
on the efficacy of an XAI method. Too sensitive a measure results in
exaggerated vulnerability, while too coarse understates its weakness. We
investigate a variety of similarity measures designed for text-based ranked
lists, including Kendall’s Tau, Spearman’s Footrule, and Rank-biased Overlap to
determine how substantial changes in the type of measure or threshold of
success affect the conclusions generated from common adversarial attack
processes. Certain measures are found to be overly sensitive, resulting in
erroneous estimates of stability.
[COMMENTS]
11 pages, 8 Tables (Minor edits for clarity and grammar)
[LINK]
http://arxiv.org/abs/2406.15839v2
[DATE]
2025-01-18 00:49:25+08:00
[CATEGORIES]
cs.LG
Generalized Multi-hop Traffic Pressure for Heterogeneous Traffic Perimeter Control
[AUTHORS]
Xiaocan Li, Xiaoyu Wang, Ilia Smirnov, Scott Sanner, Baher Abdulhai
[ABSTRACT]
Perimeter control (PC) prevents loss of traffic network capacity due to
congestion in urban areas. Homogeneous PC allows all access points to a
protected region to have identical permitted inflow. However, homogeneous PC
performs poorly when the congestion in the protected region is heterogeneous
(e.g., imbalanced demand) since the homogeneous PC does not consider specific
traffic conditions around each perimeter intersection. When the protected
region has spatially heterogeneous congestion, one needs to modulate the
perimeter inflow rate to be higher near low-density regions and vice versa for
high-density regions. A na"ive approach is to leverage 1-hop traffic pressure
to measure traffic condition around perimeter intersections, but such metric is
too spatially myopic for PC. To address this issue, we formulate multi-hop
downstream pressure grounded on Markov chain theory, which “looks deeper”
into the protected region beyond perimeter intersections. In addition, we
formulate a two-stage hierarchical control scheme that can leverage this novel
multi-hop pressure to redistribute the total permitted inflow provided by a
pre-trained deep reinforcement learning homogeneous control policy.
Experimental results show that our heterogeneous PC approaches leveraging
multi-hop pressure significantly outperform homogeneous PC in scenarios where
the origin-destination flows are highly imbalanced with high spatial
heterogeneity. Moveover, our approach is shown to be robust against turning
ratio uncertainties by a sensitivity analysis.
[COMMENTS]
11 pages main body, 13 figures, journal paper
[LINK]
http://arxiv.org/abs/2409.00753v2
[DATE]
2025-01-18 00:37:23+08:00
[CATEGORIES]
cs.LG
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
[AUTHORS]
Ishank Juneja, Carlee Joe-Wong, Osman Yağan
[ABSTRACT]
Multi-armed bandits (MAB) are commonly used in sequential online
decision-making when the reward of each decision is an unknown random variable.
In practice however, the typical goal of maximizing total reward may be less
important than minimizing the total cost of the decisions taken, subject to a
reward constraint. For example, we may seek to make decisions that have at
least the reward of a reference “default” decision, with as low a cost as
possible. This problem was recently introduced in the Multi-Armed Bandits with
Cost Subsidy (MAB-CS) framework. MAB-CS is broadly applicable to problem
domains where a primary metric (cost) is constrained by a secondary metric
(reward), and the rewards are unknown. In our work, we address variants of
MAB-CS including ones with reward constrained by the reward of a known
reference arm or by the subsidized best reward. We introduce the
Pairwise-Elimination (PE) algorithm for the known reference arm variant and
generalize PE to PE-CS for the subsidized best reward variant. Our
instance-dependent analysis of PE and PE-CS reveals that both algorithms have
an order-wise logarithmic upper bound on Cost and Quality Regret, making our
policies the first with such a guarantee. Moreover, by comparing our upper and
lower bound results we establish that PE is order-optimal for all known
reference arm problem instances. Finally, experiments are conducted using the
MovieLens 25M and Goodreads datasets for both PE and PE-CS revealing the
effectiveness of PE and the superior balance between performance and
reliability offered by PE-CS compared to baselines from the literature.
[LINK]
http://arxiv.org/abs/2501.10290v1
[DATE]
2025-01-18 00:34:45+08:00
[CATEGORIES]
cs.LG
Counterfactual Uncertainty Quantification of Factual Estimand of Efficacy from Before-and-After Treatment Repeated Measures Randomized Controlled Trials
[AUTHORS]
Xingya Wang, Yang Han, Yushi Liu, Szu-Yu Tang, Jason C. Hsu
[ABSTRACT]
The ideal estimand for comparing treatment $Rx$ with a control $C$ is the
$\textit{counterfactual}$ efficacy $Rx:C$, the expected differential outcome
between $Rx$ and $C$ if each patient were given $\textit{both}$. One hundred
years ago, Neyman (1923a) proved unbiased $\textit{point estimation}$ of
counterfactual efficacy from designed $\textit{factual}$ experiments is
achievable. But he left the determination of how much might the counterfactual
variance of this estimate be smaller than the factual variance as an open
challenge. This article shows $\textit{counterfactual}$ uncertainty
quantification (CUQ), quantifying uncertainty for factual point estimates but
in a counterfactual setting, is achievable for Randomized Controlled Trials
(RCTs) with Before-and-After treatment Repeated Measures which are common in
many therapeutic areas. We achieve CUQ whose variability is typically smaller
than factual UQ by creating a new statistical modeling principle called ETZ.
We urge caution in using predictors with measurement error which violates
standard regression assumption and can cause $\textit{attenuation}$ in
estimating treatment effects. Fortunately, we prove that, for traditional
medicine in general, and for targeted therapy with efficacy defined as averaged
over the population, counterfactual point estimation is unbiased. However, for
both Real Human and Digital Twins approaches, predicting treatment effect in
$\textit{subgroups}$ may have attenuation bias.
[LINK]
http://arxiv.org/abs/2411.09635v3
[DATE]
2025-01-18 00:11:42+08:00
[CATEGORIES]
cs.LG
Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities
[AUTHORS]
Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li
[ABSTRACT]
Language has long been conceived as an essential tool for human reasoning.
The breakthrough of Large Language Models (LLMs) has sparked significant
research interest in leveraging these models to tackle complex reasoning tasks.
Researchers have moved beyond simple autoregressive token generation by
introducing the concept of “thought” – a sequence of tokens representing
intermediate steps in the reasoning process. This innovative paradigm enables
LLMs’ to mimic complex human reasoning processes, such as tree search and
reflective thinking. Recently, an emerging trend of learning to reason has
applied reinforcement learning (RL) to train LLMs to master reasoning
processes. This approach enables the automatic generation of high-quality
reasoning trajectories through trial-and-error search algorithms, significantly
expanding LLMs’ reasoning capacity by providing substantially more training
data. Furthermore, recent studies demonstrate that encouraging LLMs to “think”
with more tokens during test-time inference can further significantly boost
reasoning accuracy. Therefore, the train-time and test-time scaling combined to
show a new research frontier – a path toward Large Reasoning Model. The
introduction of OpenAI’s o1 series marks a significant milestone in this
research direction. In this survey, we present a comprehensive review of recent
progress in LLM reasoning. We begin by introducing the foundational background
of LLMs and then explore the key technical components driving the development
of large reasoning models, with a focus on automated data construction,
learning-to-reason techniques, and test-time scaling. We also analyze popular
open-source projects at building large reasoning models, and conclude with open
challenges and future research directions.
[COMMENTS]
36 pages, 5 figures
[LINK]
http://arxiv.org/abs/2501.09686v2
[DATE]
2025-01-17 23:24:53+08:00
[CATEGORIES]
cs.CL
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
[AUTHORS]
Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin
[ABSTRACT]
The kNN-CTC model has proven to be effective for monolingual automatic speech
recognition (ASR). However, its direct application to multilingual scenarios
like code-switching, presents challenges. Although there is potential for
performance improvement, a kNN-CTC model utilizing a single bilingual datastore
can inadvertently introduce undesirable noise from the alternative language. To
address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR)
framework that employs dual monolingual datastores and a gated datastore
selection mechanism to reduce noise interference. Our method selects the
appropriate datastore for decoding each frame, ensuring the injection of
language-specific information into the ASR process. We apply this framework to
cutting-edge CTC-based models, developing an advanced CS-ASR system. Extensive
experiments demonstrate the remarkable effectiveness of our gated datastore
mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR.
[COMMENTS]
Accepted by ICASSP 2025
[LINK]
http://arxiv.org/abs/2406.03814v5
[DATE]
2025-01-17 23:02:08+08:00
[CATEGORIES]
cs.CL
Optimal Quantization for Matrix Multiplication
[AUTHORS]
Or Ordentlich, Yury Polyanskiy
[ABSTRACT]
Recent work in machine learning community proposed multiple methods for
performing lossy compression (quantization) of large matrices. This
quantization is important for accelerating matrix multiplication (main
component of large language models), which is often bottlenecked by the speed
of loading these matrices from memory. Unlike classical vector quantization and
rate-distortion theory, the goal of these new compression algorithms is to be
able to approximate not the matrices themselves, but their matrix product.
Specifically, given a pair of real matrices $A,B$ an encoder (compressor) is
applied to each of them independently producing descriptions with $R$ bits per
entry. These representations subsequently are used by the decoder to estimate
matrix product $A^\top B$. In this work, we provide a non-asymptotic lower
bound on the mean squared error of this approximation (as a function of rate
$R$) for the case of matrices $A,B$ with iid Gaussian entries. Algorithmically,
we construct a universal quantizer based on nested lattices with an explicit
guarantee of approximation error for any (non-random) pair of matrices $A$, $B$
in terms of only Frobenius norms $|\bar{A}|_F, |\bar{B}|_F$ and
$|\bar{A}^\top \bar{B}|_F$, where $\bar{A},\bar{B}$ are versions of $A,B$
with zero-centered columns, respectively. For iid Gaussian matrices our
quantizer achieves the lower bound and is, thus, asymptotically optimal. A
practical low-complexity version of our quantizer achieves performance quite
close to optimal. In addition, we derive rate-distortion function for matrix
multiplication of iid Gaussian matrices, which exhibits an interesting
phase-transition at $R\approx 0.906$ bit/entry.
[LINK]
http://arxiv.org/abs/2410.13780v2
[DATE]
2025-01-17 22:26:37+08:00
[CATEGORIES]
cs.CL
cs.LG
DPCL-Diff: The Temporal Knowledge Graph Reasoning Based on Graph Node Diffusion Model with Dual-Domain Periodic Contrastive Learning
[AUTHORS]
Yukun Cao, Lisheng Wang, Luobin Huang
[ABSTRACT]
Temporal knowledge graph (TKG) reasoning that infers future missing facts is
an essential and challenging task. Predicting future events typically relies on
closely related historical facts, yielding more accurate results for repetitive
or periodic events. However, for future events with sparse historical
interactions, the effectiveness of this method, which focuses on leveraging
high-frequency historical information, diminishes. Recently, the capabilities
of diffusion models in image generation have opened new opportunities for TKG
reasoning. Therefore, we propose a graph node diffusion model with dual-domain
periodic contrastive learning (DPCL-Diff). Graph node diffusion model (GNDiff)
introduces noise into sparsely related events to simulate new events,
generating high-quality data that better conforms to the actual distribution.
This generative mechanism significantly enhances the model’s ability to reason
about new events. Additionally, the dual-domain periodic contrastive learning
(DPCL) maps periodic and non-periodic event entities to Poincar'e and
Euclidean spaces, leveraging their characteristics to distinguish similar
periodic events effectively. Experimental results on four public datasets
demonstrate that DPCL-Diff significantly outperforms state-of-the-art TKG
models in event prediction, demonstrating our approach’s effectiveness. This
study also investigates the combined effectiveness of GNDiff and DPCL in TKG
tasks.
[COMMENTS]
11 pages, 2 figures
[LINK]
http://arxiv.org/abs/2411.01477v2
[DATE]
2025-01-17 22:10:15+08:00
[CATEGORIES]
cs.LG
cs.CL
Jailbreaking as a Reward Misspecification Problem
[AUTHORS]
Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong
[ABSTRACT]
The widespread adoption of large language models (LLMs) has raised concerns
about their safety and reliability, particularly regarding their vulnerability
to adversarial attacks. In this paper, we propose a novel perspective that
attributes this vulnerability to reward misspecification during the alignment
process. This misspecification occurs when the reward function fails to
accurately capture the intended behavior, leading to misaligned model outputs.
We introduce a metric ReGap to quantify the extent of reward misspecification
and demonstrate its effectiveness and robustness in detecting harmful backdoor
prompts. Building upon these insights, we present ReMiss, a system for
automated red teaming that generates adversarial prompts in a
reward-misspecified space. ReMiss achieves state-of-the-art attack success
rates on the AdvBench benchmark against various target aligned LLMs while
preserving the human readability of the generated prompts. Furthermore, these
attacks on open-source models demonstrate high transferability to closed-source
models like GPT-4o and out-of-distribution tasks from HarmBench. Detailed
analysis highlights the unique advantages of the proposed reward
misspecification objective compared to previous methods, offering new insights
for improving LLM safety and robustness.
[LINK]
http://arxiv.org/abs/2406.14393v4
[DATE]
2025-01-17 21:56:50+08:00
[CATEGORIES]
cs.LG
cs.CL
Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence
[AUTHORS]
Philipp Kuehn, Dilara Nadermahmoodi, Markus Bayer, Christian Reuter
[COMMENTS]
6 pages, 1 figure, 3 tables
[LINK]
http://arxiv.org/abs/2304.11960v3
[DATE]
2025-01-17 21:34:49+08:00
[CATEGORIES]
cs.CL
cs.LG
How Redundant Is the Transformer Stack in Speech Representation Models?
[AUTHORS]
Teresa Dorszewski, Albert Kjøller Jacobsen, Lenka Tětková, Lars Kai Hansen
[ABSTRACT]
Self-supervised speech representation models, particularly those leveraging
transformer architectures, have demonstrated remarkable performance across
various tasks such as speech recognition, speaker identification, and emotion
detection. Recent studies on transformer models revealed a high redundancy
between layers and the potential for significant pruning, which we will
investigate here for transformer-based speech representation models. We perform
a detailed analysis of layer similarity in speech representation models using
three similarity metrics: cosine similarity, centered kernel alignment, and
mutual nearest-neighbor alignment. Our findings reveal a block-like structure
of high similarity, suggesting two main processing steps and significant
redundancy of layers. We demonstrate the effectiveness of pruning
transformer-based speech representation models without the need for
post-training, achieving up to 40% reduction in transformer layers while
maintaining over 95% of the model’s predictive capacity. Furthermore, we employ
a knowledge distillation method to substitute the entire transformer stack with
mimicking layers, reducing the network size 95-98% and the inference time by up
to 94%. This substantial decrease in computational load occurs without
considerable performance loss, suggesting that the transformer stack is almost
completely redundant for downstream applications of speech representation
models.
[COMMENTS]
To appear at ICASSP 2025 (excluding appendix)
[LINK]
http://arxiv.org/abs/2409.16302v2
[DATE]
2025-01-17 20:27:40+08:00
[CATEGORIES]
cs.CL
cs.LG
Dual Debiasing: Remove Stereotypes and Keep Factual Gender for Fair Language Modeling and Translation
[AUTHORS]
Tomasz Limisiewicz, David Mareček, Tomáš Musil
[ABSTRACT]
Mitigation of biases, such as language models’ reliance on gender
stereotypes, is a crucial endeavor required for the creation of reliable and
useful language technology. The crucial aspect of debiasing is to ensure that
the models preserve their versatile capabilities, including their ability to
solve language tasks and equitably represent various genders. To address this
issue, we introduce a streamlined Dual Dabiasing Algorithm through Model
Adaptation (2DAMA). Novel Dual Debiasing enables robust reduction of
stereotypical bias while preserving desired factual gender information encoded
by language models. We show that 2DAMA effectively reduces gender bias in
English and is one of the first approaches facilitating the mitigation of
stereotypical tendencies in translation. The proposed method’s key advantage is
the preservation of factual gender cues, which are useful in a wide range of
natural language processing tasks.
[LINK]
http://arxiv.org/abs/2501.10150v1
[DATE]
2025-01-17 20:23:30+08:00
[CATEGORIES]
cs.CL
Piece of Table: A Divide-and-Conquer Approach for Selecting Sub-Tables in Table Question Answering
[AUTHORS]
Wonjin Lee, Kyumin Kim, Sungjae Lee, Jihun Lee, Kwang In Kim
[ABSTRACT]
Applying language models (LMs) to tables is challenging due to the inherent
structural differences between two-dimensional tables and one-dimensional text
for which the LMs were originally designed. Furthermore, when applying
linearized tables to LMs, the maximum token lengths often imposed in
self-attention calculations make it difficult to comprehensively understand the
context spread across large tables. To address these challenges, we present
PieTa (Piece of Table), a new framework for sub-table-based question answering
(QA). PieTa operates through an iterative process of dividing tables into
smaller windows, using LMs to select relevant cells within each window, and
merging these cells into a sub-table. This multi-resolution approach captures
dependencies across multiple rows and columns while avoiding the limitations
caused by long context inputs. Instantiated as a simple iterative sub-table
union algorithm, PieTa demonstrates improved performance over previous
sub-table-based QA approaches.
[LINK]
http://arxiv.org/abs/2412.07629v3
[DATE]
2025-01-17 19:37:04+08:00
[CATEGORIES]
cs.CL
Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO
[AUTHORS]
Umer Butt, Stalin Veranasi, Günter Neumann
[ABSTRACT]
As the Information Retrieval (IR) field increasingly recognizes the
importance of inclusivity, addressing the needs of low-resource languages
remains a significant challenge. This paper introduces the first large-scale
Urdu IR dataset, created by translating the MS MARCO dataset through machine
translation. We establish baseline results through zero-shot learning for IR in
Urdu and subsequently apply the mMARCO multilingual IR methodology to this
newly translated dataset. Our findings demonstrate that the fine-tuned model
(Urdu-mT5-mMARCO) achieves a Mean Reciprocal Rank (MRR@10) of 0.247 and a
Recall@10 of 0.439, representing significant improvements over zero-shot
results and showing the potential for expanding IR access for Urdu speakers. By
bridging access gaps for speakers of low-resource languages, this work not only
advances multilingual IR research but also emphasizes the ethical and societal
importance of inclusive IR technologies. This work provides valuable insights
into the challenges and solutions for improving language representation and
lays the groundwork for future research, especially in South Asian languages,
which can benefit from the adaptable methods used in this study.
[COMMENTS]
7 pages, ECIR 2025, conference camera-ready version
[LINK]
http://arxiv.org/abs/2412.12997v2
[DATE]
2025-01-17 18:02:38+08:00
[CATEGORIES]
cs.CL
Author-Specific Linguistic Patterns Unveiled: A Deep Learning Study on Word Class Distributions
[AUTHORS]
Patrick Krauss, Achim Schilling
[ABSTRACT]
Deep learning methods have been increasingly applied to computational
linguistics to uncover patterns in text data. This study investigates
author-specific word class distributions using part-of-speech (POS) tagging and
bigram analysis. By leveraging deep neural networks, we classify literary
authors based on POS tag vectors and bigram frequency matrices derived from
their works. We employ fully connected and convolutional neural network
architectures to explore the efficacy of unigram and bigram-based
representations. Our results demonstrate that while unigram features achieve
moderate classification accuracy, bigram-based models significantly improve
performance, suggesting that sequential word class patterns are more
distinctive of authorial style. Multi-dimensional scaling (MDS) visualizations
reveal meaningful clustering of authors’ works, supporting the hypothesis that
stylistic nuances can be captured through computational methods. These findings
highlight the potential of deep learning and linguistic feature analysis for
author profiling and literary studies.
[LINK]
http://arxiv.org/abs/2501.10072v1
[DATE]
2025-01-17 17:43:49+08:00
[CATEGORIES]
cs.CL
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
[AUTHORS]
Thibaut Thonet, Jos Rozen, Laurent Besacier
[COMMENTS]
Published in COLING 2025
[LINK]
http://arxiv.org/abs/2403.20262v3
[DATE]
2025-01-17 17:32:54+08:00
[CATEGORIES]
cs.CL
cs.LG
Structured Packing in LLM Training Improves Long Context Utilization
[AUTHORS]
Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś
[ABSTRACT]
Recent advancements in long-context large language models have attracted
significant attention, yet their practical applications often suffer from
suboptimal context utilization. This study investigates structuring training
data to enhance semantic interdependence, demonstrating that this approach
effectively improves context utilization. To this end, we introduce the
Structured Packing for Long Context (SPLiCe) method, which utilizes retrieval
to collate mutually relevant documents into long and coherent training
examples. We validate SPLiCe empirically across models of varying sizes – 3B,
7B, and 13B – achieving improved performance in long-context tasks, such as
Qasper and HotpotQA. Remarkably, even brief fine-tuning with SPLiCe is
sufficient to realize these benefits. Additionally, SPLiCe effectively
mitigates the lost-in-middle phenomenon often observed in large models. Our
comprehensive analysis of SPLiCe explores its design choices and reveals
intriguing transfer effects; for instance, training on programming code
enhances performance on natural language tasks.
[COMMENTS]
AAAI’25
[LINK]
http://arxiv.org/abs/2312.17296v8
[DATE]
2025-01-17 17:28:45+08:00
[CATEGORIES]
cs.CL
OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning
[AUTHORS]
Jinyuan Feng, Zhiqiang Pu, Tianyi Hu, Dongmin Li, Xiaolin Ai, Huimu Wang
[ABSTRACT]
Building mixture-of-experts (MoE) architecture for Low-rank adaptation (LoRA)
is emerging as a potential direction in parameter-efficient fine-tuning (PEFT)
for its modular design and remarkable performance. However, simply stacking the
number of experts cannot guarantee significant improvement. In this work, we
first conduct qualitative analysis to indicate that experts collapse to similar
representations in vanilla MoE, limiting the capacity of modular design and
computational efficiency. Ulteriorly, Our analysis reveals that the performance
of previous MoE variants maybe limited by a lack of diversity among experts.
Motivated by these findings, we propose Orthogonal Mixture-of-Experts (OMoE), a
resource-efficient MoE variant that trains experts in an orthogonal manner to
promote diversity. In OMoE, a Gram-Schmidt process is leveraged to enforce that
the experts’ representations lie within the Stiefel manifold. By applying
orthogonal constraints directly to the architecture, OMoE keeps the learning
objective unchanged, without compromising optimality. Our method is simple and
alleviates memory bottlenecks, as it incurs minimal experts compared to vanilla
MoE models. Experiments on diverse commonsense reasoning benchmarks demonstrate
that OMoE can consistently achieve stable and efficient performance improvement
when compared with the state-of-the-art methods while significantly reducing
the number of required experts.
[LINK]
http://arxiv.org/abs/2501.10062v1
[DATE]
2025-01-17 17:27:08+08:00
[CATEGORIES]
cs.LG
cs.CL
Can linguists better understand DNA?
[AUTHORS]
Wang Liang
[ABSTRACT]
Multilingual transfer ability, which reflects how well models fine-tuned on
one source language can be applied to other languages, has been well studied in
multilingual pre-trained models. However, the existence of such capability
transfer between natural language and gene sequences/languages remains under
explored.This study addresses this gap by drawing inspiration from the
sentence-pair classification task used for evaluating sentence similarity in
natural language. We constructed two analogous tasks: DNA-pair
classification(DNA sequence similarity) and DNA-protein-pair
classification(gene coding determination). These tasks were designed to
validate the transferability of capabilities from natural language to gene
sequences. Even a small-scale pre-trained model like GPT-2-small, which was
pre-trained on English, achieved an accuracy of 78% on the DNA-pair
classification task after being fine-tuned on English sentence-pair
classification data(XTREME PAWS-X). While training a BERT model on multilingual
text, the precision reached 89%. On the more complex DNA-protein-pair
classification task, however, the model’s output was barely distinguishable
from random output.Experimental validation has confirmed that the transfer of
capabilities from natural language to biological language is unequivocally
present. Building on this foundation, we have also investigated the impact of
model parameter scale and pre-training on this capability transfer. We provide
recommendations for facilitating the transfer of capabilities from natural
language to genetic language,as well as new approaches for conducting
biological research based on this capability.This study offers an intriguing
new perspective on exploring the relationship between natural language and
genetic language.
[COMMENTS]
20 pages,8 figures
[LINK]
http://arxiv.org/abs/2412.07678v3
[DATE]
2025-01-17 16:54:50+08:00
[CATEGORIES]
cs.CL
RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation
[AUTHORS]
Yuefan Cao, Chengyue Gong, Xiaoyu Li, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
[ABSTRACT]
Text-to-video generation models have made impressive progress, but they still
struggle with generating videos with complex features. This limitation often
arises from the inability of the text encoder to produce accurate embeddings,
which hinders the video generation model. In this work, we propose a novel
approach to overcome this challenge by selecting the optimal text embedding
through interpolation in the embedding space. We demonstrate that this method
enables the video generation model to produce the desired videos. Additionally,
we introduce a simple algorithm using perpendicular foot embeddings and cosine
similarity to identify the optimal interpolation embedding. Our findings
highlight the importance of accurate text embeddings and offer a pathway for
improving text-to-video generation performance.
[LINK]
http://arxiv.org/abs/2501.09982v1
[DATE]
2025-01-17 14:46:10+08:00
[CATEGORIES]
cs.CL
cs.LG
LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration
[AUTHORS]
Yukun Cao, Zengyi Gao, Zhiyang Li, Xike Xie, Kevin Zhou, Jianliang Xu
[LINK]
http://arxiv.org/abs/2411.05844v2
[DATE]
2025-01-17 13:33:54+08:00
[CATEGORIES]
cs.CL
A Survey on Multi-Turn Interaction Capabilities of Large Language Models
[AUTHORS]
Chen Zhang, Xinyi Dai, Yaxiong Wu, Qu Yang, Yasheng Wang, Ruiming Tang, Yong Liu
[ABSTRACT]
Multi-turn interaction in the dialogue system research refers to a system’s
ability to maintain context across multiple dialogue turns, enabling it to
generate coherent and contextually relevant responses. Recent advancements in
large language models (LLMs) have significantly expanded the scope of
multi-turn interaction, moving beyond chatbots to enable more dynamic agentic
interactions with users or environments. In this paper, we provide a focused
review of the multi-turn capabilities of LLMs, which are critical for a wide
range of downstream applications, including conversational search and
recommendation, consultation services, and interactive tutoring. This survey
explores four key aspects: (1) the core model capabilities that contribute to
effective multi-turn interaction, (2) how multi-turn interaction is evaluated
in current practice, (3) the general algorithms used to enhance multi-turn
interaction, and (4) potential future directions for research in this field.
[COMMENTS]
Draft Version, 14 pages, Ongoing refinement over time
[LINK]
http://arxiv.org/abs/2501.09959v1
[DATE]
2025-01-17 13:21:49+08:00
[CATEGORIES]
cs.CL
FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs
[AUTHORS]
Zengyi Gao, Yukun Cao, Hairu Wang, Ao Ke, Yuan Feng, Xike Xie, S Kevin Zhou
[ABSTRACT]
To mitigate the hallucination and knowledge deficiency in large language
models (LLMs), Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG)
has shown promising potential by utilizing KGs as external resource to enhance
LLMs reasoning.However, existing KG-RAG approaches struggle with a trade-off
between flexibility and retrieval quality.Modular methods prioritize
flexibility by avoiding the use of KG-fine-tuned models during retrieval,
leading to fixed retrieval strategies and suboptimal retrieval
quality.Conversely, coupled methods embed KG information within models to
improve retrieval quality, but at the expense of flexibility.In this paper, we
propose a novel flexible modular KG-RAG framework, termed FRAG, which
synergizes the advantages of both approaches.FRAG estimates the hop range of
reasoning paths based solely on the query and classify it as either simple or
complex.To match the complexity of the query, tailored pipelines are applied to
ensure efficient and accurate reasoning path retrieval, thus fostering the
final reasoning process.By using the query text instead of the KG to infer the
structural information of reasoning paths and employing adaptable retrieval
strategies, FRAG improves retrieval quality while maintaining
flexibility.Moreover, FRAG does not require extra LLMs fine-tuning or calls,
significantly boosting efficiency and conserving resources.Extensive
experiments show that FRAG achieves state-of-the-art performance with high
efficiency and low resource consumption.
[LINK]
http://arxiv.org/abs/2501.09957v1
[DATE]
2025-01-17 13:19:14+08:00
[CATEGORIES]
cs.CL
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
[AUTHORS]
Chen Cai, Zheng Wang, Jianjun Gao, Wenyang Liu, Ye Lu, Runzhong Zhang, Kim-Hui Yap
[ABSTRACT]
In recent years, the rapid increase in online video content has underscored
the limitations of static Video Question Answering (VideoQA) models trained on
fixed datasets, as they struggle to adapt to new questions or tasks posed by
newly available content. In this paper, we explore the novel challenge of
VideoQA within a continual learning framework, and empirically identify a
critical issue: fine-tuning a large language model (LLM) for a sequence of
tasks often results in catastrophic forgetting. To address this, we propose
Collaborative Prompting (ColPro), which integrates specific question constraint
prompting, knowledge acquisition prompting, and visual temporal awareness
prompting. These prompts aim to capture textual question context, visual
content, and video temporal dynamics in VideoQA, a perspective underexplored in
prior research. Experimental results on the NExT-QA and DramaQA datasets show
that ColPro achieves superior performance compared to existing approaches,
achieving 55.14\% accuracy on NExT-QA and 71.24\% accuracy on DramaQA,
highlighting its practical relevance and effectiveness.
[COMMENTS]
Accepted by main EMNLP 2024
[LINK]
http://arxiv.org/abs/2410.00771v2
[DATE]
2025-01-17 12:47:11+08:00
[CATEGORIES]
cs.CL
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models
[AUTHORS]
Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Gaël Gendron, Timothy Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu
[ABSTRACT]
Large language models exhibit superior capabilities in processing and
understanding language, yet their applications in educational contexts remain
underexplored. Learnersourcing enhances learning by engaging students in
creating their own educational content. When learnersourcing multiple-choice
questions, creating explanations for the solution of a question is a crucial
step; it helps other students understand the solution and promotes a deeper
understanding of related concepts. However, it is often difficult for students
to craft effective solution explanations, due to limited subject understanding.
To help scaffold the task of automated explanation generation, we present and
evaluate a framework called “ILearner-LLM”, that iteratively enhances the
generated explanations for the given questions with large language models.
Comprising an explanation generation model and an explanation evaluation model,
the framework generates high-quality student-aligned explanations by
iteratively feeding the quality rating score from the evaluation model back
into the instruction prompt of the explanation generation model. Experimental
results demonstrate the effectiveness of our ILearner-LLM on LLaMA2-13B and
GPT-4 to generate higher quality explanations that are closer to those written
by students on five PeerWise datasets. Our findings represent a promising path
to enrich the learnersourcing experience for students and to enhance the
capabilities of large language models for educational applications.
[COMMENTS]
The short version (v4) has been accepted as a non-archival workshop
paper at AGI@ICLR 2024, and the full version has been accepted by the main
track of AAAI/EAAI 2025
[LINK]
http://arxiv.org/abs/2309.10444v5
[DATE]
2025-01-17 12:45:45+08:00
[CATEGORIES]
cs.CL
Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning
[AUTHORS]
Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan, Yang Chen, Michael Witbrock, Jiamou Liu
[COMMENTS]
The short version (v3) was accepted for oral presentation at the
first LLM@IJCAI 2023 non-archival symposium, and the full version was
accepted by ICONIP 2024
[LINK]
http://arxiv.org/abs/2310.09430v5
[DATE]
2025-01-17 12:39:38+08:00
[CATEGORIES]
cs.CL
GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning
[AUTHORS]
Zhisheng Tang, Mayank Kejriwal
[ABSTRACT]
Spatial reasoning, an important faculty of human cognition with many
practical applications, is one of the core commonsense skills that is not
purely language-based and, for satisfying (as opposed to optimal) solutions,
requires some minimum degree of planning. Existing benchmarks of Commonsense
Spatial Reasoning (CSR) tend to evaluate how Large Language Models (LLMs)
interpret text-based spatial $\textit{descriptions}$ rather than directly
evaluate a plan produced by the LLM in response to a $\textit{specific}$
spatial reasoning problem. In this paper, we construct a large-scale benchmark
called GRASP, which consists of 16,000 grid-based environments where the agent
is tasked with an energy collection problem. These environments include 100
grid instances instantiated using each of the 160 different grid settings,
involving five different energy distributions, two modes of agent starting
position, and two distinct obstacle configurations, as well as three kinds of
agent constraints. Using GRASP, we compare classic baseline approaches, such as
random walk and greedy search methods, with advanced LLMs like GPT-3.5-Turbo,
GPT-4o, and GPT-o1-mini. The experimental results indicate that even these
advanced LLMs struggle to consistently achieve satisfactory solutions.
[LINK]
http://arxiv.org/abs/2407.01892v2
[DATE]
2025-01-17 12:29:47+08:00
[CATEGORIES]
cs.CL
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
[AUTHORS]
Xindi Tong, Yujin Zhu, Shijian Fan, Liang Xu
[ABSTRACT]
Long text summarization, gradually being essential for efficiently processing
large volumes of information, stays challenging for Large Language Models
(LLMs) such as GPT and LLaMA families because of the insufficient open-sourced
training datasets and the high requirement of contextual details dealing. To
address the issue, we design a novel zero-shot transfer learning framework,
abbreviated as T3, to iteratively training a baseline LLM on an assistant task
for the target task, where the former should own richer data resources and
share structural or semantic similarity with the latter. In practice, T3 is
approached to deal with the long text summarization task by utilizing question
answering as the assistant task, and further validated its effectiveness on the
BBC summary, NarraSum, FairytaleQA, and NLQuAD datasets, with up to nearly 14%
improvement in ROUGE, 35% improvement in BLEU, and 16% improvement in Factscore
compared to three baseline LLMs, demonstrating its potential for more
assistant-target task combinations.
[LINK]
http://arxiv.org/abs/2409.17640v2
[DATE]
2025-01-17 12:26:44+08:00
[CATEGORIES]
cs.CL
Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources
[AUTHORS]
Belu Ticona, Fernando Carranza, Viviana Cotik
[COMMENTS]
Accepted to COLING Main 2025
[LINK]
http://arxiv.org/abs/2501.09943v1
[DATE]
2025-01-17 11:47:19+08:00
[CATEGORIES]
cs.CL
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
[AUTHORS]
Yinhong Liu, Han Zhou, Zhijiang Guo, Ehsan Shareghi, Ivan Vulić, Anna Korhonen, Nigel Collier
[ABSTRACT]
Large Language Models (LLMs) have demonstrated promising capabilities as
automatic evaluators in assessing the quality of generated natural language.
However, LLMs still exhibit biases in evaluation and often struggle to generate
coherent evaluations that align with human assessments. In this work, we first
conduct a systematic study of the misalignment between LLM evaluators and human
evaluation, revealing that existing calibration methods aimed at mitigating
biases of LLMs are insufficient for effectively aligning LLM evaluators.
Inspired by the use of preference data in RLHF, we formulate the evaluation as
a ranking problem and introduce Pairwise-preference Search (PAIRS), an
uncertainty-guided search-based rank aggregation method that employs LLMs to
conduct pairwise comparisons locally and efficiently ranks candidate texts
globally. PAIRS achieves state-of-the-art performance on representative
evaluation tasks in long-form generations and demonstrates significant
improvements over direct scoring. Furthermore, we provide insights into the
role of pairwise preference in quantifying the transitivity of LLMs and
demonstrate how PAIRS benefits from calibration using debiased pairwise
evaluations.
[COMMENTS]
This paper has been accepted by COLM 2024
[LINK]
http://arxiv.org/abs/2403.16950v5
[DATE]
2025-01-17 11:43:53+08:00
[CATEGORIES]
cs.CL
cs.LG
Passage Segmentation of Documents for Extractive Question Answering
[AUTHORS]
Zuhong Liu, Charles-Elie Simon, Fabien Caspani
[ABSTRACT]
Retrieval-Augmented Generation (RAG) has proven effective in open-domain
question answering. However, the chunking process, which is essential to this
pipeline, often receives insufficient attention relative to retrieval and
synthesis components. This study emphasizes the critical role of chunking in
improving the performance of both dense passage retrieval and the end-to-end
RAG pipeline. We then introduce the Logits-Guided Multi-Granular Chunker
(LGMGC), a novel framework that splits long documents into contextualized,
self-contained chunks of varied granularity. Our experimental results,
evaluated on two benchmark datasets, demonstrate that LGMGC not only improves
the retrieval step but also outperforms existing chunking methods when
integrated into a RAG pipeline.
[LINK]
http://arxiv.org/abs/2501.09940v1
[DATE]
2025-01-17 11:42:18+08:00
[CATEGORIES]
cs.CL
NL2KQL: From Natural Language to Kusto Query
[AUTHORS]
Xinye Tang, Amir H. Abdi, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing
[ABSTRACT]
Data is growing rapidly in volume and complexity. Proficiency in database
query languages is pivotal for crafting effective queries. As coding assistants
become more prevalent, there is significant opportunity to enhance database
query languages. The Kusto Query Language (KQL) is a widely used query language
for large semi-structured data such as logs, telemetries, and time-series for
big data analytics platforms. This paper introduces NL2KQL an innovative
framework that uses large language models (LLMs) to convert natural language
queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several
key components: Schema Refiner which narrows down the schema to its most
pertinent elements; the Few-shot Selector which dynamically selects relevant
examples from a few-shot dataset; and the Query Refiner which repairs syntactic
and semantic errors in KQL queries. Additionally, this study outlines a method
for generating large datasets of synthetic NLQ-KQL pairs which are valid within
a specific database contexts. To validate NL2KQL’s performance, we utilize an
array of online (based on query execution) and offline (based on query parsing)
metrics. Through ablation studies, the significance of each framework component
is examined, and the datasets used for benchmarking are made publicly
available. This work is the first of its kind and is compared with available
baselines to demonstrate its effectiveness.
[LINK]
http://arxiv.org/abs/2404.02933v4
[DATE]
2025-01-17 11:19:16+08:00
[CATEGORIES]
cs.CL
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
[AUTHORS]
Han Guo, William Brandon, Radostin Cholakov, Jonathan Ragan-Kelley, Eric P. Xing, Yoon Kim
[COMMENTS]
EMNLP 2024 (Findings)
[LINK]
http://arxiv.org/abs/2407.10960v4
[DATE]
2025-01-17 11:09:24+08:00
[CATEGORIES]
cs.LG
cs.CL
Steering Large Language Models with Feature Guided Activation Additions
[AUTHORS]
Samuel Soo, Wesley Teng, Chandrasekaran Balaganesh
[ABSTRACT]
Effective and reliable control over large language model (LLM) behavior is a
significant challenge. While activation steering methods, which add steering
vectors to a model’s hidden states, are a promising approach, existing
techniques often lack precision and interpretability in how they influence
model outputs. We introduce Feature Guided Activation Additions (FGAA), a novel
activation steering method that leverages insights from Contrastive Activation
Addition (CAA) and Sparse Autoencoder-Targeted Steering (SAE-TS). By operating
in the latent space of a Sparse Autoencoder (SAE) and employing optimization
techniques to select desired SAE features, FGAA constructs precise steering
vectors that provide better steering effects while maintaining coherence of
steered model outputs. In this regard, evaluations on Gemma-2-2B and Gemma-2-9B
models across various steering tasks demonstrate that FGAA outperforms existing
steering methods of CAA, SAE decoder steering, and SAE-TS. Our results also
highlight important trade-offs between steering scale and general model
capabilities that are consistent across all tested steering methods.
[COMMENTS]
7 maintext pages, 14 appendix pages
[LINK]
http://arxiv.org/abs/2501.09929v1
[DATE]
2025-01-17 10:55:23+08:00
[CATEGORIES]
cs.LG
cs.CL
Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs
[AUTHORS]
Reham Omar, Omij Mangukiya, Essam Mansour
[ABSTRACT]
Dialogue benchmarks are crucial in training and evaluating chatbots engaging
in domain-specific conversations. Knowledge graphs (KGs) represent semantically
rich and well-organized data spanning various domains, such as DBLP, DBpedia,
and YAGO. Traditionally, dialogue benchmarks have been manually created from
documents, neglecting the potential of KGs in automating this process. Some
question-answering benchmarks are automatically generated using extensive
preprocessing from KGs, but they do not support dialogue generation. This paper
introduces Chatty-Gen, a novel multi-stage retrieval-augmented generation
platform for automatically generating high-quality dialogue benchmarks tailored
to a specific domain using a KG. Chatty-Gen decomposes the generation process
into manageable stages and uses assertion rules for automatic validation
between stages. Our approach enables control over intermediate results to
prevent time-consuming restarts due to hallucinations. It also reduces reliance
on costly and more powerful commercial LLMs. Chatty-Gen eliminates upfront
processing of the entire KG using efficient query-based retrieval to find
representative subgraphs based on the dialogue context. Our experiments with
several real and large KGs demonstrate that Chatty-Gen significantly
outperforms state-of-the-art systems and ensures consistent model and system
performance across multiple LLMs of diverse capabilities, such as GPT-4o,
Gemini 1.5, Llama 3, and Mistral.
[COMMENTS]
The paper is publsihed in SIGMOD 2025
[LINK]
http://arxiv.org/abs/2501.09928v1
[DATE]
2025-01-17 10:48:29+08:00
[CATEGORIES]
cs.CL
LADDER: Language Driven Slice Discovery and Error Rectification
[AUTHORS]
Shantanu Ghosh, Rayan Syed, Chenyu Wang, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich
[ABSTRACT]
Error slice discovery is crucial to diagnose and mitigate model errors.
Current clustering or discrete attribute-based slice discovery methods face key
limitations: 1) clustering results in incoherent slices, while assigning
discrete attributes to slices leads to incomplete coverage of error patterns
due to missing or insufficient attributes; 2) these methods lack complex
reasoning, preventing them from fully explaining model biases; 3) they fail to
integrate \textit{domain knowledge}, limiting their usage in specialized fields
\eg radiology. We propose\ladder (\underline{La}nguage-\underline{D}riven
\underline{D}iscovery and \underline{E}rror \underline{R}ectification), to
address the limitations by: (1) leveraging the flexibility of natural language
to address incompleteness, (2) employing LLM’s latent \textit{domain knowledge}
and advanced reasoning to analyze sentences and derive testable hypotheses
directly, identifying biased attributes, and form coherent error slices without
clustering. Existing mitigation methods typically address only the
worst-performing group, often amplifying errors in other subgroups. In
contrast,\ladder generates pseudo attributes from the discovered hypotheses to
mitigate errors across all biases without explicit attribute annotations or
prior knowledge of bias. Rigorous evaluations on 6 datasets spanning natural
and medical images – comparing 200+ classifiers with diverse architectures,
pretraining strategies, and LLMs – show that\ladder consistently outperforms
existing baselines in discovering and mitigating biases.
[LINK]
http://arxiv.org/abs/2408.07832v8
[DATE]
2025-01-17 10:18:00+08:00
[CATEGORIES]
cs.CL
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
[AUTHORS]
Ziyao Zhang, Yanlin Wang, Chong Wang, Jiachi Chen, Zibin Zheng
[ABSTRACT]
Code generation aims to automatically generate code from input requirements,
significantly enhancing development efficiency. Recent large language models
(LLMs) based approaches have shown promising results and revolutionized code
generation task. Despite the promising performance, LLMs often generate
contents with hallucinations, especially for the code generation scenario
requiring the handling of complex contextual dependencies in practical
development process. Although previous study has analyzed hallucinations in
LLM-powered code generation, the study is limited to standalone function
generation. In this paper, we conduct an empirical study to study the
phenomena, mechanism, and mitigation of LLM hallucinations within more
practical and complex development contexts in repository-level generation
scenario. First, we manually examine the code generation results from six
mainstream LLMs to establish a hallucination taxonomy of LLM-generated code.
Next, we elaborate on the phenomenon of hallucinations, analyze their
distribution across different models. We then analyze causes of hallucinations
and identify four potential factors contributing to hallucinations. Finally, we
propose an RAG-based mitigation method, which demonstrates consistent
effectiveness in all studied LLMs. The replication package including code,
data, and experimental results is available at
https://github.com/DeepSoftwareAnalytics/LLMCodingHallucination
[COMMENTS]
Accepted by ISSTA 2025
[LINK]
http://arxiv.org/abs/2409.20550v2
[DATE]
2025-01-17 09:44:44+08:00
[CATEGORIES]
cs.CL
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
[AUTHORS]
Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2402.18540v2
[DATE]
2025-01-17 09:43:21+08:00
[CATEGORIES]
cs.LG
cs.CL
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs
[AUTHORS]
Nada Saadi, Tathagata Raha, Clément Christophe, Marco AF Pimentel, Ronnie Rajan, Praveen K Kanithi
[ABSTRACT]
This paper investigates the challenges of developing large language models
(LLMs) proficient in both multilingual understanding and medical knowledge. We
demonstrate that simply translating medical data does not guarantee strong
performance on clinical tasks in the target language. Our experiments reveal
that the optimal language mix in training data varies significantly across
different medical tasks. We find that larger models with carefully calibrated
language ratios achieve superior performance on native-language clinical tasks.
Furthermore, our results suggest that relying solely on fine-tuning may not be
the most effective approach for incorporating new language knowledge into LLMs.
Instead, data and computationally intensive pretraining methods may still be
necessary to achieve optimal performance in multilingual medical settings.
These findings provide valuable guidance for building effective and inclusive
medical AI systems for diverse linguistic communities.
[LINK]
http://arxiv.org/abs/2501.09825v1
[DATE]
2025-01-17 04:24:56+08:00
[CATEGORIES]
cs.CL
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
[AUTHORS]
Maxwell J. Yin, Dingyi Jiang, Yongbing Chen, Boyu Wang, Charles Ling
[ABSTRACT]
Chain-of-Thought (CoT) reasoning in smaller language models is a challenging
natural language process problem yet highly desirable in many real-life
applications. Existing CoT knowledge distillation methods often suffer from
overly conservative memorization in smaller LLMs, leading to low generalization
confidence. As fully preserving the CoT ability of teacher model is impossible,
we hypothesize that adversarial CoT fine-tuning is crucial for developing
smaller LLM with robust CoT generalization. To this end, we propose
\textit{PRompt-Assisted Domain-Adversarial fine-tuning} (PRADA), a principled
fine-tuning framework that integrates diverse CoT domains. Specifically, PRADA
pioneers two CoT improvements in smaller LLM: (1) Recovering the
domain-invariant feature insight which typically lost during distillation with
domain adversarial fine-tuning; (2) Enhancing the domain adaptability of CoT
prompt engineering by employing domain-adversarial approaches. We theoretically
demonstrate the effectiveness of our approach and empirically show that it
significantly outperforms the state of the arts in a wide range of tasks.
Moreover, our empirical findings reveal that the smaller LLM, when leveraging
PRADA, aligns closely with domain knowledge, thereby improving the
explainability of our approach.
[LINK]
http://arxiv.org/abs/2501.09804v1
[DATE]
2025-01-17 03:23:11+08:00
[CATEGORIES]
cs.LG
cs.CL
Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems
[AUTHORS]
Soham Roy, Mitul Goswami, Nisharg Nargund, Suneeta Mohanty, Prasant Kumar Pattnaik
[ABSTRACT]
This study introduces a system leveraging Large Language Models (LLMs) to
extract text and enhance user interaction with PDF documents via a
conversational interface. Utilizing Retrieval-Augmented Generation (RAG), the
system provides informative responses to user inquiries while highlighting
relevant passages within the PDF. Upon user upload, the system processes the
PDF, employing sentence embeddings to create a document-specific vector store.
This vector store enables efficient retrieval of pertinent sections in response
to user queries. The LLM then engages in a conversational exchange, using the
retrieved information to extract text and generate comprehensive, contextually
aware answers. While our approach demonstrates competitive ROUGE values
compared to existing state-of-the-art techniques for text extraction and
summarization, we acknowledge that further qualitative evaluation is necessary
to fully assess its effectiveness in real-world applications. The proposed
system gives competitive ROUGE values as compared to existing state-of-the-art
techniques for text extraction and summarization, thus offering a valuable tool
for researchers, students, and anyone seeking to efficiently extract knowledge
and gain insights from documents through an intuitive question-answering
interface.
[LINK]
http://arxiv.org/abs/2501.09801v1
[DATE]
2025-01-17 03:12:25+08:00
[CATEGORIES]
cs.CL
Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API
[AUTHORS]
Andrey Labunets, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, Earlence Fernandes
[ABSTRACT]
We surface a new threat to closed-weight Large Language Models (LLMs) that
enables an attacker to compute optimization-based prompt injections.
Specifically, we characterize how an attacker can leverage the loss-like
information returned from the remote fine-tuning interface to guide the search
for adversarial prompts. The fine-tuning interface is hosted by an LLM vendor
and allows developers to fine-tune LLMs for their tasks, thus providing
utility, but also exposes enough information for an attacker to compute
adversarial prompts. Through an experimental analysis, we characterize the
loss-like values returned by the Gemini fine-tuning API and demonstrate that
they provide a useful signal for discrete optimization of adversarial prompts
using a greedy search algorithm. Using the PurpleLlama prompt injection
benchmark, we demonstrate attack success rates between 65% and 82% on Google’s
Gemini family of LLMs. These attacks exploit the classic utility-security
tradeoff - the fine-tuning interface provides a useful feature for developers
but also exposes the LLMs to powerful attacks.
[LINK]
http://arxiv.org/abs/2501.09798v1
[DATE]
2025-01-17 03:01:25+08:00
[CATEGORIES]
cs.CL
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
[AUTHORS]
Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Ningyu Zhang, Jiang Yong, Pengjun Xie, Fei Huang, Huajun Chen
[ABSTRACT]
Machine writing with large language models often relies on
retrieval-augmented generation. However, these approaches remain confined
within the boundaries of the model’s predefined scope, limiting the generation
of content with rich information. Specifically, vanilla-retrieved information
tends to lack depth, utility, and suffers from redundancy, which negatively
impacts the quality of generated articles, leading to shallow, repetitive, and
unoriginal outputs. To address these issues, we propose OmniThink, a machine
writing framework that emulates the human-like process of iterative expansion
and reflection. The core idea behind OmniThink is to simulate the cognitive
behavior of learners as they progressively deepen their knowledge of the
topics. Experimental results demonstrate that OmniThink improves the knowledge
density of generated articles without compromising metrics such as coherence
and depth. Human evaluations and expert feedback further highlight the
potential of OmniThink to address real-world challenges in the generation of
long-form articles.
[LINK]
http://arxiv.org/abs/2501.09751v1
[DATE]
2025-01-17 02:58:06+08:00
[CATEGORIES]
cs.CL
cs.LG
Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text
[AUTHORS]
Jihed Ncib
[ABSTRACT]
This study conducts a systematic assessment of the capabilities of 12 machine
learning models and model variations in detecting economic ideology. As an
evaluation benchmark, I use manifesto data spanning six elections in the United
Kingdom and pre-annotated by expert and crowd coders. The analysis assesses the
performance of several generative, fine-tuned, and zero-shot models at the
granular and aggregate levels. The results show that generative models such as
GPT-4o and Gemini 1.5 Flash consistently outperform other models against all
benchmarks. However, they pose issues of accessibility and resource
availability. Fine-tuning yielded competitive performance and offers a reliable
alternative through domain-specific optimization. But its dependency on
training data severely limits scalability. Zero-shot models consistently face
difficulties with identifying signals of economic ideology, often resulting in
negative associations with human coding. Using general knowledge for the
domain-specific task of ideology scaling proved to be unreliable. Other key
findings include considerable within-party variation, fine-tuning benefiting
from larger training data, and zero-shot’s sensitivity to prompt content. The
assessments include the strengths and limitations of each model and derive
best-practices for automated analyses of political content.
[LINK]
http://arxiv.org/abs/2501.09719v1
[DATE]
2025-01-17 02:06:22+08:00
[CATEGORIES]
cs.CL
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models
[AUTHORS]
Jonathan Katzy, Razvan Mihai Popescu, Arie van Deursen, Maliheh Izadi
[ABSTRACT]
The recent rise in the popularity of large language models has spurred the
development of extensive code datasets needed to train them. This has left
limited code available for collection and use in the downstream investigation
of specific behaviors, or evaluation of large language models without suffering
from data contamination. To address this problem, we release The Heap, a large
multilingual dataset covering 57 programming languages that has been
deduplicated with respect to other open datasets of code, enabling researchers
to conduct fair evaluations of large language models without significant data
cleaning overhead.
[COMMENTS]
Pre-Print. Accepted to FORGE 2025 Dataset Track
[LINK]
http://arxiv.org/abs/2501.09653v1
[DATE]
2025-01-17 00:48:41+08:00
[CATEGORIES]
cs.CL
CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding
[AUTHORS]
Johannes Kirmayr, Lukas Stappen, Phillip Schneider, Florian Matthes, Elisabeth André
[ABSTRACT]
In today’s assistant landscape, personalisation enhances interactions,
fosters long-term relationships, and deepens engagement. However, many systems
struggle with retaining user preferences, leading to repetitive user requests
and disengagement. Furthermore, the unregulated and opaque extraction of user
preferences in industry applications raises significant concerns about privacy
and trust, especially in regions with stringent regulations like Europe. In
response to these challenges, we propose a long-term memory system for voice
assistants, structured around predefined categories. This approach leverages
Large Language Models to efficiently extract, store, and retrieve preferences
within these categories, ensuring both personalisation and transparency. We
also introduce a synthetic multi-turn, multi-session conversation dataset
(CarMem), grounded in real industry data, tailored to an in-car voice assistant
setting. Benchmarked on the dataset, our system achieves an F1-score of .78 to
.95 in preference extraction, depending on category granularity. Our
maintenance strategy reduces redundant preferences by 95% and contradictory
ones by 92%, while the accuracy of optimal retrieval is at .87. Collectively,
the results demonstrate the system’s suitability for industrial applications.
[COMMENTS]
Accepted for presentation at the International Conference on
Computational Linguistics (COLING 2025)
[LINK]
http://arxiv.org/abs/2501.09645v1
[DATE]
2025-01-17 00:37:33+08:00
[CATEGORIES]
cs.CL
Aligning Brain Activity with Advanced Transformer Models: Exploring the Role of Punctuation in Semantic Processing
[AUTHORS]
Zenon Lamprou, Frank Polick, Yashar Moshfeghi
[ABSTRACT]
This research examines the congruence between neural activity and advanced
transformer models, emphasizing the semantic significance of punctuation in
text understanding. Utilizing an innovative approach originally proposed by
Toneva and Wehbe, we evaluate four advanced transformer models RoBERTa,
DistiliBERT, ALBERT, and ELECTRA against neural activity data. Our findings
indicate that RoBERTa exhibits the closest alignment with neural activity,
surpassing BERT in accuracy. Furthermore, we investigate the impact of
punctuation removal on model performance and neural alignment, revealing that
BERT’s accuracy enhances in the absence of punctuation. This study contributes
to the comprehension of how neural networks represent language and the
influence of punctuation on semantic processing within the human brain.
[LINK]
http://arxiv.org/abs/2501.06278v2
[DATE]
2025-01-17 00:19:24+08:00
[CATEGORIES]
cs.CL
cs.LG
Logarithmic Regret for Nonlinear Control
[AUTHORS]
James Wang, Bruce D. Lee, Ingvar Ziemann, Nikolai Matni
[ABSTRACT]
We address the problem of learning to control an unknown nonlinear dynamical
system through sequential interactions. Motivated by high-stakes applications
in which mistakes can be catastrophic, such as robotics and healthcare, we
study situations where it is possible for fast sequential learning to occur.
Fast sequential learning is characterized by the ability of the learning agent
to incur logarithmic regret relative to a fully-informed baseline. We
demonstrate that fast sequential learning is achievable in a diverse class of
continuous control problems where the system dynamics depend smoothly on
unknown parameters, provided the optimal control policy is persistently
exciting. Additionally, we derive a regret bound which grows with the square
root of the number of interactions for cases where the optimal policy is not
persistently exciting. Our results provide the first regret bounds for
controlling nonlinear dynamical systems depending nonlinearly on unknown
parameters. We validate the trends our theory predicts in simulation on a
simple dynamical system.
[LINK]
http://arxiv.org/abs/2501.10261v1
[DATE]
2025-01-17 23:42:42+08:00
[CATEGORIES]
cs.LG
DADA: Dual Averaging with Distance Adaptation
[AUTHORS]
Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian Stich
[ABSTRACT]
We present a novel universal gradient method for solving convex optimization
problems. Our algorithm – Dual Averaging with Distance Adaptation (DADA) – is
based on the classical scheme of dual averaging and dynamically adjusts its
coefficients based on observed gradients and the distance between iterates and
the starting point, eliminating the need for problem-specific parameters. DADA
is a universal algorithm that simultaneously works for a broad spectrum of
problem classes, provided the local growth of the objective function around its
minimizer can be bounded. Particular examples of such problem classes are
nonsmooth Lipschitz functions, Lipschitz-smooth functions, H"older-smooth
functions, functions with high-order Lipschitz derivative,
quasi-self-concordant functions, and $(L_0,L_1)$-smooth functions. Crucially,
DADA is applicable to both unconstrained and constrained problems, even when
the domain is unbounded, without requiring prior knowledge of the number of
iterations or desired accuracy.
[LINK]
http://arxiv.org/abs/2501.10258v1
[DATE]
2025-01-17 23:40:03+08:00
[CATEGORIES]
cs.LG
Large Language Model is Secretly a Protein Sequence Optimizer
[AUTHORS]
Yinkai Wang, Jiaxing He, Yuanqi Du, Xiaohui Chen, Jianan Canal Li, Li-Ping Liu, Xiaolin Xu, Soha Hassoun
[ABSTRACT]
We consider the protein sequence engineering problem, which aims to find
protein sequences with high fitness levels, starting from a given wild-type
sequence. Directed evolution has been a dominating paradigm in this field which
has an iterative process to generate variants and select via experimental
feedback. We demonstrate large language models (LLMs), despite being trained on
massive texts, are secretly protein sequence optimizers. With a directed
evolutionary method, LLM can perform protein engineering through Pareto and
experiment-budget constrained optimization, demonstrating success on both
synthetic and experimental fitness landscapes.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2501.09274v2
[DATE]
2025-01-17 23:22:00+08:00
[CATEGORIES]
cs.LG
Bridging Diversity and Uncertainty in Active learning with Self-Supervised Pre-Training
[AUTHORS]
Paul Doucet, Benjamin Estermann, Till Aczel, Roger Wattenhofer
[ABSTRACT]
This study addresses the integration of diversity-based and uncertainty-based
sampling strategies in active learning, particularly within the context of
self-supervised pre-trained models. We introduce a straightforward heuristic
called TCM that mitigates the cold start problem while maintaining strong
performance across various data levels. By initially applying TypiClust for
diversity sampling and subsequently transitioning to uncertainty sampling with
Margin, our approach effectively combines the strengths of both strategies. Our
experiments demonstrate that TCM consistently outperforms existing methods
across various datasets in both low and high data regimes.
[COMMENTS]
Accepted at ICLR 2024 Workshop on Practical Machine Learning for Low
Resource Settings (PML4LRS)
[LINK]
http://arxiv.org/abs/2403.03728v2
[DATE]
2025-01-17 23:15:15+08:00
[CATEGORIES]
cs.LG
Over-the-Air Multi-Sensor Inference with Neural Networks Using Memristor-Based Analog Computing
[AUTHORS]
Busra Tegin, Muhammad Atif Ali, Tolga M Duman
[ABSTRACT]
Deep neural networks provide reliable solutions for many classification and
regression tasks; however, their application in real-time wireless systems with
simple sensor networks is limited due to high energy consumption and
significant bandwidth needs. This study proposes a multi-sensor wireless
inference system with memristor-based analog computing. Given the sensors’
limited computational capabilities, the features from the network’s front end
are transmitted to a central device where an $L_p$-norm inspired approximation
of the maximum operation is employed to achieve transformation-invariant
features, enabling efficient over-the-air transmission. We also introduce a
trainable over-the-air sensor fusion method based on $L_p$-norm inspired
combining function that customizes sensor fusion to match the network and
sensor distribution characteristics, enhancing adaptability. To address the
energy constraints of sensors, we utilize memristors, known for their
energy-efficient in-memory computing, enabling analog-domain computations that
reduce energy use and computational overhead in edge computing. This dual
approach of memristors and $L_p$-norm inspired sensor fusion fosters
energy-efficient computational and transmission paradigms and serves as a
practical energy-efficient solution with minimal performance loss.
[COMMENTS]
34 pages
[LINK]
http://arxiv.org/abs/2501.10245v1
[DATE]
2025-01-17 23:14:58+08:00
[CATEGORIES]
cs.LG
Challenges and recommendations for Electronic Health Records data extraction and preparation for dynamic prediction modelling in hospitalized patients – a practical guide
[AUTHORS]
Elena Albu, Shan Gao, Pieter Stijnen, Frank E. Rademakers, Bas C T van Bussel, Taya Collyer, Tina Hernandez-Boussard, Laure Wynants, Ben Van Calster
[ABSTRACT]
Dynamic predictive modeling using electronic health record (EHR) data has
gained significant attention in recent years. The reliability and
trustworthiness of such models depend heavily on the quality of the underlying
data, which is largely determined by the stages preceding the model
development: data extraction from EHR systems and data preparation. We list
over forty challenges encountered during these stages and provide actionable
recommendations for addressing them. These challenges are organized into four
categories: cohort definition, outcome definition, feature engineering, and
data cleaning. This list is designed to serve as a practical guide for data
extraction engineers and researchers, supporting better practices and improving
the quality and real-world applicability of dynamic prediction models in
clinical settings.
[LINK]
http://arxiv.org/abs/2501.10240v1
[DATE]
2025-01-17 23:09:57+08:00
[CATEGORIES]
cs.LG
Counterfactual Explanations for k-means and Gaussian Clustering
[AUTHORS]
Georgios Vardakas, Antonia Karra, Evaggelia Pitoura, Aristidis Likas
[ABSTRACT]
Counterfactuals have been recognized as an effective approach to explain
classifier decisions. Nevertheless, they have not yet been considered in the
context of clustering. In this work, we propose the use of counterfactuals to
explain clustering solutions. First, we present a general definition for
counterfactuals for model-based clustering that includes plausibility and
feasibility constraints. Then we consider the counterfactual generation problem
for k-means and Gaussian clustering assuming Euclidean distance. Our approach
takes as input the factual, the target cluster, a binary mask indicating
actionable or immutable features and a plausibility factor specifying how far
from the cluster boundary the counterfactual should be placed. In the k-means
clustering case, analytical mathematical formulas are presented for computing
the optimal solution, while in the Gaussian clustering case (assuming full,
diagonal, or spherical covariances) our method requires the numerical solution
of a nonlinear equation with a single parameter only. We demonstrate the
advantages of our approach through illustrative examples and quantitative
experimental comparisons.
[LINK]
http://arxiv.org/abs/2501.10234v1
[DATE]
2025-01-17 22:56:20+08:00
[CATEGORIES]
cs.LG
Amortized Bayesian Mixture Models
[AUTHORS]
Šimon Kucharský, Paul Christian Bürkner
[ABSTRACT]
Finite mixtures are a broad class of models useful in scenarios where
observed data is generated by multiple distinct processes but without explicit
information about the responsible process for each data point. Estimating
Bayesian mixture models is computationally challenging due to issues such as
high-dimensional posterior inference and label switching. Furthermore,
traditional methods such as MCMC are applicable only if the likelihoods for
each mixture component are analytically tractable.
Amortized Bayesian Inference (ABI) is a simulation-based framework for
estimating Bayesian models using generative neural networks. This allows the
fitting of models without explicit likelihoods, and provides fast inference.
ABI is therefore an attractive framework for estimating mixture models. This
paper introduces a novel extension of ABI tailored to mixture models. We
factorize the posterior into a distribution of the parameters and a
distribution of (categorical) mixture indicators, which allows us to use a
combination of generative neural networks for parameter inference, and
classification networks for mixture membership identification. The proposed
framework accommodates both independent and dependent mixture models, enabling
filtering and smoothing. We validate and demonstrate our approach through
synthetic and real-world datasets.
[COMMENTS]
34 pages, 17 figures
[LINK]
http://arxiv.org/abs/2501.10229v1
[DATE]
2025-01-17 22:51:03+08:00
[CATEGORIES]
cs.LG
Modelling Activity Scheduling Behaviour with Deep Generative Machine Learning
[AUTHORS]
Fred Shone, Tim Hillel
[ABSTRACT]
We model human activity scheduling behaviour using a deep generative machine
learning approach. Activity schedules, which represent the activities and
associated travel behaviours of individuals, are a core component of many
applied models in the transport, energy and epidemiology domains. Our data
driven approach learns human preferences and scheduling logic without the need
for complex interacting combinations of sub-models and custom-rules, this makes
our approach significantly faster and simpler to operate that existing
approaches. We find activity schedule data combines aspects of both continuous
image data and also discrete text data, requiring novel approaches. We
additionally contribute a novel schedule representation and comprehensive
evaluation framework for generated schedules. Evaluation shows our approach is
able to rapidly generate large, diverse and realistic synthetic samples of
activity schedules.
[LINK]
http://arxiv.org/abs/2501.10221v1
[DATE]
2025-01-17 22:37:54+08:00
[CATEGORIES]
cs.LG
The Relevance of AWS Chronos: An Evaluation of Standard Methods for Time Series Forecasting with Limited Tuning
[AUTHORS]
Matthew Baron, Alex Karpinski
[ABSTRACT]
A systematic comparison of Chronos, a transformer-based time series
forecasting framework, against traditional approaches including ARIMA and
Prophet. We evaluate these models across multiple time horizons and user
categories, with a focus on the impact of historical context length. Our
analysis reveals that while Chronos demonstrates superior performance for
longer-term predictions and maintains accuracy with increased context,
traditional models show significant degradation as context length increases. We
find that prediction quality varies systematically between user classes,
suggesting that underlying behavior patterns always influence model
performance. This study provides a case for deploying Chronos in real-world
applications where limited model tuning is feasible, especially in scenarios
requiring longer prediction.
[LINK]
http://arxiv.org/abs/2501.10216v1
[DATE]
2025-01-17 22:23:54+08:00
[CATEGORIES]
cs.LG
Hypercone Assisted Contour Generation for Out-of-Distribution Detection
[AUTHORS]
Annita Vapsi, Andrés Muñoz, Nancy Thomas, Keshav Ramani, Daniel Borrajo
[LINK]
http://arxiv.org/abs/2501.10209v1
[DATE]
2025-01-17 22:08:32+08:00
[CATEGORIES]
cs.LG
Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach
[AUTHORS]
Nicolas Atienza, Christophe Labreuche, Johanne Cohen, Michele Sebag
[ABSTRACT]
This paper introduces a novel method, Sample-efficient Probabilistic
Detection using Extreme Value Theory (SPADE), which transforms a classifier
into an abstaining classifier, offering provable protection against
out-of-distribution and adversarial samples. The approach is based on a
Generalized Extreme Value (GEV) model of the training distribution in the
classifier’s latent space, enabling the formal characterization of OOD samples.
Interestingly, under mild assumptions, the GEV model also allows for formally
characterizing adversarial samples. The abstaining classifier, which rejects
samples based on their assessment by the GEV model, provably avoids OOD and
adversarial samples. The empirical validation of the approach, conducted on
various neural architectures (ResNet, VGG, and Vision Transformer) and medium
and large-sized datasets (CIFAR-10, CIFAR-100, and ImageNet), demonstrates its
frugality, stability, and efficiency compared to the state of the art.
[COMMENTS]
under review
[LINK]
http://arxiv.org/abs/2501.10202v1
[DATE]
2025-01-17 21:51:14+08:00
[CATEGORIES]
cs.LG
Contributions to the Decision Theoretic Foundations of Machine Learning and Robust Statistics under Weakly Structured Information
[AUTHORS]
Christoph Jansen
[COMMENTS]
Habilitation Thesis
[LINK]
http://arxiv.org/abs/2501.10195v1
[DATE]
2025-01-17 21:39:51+08:00
[CATEGORIES]
cs.LG
Surrogate-based multiscale analysis of experiments on thermoplastic composites under off-axis loading
[AUTHORS]
M. A. Maia, I. B. C. M. Rocha, D. Kovačević, F. P. van der Meer
[ABSTRACT]
In this paper, we present a surrogate-based multiscale approach to model
constant strain-rate and creep experiments on unidirectional thermoplastic
composites under off-axis loading. In previous contributions, these experiments
were modeled through a single-scale micromechanical simulation under the
assumption of macroscopic homogeneity. Although efficient and accurate in many
scenarios, simulations with low-off axis angles showed significant
discrepancies with the experiments. It was hypothesized that the mismatch was
caused by macroscopic inhomogeneity, which would require a multiscale approach
to capture it. However, full-field multiscale simulations remain
computationally prohibitive. To address this issue, we replace the micromodel
with a Physically Recurrent Neural Network (PRNN), a surrogate model that
combines data-driven components with embedded constitutive models to capture
history-dependent behavior naturally. The explainability of the latent space of
this network is also explored in a transfer learning strategy that requires no
re-training. With the surrogate-based simulations, we confirm the hypothesis
raised on the inhomogeneity of the macroscopic strain field and gain insights
into the influence of adjustment of the experimental setup with oblique
end-tabs. Results from the surrogate-based multiscale approach show better
agreement with experiments than the single-scale micromechanical approach over
a wide range of settings, although with limited accuracy on the creep
experiments, where macroscopic test effects were implicitly taken into account
in the material properties calibration.
[COMMENTS]
21 pages. 31 figures
[LINK]
http://arxiv.org/abs/2501.10193v1
[DATE]
2025-01-17 21:39:10+08:00
[CATEGORIES]
cs.LG
Improved learning rates in multi-unit uniform price auctions
[AUTHORS]
Marius Potfer, Dorian Baudry, Hugo Richard, Vianney Perchet, Cheng Wan
[ABSTRACT]
Motivated by the strategic participation of electricity producers in
electricity day-ahead market, we study the problem of online learning in
repeated multi-unit uniform price auctions focusing on the adversarial opposing
bid setting. The main contribution of this paper is the introduction of a new
modeling of the bid space. Indeed, we prove that a learning algorithm
leveraging the structure of this problem achieves a regret of
$\tilde{O}(K^{4/3}T^{2/3})$ under bandit feedback, improving over the bound of
$\tilde{O}(K^{7/4}T^{3/4})$ previously obtained in the literature. This
improved regret rate is tight up to logarithmic terms. Inspired by electricity
reserve markets, we further introduce a different feedback model under which
all winning bids are revealed. This feedback interpolates between the
full-information and bandit scenarios depending on the auctions’ results. We
prove that, under this feedback, the algorithm that we propose achieves regret
$\tilde{O}(K^{5/2}\sqrt{T})$.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2501.10181v1
[DATE]
2025-01-17 21:26:12+08:00
[CATEGORIES]
cs.LG
Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization
[AUTHORS]
Valentio Iverson, Stephen Vavasis
[ABSTRACT]
Parameter estimation is a fundamental challenge in machine learning, crucial
for tasks such as neural network weight fitting and Bayesian inference. This
paper focuses on the complexity of estimating translation $\boldsymbol{\mu} \in
\mathbb{R}^l$ and shrinkage $\sigma \in \mathbb{R}_{++}$ parameters for a
distribution of the form $\frac{1}{\sigma^l} f_0 \left( \frac{\boldsymbol{x} -
\boldsymbol{\mu}}{\sigma} \right)$, where $f_0$ is a known density in
$\mathbb{R}^l$ given $n$ samples. We highlight that while the problem is
NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain
$\varepsilon$-approximations for arbitrary $\varepsilon > 0$ within
$\text{poly} \left( \frac{1}{\varepsilon} \right)$ time using the Wasserstein
distance.
[LINK]
http://arxiv.org/abs/2501.10172v1
[DATE]
2025-01-17 21:07:52+08:00
[CATEGORIES]
cs.LG
Convex Physics Informed Neural Networks for the Monge-Ampère Optimal Transport Problem
[AUTHORS]
Alexandre Caboussat, Anna Peruso
[ABSTRACT]
Optimal transportation of raw material from suppliers to customers is an
issue arising in logistics that is addressed here with a continuous model
relying on optimal transport theory. A physics informed neuralnetwork method is
advocated here for the solution of the corresponding generalized Monge-Ampere
equation. Convex neural networks are advocated to enforce the convexity of the
solution to the Monge-Amp\
ere equation and obtain a suitable approximation of
the optimal transport map. A particular focus is set on the enforcement of
transport boundary conditions in the loss function. Numerical experiments
illustrate the solution to the optimal transport problem in several
configurations, and sensitivity analyses are performed.
[COMMENTS]
17 pages, 14 figures. Submitted to Engineering Computations on 26
September 2024
[LINK]
http://arxiv.org/abs/2501.10162v1
[DATE]
2025-01-17 20:51:25+08:00
[CATEGORIES]
cs.LG
Region-wise stacking ensembles for estimating brain-age using MRI
[AUTHORS]
Georgios Antonopoulos, Shammi More, Simon B. Eickhoff, Federico Raimondo, Kaustubh R. Patil
[ABSTRACT]
Predictive modeling using structural magnetic resonance imaging (MRI) data is
a prominent approach to study brain-aging. Machine learning algorithms and
feature extraction methods have been employed to improve predictions and
explore healthy and accelerated aging e.g. neurodegenerative and psychiatric
disorders. The high-dimensional MRI data pose challenges to building
generalizable and interpretable models as well as for data privacy. Common
practices are resampling or averaging voxels within predefined parcels, which
reduces anatomical specificity and biological interpretability as voxels within
a region may differently relate to aging. Effectively, naive fusion by
averaging can result in information loss and reduced accuracy. We present a
conceptually novel two-level stacking ensemble (SE) approach. The first level
comprises regional models for predicting individuals’ age based on voxel-wise
information, fused by a second-level model yielding final predictions. Eight
data fusion scenarios were explored using as input Gray matter volume (GMV)
estimates from four datasets covering the adult lifespan. Performance, measured
using mean absolute error (MAE), R2, correlation and prediction bias, showed
that SE outperformed the region-wise averages. The best performance was
obtained when first-level regional predictions were obtained as out-of-sample
predictions on the application site with second-level models trained on
independent and site-specific data (MAE=4.75 vs baseline regional mean GMV
MAE=5.68). Performance improved as more datasets were used for training.
First-level predictions showed improved and more robust aging signal providing
new biological insights and enhanced data privacy. Overall, the SE improves
accuracy compared to the baseline while preserving or enhancing data privacy.
[COMMENTS]
version1
[LINK]
http://arxiv.org/abs/2501.10153v1
[DATE]
2025-01-17 20:24:28+08:00
[CATEGORIES]
cs.LG
Enhancing UAV Path Planning Efficiency Through Accelerated Learning
[AUTHORS]
Joseanne Viana, Boris Galkin, Lester Ho, Holger Claussen
[ABSTRACT]
Unmanned Aerial Vehicles (UAVs) are increasingly essential in various fields
such as surveillance, reconnaissance, and telecommunications. This study aims
to develop a learning algorithm for the path planning of UAV wireless
communication relays, which can reduce storage requirements and accelerate Deep
Reinforcement Learning (DRL) convergence. Assuming the system possesses terrain
maps of the area and can estimate user locations using localization algorithms
or direct GPS reporting, it can input these parameters into the learning
algorithms to achieve optimized path planning performance. However, higher
resolution terrain maps are necessary to extract topological information such
as terrain height, object distances, and signal blockages. This requirement
increases memory and storage demands on UAVs while also lengthening convergence
times in DRL algorithms. Similarly, defining the telecommunication coverage map
in UAV wireless communication relays using these terrain maps and user position
estimations demands higher memory and storage utilization for the learning path
planning algorithms. Our approach reduces path planning training time by
applying a dimensionality reduction technique based on Principal Component
Analysis (PCA), sample combination, Prioritized Experience Replay (PER), and
the combination of Mean Squared Error (MSE) and Mean Absolute Error (MAE) loss
calculations in the coverage map estimates, thereby enhancing a Twin Delayed
Deep Deterministic Policy Gradient (TD3) algorithm. The proposed solution
reduces the convergence episodes needed for basic training by approximately
four times compared to the traditional TD3.
[COMMENTS]
This paper was accepted in https://camad2024.ieee-camad.org/
conference but it is not available from the conference yet
[LINK]
http://arxiv.org/abs/2501.10141v1
[DATE]
2025-01-17 20:05:24+08:00
[CATEGORIES]
cs.LG
Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores
[AUTHORS]
Jivat Neet Kaur, Michael I. Jordan, Ahmed Alaa
[ABSTRACT]
Standard conformal prediction offers a marginal guarantee on coverage, but
for prediction sets to be truly useful, they should ideally ensure coverage
conditional on each test point. Unfortunately, it is impossible to achieve
exact, distribution-free conditional coverage in finite samples. In this work,
we propose an alternative conformal prediction algorithm that targets coverage
where it matters most–in instances where a classifier is overconfident in its
incorrect predictions. We start by dissecting miscoverage events in
marginally-valid conformal prediction, and show that miscoverage rates vary
based on the classifier’s confidence and its deviation from the Bayes optimal
classifier. Motivated by this insight, we develop a variant of conformal
prediction that targets coverage conditional on a reduced set of two variables:
the classifier’s confidence in a prediction and a nonparametric trust score
that measures its deviation from the Bayes classifier. Empirical evaluation on
multiple image datasets shows that our method generally improves conditional
coverage properties compared to standard conformal prediction, including
class-conditional coverage, coverage over arbitrary subgroups, and coverage
over demographic groups.
[LINK]
http://arxiv.org/abs/2501.10139v1
[DATE]
2025-01-17 20:01:56+08:00
[CATEGORIES]
cs.LG
Visual Exploration of Stopword Probabilities in Topic Models
[AUTHORS]
Shuangjiang Xue, Pierre Le Bras, David A. Robb, Mike J. Chantler, Stefano Padilla
[ABSTRACT]
Stopword removal is a critical stage in many Machine Learning methods but
often receives little consideration, it interferes with the model
visualizations and disrupts user confidence. Inappropriately chosen or hastily
omitted stopwords not only lead to suboptimal performance but also
significantly affect the quality of models, thus reducing the willingness of
practitioners and stakeholders to rely on the output visualizations. This paper
proposes a novel extraction method that provides a corpus-specific
probabilistic estimation of stopword likelihood and an interactive
visualization system to support their analysis. We evaluated our approach and
interface using real-world data, a commonly used Machine Learning method (Topic
Modelling), and a comprehensive qualitative experiment probing user confidence.
The results of our work show that our system increases user confidence in the
credibility of topic models by (1) returning reasonable probabilities, (2)
generating an appropriate and representative extension of common stopword
lists, and (3) providing an adjustable threshold for estimating and analyzing
stopwords visually. Finally, we discuss insights, recommendations, and best
practices to support practitioners while improving the output of Machine
Learning methods and topic model visualizations with robust stopword analysis
and removal.
[LINK]
http://arxiv.org/abs/2501.10137v1
[DATE]
2025-01-17 19:59:56+08:00
[CATEGORIES]
cs.LG
Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders
[AUTHORS]
Gongxu Luo, Haoyue Dai, Boyang Sun, Loka Li, Biwei Huang, Petar Stojanov, Kun Zhang
[ABSTRACT]
Gene Regulatory Network Inference (GRNI) aims to identify causal
relationships among genes using gene expression data, providing insights into
regulatory mechanisms. A significant yet often overlooked challenge is
selection bias, a process where only cells meeting specific criteria, such as
gene expression thresholds, survive or are observed, distorting the true joint
distribution of genes and thus biasing GRNI results. Furthermore, gene
expression is influenced by latent confounders, such as non-coding RNAs, which
add complexity to GRNI. To address these challenges, we propose GISL (Gene
Regulatory Network Inference in the presence of Selection bias and Latent
confounders), a novel algorithm to infer true regulatory relationships in the
presence of selection and confounding issues. Leveraging data obtained via
multiple gene perturbation experiments, we show that the true regulatory
relationships, as well as selection processes and latent confounders can be
partially identified without strong parametric models and under mild graphical
assumptions. Experimental results on both synthetic and real-world single-cell
gene expression datasets demonstrate the superiority of GISL over existing
methods.
[LINK]
http://arxiv.org/abs/2501.10124v1
[DATE]
2025-01-17 19:27:58+08:00
[CATEGORIES]
cs.LG
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics
[AUTHORS]
Chenhao Li, Andreas Krause, Marco Hutter
[ABSTRACT]
Learning robust and generalizable world models is crucial for enabling
efficient and scalable robotic control in real-world environments. In this
work, we introduce a novel framework for learning world models that accurately
capture complex, partially observable, and stochastic dynamics. The proposed
method employs a dual-autoregressive mechanism and self-supervised training to
achieve reliable long-horizon predictions without relying on domain-specific
inductive biases, ensuring adaptability across diverse robotic tasks. We
further propose a policy optimization framework that leverages world models for
efficient training in imagined environments and seamless deployment in
real-world systems. Through extensive experiments, our approach consistently
outperforms state-of-the-art methods, demonstrating superior autoregressive
prediction accuracy, robustness to noise, and generalization across
manipulation and locomotion tasks. Notably, policies trained with our method
are successfully deployed on ANYmal D hardware in a zero-shot transfer,
achieving robust performance with minimal sim-to-real performance loss. This
work advances model-based reinforcement learning by addressing the challenges
of long-horizon prediction, error accumulation, and sim-to-real transfer. By
providing a scalable and robust framework, the introduced methods pave the way
for adaptive and efficient robotic systems in real-world applications.
[LINK]
http://arxiv.org/abs/2501.10100v1
[DATE]
2025-01-17 18:39:09+08:00
[CATEGORIES]
cs.LG
Multi-stage Deep Learning Artifact Reduction for Pallel-beam Computed Tomography
[AUTHORS]
Jiayang Shi, Daniel M. Pelt, K. Joost Batenburg
[ABSTRACT]
Computed Tomography (CT) using synchrotron radiation is a powerful technique
that, compared to lab-CT techniques, boosts high spatial and temporal
resolution while also providing access to a range of contrast-formation
mechanisms. The acquired projection data is typically processed by a
computational pipeline composed of multiple stages. Artifacts introduced during
data acquisition can propagate through the pipeline, and degrade image quality
in the reconstructed images. Recently, deep learning has shown significant
promise in enhancing image quality for images representing scientific data.
This success has driven increasing adoption of deep learning techniques in CT
imaging. Various approaches have been proposed to incorporate deep learning
into computational pipelines, but each has limitations in addressing artifacts
effectively and efficiently in synchrotron CT, either in properly addressing
the specific artifacts, or in computational efficiency.
Recognizing these challenges, we introduce a novel method that incorporates
separate deep learning models at each stage of the tomography
pipeline-projection, sinogram, and reconstruction-to address specific artifacts
locally in a data-driven way. Our approach includes bypass connections that
feed both the outputs from previous stages and raw data to subsequent stages,
minimizing the risk of error propagation. Extensive evaluations on both
simulated and real-world datasets illustrate that our approach effectively
reduces artifacts and outperforms comparison methods.
[LINK]
http://arxiv.org/abs/2309.00494v2
[DATE]
2025-01-17 18:31:13+08:00
[CATEGORIES]
cs.LG
A recursive Bayesian neural network for constitutive modeling of sands under monotonic loading
[AUTHORS]
Toiba Noor, Soban Nasir Lone, G. V. Ramana, Rajdip Nayek
[ABSTRACT]
In geotechnical engineering, constitutive models play a crucial role in
describing soil behavior under varying loading conditions. Data-driven deep
learning (DL) models offer a promising alternative for developing predictive
constitutive models. When prediction is the primary focus, quantifying the
predictive uncertainty of a trained DL model and communicating this uncertainty
to end users is crucial for informed decision-making.
This study proposes a recursive Bayesian neural network (rBNN) framework,
which builds upon recursive feedforward neural networks (rFFNNs) by introducing
generalized Bayesian inference for uncertainty quantification. A significant
contribution of this work is the incorporation of a sliding window approach in
rFFNNs, allowing the models to effectively capture temporal dependencies across
load steps. The rBNN extends this framework by treating model parameters as
random variables, with their posterior distributions inferred using generalized
variational inference.
The proposed framework is validated on two datasets: (i) a numerically
simulated consolidated drained (CD) triaxial dataset employing a hardening soil
model and (ii) an experimental dataset comprising 28 CD triaxial tests on
Baskarp sand. Comparative analyses with LSTM, Bi-LSTM, and GRU models
demonstrate that the deterministic rFFNN achieves superior predictive accuracy,
attributed to its transparent structure and sliding window design. While the
rBNN marginally trails in accuracy for the experimental case, it provides
robust confidence intervals, addressing data sparsity and measurement noise in
experimental conditions. The study underscores the trade-offs between
deterministic and probabilistic approaches and the potential of rBNNs for
uncertainty-aware constitutive modeling.
[LINK]
http://arxiv.org/abs/2501.10088v1
[DATE]
2025-01-17 18:15:03+08:00
[CATEGORIES]
cs.LG
Two-level Solar Irradiance Clustering with Season Identification: A Comparative Analysis
[AUTHORS]
Roshni Agrawal, Sivakumar Subramanian, Venkataramana Runkana
[ABSTRACT]
Solar irradiance clustering can enhance solar power capacity planning and
help improve forecasting models by identifying similar irradiance patterns
influenced by seasonal and weather changes. In this study, we adopt an
efficient two-level clustering approach to automatically identify seasons using
the clear sky irradiance in first level and subsequently to identify daily
cloud level as clear, cloudy and partly cloudy within each season in second
level. In the second level of clustering, three methods are compared, namely,
Daily Irradiance Index (DII or $\beta$), Euclidean Distance (ED), and Dynamic
Time Warping (DTW) distance. The DII is computed as the ratio of time integral
of measured irradiance to time integral of the clear sky irradiance. The
identified clusters were compared quantitatively using established clustering
metrics and qualitatively by comparing the mean irradiance profiles. The
results clearly establish the superiority of the $\beta$-based clustering
approach as the leader, setting a new benchmark for solar irradiance clustering
studies. Moreover, $\beta$-based clustering remains effective even for annual
data unlike the time-series methods which suffer significant performance
degradation. Interestingly, contrary to expectations, ED-based clustering
outperforms the more compute-intensive DTW distance-based clustering. The
method has been rigorously validated using data from two distinct US locations,
demonstrating robust scalability for larger datasets and potential
applicability for other locations.
[COMMENTS]
30 pages, 9 figures, 6 tables
[LINK]
http://arxiv.org/abs/2501.10084v1
[DATE]
2025-01-17 18:05:11+08:00
[CATEGORIES]
cs.LG
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
[AUTHORS]
David Perera, Victor Letzelter, Théo Mariotte, Adrien Cortés, Mickael Chen, Slim Essid, Gaël Richard
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2407.15580v3
[DATE]
2025-01-17 18:03:39+08:00
[CATEGORIES]
cs.LG
LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch
[AUTHORS]
Zhengzhong Liu, Bowen Tan, Hongyi Wang, Willie Neiswanger, Tianhua Tao, Haonan Li, Fajri Koto, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Liqun Ma, Liping Tang, Nikhil Ranjan, Yonghao Zhuang, Guowei He, Renxi Wang, Mingkai Deng, Robin Algayres, Yuanzhi Li, Zhiqiang Shen, Preslav Nakov, Eric Xing
[ABSTRACT]
We detail the training of the LLM360 K2-65B model, scaling up our 360-degree
OPEN SOURCE approach to the largest and most powerful models under project
LLM360. While open-source LLMs continue to advance, the answer to “How are the
largest LLMs trained?” remains unclear within the community. The implementation
details for such high-capacity models are often protected due to business
considerations associated with their high cost. This lack of transparency
prevents LLM researchers from leveraging valuable insights from prior
experience, e.g., “What are the best practices for addressing loss spikes?” The
LLM360 K2 project addresses this gap by providing full transparency and access
to resources accumulated during the training of LLMs at the largest scale. This
report highlights key elements of the K2 project, including our first model, K2
DIAMOND, a 65 billion-parameter LLM that surpasses LLaMA-65B and rivals
LLaMA2-70B, while requiring fewer FLOPs and tokens. We detail the
implementation steps and present a longitudinal analysis of K2 DIAMOND’s
capabilities throughout its training process. We also outline ongoing projects
such as TXT360, setting the stage for future models in the series. By offering
previously unavailable resources, the K2 project also resonates with the
360-degree OPEN SOURCE principles of transparency, reproducibility, and
accessibility, which we believe are vital in the era of resource-intensive AI
research.
[LINK]
http://arxiv.org/abs/2501.07124v3
[DATE]
2025-01-17 17:39:17+08:00
[CATEGORIES]
cs.LG
One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
[AUTHORS]
Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi
[ABSTRACT]
Current image tokenization methods require a large number of tokens to
capture the information contained within images. Although the amount of
information varies across images, most image tokenizers only support
fixed-length tokenization, leading to inefficiency in token allocation. In this
study, we introduce One-D-Piece, a discrete image tokenizer designed for
variable-length tokenization, achieving quality-controllable mechanism. To
enable variable compression rate, we introduce a simple but effective
regularization mechanism named “Tail Token Drop” into discrete one-dimensional
image tokenizers. This method encourages critical information to concentrate at
the head of the token sequence, enabling support of variadic tokenization,
while preserving state-of-the-art reconstruction quality. We evaluate our
tokenizer across multiple reconstruction quality metrics and find that it
delivers significantly better perceptual quality than existing
quality-controllable compression methods, including JPEG and WebP, at smaller
byte sizes. Furthermore, we assess our tokenizer on various downstream computer
vision tasks, including image classification, object detection, semantic
segmentation, and depth estimation, confirming its adaptability to numerous
applications compared to other variable-rate methods. Our approach demonstrates
the versatility of variable-length discrete image tokenization, establishing a
new paradigm in both compression efficiency and reconstruction performance.
Finally, we validate the effectiveness of tail token drop via detailed analysis
of tokenizers.
[COMMENTS]
Our Project Page:
https://turingmotors.github.io/one-d-piece-tokenizer
[LINK]
http://arxiv.org/abs/2501.10064v1
[DATE]
2025-01-17 17:29:33+08:00
[CATEGORIES]
cs.LG
Accelerating Large Language Models through Partially Linear Feed-Forward Network
[AUTHORS]
Gansen Hu, Zhaoguo Wang, Jinglin Wei, Wei Huang, Haibo Chen
[ABSTRACT]
Large language models (LLMs) demonstrate remarkable capabilities but face
deployment challenges due to their massive parameter counts. While existing
compression techniques like pruning can reduce model size, it leads to
significant accuracy degradation under high compression ratios. We present a
novel perspective inspired by constant folding in compiler optimization. Our
approach enables parameter reduction by treating activation functions in LLMs
as linear functions.
However, recent LLMs use complex non-linear activations like GELU that
prevent direct application of this technique. We propose TARDIS, which enables
optimization of LLMs with non-linear activations by partially approximating
them with linear functions in frequently occurring input ranges. For outlier
inputs, TARDIS employs an online predictor to dynamically fall back to original
computations.
Our experiments demonstrate that TARDIS achieves 80% parameter reduction in
feed-forward networks, while significantly outperforming state-of-the-art
pruning methods Wanda and RIA with up to 65% higher accuracy. In practical
deployments for a 7B model, TARDIS achieves 1.6x end-to-end inference speedup
when integrated with the vLLM serving system, and 1.4x speedup with the widely
adopted HuggingFace implementation, while incurring only a 10.9% accuracy
trade-off.
[LINK]
http://arxiv.org/abs/2501.10054v1
[DATE]
2025-01-17 17:20:56+08:00
[CATEGORIES]
cs.LG
Tracking student skills real-time through a continuous-variable dynamic Bayesian network
[AUTHORS]
Hildo Bijl
[ABSTRACT]
The field of Knowledge Tracing is focused on predicting the success rate of a
student for a given skill. Modern methods like Deep Knowledge Tracing provide
accurate estimates given enough data, but being based on neural networks they
struggle to explain how these estimates are formed. More classical methods like
Dynamic Bayesian Networks can do this, but they cannot give data on the
accuracy of their estimates and often struggle to incorporate new observations
in real-time due to their high computational load.
This paper presents a novel method, Performance Distribution Tracing (PDT),
in which the distribution of the success rate is traced live. It uses a Dynamic
Bayesian Network with continuous random variables as nodes. By tracing the
success rate distribution, there is always data available on the accuracy of
any success rate estimation. In addition, it makes it possible to combine data
from similar/related skills to come up with a more informed estimate of success
rates. This makes it possible to predict exercise success rates, providing both
explainability and an accuracy indication, even when an exercise requires a
combination of different skills to solve. And through the use of the beta
distribution functions as conjugate priors, all distributions are available in
analytical form, allowing efficient online updates upon new observations.
Experiments have shown that the resulting estimates generally feel sufficiently
accurate to end-users such that they accept recommendations based on them.
[LINK]
http://arxiv.org/abs/2501.10050v1
[DATE]
2025-01-17 17:13:49+08:00
[CATEGORIES]
cs.LG
PandaSkill – Player Performance and Skill Rating in Esports: Application to League of Legends
[AUTHORS]
Maxime De Bois, Flora Parmentier, Raphaël Puget, Matthew Tanti, Jordan Peltier
[ABSTRACT]
To take the esports scene to the next level, we introduce PandaSkill, a
framework for assessing player performance and skill rating. Traditional rating
systems like Elo and TrueSkill often overlook individual contributions and face
challenges in professional esports due to limited game data and fragmented
competitive scenes. PandaSkill leverages machine learning to estimate in-game
player performance from individual player statistics. Each in-game role is
modeled independently, ensuring a fair comparison between them. Then, using
these performance scores, PandaSkill updates the player skill ratings using the
Bayesian framework OpenSkill in a free-for-all setting. In this setting, skill
ratings are updated solely based on performance scores rather than game
outcomes, hightlighting individual contributions. To address the challenge of
isolated rating pools that hinder cross-regional comparisons, PandaSkill
introduces a dual-rating system that combines players’ regional ratings with a
meta-rating representing each region’s overall skill level. Applying PandaSkill
to five years of professional League of Legends matches worldwide, we show that
our method produces skill ratings that better predict game outcomes and align
more closely with expert opinions compared to existing methods.
[LINK]
http://arxiv.org/abs/2501.10049v1
[DATE]
2025-01-17 17:10:34+08:00
[CATEGORIES]
cs.LG
Virtual Nodes Improve Long-term Traffic Prediction
[AUTHORS]
Xiaoyang Cao, Dingyi Zhuang, Jinhua Zhao, Shenhao Wang
[ABSTRACT]
Effective traffic prediction is a cornerstone of intelligent transportation
systems, enabling precise forecasts of traffic flow, speed, and congestion.
While traditional spatio-temporal graph neural networks (ST-GNNs) have achieved
notable success in short-term traffic forecasting, their performance in
long-term predictions remains limited. This challenge arises from
over-squashing problem, where bottlenecks and limited receptive fields restrict
information flow and hinder the modeling of global dependencies. To address
these challenges, this study introduces a novel framework that incorporates
virtual nodes, which are additional nodes added to the graph and connected to
existing nodes, in order to aggregate information across the entire graph
within a single GNN layer. Our proposed model incorporates virtual nodes by
constructing a semi-adaptive adjacency matrix. This matrix integrates
distance-based and adaptive adjacency matrices, allowing the model to leverage
geographical information while also learning task-specific features from data.
Experimental results demonstrate that the inclusion of virtual nodes
significantly enhances long-term prediction accuracy while also improving
layer-wise sensitivity to mitigate the over-squashing problem. Virtual nodes
also offer enhanced explainability by focusing on key intersections and
high-traffic areas, as shown by the visualization of their adjacency matrix
weights on road network heat maps. Our advanced approach enhances the
understanding and management of urban traffic systems, making it particularly
well-suited for real-world applications.
[LINK]
http://arxiv.org/abs/2501.10048v1
[DATE]
2025-01-17 17:09:01+08:00
[CATEGORIES]
cs.LG
Mitigating analytical variability in fMRI results with style transfer
[AUTHORS]
Elodie Germani, Camille Maumet, Elisa Fromont
[ABSTRACT]
We propose a novel approach to improve the reproducibility of neuroimaging
results by converting statistic maps across different functional MRI pipelines.
We make the assumption that pipelines used to compute fMRI statistic maps can
be considered as a style component and we propose to use different generative
models, among which, Generative Adversarial Networks (GAN) and Diffusion Models
(DM) to convert statistic maps across different pipelines. We explore the
performance of multiple GAN frameworks, and design a new DM framework for
unsupervised multi-domain styletransfer. We constrain the generation of 3D fMRI
statistic maps using the latent space of an auxiliary classifier that
distinguishes statistic maps from different pipelines and extend traditional
sampling techniques used in DM to improve the transition performance. Our
experiments demonstrate that our proposed methods aresuccessful: pipelines can
indeed be transferred as a style component, providing animportant source of
data augmentation for future medical studies.
[LINK]
http://arxiv.org/abs/2404.03703v3
[DATE]
2025-01-17 17:03:57+08:00
[CATEGORIES]
cs.LG
Accelerating lensed quasars discovery and modeling with physics-informed variational autoencoders
[AUTHORS]
Irham T. Andika, Stefan Schuldt, Sherry H. Suyu, Satadru Bag, Raoul Cañameras, Alejandra Melo, Claudio Grillo, James H. H. Chan
[ABSTRACT]
Strongly lensed quasars provide valuable insights into the rate of cosmic
expansion, the distribution of dark matter in foreground deflectors, and the
characteristics of quasar hosts. However, detecting them in astronomical images
is difficult due to the prevalence of non-lensing objects. To address this
challenge, we developed a generative deep learning model called VariLens, built
upon a physics-informed variational autoencoder. This model seamlessly
integrates three essential modules: image reconstruction, object
classification, and lens modeling, offering a fast and comprehensive approach
to strong lens analysis. VariLens is capable of rapidly determining both (1)
the probability that an object is a lens system and (2) key parameters of a
singular isothermal ellipsoid (SIE) mass model – including the Einstein radius
($\theta_\mathrm{E}$), lens center, and ellipticity – in just milliseconds
using a single CPU. A direct comparison of VariLens estimates with traditional
lens modeling for 20 known lensed quasars within the Subaru Hyper Suprime-Cam
(HSC) footprint shows good agreement, with both results consistent within
$2\sigma$ for systems with $\theta_\mathrm{E}<3$ arcsecs. To identify new
lensed quasar candidates, we begin with an initial sample of approximately 80
million sources, combining HSC data with multiwavelength information from
various surveys. After applying a photometric preselection aimed at locating
$z>1.5$ sources, the number of candidates is reduced to 710,966. Subsequently,
VariLens highlights 13,831 sources, each showing a high likelihood of being a
lens. A visual assessment of these objects results in 42 promising candidates
that await spectroscopic confirmation. These results underscore the potential
of automated deep learning pipelines to efficiently detect and model strong
lenses in large datasets.
[COMMENTS]
Submitted to the Astronomy & Astrophysics journal and updated to
reflect the revised version. The paper consists of 17 main pages, 14 figures,
and 5 tables. We welcome feedback and comments from readers!
[LINK]
http://arxiv.org/abs/2412.12709v2
[DATE]
2025-01-17 17:03:17+08:00
[CATEGORIES]
cs.LG
VECT-GAN: A variationally encoded generative model for overcoming data scarcity in pharmaceutical science
[AUTHORS]
Youssef Abdalla, Marrisa Taub, Eleanor Hilton, Priya Akkaraju, Alexander Milanovic, Mine Orlu, Abdul W. Basit, Michael T Cook, Tapabrata Chakraborti, David Shorthouse
[ABSTRACT]
Data scarcity in pharmaceutical research has led to reliance on
labour-intensive trial-and-error approaches for development rather than
data-driven methods. While Machine Learning offers a solution, existing
datasets are often small and noisy, limiting their utility. To address this, we
developed a Variationally Encoded Conditional Tabular Generative Adversarial
Network (VECT-GAN), a novel generative model specifically designed for
augmenting small, noisy datasets. We introduce a pipeline where data is
augmented before regression model development and demonstrate that this
consistently and significantly improves performance over other state-of-the-art
tabular generative models. We apply this pipeline across six pharmaceutical
datasets, and highlight its real-world applicability by developing novel
polymers with medically desirable mucoadhesive properties, which we made and
experimentally characterised. Additionally, we pre-train the model on the
ChEMBL database of drug-like molecules, leveraging knowledge distillation to
enhance its generalisability, making it readily available for use on
pharmaceutical datasets containing small molecules, an extremely common
pharmaceutical task. We demonstrate the power of synthetic data for
regularising small tabular datasets, highlighting its potential to become
standard practice in pharmaceutical model development, and make our method,
including VECT-GAN pre-trained on ChEMBL available as a pip package.
[COMMENTS]
30 pages, 6 primary figures, 3 supplementary figures
[LINK]
http://arxiv.org/abs/2501.08995v2
[DATE]
2025-01-17 16:58:48+08:00
[CATEGORIES]
cs.LG
IterL2Norm: Fast Iterative L2-Normalization
[AUTHORS]
ChangMin Ye, Yonguk Sim, Youngchae Kim, SeongMin Jin, Doo Seok Jeong
[ABSTRACT]
Transformer-based large language models are a memory-bound model whose
operation is based on a large amount of data that are marginally reused. Thus,
the data movement between a host and accelerator likely dictates the total
wall-clock time. Layer normalization is one of the key workloads in the
transformer model, following each of multi-head attention and feed-forward
network blocks. To reduce data movement, layer normalization needs to be
performed on the same chip as the matrix-matrix multiplication engine. To this
end, we introduce an iterative L2-normalization method for 1D input
(IterL2Norm), ensuring fast convergence to the steady-state solution within
five iteration steps and high precision, outperforming the fast inverse square
root algorithm in six out of nine cases for FP32 and five out of nine for
BFloat16 across the embedding lengths used in the OPT models. Implemented in
32/28nm CMOS, the IterL2Norm macro normalizes $d$-dimensional vectors, where
$64 \leq d \leq 1024$, with a latency of 116-227 cycles at 100MHz/1.05V.
[COMMENTS]
Design, Automation & Test in Europe Conference 2025
[LINK]
http://arxiv.org/abs/2412.04778v2
[DATE]
2025-01-17 16:58:17+08:00
[CATEGORIES]
cs.LG
Geometric Median (GM) Matching for Robust Data Pruning
[AUTHORS]
Anish Acharya, Inderjit S Dhillon, Sujay Sanghavi
[ABSTRACT]
Large-scale data collections in the wild, are invariably noisy. Thus
developing data pruning strategies that remain robust even in the presence of
corruption is critical in practice. In this work, we propose Geometric Median
($\gm$) Matching – a herding style greedy algorithm that yields a $k$-subset
such that the mean of the subset approximates the geometric median of the
(potentially) noisy dataset. Theoretically, we show that $\gm$ Matching enjoys
an improved $\gO(1/k)$ scaling over $\gO(1/\sqrt{k})$ scaling of uniform
sampling; while achieving {\bf optimal breakdown point} of {\bf 1/2} even under
{\bf arbitrary} corruption. Extensive experiments across several popular deep
learning benchmarks indicate that $\gm$ Matching consistently improves over
prior state-of-the-art; the gains become more profound at high rates of
corruption and aggressive pruning rates; making $\gm$ Matching a strong
baseline for future research in robust data pruning.
[LINK]
http://arxiv.org/abs/2406.17188v2
[DATE]
2025-01-17 16:38:45+08:00
[CATEGORIES]
cs.LG
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
[AUTHORS]
Freek Holvoet, Katrien Antonio, Roel Henckaerts
[ABSTRACT]
Insurers usually turn to generalized linear models for modeling claim
frequency and severity data. Due to their success in other fields, machine
learning techniques are gaining popularity within the actuarial toolbox. Our
paper contributes to the literature on frequency-severity insurance pricing
with machine learning via deep learning structures. We present a benchmark
study on four insurance data sets with frequency and severity targets in the
presence of multiple types of input features. We compare in detail the
performance of: a generalized linear model on binned input data, a
gradient-boosted tree model, a feed-forward neural network (FFNN), and the
combined actuarial neural network (CANN). The CANNs combine a baseline
prediction established with a GLM and GBM, respectively, with a neural network
correction. We explain the data preprocessing steps with specific focus on the
multiple types of input features typically present in tabular insurance data
sets, such as postal codes, numeric and categorical covariates. Autoencoders
are used to embed the categorical variables into the neural network, and we
explore their potential advantages in a frequency-severity setting. Model
performance is evaluated not only on out-of-sample deviance but also using
statistical and calibration performance criteria and managerial tools to get
more nuanced insights. Finally, we construct global surrogate models for the
neural nets’ frequency and severity models. These surrogates enable the
translation of the essential insights captured by the FFNNs or CANNs to GLMs.
As such, a technical tariff table results that can easily be deployed in
practice.
[LINK]
http://arxiv.org/abs/2310.12671v4
[DATE]
2025-01-17 16:14:18+08:00
[CATEGORIES]
cs.LG
Differentially Private Secure Multiplication: Hiding Information in the Rubble of Noise
[AUTHORS]
Viveck R. Cadambe, Ateet Devulapalli, Haewon Jeong, Flavio P. Calmon
[ABSTRACT]
We consider the problem of private distributed multi-party multiplication. It
is well-established that Shamir secret-sharing coding strategies can enable
perfect information-theoretic privacy in distributed computation via the
celebrated algorithm of Ben Or, Goldwasser and Wigderson (the “BGW algorithm”).
However, perfect privacy and accuracy require an honest majority, that is, $N
\geq 2t+1$ compute nodes are required to ensure privacy against any $t$
colluding adversarial nodes. By allowing for some controlled amount of
information leakage and approximate multiplication instead of exact
multiplication, we study coding schemes for the setting where the number of
honest nodes can be a minority, that is $N< 2t+1.$ We develop a tight
characterization privacy-accuracy trade-off for cases where $N < 2t+1$ by
measuring information leakage using {differential} privacy instead of perfect
privacy, and using the mean squared error metric for accuracy. A novel
technical aspect is an intricately layered noise distribution that merges ideas
from differential privacy and Shamir secret-sharing at different layers.
[COMMENTS]
Extended version of papers presented in IEEE ISIT 2022, IEEE ISIT
2023 and TPDP 2023
[LINK]
http://arxiv.org/abs/2309.16105v2
[DATE]
2025-01-17 16:02:37+08:00
[CATEGORIES]
cs.LG
Adaptive Spatiotemporal Augmentation for Improving Dynamic Graph Learning
[AUTHORS]
Xu Chu, Hanlin Xue, Bingce Wang, Xiaoyang Liu, Weiping Li, Tong Mo, Tuoyu Feng, Zhijie Tan
[ABSTRACT]
Dynamic graph augmentation is used to improve the performance of dynamic
GNNs. Most methods assume temporal locality, meaning that recent edges are more
influential than earlier edges. However, for temporal changes in edges caused
by random noise, overemphasizing recent edges while neglecting earlier ones may
lead to the model capturing noise. To address this issue, we propose STAA
(SpatioTemporal Activity-Aware Random Walk Diffusion). STAA identifies nodes
likely to have noisy edges in spatiotemporal dimensions. Spatially, it analyzes
critical topological positions through graph wavelet coefficients. Temporally,
it analyzes edge evolution through graph wavelet coefficient change rates.
Then, random walks are used to reduce the weights of noisy edges, deriving a
diffusion matrix containing spatiotemporal information as an augmented
adjacency matrix for dynamic GNN learning. Experiments on multiple datasets
show that STAA outperforms other dynamic graph augmentation methods in node
classification and link prediction tasks.
[COMMENTS]
2025 IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP 2025)
[LINK]
http://arxiv.org/abs/2501.10010v1
[DATE]
2025-01-17 15:48:18+08:00
[CATEGORIES]
cs.LG
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning
[AUTHORS]
Jiapeng Zhu, Zichen Ding, Jianxiang Yu, Jiaqi Tan, Xiang Li, Weining Qian
[ABSTRACT]
The advent of the “pre-train, prompt” paradigm has recently extended its
generalization ability and data efficiency to graph representation learning,
following its achievements in Natural Language Processing (NLP). Initial graph
prompt tuning approaches tailored specialized prompting functions for Graph
Neural Network (GNN) models pre-trained with specific strategies, such as edge
prediction, thus limiting their applicability. In contrast, another pioneering
line of research has explored universal prompting via adding prompts to the
input graph’s feature space, thereby removing the reliance on specific
pre-training strategies. However, the necessity to add feature prompts to all
nodes remains an open question. Motivated by findings from prompt tuning
research in the NLP domain, which suggest that highly capable pre-trained
models need less conditioning signal to achieve desired behaviors, we advocate
for strategically incorporating necessary and lightweight feature prompts to
certain graph nodes to enhance downstream task performance. This introduces a
combinatorial optimization problem, requiring a policy to decide 1) which nodes
to prompt and 2) what specific feature prompts to attach. We then address the
problem by framing the prompt incorporation process as a sequential
decision-making problem and propose our method, RELIEF, which employs
Reinforcement Learning (RL) to optimize it. At each step, the RL agent selects
a node (discrete action) and determines the prompt content (continuous action),
aiming to maximize cumulative performance gain. Extensive experiments on graph
and node-level tasks with various pre-training strategies in few-shot scenarios
demonstrate that our RELIEF outperforms fine-tuning and other prompt-based
approaches in classification performance and data efficiency. The code is
available at https://github.com/JasonZhujp/RELIEF.
[COMMENTS]
Accepted by SIGKDD 2025 (camera-ready version). Due to the space
limitation, please refer to the V2 version for more details
[LINK]
http://arxiv.org/abs/2408.03195v3
[DATE]
2025-01-17 15:29:14+08:00
[CATEGORIES]
cs.LG
Elucidating the Design Space of Dataset Condensation
[AUTHORS]
Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen
[ABSTRACT]
Dataset condensation, a concept within data-centric learning, efficiently
transfers critical attributes from an original dataset to a synthetic version,
maintaining both diversity and realism. This approach significantly improves
model training efficiency and is adaptable across multiple application areas.
Previous methods in dataset condensation have faced challenges: some incur high
computational costs which limit scalability to larger datasets (e.g., MTT,
DREAM, and TESLA), while others are restricted to less optimal design spaces,
which could hinder potential improvements, especially in smaller datasets
(e.g., SRe2L, G-VBSM, and RDED). To address these limitations, we propose a
comprehensive design framework that includes specific, effective strategies
like implementing soft category-aware matching and adjusting the learning rate
schedule. These strategies are grounded in empirical evidence and theoretical
backing. Our resulting approach, Elucidate Dataset Condensation (EDC),
establishes a benchmark for both small and large-scale dataset condensation. In
our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on
ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a
compression ratio of 0.78%. This performance exceeds those of SRe2L, G-VBSM,
and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2404.13733v4
[DATE]
2025-01-17 15:15:16+08:00
[CATEGORIES]
cs.LG
Harnessing small projectors and multiple views for efficient vision pretraining
[AUTHORS]
Kumar Krishna Agrawal, Arna Ghosh, Shagun Sodhani, Adam Oberman, Blake Richards
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2312.10725v2
[DATE]
2025-01-17 15:01:43+08:00
[CATEGORIES]
cs.LG
Deep Plug-and-Play HIO Approach for Phase Retrieval
[AUTHORS]
Cagatay Isil, Figen S. Oktem
[ABSTRACT]
In the phase retrieval problem, the aim is the recovery of an unknown image
from intensity-only measurements such as Fourier intensity. Although there are
several solution approaches, solving this problem is challenging due to its
nonlinear and ill-posed nature. Recently, learning-based approaches have
emerged as powerful alternatives to the analytical methods for several inverse
problems. In the context of phase retrieval, a novel plug-and-play approach
that exploits learning-based prior and efficient update steps has been
presented at the Computational Optical Sensing and Imaging topical meeting,
with demonstrated state-of-the-art performance. The key idea was to incorporate
learning-based prior to the Gerchberg-Saxton type algorithms through
plug-and-play regularization. In this paper, we present the mathematical
development of the method including the derivation of its analytical update
steps based on half-quadratic splitting and comparatively evaluate its
performance through extensive simulations on a large test dataset. The results
show the effectiveness of the method in terms of both image quality,
computational efficiency, and robustness to initialization and noise.
[COMMENTS]
16 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.18967v2
[DATE]
2025-01-17 14:44:38+08:00
[CATEGORIES]
cs.LG
Aneumo: A Large-Scale Comprehensive Synthetic Dataset of Aneurysm Hemodynamics
[AUTHORS]
Xigui Li, Yuanye Zhou, Feiyang Xiao, Xin Guo, Yichi Zhang, Chen Jiang, Jianchao Ge, Xiansheng Wang, Qimeng Wang, Taiwei Zhang, Chensen Lin, Yuan Cheng, Yuan Qi
[ABSTRACT]
Intracranial aneurysm (IA) is a common cerebrovascular disease that is
usually asymptomatic but may cause severe subarachnoid hemorrhage (SAH) if
ruptured. Although clinical practice is usually based on individual factors and
morphological features of the aneurysm, its pathophysiology and hemodynamic
mechanisms remain controversial. To address the limitations of current
research, this study constructed a comprehensive hemodynamic dataset of
intracranial aneurysms. The dataset is based on 466 real aneurysm models, and
10,000 synthetic models were generated by resection and deformation operations,
including 466 aneurysm-free models and 9,534 deformed aneurysm models. The
dataset also provides medical image-like segmentation mask files to support
insightful analysis. In addition, the dataset contains hemodynamic data
measured at eight steady-state flow rates (0.001 to 0.004 kg/s), including
critical parameters such as flow velocity, pressure, and wall shear stress,
providing a valuable resource for investigating aneurysm pathogenesis and
clinical prediction. This dataset will help advance the understanding of the
pathologic features and hemodynamic mechanisms of intracranial aneurysms and
support in-depth research in related fields. Dataset hosted at
https://github.com/Xigui-Li/Aneumo.
[LINK]
http://arxiv.org/abs/2501.09980v1
[DATE]
2025-01-17 14:43:03+08:00
[CATEGORIES]
cs.LG
Tree-structured Markov random fields with Poisson marginal distributions
[AUTHORS]
Benjamin Côté, Hélène Cossette, Etienne Marceau
[ABSTRACT]
A new family of tree-structured Markov random fields for a vector of discrete
counting random variables is introduced. According to the characteristics of
the family, the marginal distributions of the Markov random fields are all
Poisson with the same mean, and are untied from the strength or structure of
their built-in dependence. This key feature is uncommon for Markov random
fields and most convenient for applications purposes. The specific properties
of this new family confer a straightforward sampling procedure and analytic
expressions for the joint probability mass function and the joint probability
generating function of the vector of counting random variables, thus granting
computational methods that scale well to vectors of high dimension. We study
the distribution of the sum of random variables constituting a Markov random
field from the proposed family, analyze a random variable’s individual
contribution to that sum through expected allocations, and establish stochastic
orderings to assess a wide understanding of their behavior.
[COMMENTS]
27 pages, 10 figures
[LINK]
http://arxiv.org/abs/2408.13649v2
[DATE]
2025-01-17 14:38:23+08:00
[CATEGORIES]
cs.LG
TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance
[AUTHORS]
Waris Gill, Ali Anwar, Muhammad Ali Gulzar
[ABSTRACT]
In Federated Learning, clients train models on local data and send updates to
a central server, which aggregates them into a global model using a fusion
algorithm. This collaborative yet privacy-preserving training comes at a cost.
FL developers face significant challenges in attributing global model
predictions to specific clients. Localizing responsible clients is a crucial
step towards (a) excluding clients primarily responsible for incorrect
predictions and (b) encouraging clients who contributed high-quality models to
continue participating in the future. Existing ML debugging approaches are
inherently inapplicable as they are designed for single-model, centralized
training.
We introduce TraceFL, a fine-grained neuron provenance capturing mechanism
that identifies clients responsible for a global model’s prediction by tracking
the flow of information from individual clients to the global model. Since
inference on different inputs activates a different set of neurons of the
global model, TraceFL dynamically quantifies the significance of the global
model’s neurons in a given prediction, identifying the most crucial neurons in
the global model. It then maps them to the corresponding neurons in every
participating client to determine each client’s contribution, ultimately
localizing the responsible client. We evaluate TraceFL on six datasets,
including two real-world medical imaging datasets and four neural networks,
including advanced models such as GPT. TraceFL achieves 99% accuracy in
localizing the responsible client in FL tasks spanning both image and text
classification tasks. At a time when state-of-the-artML debugging approaches
are mostly domain-specific (e.g., image classification only), TraceFL is the
first technique to enable highly accurate automated reasoning across a wide
range of FL applications.
[COMMENTS]
Accepted at 2025 IEEE/ACM 47th International Conference on Software
Engineering (ICSE)
[LINK]
http://arxiv.org/abs/2312.13632v4
[DATE]
2025-01-17 14:09:13+08:00
[CATEGORIES]
cs.LG
Learning Dynamical Systems by Leveraging Data from Similar Systems
[AUTHORS]
Lei Xin, Lintao Ye, George Chiu, Shreyas Sundaram
[ABSTRACT]
We consider the problem of learning the dynamics of a linear system when one
has access to data generated by an auxiliary system that shares similar (but
not identical) dynamics, in addition to data from the true system. We use a
weighted least squares approach, and provide finite sample error bounds of the
learned model as a function of the number of samples and various system
parameters from the two systems as well as the weight assigned to the auxiliary
data. We show that the auxiliary data can help to reduce the intrinsic system
identification error due to noise, at the price of adding a portion of error
that is due to the differences between the two system models. We further
provide a data-dependent bound that is computable when some prior knowledge
about the systems, such as upper bounds on noise levels and model difference,
is available. This bound can also be used to determine the weight that should
be assigned to the auxiliary data during the model training stage.
[COMMENTS]
15 pages,9 figures
[LINK]
http://arxiv.org/abs/2302.04344v3
[DATE]
2025-01-17 13:21:29+08:00
[CATEGORIES]
cs.LG
AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations
[AUTHORS]
Jamin Seo, Akshat Ramachandran, Yu-Chuan Chuang, Anirudh Itagi, Tushar Krishna
[ABSTRACT]
Design space exploration (DSE) plays a crucial role in enabling custom
hardware architectures, particularly for emerging applications like AI, where
optimized and specialized designs are essential. With the growing complexity of
deep neural networks (DNNs) and the introduction of advanced foundational
models (FMs), the design space for DNN accelerators is expanding at an
exponential rate. Additionally, this space is highly non-uniform and
non-convex, making it increasingly difficult to navigate and optimize.
Traditional DSE techniques rely on search-based methods, which involve
iterative sampling of the design space to find the optimal solution. However,
this process is both time-consuming and often fails to converge to the global
optima for such design spaces. Recently, AIrchitect v1, the first attempt to
address the limitations of search-based techniques, transformed DSE into a
constant-time classification problem using recommendation networks. In this
work, we propose AIrchitect v2, a more accurate and generalizable
learning-based DSE technique applicable to large-scale design spaces that
overcomes the shortcomings of earlier approaches. Specifically, we devise an
encoder-decoder transformer model that (a) encodes the complex design space
into a uniform intermediate representation using contrastive learning and (b)
leverages a novel unified representation blending the advantages of
classification and regression to effectively explore the large DSE space
without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN
workloads demonstrate that, on average, AIrchitect v2 outperforms existing
techniques by 15% in identifying optimal design points. Furthermore, to
demonstrate the generalizability of our method, we evaluate performance on
unseen model workloads (LLMs) and attain a 1.7x improvement in inference
latency on the identified hardware architecture.
[COMMENTS]
Accepted to DATE 2025
[LINK]
http://arxiv.org/abs/2501.09954v1
[DATE]
2025-01-17 12:57:42+08:00
[CATEGORIES]
cs.LG
The Spatial Complexity of Optical Computing and How to Reduce It
[AUTHORS]
Yandong Li, Francesco Monticone
[ABSTRACT]
Similar to algorithms, which consume time and memory to run, hardware
requires resources to function. For devices processing physical waves,
implementing operations needs sufficient “space,” as dictated by wave physics.
How much space is needed to perform a certain function is a fundamental
question in optics, with recent research addressing it for given mathematical
operations, but not for more general computing tasks, e.g., classification.
Inspired by computational complexity theory, we study the “spatial complexity”
of optical computing systems in terms of scaling laws - specifically, how their
physical dimensions must scale as the dimension of the mathematical operation
increases - and propose a new paradigm for designing optical computing systems:
space-efficient neuromorphic optics, based on structural sparsity constraints
and neural pruning methods motivated by wave physics (notably, the concept of
“overlapping nonlocality”). On two mainstream platforms, free-space optics and
on-chip integrated photonics, our methods demonstrate substantial size
reductions (to 1%-10% the size of conventional designs) with minimal compromise
on performance. Our theoretical and computational results reveal a trend of
diminishing returns on accuracy as structure dimensions increase, providing a
new perspective for interpreting and approaching the ultimate limits of optical
computing - a balanced trade-off between device size and accuracy.
[LINK]
http://arxiv.org/abs/2411.10435v2
[DATE]
2025-01-17 12:53:18+08:00
[CATEGORIES]
cs.LG
Consistent estimation of generative model representations in the data kernel perspective space
[AUTHORS]
Aranyak Acharyya, Michael W. Trosset, Carey E. Priebe, Hayden S. Helm
[ABSTRACT]
Generative models, such as large language models and text-to-image diffusion
models, produce relevant information when presented a query. Different models
may produce different information when presented the same query. As the
landscape of generative models evolves, it is important to develop techniques
to study and analyze differences in model behaviour. In this paper we present
novel theoretical results for embedding-based representations of generative
models in the context of a set of queries. In particular, we establish
sufficient conditions for the consistent estimation of the model embeddings in
situations where the query set and the number of models grow.
[LINK]
http://arxiv.org/abs/2409.17308v2
[DATE]
2025-01-17 12:30:17+08:00
[CATEGORIES]
cs.LG
MultiPruner: Balanced Structure Removal in Foundation Models
[AUTHORS]
J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
[ABSTRACT]
Recently, state-of-the-art approaches for pruning large pre-trained models
(LPMs) have demonstrated that the training-free removal of non-critical
residual blocks in Transformers is viable for reducing model size, achieving
results that outperform previous training-free pruning approaches. Motivated by
these findings, we extend BlockPruner (Zhong et al., 2024) and propose
MultiPruner, a pruning approach that surpasses recent training-free pruning
methods by adopting a multidimensional, iterative, fine-grained pruning
strategy. In MultiPruner, multidimensional pruning reinstates the structural
balance in block-pruned models by sequentially compressing along three
dimensions: i) residual blocks, ii) channels of multilayer perceptrons (MLP),
and iii) attention heads. This solution enhances zero-shot accuracy on
downstream tasks compared to other techniques while improving model compression
ratios, producing compressed models with fewer computing and memory
requirements. Extensive experiments demonstrate the advantages of the proposed
method across various large pre-trained models. The code and pruning
configurations are available at
https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.
[LINK]
http://arxiv.org/abs/2501.09949v1
[DATE]
2025-01-17 12:24:31+08:00
[CATEGORIES]
cs.LG
Enhancing User Interest based on Stream Clustering and Memory Networks in Large-Scale Recommender Systems
[AUTHORS]
Peng Liu, Nian Wang, Cong Xu, Ming Zhao, Bin Wang, Yi Ren
[ABSTRACT]
Recommender Systems (RSs) provide personalized recommendation service based
on user interest, which are widely used in various platforms. However, there
are lots of users with sparse interest due to lacking consumption behaviors,
which leads to poor recommendation results for them. This problem is widespread
in large-scale RSs and is particularly difficult to address. To solve this
problem, we propose a novel solution named User Interest Enhancement (UIE)
which enhances user interest including user profile and user history behavior
sequences using the enhancement vectors and personalized enhancement vector
generated based on stream clustering and memory networks from different
perspectives. UIE not only remarkably improves model performance on the users
with sparse interest but also significantly enhance model performance on other
users. UIE is an end-to-end solution which is easy to be implemented based on
ranking model. Moreover, we expand our solution and apply similar methods to
long-tail items, which also achieves excellent improvement. Furthermore, we
conduct extensive offline and online experiments in a large-scale industrial
RS. The results demonstrate that our model outperforms other models remarkably,
especially for the users with sparse interest. Until now, UIE has been fully
deployed in multiple large-scale RSs and achieved remarkable improvements.
[LINK]
http://arxiv.org/abs/2405.13238v4
[DATE]
2025-01-17 11:44:29+08:00
[CATEGORIES]
cs.LG
HEART: Achieving Timely Multi-Model Training for Vehicle-Edge-Cloud-Integrated Hierarchical Federated Learning
[AUTHORS]
Xiaohong Yang, Minghui Liwang, Xianbin Wang, Zhipeng Cheng, Seyyedali Hosseinalipour, Huaiyu Dai, Zhenzhen Jiao
[ABSTRACT]
The rapid growth of AI-enabled Internet of Vehicles (IoV) calls for efficient
machine learning (ML) solutions that can handle high vehicular mobility and
decentralized data. This has motivated the emergence of Hierarchical Federated
Learning over vehicle-edge-cloud architectures (VEC-HFL). Nevertheless, one
aspect which is underexplored in the literature on VEC-HFL is that vehicles
often need to execute multiple ML tasks simultaneously, where this multi-model
training environment introduces crucial challenges. First, improper aggregation
rules can lead to model obsolescence and prolonged training times. Second,
vehicular mobility may result in inefficient data utilization by preventing the
vehicles from returning their models to the network edge. Third, achieving a
balanced resource allocation across diverse tasks becomes of paramount
importance as it majorly affects the effectiveness of collaborative training.
We take one of the first steps towards addressing these challenges via
proposing a framework for multi-model training in dynamic VEC-HFL with the goal
of minimizing global training latency while ensuring balanced training across
various tasks-a problem that turns out to be NP-hard. To facilitate timely
model training, we introduce a hybrid synchronous-asynchronous aggregation
rule. Building on this, we present a novel method called Hybrid Evolutionary
And gReedy allocaTion (HEART). The framework operates in two stages: first, it
achieves balanced task scheduling through a hybrid heuristic approach that
combines improved Particle Swarm Optimization (PSO) and Genetic Algorithms
(GA); second, it employs a low-complexity greedy algorithm to determine the
training priority of assigned tasks on vehicles. Experiments on real-world
datasets demonstrate the superiority of HEART over existing methods.
[COMMENTS]
14 pages, 6 figures,
[LINK]
http://arxiv.org/abs/2501.09934v1
[DATE]
2025-01-17 11:15:03+08:00
[CATEGORIES]
cs.LG
Spatial Clustering of Citizen Science Data Improves Downstream Species Distribution Models
[AUTHORS]
Nahian Ahmed, Mark Roth, Tyler A. Hallman, W. Douglas Robinson, Rebecca A. Hutchinson
[ABSTRACT]
Citizen science biodiversity data present great opportunities for ecology and
conservation across vast spatial and temporal scales. However, the
opportunistic nature of these data lacks the sampling structure required by
modeling methodologies that address a pervasive challenge in ecological data
collection: imperfect detection, i.e., the likelihood of under-observing
species on field surveys. Occupancy modeling is an example of an approach that
accounts for imperfect detection by explicitly modeling the observation process
separately from the biological process of habitat selection. This produces
species distribution models that speak to the pattern of the species on a
landscape after accounting for imperfect detection in the data, rather than the
pattern of species observations corrupted by errors. To achieve this benefit,
occupancy models require multiple surveys of a site across which the site’s
status (i.e., occupied or not) is assumed constant. Since citizen science data
are not collected under the required repeated-visit protocol, observations may
be grouped into sites post hoc. Existing approaches for constructing sites
discard some observations and/or consider only geographic distance and not
environmental similarity. In this study, we compare ten approaches for site
construction in terms of their impact on downstream species distribution models
for 31 bird species in Oregon, using observations recorded in the eBird
database. We find that occupancy models built on sites constructed by spatial
clustering algorithms perform better than existing alternatives.
[COMMENTS]
AAAI 2025
[LINK]
http://arxiv.org/abs/2412.15559v3
[DATE]
2025-01-17 10:50:03+08:00
[CATEGORIES]
cs.LG
Study on a Fast Solver for Combined Field Integral Equations of 3D Conducting Bodies Based on Graph Neural Networks
[AUTHORS]
Tao Shan, Xin Zhang, Di Wu
[ABSTRACT]
In this paper, we present a graph neural networks (GNNs)-based fast solver
(GraphSolver) for solving combined field integral equations (CFIEs) of 3D
conducting bodies. Rao-Wilton-Glisson (RWG) basis functions are employed to
discretely and accurately represent the geometry of 3D conducting bodies. A
concise and informative graph representation is then constructed by treating
each RWG function as a node in the graph, enabling the flow of current between
nodes. With the transformed graphs, GraphSolver is developed to directly
predict real and imaginary parts of the x, y and z components of the surface
current densities at each node (RWG function). Numerical results demonstrate
the efficacy of GraphSolver in solving CFIEs for 3D conducting bodies with
varying levels of geometric complexity, including basic 3D targets,
missile-shaped targets, and airplane-shaped targets.
[COMMENTS]
10 pages,11 figures
[LINK]
http://arxiv.org/abs/2501.09923v1
[DATE]
2025-01-17 10:40:04+08:00
[CATEGORIES]
cs.LG
Decoupled Sequence and Structure Generation for Realistic Antibody Design
[AUTHORS]
Nayoung Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park
[ABSTRACT]
Recently, deep learning has made rapid progress in antibody design, which
plays a key role in the advancement of therapeutics. A dominant paradigm is to
train a model to jointly generate the antibody sequence and the structure as a
candidate. However, the joint generation requires the model to generate both
the discrete amino acid categories and the continuous 3D coordinates; this
limits the space of possible architectures and may lead to suboptimal
performance. In response, we propose an antibody sequence-structure decoupling
(ASSD) framework, which separates sequence generation and structure prediction.
Although our approach is simple, our idea allows the use of powerful neural
architectures and demonstrates notable performance improvements. We also find
that the widely used non-autoregressive generators promote sequences with
overly repeating tokens. Such sequences are both out-of-distribution and prone
to undesirable developability properties that can trigger harmful immune
responses in patients. To resolve this, we introduce a composition-based
objective that allows an efficient trade-off between high performance and low
token repetition. ASSD shows improved performance in various antibody design
experiments, while the composition-based objective successfully mitigates token
repetition of non-autoregressive models.
[COMMENTS]
22 pages, 6 figures
[LINK]
http://arxiv.org/abs/2402.05982v3
[DATE]
2025-01-17 10:28:17+08:00
[CATEGORIES]
cs.LG
Bayesian Adaptive Calibration and Optimal Design
[AUTHORS]
Rafael Oliveira, Dino Sejdinovic, David Howard, Edwin V. Bonilla
[ABSTRACT]
The process of calibrating computer models of natural phenomena is essential
for applications in the physical sciences, where plenty of domain knowledge can
be embedded into simulations and then calibrated against real observations.
Current machine learning approaches, however, mostly rely on rerunning
simulations over a fixed set of designs available in the observed data,
potentially neglecting informative correlations across the design space and
requiring a large amount of simulations. Instead, we consider the calibration
process from the perspective of Bayesian adaptive experimental design and
propose a data-efficient algorithm to run maximally informative simulations
within a batch-sequential process. At each round, the algorithm jointly
estimates the parameters of the posterior distribution and optimal designs by
maximising a variational lower bound of the expected information gain. The
simulator is modelled as a sample from a Gaussian process, which allows us to
correlate simulations and observed data with the unknown calibration
parameters. We show the benefits of our method when compared to related
approaches across synthetic and real-data problems.
[COMMENTS]
NeurIPS 2024 final revision
[LINK]
http://arxiv.org/abs/2405.14440v3
[DATE]
2025-01-17 09:49:21+08:00
[CATEGORIES]
cs.LG
SBAMDT: Bayesian Additive Decision Trees with Adaptive Soft Semi-multivariate Split Rules
[AUTHORS]
Stamatina Lamprinakou, Huiyan Sang, Bledar A. Konomi, Ligang Lu
[ABSTRACT]
Bayesian Additive Regression Trees [BART, Chipman et al., 2010] have gained
significant popularity due to their remarkable predictive performance and
ability to quantify uncertainty. However, standard decision tree models rely on
recursive data splits at each decision node, using deterministic decision rules
based on a single univariate feature. This approach limits their ability to
effectively capture complex decision boundaries, particularly in scenarios
involving multiple features, such as spatial domains, or when transitions are
either sharp or smoothly varying. In this paper, we introduce a novel
probabilistic additive decision tree model that employs a soft split rule. This
method enables highly flexible splits that leverage both univariate and
multivariate features, while also respecting the geometric properties of the
feature domain. Notably, the probabilistic split rule adapts dynamically across
decision nodes, allowing the model to account for varying levels of smoothness
in the regression function. We demonstrate the utility of the proposed model
through comparisons with existing tree-based models on synthetic datasets and a
New York City education dataset.
[LINK]
http://arxiv.org/abs/2501.09900v1
[DATE]
2025-01-17 09:13:44+08:00
[CATEGORIES]
cs.LG
A Systematic Study of Multi-Agent Deep Reinforcement Learning for Safe and Robust Autonomous Highway Ramp Entry
[AUTHORS]
Larry Schester, Luis E. Ortiz
[ABSTRACT]
Vehicles today can drive themselves on highways and driverless robotaxis
operate in major cities, with more sophisticated levels of autonomous driving
expected to be available and become more common in the future. Yet, technically
speaking, so-called “Level 5” (L5) operation, corresponding to full autonomy,
has not been achieved. For that to happen, functions such as fully autonomous
highway ramp entry must be available, and provide provably safe, and reliably
robust behavior to enable full autonomy. We present a systematic study of a
highway ramp function that controls the vehicles forward-moving actions to
minimize collisions with the stream of highway traffic into which a merging
(ego) vehicle enters. We take a game-theoretic multi-agent (MA) approach to
this problem and study the use of controllers based on deep reinforcement
learning (DRL). The virtual environment of the MA DRL uses self-play with
simulated data where merging vehicles safely learn to control longitudinal
position during a taper-type merge. The work presented in this paper extends
existing work by studying the interaction of more than two vehicles (agents)
and does so by systematically expanding the road scene with additional traffic
and ego vehicles. While previous work on the two-vehicle setting established
that collision-free controllers are theoretically impossible in fully
decentralized, non-coordinated environments, we empirically show that
controllers learned using our approach are nearly ideal when measured against
idealized optimal controllers.
[COMMENTS]
9 pages, 9 figures; added support ack
[LINK]
http://arxiv.org/abs/2411.14593v2
[DATE]
2025-01-17 09:00:13+08:00
[CATEGORIES]
cs.LG
Sparse Binary Representation Learning for Knowledge Tracing
[AUTHORS]
Yahya Badran, Christine Preisach
[ABSTRACT]
Knowledge tracing (KT) models aim to predict students’ future performance
based on their historical interactions. Most existing KT models rely
exclusively on human-defined knowledge concepts (KCs) associated with
exercises. As a result, the effectiveness of these models is highly dependent
on the quality and completeness of the predefined KCs. Human errors in labeling
and the cost of covering all potential underlying KCs can limit model
performance.
In this paper, we propose a KT model, Sparse Binary Representation KT
(SBRKT), that generates new KC labels, referred to as auxiliary KCs, which can
augment the predefined KCs to address the limitations of relying solely on
human-defined KCs. These are learned through a binary vector representation,
where each bit indicates the presence (one) or absence (zero) of an auxiliary
KC. The resulting discrete representation allows these auxiliary KCs to be
utilized in training any KT model that incorporates KCs. Unlike pre-trained
dense embeddings, which are limited to models designed to accept such vectors,
our discrete representations are compatible with both classical models, such as
Bayesian Knowledge Tracing (BKT), and modern deep learning approaches.
To generate this discrete representation, SBRKT employs a binarization method
that learns a sparse representation, fully trainable via stochastic gradient
descent. Additionally, SBRKT incorporates a recurrent neural network (RNN) to
capture temporal dynamics and predict future student responses by effectively
combining the auxiliary and predefined KCs. Experimental results demonstrate
that SBRKT outperforms the tested baselines on several datasets and achieves
competitive performance on others. Furthermore, incorporating the learned
auxiliary KCs consistently enhances the performance of BKT across all tested
datasets.
[LINK]
http://arxiv.org/abs/2501.09893v1
[DATE]
2025-01-17 08:45:10+08:00
[CATEGORIES]
cs.LG
A Complete Characterization of Learnability for Stochastic Noisy Bandits
[AUTHORS]
Steve Hanneke, Kun Wang
[ABSTRACT]
We study the stochastic noisy bandit problem with an unknown reward function
$f^*$ in a known function class $\mathcal{F}$. Formally, a model $M$ maps arms
$\pi$ to a probability distribution $M(\pi)$ of reward. A model class
$\mathcal{M}$ is a collection of models. For each model $M$, define its mean
reward function $f^M(\pi)=\mathbb{E}_{r \sim M(\pi)}[r]$. In the bandit
learning problem, we proceed in rounds, pulling one arm $\pi$ each round and
observing a reward sampled from $M(\pi)$. With knowledge of $\mathcal{M}$,
supposing that the true model $M\in \mathcal{M}$, the objective is to identify
an arm $\hat{\pi}$ of near-maximal mean reward $f^M(\hat{\pi})$ with high
probability in a bounded number of rounds. If this is possible, then the model
class is said to be learnable.
Importantly, a result of \cite{hanneke2023bandit} shows there exist model
classes for which learnability is undecidable. However, the model class they
consider features deterministic rewards, and they raise the question of whether
learnability is decidable for classes containing sufficiently noisy models. For
the first time, we answer this question in the positive by giving a complete
characterization of learnability for model classes with arbitrary noise. In
addition to that, we also describe the full spectrum of possible optimal query
complexities. Further, we prove adaptivity is sometimes necessary to achieve
the optimal query complexity. Last, we revisit an important complexity measure
for interactive decision making, the Decision-Estimation-Coefficient
\citep{foster2021statistical,foster2023tight}, and propose a new variant of the
DEC which also characterizes learnability in this setting.
[LINK]
http://arxiv.org/abs/2410.09597v2
[DATE]
2025-01-17 08:25:18+08:00
[CATEGORIES]
cs.LG
Geometry-Preserving Encoder/Decoder in Latent Generative Models
[AUTHORS]
Wonjun Lee, Riley C. W. O’Neill, Dongmian Zou, Jeff Calder, Gilad Lerman
[ABSTRACT]
Generative modeling aims to generate new data samples that resemble a given
dataset, with diffusion models recently becoming the most popular generative
model. One of the main challenges of diffusion models is solving the problem in
the input space, which tends to be very high-dimensional. Recently, solving
diffusion models in the latent space through an encoder that maps from the data
space to a lower-dimensional latent space has been considered to make the
training process more efficient and has shown state-of-the-art results. The
variational autoencoder (VAE) is the most commonly used encoder/decoder
framework in this domain, known for its ability to learn latent representations
and generate data samples. In this paper, we introduce a novel encoder/decoder
framework with theoretical properties distinct from those of the VAE,
specifically designed to preserve the geometric structure of the data
distribution. We demonstrate the significant advantages of this
geometry-preserving encoder in the training process of both the encoder and
decoder. Additionally, we provide theoretical results proving convergence of
the training process, including convergence guarantees for encoder training,
and results showing faster convergence of decoder training when using the
geometry-preserving encoder.
[COMMENTS]
41 pages
[LINK]
http://arxiv.org/abs/2501.09876v1
[DATE]
2025-01-17 07:14:34+08:00
[CATEGORIES]
cs.LG
Preference-based Pure Exploration
[AUTHORS]
Apurv Shukla, Debabrota Basu
[ABSTRACT]
We study the preference-based pure exploration problem for bandits with
vector-valued rewards. The rewards are ordered using a (given) preference cone
$\mathcal{C}$ and our goal is to identify the set of Pareto optimal arms.
First, to quantify the impact of preferences, we derive a novel lower bound on
sample complexity for identifying the most preferred policy with a confidence
level $1-\delta$. Our lower bound elicits the role played by the geometry of
the preference cone and punctuates the difference in hardness compared to
existing best-arm identification variants of the problem. We further explicate
this geometry when the rewards follow Gaussian distributions. We then provide a
convex relaxation of the lower bound and leverage it to design the
Preference-based Track and Stop (PreTS) algorithm that identifies the most
preferred policy. Finally, we show that the sample complexity of PreTS is
asymptotically tight by deriving a new concentration inequality for
vector-valued rewards.
[LINK]
http://arxiv.org/abs/2412.02988v2
[DATE]
2025-01-17 06:16:11+08:00
[CATEGORIES]
cs.LG
Learning Noisy Halfspaces with a Margin: Massart is No Harder than Random
[AUTHORS]
Gautam Chandrasekaran, Vasilis Kontonis, Konstantinos Stavropoulos, Kevin Tian
[ABSTRACT]
We study the problem of PAC learning $\gamma$-margin halfspaces with Massart
noise. We propose a simple proper learning algorithm, the Perspectron, that has
sample complexity $\widetilde{O}((\epsilon\gamma)^{-2})$ and achieves
classification error at most $\eta+\epsilon$ where $\eta$ is the Massart noise
rate. Prior works [DGT19,CKMY20] came with worse sample complexity guarantees
(in both $\epsilon$ and $\gamma$) or could only handle random classification
noise [DDK+23,KIT+23] – a much milder noise assumption. We also show that our
results extend to the more challenging setting of learning generalized linear
models with a known link function under Massart noise, achieving a similar
sample complexity to the halfspace case. This significantly improves upon the
prior state-of-the-art in this setting due to [CKMY20], who introduced this
model.
[COMMENTS]
Appeared in NeurIPS 2024
[LINK]
http://arxiv.org/abs/2501.09851v1
[DATE]
2025-01-17 05:46:53+08:00
[CATEGORIES]
cs.LG
Coded Deep Learning: Framework and Algorithm
[AUTHORS]
En-hui Yang, Shayan Mohajer Hamidi
[ABSTRACT]
The success of deep learning (DL) is often achieved with large models and
high complexity during both training and post-training inferences, hindering
training in resource-limited settings. To alleviate these issues, this paper
introduces a new framework dubbed “coded deep learning” (CDL), which
integrates information-theoretic coding concepts into the inner workings of DL,
to significantly compress model weights and activations, reduce computational
complexity at both training and post-training inference stages, and enable
efficient model/data parallelism. Specifically, within CDL, (i) we first
propose a novel probabilistic method for quantizing both model weights and
activations, and its soft differentiable variant which offers an analytic
formula for gradient calculation during training; (ii) both the forward and
backward passes during training are executed over quantized weights and
activations, eliminating most floating-point operations and reducing training
complexity; (iii) during training, both weights and activations are entropy
constrained so that they are compressible in an information-theoretic sense
throughout training, thus reducing communication costs in model/data
parallelism; and (iv) the trained model in CDL is by default in a quantized
format with compressible quantized weights, reducing post-training inference
and storage complexity. Additionally, a variant of CDL, namely relaxed CDL
(R-CDL), is presented to further improve the trade-off between validation
accuracy and compression though requiring full precision in training with other
advantageous features of CDL intact. Extensive empirical results show that CDL
and R-CDL outperform the state-of-the-art algorithms in DNN compression in the
literature.
[LINK]
http://arxiv.org/abs/2501.09849v1
[DATE]
2025-01-17 05:33:47+08:00
[CATEGORIES]
cs.LG
Intelligent Icing Detection Model of Wind Turbine Blades Based on SCADA data
[AUTHORS]
Wenqian Jiang, Junyang Jin
[ABSTRACT]
Diagnosis of ice accretion on wind turbine blades is all the time a hard nut
to crack in condition monitoring of wind farms. Existing methods focus on
mechanism analysis of icing process, deviation degree analysis of feature
engineering. However, there have not been deep researches of neural networks
applied in this field at present. Supervisory control and data acquisition
(SCADA) makes it possible to train networks through continuously providing not
only operation parameters and performance parameters of wind turbines but also
environmental parameters and operation modes. This paper explores the
possibility that using convolutional neural networks (CNNs), generative
adversarial networks (GANs) and domain adaption learning to establish
intelligent diagnosis frameworks under different training scenarios.
Specifically, PGANC and PGANT are proposed for sufficient and non-sufficient
target wind turbine labeled data, respectively. The basic idea is that we
consider a two-stage training with parallel GANs, which are aimed at capturing
intrinsic features for normal and icing samples, followed by classification CNN
or domain adaption module in various training cases. Model validation on three
wind turbine SCADA data shows that two-stage training can effectively improve
the model performance. Besides, if there is no sufficient labeled data for a
target turbine, which is an extremely common phenomenon in real industrial
practices, the addition of domain adaption learning makes the trained model
show better performance. Overall, our proposed intelligent diagnosis frameworks
can achieve more accurate detection on the same wind turbine and more
generalized capability on a new wind turbine, compared with other machine
learning models and conventional CNNs.
[COMMENTS]
10 pages, 6 figures
[LINK]
http://arxiv.org/abs/2101.07914v2
[DATE]
2025-01-17 05:18:48+08:00
[CATEGORIES]
cs.LG
Multi-hop Upstream Anticipatory Traffic Signal Control with Deep Reinforcement Learning
[AUTHORS]
Xiaocan Li, Xiaoyu Wang, Ilia Smirnov, Scott Sanner, Baher Abdulhai
[ABSTRACT]
Coordination in traffic signal control is crucial for managing congestion in
urban networks. Existing pressure-based control methods focus only on immediate
upstream links, leading to suboptimal green time allocation and increased
network delays. However, effective signal control inherently requires
coordination across a broader spatial scope, as the effect of upstream traffic
should influence signal control decisions at downstream intersections,
impacting a large area in the traffic network. Although agent communication
using neural network-based feature extraction can implicitly enhance spatial
awareness, it significantly increases the learning complexity, adding an
additional layer of difficulty to the challenging task of control in deep
reinforcement learning. To address the issue of learning complexity and myopic
traffic pressure definition, our work introduces a novel concept based on
Markov chain theory, namely \textit{multi-hop upstream pressure}, which
generalizes the conventional pressure to account for traffic conditions beyond
the immediate upstream links. This farsighted and compact metric informs the
deep reinforcement learning agent to preemptively clear the multi-hop upstream
queues, guiding the agent to optimize signal timings with a broader spatial
awareness. Simulations on synthetic and realistic (Toronto) scenarios
demonstrate controllers utilizing multi-hop upstream pressure significantly
reduce overall network delay by prioritizing traffic movements based on a
broader understanding of upstream congestion.
[COMMENTS]
5 tables, 11 figures
[LINK]
http://arxiv.org/abs/2411.07271v2
[DATE]
2025-01-17 05:09:57+08:00
[CATEGORIES]
cs.LG
Model Alignment Search
[AUTHORS]
Satchel Grant
[ABSTRACT]
When can we say that two neural systems are the same? The answer to this
question is goal-dependent, and it is often addressed through correlative
methods such as Representational Similarity Analysis (RSA) and Centered Kernel
Alignment (CKA). What do we miss when we forgo causal explorations, and how can
we target specific types of similarity? In this work, we introduce Model
Alignment Search (MAS), a method for causally exploring distributed
representational similarity. The method learns invertible linear
transformations that align a subspace between two distributed networks’
representations where causal information can be freely interchanged. We first
show that the method can be used to transfer specific causal variables, such as
the number of items in a counting task, between networks with different
training seeds. We then explore open questions in number cognition by comparing
different types of numeric representations in models trained on structurally
different numeric tasks. We then explore differences between MAS vs preexisting
causal similarity methods, and lastly, we introduce a counterfactual latent
auxiliary loss function that helps shape causally relevant alignments even in
cases where we do not have causal access to one of the two models for training.
[LINK]
http://arxiv.org/abs/2501.06164v2
[DATE]
2025-01-17 05:07:04+08:00
[CATEGORIES]
cs.LG
pFedWN: A Personalized Federated Learning Framework for D2D Wireless Networks with Heterogeneous Data
[AUTHORS]
Zhou Ni, Masoud Ghazikor, Morteza Hashemi
[ABSTRACT]
Traditional Federated Learning (FL) approaches often struggle with data
heterogeneity across clients, leading to suboptimal model performance for
individual clients. To address this issue, Personalized Federated Learning
(PFL) emerges as a solution to the challenges posed by non-independent and
identically distributed (non-IID) and unbalanced data across clients.
Furthermore, in most existing decentralized machine learning works, a perfect
communication channel is considered for model parameter transmission between
clients and servers. However, decentralized PFL over wireless links introduces
new challenges, such as resource allocation and interference management. To
overcome these challenges, we formulate a joint optimization problem that
incorporates the underlying device-to-device (D2D) wireless channel conditions
into a server-free PFL approach. The proposed method, dubbed pFedWN, optimizes
the learning performance for each client while accounting for the variability
in D2D wireless channels. To tackle the formulated problem, we divide it into
two sub-problems: PFL neighbor selection and PFL weight assignment. The PFL
neighbor selection is addressed through channel-aware neighbor selection within
unlicensed spectrum bands such as ISM bands. Next, to assign PFL weights, we
utilize the Expectation-Maximization (EM) method to evaluate the similarity
between clients’ data and obtain optimal weight distribution among the chosen
PFL neighbors. Empirical results show that pFedWN provides efficient and
personalized learning performance with non-IID and unbalanced datasets.
Furthermore, it outperforms the existing FL and PFL methods in terms of
learning efficacy and robustness, particularly under dynamic and unpredictable
wireless channel conditions.
[COMMENTS]
16 pages, 9 figures, 3 tables, submitted to Transactions on
Networking
[LINK]
http://arxiv.org/abs/2501.09822v1
[DATE]
2025-01-17 04:16:49+08:00
[CATEGORIES]
cs.LG
BN-Pool: a Bayesian Nonparametric Approach to Graph Pooling
[AUTHORS]
Daniele Castellana, Filippo Maria Bianchi
[ABSTRACT]
We introduce BN-Pool, the first clustering-based pooling method for Graph
Neural Networks (GNNs) that adaptively determines the number of supernodes in a
coarsened graph. By leveraging a Bayesian non-parametric framework, BN-Pool
employs a generative model capable of partitioning graph nodes into an
unbounded number of clusters. During training, we learn the node-to-cluster
assignments by combining the supervised loss of the downstream task with an
unsupervised auxiliary term, which encourages the reconstruction of the
original graph topology while penalizing unnecessary proliferation of clusters.
This adaptive strategy allows BN-Pool to automatically discover an optimal
coarsening level, offering enhanced flexibility and removing the need to
specify sensitive pooling ratios. We show that BN-Pool achieves superior
performance across diverse benchmarks.
[LINK]
http://arxiv.org/abs/2501.09821v1
[DATE]
2025-01-17 04:15:12+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks for Travel Distance Estimation and Route Recommendation Under Probabilistic Hazards
[AUTHORS]
Tong Liu, Hadi Meidani
[COMMENTS]
17 pages, 11 figures
[LINK]
http://arxiv.org/abs/2501.09803v1
[DATE]
2025-01-17 03:22:50+08:00
[CATEGORIES]
cs.LG
Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists
[AUTHORS]
Joachim Baumann, Celestine Mendler-Dünner
[ABSTRACT]
We investigate algorithmic collective action in transformer-based recommender
systems. Our use case is a music streaming platform where a collective of fans
aims to promote the visibility of an underrepresented artist by strategically
placing one of their songs in the existing playlists they control. We introduce
two easily implementable strategies to select the position at which to insert
the song with the goal to boost recommendations at test time. The strategies
exploit statistical properties of the learner by targeting discontinuities in
the recommendations, and leveraging the long-tail nature of song distributions.
We evaluate the efficacy of our strategies using a publicly available
recommender system model released by a major music streaming platform. Our
findings reveal that through strategic placement even small collectives
(controlling less than 0.01\% of the training data) can achieve up to
$40\times$ more test time recommendations than an average song with the same
number of training set occurrences. Focusing on the externalities of the
strategy, we find that the recommendations of other songs are largely
preserved, and the newly gained recommendations are distributed across various
artists. Together, our findings demonstrate how carefully designed collective
action strategies can be effective while not necessarily being adversarial.
[COMMENTS]
Published at NeurIPS 2024, camera-ready updates
[LINK]
http://arxiv.org/abs/2404.04269v2
[DATE]
2025-01-17 02:59:53+08:00
[CATEGORIES]
cs.LG
FAST: Efficient Action Tokenization for Vision-Language-Action Models
[AUTHORS]
Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine
[ABSTRACT]
Autoregressive sequence models, such as Transformer-based vision-language
action (VLA) policies, can be tremendously effective for capturing complex and
generalizable robotic behaviors. However, such models require us to choose a
tokenization of our continuous action signals, which determines how the
discrete symbols predicted by the model map to continuous robot actions. We
find that current approaches for robot action tokenization, based on simple
per-dimension, per-timestep binning schemes, typically perform poorly when
learning dexterous skills from high-frequency robot data. To address this
challenge, we propose a new compression-based tokenization scheme for robot
actions, based on the discrete cosine transform. Our tokenization approach,
Frequency-space Action Sequence Tokenization (FAST), enables us to train
autoregressive VLAs for highly dexterous and high-frequency tasks where
standard discretization methods fail completely. Based on FAST, we release
FAST+, a universal robot action tokenizer, trained on 1M real robot action
trajectories. It can be used as a black-box tokenizer for a wide range of robot
action sequences, with diverse action spaces and control frequencies. Finally,
we show that, when combined with the pi0 VLA, our method can scale to training
on 10k hours of robot data and match the performance of diffusion VLAs, while
reducing training time by up to 5x.
[COMMENTS]
Website: https://www.pi.website/research/fast
[LINK]
http://arxiv.org/abs/2501.09747v1
[DATE]
2025-01-17 02:57:04+08:00
[CATEGORIES]
cs.LG
Using Machine Learning to Discover Parsimonious and Physically-Interpretable Representations of Catchment-Scale Rainfall-Runoff Dynamics
[AUTHORS]
Yuan-Heng Wang, Hoshin V. Gupta
[ABSTRACT]
Despite the excellent real-world predictive performance of modern machine
learning (ML) methods, many scientists remain hesitant to discard traditional
physical-conceptual (PC) approaches due mainly to their relative
interpretability, which contributes to credibility during decision-making. In
this context, a currently underexplored aspect of ML is how to develop
minimally-optimal representations that can facilitate better insight regarding
system functioning. Regardless of how this is achieved, it is arguably true
that parsimonious representations better support the advancement of scientific
understanding. Our own view is that ML-based modeling of geoscientific systems
should be based in the use of computational units that are fundamentally
interpretable by design.
This paper continues our exploration of how the strengths of ML can be
exploited in the service of better understanding via scientific investigation.
Here, we use the Mass Conserving Perceptron (MCP) as the fundamental
computational unit in a generic network architecture consisting of nodes
arranged in series and parallel to explore several generic and important issues
related to the use of observational data for constructing input-state-output
models of dynamical systems. In the context of lumped catchment modeling, we
show that physical interpretability and excellent predictive performance can
both be achieved using a relatively parsimonious distributed-state
multiple-flow-path network with context-dependent gating and information
sharing across the nodes, suggesting that MCP-based modeling can play a
significant role in application of ML to geoscientific investigation.
[COMMENTS]
74 Pages, 4 Tables, 13 Figures, 11 Tables and 11 Figures in
Supplementary Materials
[LINK]
http://arxiv.org/abs/2412.04845v2
[DATE]
2025-01-17 02:48:36+08:00
[CATEGORIES]
cs.LG
Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions
[AUTHORS]
Coralia Cartis, Zhen Shao, Edward Tansley
[ABSTRACT]
We propose and analyze random subspace variants of the second-order Adaptive
Regularization using Cubics (ARC) algorithm. These methods iteratively restrict
the search space to some random subspace of the parameters, constructing and
minimizing a local model only within this subspace. Thus, our variants only
require access to (small-dimensional) projections of first- and second-order
problem derivatives and calculate a reduced step inexpensively. Under suitable
assumptions, the ensuing methods maintain the optimal first-order, and
second-order, global rates of convergence of (full-dimensional) cubic
regularization, while showing improved scalability both theoretically and
numerically, particularly when applied to low-rank functions. When applied to
the latter, our adaptive variant naturally adapts the subspace size to the true
rank of the function, without knowing it a priori.
[LINK]
http://arxiv.org/abs/2501.09734v1
[DATE]
2025-01-17 02:37:59+08:00
[CATEGORIES]
cs.LG
Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI
[AUTHORS]
Wenlong Ji, Lihua Lei, Tijana Zrnic
[ABSTRACT]
We establish a formal connection between the decades-old surrogate outcome
model in biostatistics and economics and the emerging field of
prediction-powered inference (PPI). The connection treats predictions from
pre-trained models, prevalent in the age of AI, as cost-effective surrogates
for expensive outcomes. Building on the surrogate outcomes literature, we
develop recalibrated prediction-powered inference, a more efficient approach to
statistical inference than existing PPI proposals. Our method departs from the
existing proposals by using flexible machine learning techniques to learn the
optimal “imputed loss” through a step we call recalibration. Importantly, the
method always improves upon the estimator that relies solely on the data with
available true outcomes, even when the optimal imputed loss is estimated
imperfectly, and it achieves the smallest asymptotic variance among PPI
estimators if the estimate is consistent. Computationally, our optimization
objective is convex whenever the loss function that defines the target
parameter is convex. We further analyze the benefits of recalibration, both
theoretically and numerically, in several common scenarios where machine
learning predictions systematically deviate from the outcome of interest. We
demonstrate significant gains in effective sample size over existing PPI
proposals via three applications leveraging state-of-the-art machine
learning/AI models.
[LINK]
http://arxiv.org/abs/2501.09731v1
[DATE]
2025-01-17 02:30:33+08:00
[CATEGORIES]
cs.LG
Generating particle physics Lagrangians with transformers
[AUTHORS]
Yong Sheng Koay, Rikard Enberg, Stefano Moretti, Eliel Camargo-Molina
[ABSTRACT]
In physics, Lagrangians provide a systematic way to describe laws governing
physical systems. In the context of particle physics, they encode the
interactions and behavior of the fundamental building blocks of our universe.
By treating Lagrangians as complex, rule-based constructs similar to linguistic
expressions, we trained a transformer model – proven to be effective in
natural language tasks – to predict the Lagrangian corresponding to a given
list of particles. We report on the transformer’s performance in constructing
Lagrangians respecting the Standard Model $\mathrm{SU}(3)\times
\mathrm{SU}(2)\times \mathrm{U}(1)$ gauge symmetries. The resulting model is
shown to achieve high accuracies (over 90\%) with Lagrangians up to six matter
fields, with the capacity to generalize beyond the training distribution,
albeit within architectural constraints. We show through an analysis of input
embeddings that the model has internalized concepts such as group
representations and conjugation operations as it learned to generate
Lagrangians. We make the model and training datasets available to the
community. An interactive demonstration can be found at:
\url{https://huggingface.co/spaces/JoseEliel/generate-lagrangians}.
[COMMENTS]
32 pages, 11 figues, 18 tables
[LINK]
http://arxiv.org/abs/2501.09729v1
[DATE]
2025-01-17 02:25:50+08:00
[CATEGORIES]
cs.LG
Practical Continual Forgetting for Pre-trained Vision Models
[AUTHORS]
Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang
[ABSTRACT]
For privacy and security concerns, the need to erase unwanted information
from pre-trained vision models is becoming evident nowadays. In real-world
scenarios, erasure requests originate at any time from both users and model
owners, and these requests usually form a sequence. Therefore, under such a
setting, selective information is expected to be continuously removed from a
pre-trained model while maintaining the rest. We define this problem as
continual forgetting and identify three key challenges. (i) For unwanted
knowledge, efficient and effective deleting is crucial. (ii) For remaining
knowledge, the impact brought by the forgetting procedure should be minimal.
(iii) In real-world scenarios, the training samples may be scarce or partially
missing during the process of forgetting. To address them, we first propose
Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we introduce LoRA
modules to fine-tune the FFN layers in Transformer blocks for each forgetting
task independently, and towards (ii), a simple group sparse regularization is
adopted, enabling automatic selection of specific LoRA groups and zeroing out
the others. To further extend GS-LoRA to more practical scenarios, we
incorporate prototype information as additional supervision and introduce a
more practical approach, GS-LoRA++. For each forgotten class, we move the
logits away from its original prototype. For the remaining classes, we pull the
logits closer to their respective prototypes. We conduct extensive experiments
on face recognition, object detection and image classification and demonstrate
that our method manages to forget specific classes with minimal impact on other
classes. Codes have been released on https://github.com/bjzhb666/GS-LoRA.
[LINK]
http://arxiv.org/abs/2501.09705v1
[DATE]
2025-01-17 01:57:53+08:00
[CATEGORIES]
cs.LG
A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise
[AUTHORS]
Ilias Diakonikolas, Nikos Zarifis
[ABSTRACT]
We study the problem of PAC learning $\gamma$-margin halfspaces in the
presence of Massart noise. Without computational considerations, the sample
complexity of this learning problem is known to be
$\widetilde{\Theta}(1/(\gamma^2 \epsilon))$. Prior computationally efficient
algorithms for the problem incur sample complexity $\tilde{O}(1/(\gamma^4
\epsilon^3))$ and achieve 0-1 error of $\eta+\epsilon$, where $\eta<1/2$ is the
upper bound on the noise rate. Recent work gave evidence of an
information-computation tradeoff, suggesting that a quadratic dependence on
$1/\epsilon$ is required for computationally efficient algorithms. Our main
result is a computationally efficient learner with sample complexity
$\widetilde{\Theta}(1/(\gamma^2 \epsilon^2))$, nearly matching this lower
bound. In addition, our algorithm is simple and practical, relying on online
SGD on a carefully selected sequence of convex losses.
[LINK]
http://arxiv.org/abs/2501.09691v1
[DATE]
2025-01-17 01:44:18+08:00
[CATEGORIES]
cs.LG
U-Fair: Uncertainty-based Multimodal Multitask Learning for Fairer Depression Detection
[AUTHORS]
Jiaee Cheong, Aditya Bangar, Sinan Kalkan, Hatice Gunes
[ABSTRACT]
Machine learning bias in mental health is becoming an increasingly pertinent
challenge. Despite promising efforts indicating that multitask approaches often
work better than unitask approaches, there is minimal work investigating the
impact of multitask learning on performance and fairness in depression
detection nor leveraged it to achieve fairer prediction outcomes. In this work,
we undertake a systematic investigation of using a multitask approach to
improve performance and fairness for depression detection. We propose a novel
gender-based task-reweighting method using uncertainty grounded in how the
PHQ-8 questionnaire is structured. Our results indicate that, although a
multitask approach improves performance and fairness compared to a unitask
approach, the results are not always consistent and we see evidence of negative
transfer and a reduction in the Pareto frontier, which is concerning given the
high-stake healthcare setting. Our proposed approach of gender-based
reweighting with uncertainty improves performance and fairness and alleviates
both challenges to a certain extent. Our findings on each PHQ-8 subitem task
difficulty are also in agreement with the largest study conducted on the PHQ-8
subitem discrimination capacity, thus providing the very first tangible
evidence linking ML findings with large-scale empirical population studies
conducted on the PHQ-8.
[COMMENTS]
To appear at the Proceedings of Machine Learning Research 259, 1-14,
2024 as part of the Machine Learning for Health (ML4H) Symposium 2024
[LINK]
http://arxiv.org/abs/2501.09687v1
[DATE]
2025-01-17 01:39:25+08:00
[CATEGORIES]
cs.LG
Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review
[AUTHORS]
Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, Tommaso Biancalani
[ABSTRACT]
This tutorial provides an in-depth guide on inference-time guidance and
alignment methods for optimizing downstream reward functions in diffusion
models. While diffusion models are renowned for their generative modeling
capabilities, practical applications in fields such as biology often require
sample generation that maximizes specific metrics (e.g., stability, affinity in
proteins, closeness to target structures). In these scenarios, diffusion models
can be adapted not only to generate realistic samples but also to explicitly
maximize desired measures at inference time without fine-tuning. This tutorial
explores the foundational aspects of such inference-time algorithms. We review
these methods from a unified perspective, demonstrating that current techniques
– such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling,
and classifier guidance – aim to approximate soft optimal denoising processes
(a.k.a. policies in RL) that combine pre-trained denoising processes with value
functions serving as look-ahead functions that predict from intermediate states
to terminal rewards. Within this framework, we present several novel algorithms
not yet covered in the literature. Furthermore, we discuss (1) fine-tuning
methods combined with inference-time techniques, (2) inference-time algorithms
based on search algorithms such as Monte Carlo tree search, which have received
limited attention in current research, and (3) connections between
inference-time algorithms in language models and diffusion models. The code of
this tutorial on protein design is available at
https://github.com/masa-ue/AlignInversePro
[COMMENTS]
We plan to add more content/codes. Please let us know if there are
any comments
[LINK]
http://arxiv.org/abs/2501.09685v1
[DATE]
2025-01-17 01:37:35+08:00
[CATEGORIES]
cs.LG
Fokker-Planck to Callan-Symanzik: evolution of weight matrices under training
[AUTHORS]
Wei Bu, Uri Kol, Ziming Liu
[ABSTRACT]
The dynamical evolution of a neural network during training has been an
incredibly fascinating subject of study. First principal derivation of generic
evolution of variables in statistical physics systems has proved useful when
used to describe training dynamics conceptually, which in practice means
numerically solving equations such as Fokker-Planck equation. Simulating entire
networks inevitably runs into the curse of dimensionality. In this paper, we
utilize Fokker-Planck to simulate the probability density evolution of
individual weight matrices in the bottleneck layers of a simple
2-bottleneck-layered auto-encoder and compare the theoretical evolutions
against the empirical ones by examining the output data distributions. We also
derive physically relevant partial differential equations such as
Callan-Symanzik and Kardar-Parisi-Zhang equations from the dynamical equation
we have.
[COMMENTS]
8 pages, 9 figures
[LINK]
http://arxiv.org/abs/2501.09659v1
[DATE]
2025-01-17 00:54:40+08:00
[CATEGORIES]
cs.LG
A Survey of Research in Large Language Models for Electronic Design Automation
[AUTHORS]
Jingyu Pan, Guanglei Zhou, Chen-Chia Chang, Isaac Jacobson, Jiang Hu, Yiran Chen
[ABSTRACT]
Within the rapidly evolving domain of Electronic Design Automation (EDA),
Large Language Models (LLMs) have emerged as transformative technologies,
offering unprecedented capabilities for optimizing and automating various
aspects of electronic design. This survey provides a comprehensive exploration
of LLM applications in EDA, focusing on advancements in model architectures,
the implications of varying model sizes, and innovative customization
techniques that enable tailored analytical insights. By examining the
intersection of LLM capabilities and EDA requirements, the paper highlights the
significant impact these models have on extracting nuanced understandings from
complex datasets. Furthermore, it addresses the challenges and opportunities in
integrating LLMs into EDA workflows, paving the way for future research and
application in this dynamic field. Through this detailed analysis, the survey
aims to offer valuable insights to professionals in the EDA industry, AI
researchers, and anyone interested in the convergence of advanced AI
technologies and electronic design.
[COMMENTS]
21 pages, 2 figures, 3 tables, accepted by TODAES
[LINK]
http://arxiv.org/abs/2501.09655v1
[DATE]
2025-01-17 00:51:59+08:00
[CATEGORIES]
cs.LG
A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation
[AUTHORS]
Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich
[ABSTRACT]
Deep neural networks excel in perception tasks such as semantic segmentation
and monocular depth estimation, making them indispensable in safety-critical
applications like autonomous driving and industrial inspection. However, they
often suffer from overconfidence and poor explainability, especially for
out-of-domain data. While uncertainty quantification has emerged as a promising
solution to these challenges, multi-task settings have yet to be explored. In
an effort to shed light on this, we evaluate Monte Carlo Dropout, Deep
Sub-Ensembles, and Deep Ensembles for joint semantic segmentation and monocular
depth estimation. Thereby, we reveal that Deep Ensembles stand out as the
preferred choice, particularly in out-of-domain scenarios, and show the
potential benefit of multi-task learning with regard to the uncertainty quality
in comparison to solving both tasks separately. Additionally, we highlight the
impact of employing different uncertainty thresholds to classify pixels as
certain or uncertain, with the median uncertainty emerging as a robust default.
[COMMENTS]
This manuscript is an extended version of a previously published
conference paper and is currently in review for a journal
[LINK]
http://arxiv.org/abs/2405.17097v2
[DATE]
2025-01-17 00:27:33+08:00
[CATEGORIES]
cs.LG
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework
[AUTHORS]
Yushen Lin, Ruichen Zhang, Wenqi Huang, Kaidi Wang, Zhiguo Ding, Daniel K. C. So, Dusit Niyato
[ABSTRACT]
In this work, we develop a specialized dataset aimed at enhancing the
evaluation and fine-tuning of large language models (LLMs) specifically for
wireless communication applications. The dataset includes a diverse set of
multi-hop questions, including true/false and multiple-choice types, spanning
varying difficulty levels from easy to hard. By utilizing advanced language
models for entity extraction and question generation, rigorous data curation
processes are employed to maintain high quality and relevance. Additionally, we
introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a
detailed theoretical analysis and justification for its use in quantifying the
information content of training data with 2.24\% and 1.31\% performance boost
for different models compared to baselines, respectively. To demonstrate the
effectiveness of the fine-tuned models with the proposed methodologies on
practical tasks, we also consider different tasks, including summarizing
optimization problems from technical papers and solving the mathematical
problems related to non-orthogonal multiple access (NOMA), which are generated
by using the proposed multi-agent framework. Simulation results show
significant performance gain in summarization tasks with 20.9\% in the ROUGE-L
metrics. We also study the scaling laws of fine-tuning LLMs and the challenges
LLMs face in the field of wireless communications, offering insights into their
adaptation to wireless communication tasks. This dataset and fine-tuning
methodology aim to enhance the training and evaluation of LLMs, contributing to
advancements in LLMs for wireless communication research and applications.
[COMMENTS]
13 pages, 13 figure, journal
[LINK]
http://arxiv.org/abs/2501.09631v1
[DATE]
2025-01-17 00:19:53+08:00
[CATEGORIES]
cs.LG
Flexible task abstractions emerge in linear networks with fast and bounded units
[AUTHORS]
Kai Sandbrink, Jan P. Bauer, Alexandra M. Proca, Andrew M. Saxe, Christopher Summerfield, Ali Hummos
[ABSTRACT]
Animals survive in dynamic environments changing at arbitrary timescales, but
such data distribution shifts are a challenge to neural networks. To adapt to
change, neural systems may change a large number of parameters, which is a slow
process involving forgetting past information. In contrast, animals leverage
distribution changes to segment their stream of experience into tasks and
associate them with internal task abstracts. Animals can then respond flexibly
by selecting the appropriate task abstraction. However, how such flexible task
abstractions may arise in neural systems remains unknown. Here, we analyze a
linear gated network where the weights and gates are jointly optimized via
gradient descent, but with neuron-like constraints on the gates including a
faster timescale, nonnegativity, and bounded activity. We observe that the
weights self-organize into modules specialized for tasks or sub-tasks
encountered, while the gates layer forms unique representations that switch the
appropriate weight modules (task abstractions). We analytically reduce the
learning dynamics to an effective eigenspace, revealing a virtuous cycle: fast
adapting gates drive weight specialization by protecting previous knowledge,
while weight specialization in turn increases the update rate of the gating
layer. Task switching in the gating layer accelerates as a function of
curriculum block size and task training, mirroring key findings in cognitive
neuroscience. We show that the discovered task abstractions support
generalization through both task and subtask composition, and we extend our
findings to a non-linear network switching between two tasks. Overall, our work
offers a theory of cognitive flexibility in animals as arising from joint
gradient descent on synaptic and neural gating in a neural network
architecture.
[LINK]
http://arxiv.org/abs/2411.03840v2
[DATE]
2025-01-17 00:12:29+08:00
[CATEGORIES]
cs.LG
Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML
[AUTHORS]
Tehila Dahan, Kfir Y. Levy
[ABSTRACT]
We address the challenges of Byzantine-robust training in asynchronous
distributed machine learning systems, aiming to enhance efficiency amid massive
parallelization and heterogeneous computing resources. Asynchronous systems,
marked by independently operating workers and intermittent updates, uniquely
struggle with maintaining integrity against Byzantine failures, which encompass
malicious or erroneous actions that disrupt learning. The inherent delays in
such settings not only introduce additional bias to the system but also obscure
the disruptions caused by Byzantine faults. To tackle these issues, we adapt
the Byzantine framework to asynchronous dynamics by introducing a novel
weighted robust aggregation framework. This allows for the extension of robust
aggregators and a recent meta-aggregator to their weighted versions, mitigating
the effects of delayed updates. By further incorporating a recent
variance-reduction technique, we achieve an optimal convergence rate for the
first time in an asynchronous Byzantine environment. Our methodology is
rigorously validated through empirical and theoretical analysis, demonstrating
its effectiveness in enhancing fault tolerance and optimizing performance in
asynchronous ML systems.
[LINK]
http://arxiv.org/abs/2501.09621v1
[DATE]
2025-01-17 00:00:52+08:00
[CATEGORIES]
cs.LG
ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective
[AUTHORS]
Yihong Chen, Pushkar Mishra, Luca Franceschi, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel
[ABSTRACT]
Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring
success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph
Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node
features and generalise to unseen nodes in inductive settings. Our work bridges
the gap between FMs and GNNs by proposing ReFactor GNNs. This new architecture
draws upon both modelling paradigms, which previously were largely thought of
as disjoint. Concretely, using a message-passing formalism, we show how FMs can
be cast as GNNs by reformulating the gradient descent procedure as
message-passing operations, which forms the basis of our ReFactor GNNs. Across
a multitude of well-established KGC benchmarks, our ReFactor GNNs achieve
comparable transductive performance to FMs, and state-of-the-art inductive
performance while using an order of magnitude fewer parameters.
[COMMENTS]
36th Conference on Neural Information Processing Systems (NeurIPS
2022)
[LINK]
http://arxiv.org/abs/2207.09980v4
[DATE]
2025-01-16 23:56:56+08:00
[CATEGORIES]
cs.LG
cs.CL
PolInterviews – A Dataset of German Politician Public Broadcast Interviews
[AUTHORS]
Lukas Birkenmaier, Laureen Sieber, Felix Bergstein
[ABSTRACT]
This paper presents a novel dataset of public broadcast interviews featuring
high-ranking German politicians. The interviews were sourced from YouTube,
transcribed, processed for speaker identification, and stored in a tidy and
open format. The dataset comprises 99 interviews with 33 different German
politicians across five major interview formats, containing a total of 28,146
sentences. As the first of its kind, this dataset offers valuable opportunities
for research on various aspects of political communication in the (German)
political contexts, such as agenda-setting, interviewer dynamics, or
politicians’ self-presentation.
[LINK]
http://arxiv.org/abs/2501.04484v2
[DATE]
2025-01-16 23:49:07+08:00
[CATEGORIES]
cs.CL
From Scarcity to Capability: Empowering Fake News Detection in Low-Resource Languages with LLMs
[AUTHORS]
Hrithik Majumdar Shibu, Shrestha Datta, Md. Sumon Miah, Nasrullah Sami, Mahruba Sharmin Chowdhury, Md. Saiful Islam
[ABSTRACT]
The rapid spread of fake news presents a significant global challenge,
particularly in low-resource languages like Bangla, which lack adequate
datasets and detection tools. Although manual fact-checking is accurate, it is
expensive and slow to prevent the dissemination of fake news. Addressing this
gap, we introduce BanFakeNews-2.0, a robust dataset to enhance Bangla fake news
detection. This version includes 11,700 additional, meticulously curated fake
news articles validated from credible sources, creating a proportional dataset
of 47,000 authentic and 13,000 fake news items across 13 categories. In
addition, we created a manually curated independent test set of 460 fake and
540 authentic news items for rigorous evaluation. We invest efforts in
collecting fake news from credible sources and manually verified while
preserving the linguistic richness. We develop a benchmark system utilizing
transformer-based architectures, including fine-tuned Bidirectional Encoder
Representations from Transformers variants (F1-87\%) and Large Language Models
with Quantized Low-Rank Approximation (F1-89\%), that significantly outperforms
traditional methods. BanFakeNews-2.0 offers a valuable resource to advance
research and application in fake news detection for low-resourced languages. We
publicly release our dataset and model on Github to foster research in this
direction.
[LINK]
http://arxiv.org/abs/2501.09604v1
[DATE]
2025-01-16 23:24:41+08:00
[CATEGORIES]
cs.CL
Augmenting a Large Language Model with a Combination of Text and Visual Data for Conversational Visualization of Global Geospatial Data
[AUTHORS]
Omar Mena, Alexandre Kouyoumdjian, Lonni Besançon, Michael Gleicher, Ivan Viola, Anders Ynnerman
[ABSTRACT]
We present a method for augmenting a Large Language Model (LLM) with a
combination of text and visual data to enable accurate question answering in
visualization of scientific data, making conversational visualization possible.
LLMs struggle with tasks like visual data interaction, as they lack contextual
visual information. We address this problem by merging a text description of a
visualization and dataset with snapshots of the visualization. We extract their
essential features into a structured text file, highly compact, yet descriptive
enough to appropriately augment the LLM with contextual information, without
any fine-tuning. This approach can be applied to any visualization that is
already finally rendered, as long as it is associated with some textual
description.
[LINK]
http://arxiv.org/abs/2501.09521v1
[DATE]
2025-01-16 21:16:37+08:00
[CATEGORIES]
cs.CL
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing
[AUTHORS]
Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li
[ABSTRACT]
Large Language Models (LLMs) have been widely used in code completion, and
researchers are focusing on scaling up LLMs to improve their accuracy. However,
larger LLMs have lower inference efficiency, affecting developers’ experience
and productivity. In this paper, we propose a lightweight and effective LLM for
code completion named aiXcoder-7B. Compared to existing LLMs, aiXcoder-7B
achieves higher code completion accuracy while having smaller scales (i.e., 7
billion parameters). We attribute the superiority of aiXcoder-7B to three key
factors: (1) Multi-objective training. We employ three training objectives, one
of which is our proposed Structured Fill-In-the-Middle (SFIM). SFIM considers
the syntax structures in code and effectively improves the performance of LLMs
for code. (2) Diverse data sampling strategies. They consider inter-file
relationships and enhance the capability of LLMs in understanding cross-file
contexts. (3) Extensive high-quality data. We establish a rigorous data
collection pipeline and consume a total of 1.2 trillion unique tokens for
training aiXcoder-7B. This vast volume of data enables aiXcoder-7B to learn a
broad distribution of code. We evaluate aiXcoder-7B in five popular code
completion benchmarks and a new benchmark collected by this paper. The results
show that aiXcoder-7B outperforms the latest six LLMs with similar sizes and
even surpasses four larger LLMs (e.g., StarCoder2-15B and CodeLlama-34B),
positioning aiXcoder-7B as a lightweight and effective LLM for academia and
industry. Finally, we summarize three valuable insights for helping
practitioners train the next generations of LLMs for code. aiXcoder-7B has been
open-souced and gained significant attention. Until January 2025, aiXcoder-7B
has received 2,226 GitHub Stars.
[COMMENTS]
(1) Accepted by the 47th International Conference on Software
Engineering (ICSE 2025). (2) aiXcoder-7B is available at
https://github.com/aixcoder-plugin/aiXcoder-7B
[LINK]
http://arxiv.org/abs/2410.13187v3
[DATE]
2025-01-16 20:46:53+08:00
[CATEGORIES]
cs.CL
AudioBERT: Audio Knowledge Augmented Language Model
[AUTHORS]
Hyunjong Ok, Suho Yoo, Jaeho Lee
[ABSTRACT]
Recent studies have identified that language models, pretrained on text-only
datasets, often lack elementary visual knowledge, \textit{e.g.,} colors of
everyday objects. Motivated by this observation, we ask whether a similar
shortcoming exists in terms of the \textit{auditory} knowledge. To answer this
question, we construct a new dataset called AuditoryBench, which consists of
two novel tasks for evaluating auditory knowledge. Based on our analysis using
the benchmark, we find that language models also suffer from a severe lack of
auditory knowledge. To address this limitation, we propose AudioBERT, a novel
method to augment the auditory knowledge of BERT through a retrieval-based
approach. First, we detect auditory knowledge spans in prompts to query our
retrieval model efficiently. Then, we inject audio knowledge into BERT and
switch on low-rank adaptation for effective adaptation when audio knowledge is
required. Our experiments demonstrate that AudioBERT is quite effective,
achieving superior performance on the AuditoryBench. The dataset and code are
available at \bulurl{https://github.com/HJ-Ok/AudioBERT}.
[COMMENTS]
5 pages, 3 figures, ICASSP 2025
[LINK]
http://arxiv.org/abs/2409.08199v2
[DATE]
2025-01-16 20:17:18+08:00
[CATEGORIES]
cs.CL
Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement
[AUTHORS]
Nicolas Floquet, Joseph Le Roux, Nadi Tomeh, Thierry Charnois
[ABSTRACT]
We propose a novel architecture for graph-based dependency parsing that
explicitly constructs vectors, from which both arcs and labels are scored. Our
method addresses key limitations of the standard two-pipeline approach by
unifying arc scoring and labeling into a single network, reducing scalability
issues caused by the information bottleneck and lack of parameter sharing.
Additionally, our architecture overcomes limited arc interactions with
transformer layers to efficiently simulate higher-order dependencies.
Experiments on PTB and UD show that our model outperforms state-of-the-art
parsers in both accuracy and efficiency.
[LINK]
http://arxiv.org/abs/2501.09451v1
[DATE]
2025-01-16 18:26:17+08:00
[CATEGORIES]
cs.CL
Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards
[AUTHORS]
Omar Erak, Nouf Alabbasi, Omar Alhussein, Ismail Lotfi, Amr Hussein, Sami Muhaidat, Merouane Debbah
[ABSTRACT]
Recent studies show that large language models (LLMs) struggle with technical
standards in telecommunications. We propose a fine-tuned retrieval-augmented
generation (RAG) system based on the Phi-2 small language model (SLM) to serve
as an oracle for communication networks. Our developed system leverages
forward-looking semantic chunking to adaptively determine parsing breakpoints
based on embedding similarity, enabling effective processing of diverse
document formats. To handle the challenge of multiple similar contexts in
technical standards, we employ a re-ranking algorithm to prioritize the most
relevant retrieved chunks. Recognizing the limitations of Phi-2’s small context
window, we implement a recent technique, namely SelfExtend, to expand the
context window during inference, which not only boosts the performance but also
can accommodate a wider range of user queries and design requirements from
customers to specialized technicians. For fine-tuning, we utilize the low-rank
adaptation (LoRA) technique to enhance computational efficiency during training
and enable effective fine-tuning on small datasets. Our comprehensive
experiments demonstrate substantial improvements over existing
question-answering approaches in the telecom domain, achieving performance that
exceeds larger language models such as GPT-4 (which is about 880 times larger
in size). This work presents a novel approach to leveraging SLMs for
communication networks, offering a balance of efficiency and performance. This
work can serve as a foundation towards agentic language models for networks.
[COMMENTS]
submitted to Proc. IEEE Globecom
[LINK]
http://arxiv.org/abs/2408.11775v2
[DATE]
2025-01-16 18:20:03+08:00
[CATEGORIES]
cs.CL
Solving the unsolvable: Translating case law in Hong Kong
[AUTHORS]
King-kui Sin, Xi Xuan, Chunyu Kit, Clara Ho-yan Chan, Honic Ho-kin Ip
[ABSTRACT]
This paper addresses the challenges translating case law under Hong Kong’s
bilingual legal system. It highlights the initial success of translating all
written statutes into Chinese before the 1997 handover, a task mandated by the
Basic Law. The effort involved significant collaboration among legal,
linguistic, and translation experts, resulting in a comprehensive and
culturally appropriate bilingual legal system. However, translating case law
remains a significant challenge due to the sheer volume and continuous growth
of judicial decisions. The paper critiques the governments and judiciarys
sporadic and uncoordinated efforts to translate case law, contrasting it with
the thorough approach previously taken for statute translation. Although the
government acknowledges the importance of legal bilingualism, it lacks a
sustainable strategy for translating case law. The Judiciarys position that
translating all judgments is unnecessary, unrealistic, and not cost-effectiveis
analyzed and critiqued for its impact on legal transparency and public trust. A
proposed solution involves leveraging machine translation technology through a
human-machine interactive translation platform, which undergoes two major
transitions. Initially based on a neural model, the platform transitions to
using a large language model for improved translation accuracy. Furthermore, it
evolves from a single-agent system to a multi-agent system, incorporating
Translator, Annotator, and Proofreader agents. This multi-agent approach,
supported by a grant, aims to facilitate efficient, high-quality translation of
judicial judgments by integrating advanced artificial intelligence and
continuous feedback mechanisms, thus better meeting the needs of a bilingual
legal system.
[LINK]
http://arxiv.org/abs/2501.09444v1
[DATE]
2025-01-16 18:17:58+08:00
[CATEGORIES]
cs.CL
cs.LG
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
[AUTHORS]
Robert Friel, Masha Belyi, Atindriyo Sanyal
[ABSTRACT]
Retrieval-Augmented Generation (RAG) has become a standard architectural
pattern for incorporating domain-specific knowledge into user-facing chat
applications powered by Large Language Models (LLMs). RAG systems are
characterized by (1) a document retriever that queries a domain-specific corpus
for context information relevant to an input query, and (2) an LLM that
generates a response based on the provided query and context. However,
comprehensive evaluation of RAG systems remains a challenge due to the lack of
unified evaluation criteria and annotated datasets. In response, we introduce
RAGBench: the first comprehensive, large-scale RAG benchmark dataset of 100k
examples. It covers five unique industry-specific domains and various RAG task
types. RAGBench examples are sourced from industry corpora such as user
manuals, making it particularly relevant for industry applications. Further, we
formalize the TRACe evaluation framework: a set of explainable and actionable
RAG evaluation metrics applicable across all RAG domains. We release the
labeled dataset at https://huggingface.co/datasets/rungalileo/ragbench.
RAGBench explainable labels facilitate holistic evaluation of RAG systems,
enabling actionable feedback for continuous improvement of production
applications. Thorough extensive benchmarking, we find that LLM-based RAG
evaluation methods struggle to compete with a finetuned RoBERTa model on the
RAG evaluation task. We identify areas where existing approaches fall short and
propose the adoption of RAGBench with TRACe towards advancing the state of RAG
evaluation systems.
[LINK]
http://arxiv.org/abs/2407.11005v2
[DATE]
2025-01-16 18:05:17+08:00
[CATEGORIES]
cs.CL
AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling
[AUTHORS]
Ancheng Xu, Di Yang, Renhao Li, Jingwei Zhu, Minghuan Tan, Min Yang, Wanxin Qiu, Mingchen Ma, Haihong Wu, Bingyu Li, Feng Sha, Chengming Li, Xiping Hu, Qiang Qu, Derek F. Wong, Ruifeng Xu
[ABSTRACT]
Traditional in-person psychological counseling remains primarily niche, often
chosen by individuals with psychological issues, while online automated
counseling offers a potential solution for those hesitant to seek help due to
feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and
widely used approach in psychological counseling. The advent of large language
models (LLMs) and agent technology enables automatic CBT diagnosis and
treatment. However, current LLM-based CBT systems use agents with a fixed
structure, limiting their self-optimization capabilities, or providing hollow,
unhelpful suggestions due to redundant response patterns. In this work, we
utilize Quora-like and YiXinLi single-round consultation models to build a
general agent framework that generates high-quality responses for single-turn
psychological consultation scenarios. We use a bilingual dataset to evaluate
the quality of single-response consultations generated by each framework. Then,
we incorporate dynamic routing and supervisory mechanisms inspired by real
psychological counseling to construct a CBT-oriented autonomous multi-agent
framework, demonstrating its general applicability. Experimental results
indicate that AutoCBT can provide higher-quality automated psychological
counseling services.
[LINK]
http://arxiv.org/abs/2501.09426v1
[DATE]
2025-01-16 17:57:12+08:00
[CATEGORIES]
cs.CL
Vision-Language Models Do Not Understand Negation
[AUTHORS]
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip Torr, Yoon Kim, Marzyeh Ghassemi
[COMMENTS]
Project page: https://negbench.github.io
[LINK]
http://arxiv.org/abs/2501.09425v1
[DATE]
2025-01-16 17:55:42+08:00
[CATEGORIES]
cs.CL
mGeNTE: A Multilingual Resource for Gender-Neutral Language and Translation
[AUTHORS]
Beatrice Savoldi, Eleonora Cupin, Manjinder Thind, Anne Lauscher, Luisa Bentivogli
[ABSTRACT]
Gender-neutral language reflects societal and linguistic shifts towards
greater inclusivity by avoiding the implication that one gender is the norm
over others. This is particularly relevant for grammatical gender languages,
which heavily encode the gender of terms for human referents and over-relies on
masculine forms, even when gender is unspecified or irrelevant. Language
technologies are known to mirror these inequalities, being affected by a male
bias and perpetuating stereotypical associations when translating into
languages with extensive gendered morphology. In such cases, gender-neutral
language can help avoid undue binary assumptions. However, despite its
importance for creating fairer multi- and cross-lingual technologies, inclusive
language research remains scarce and insufficiently supported in current
resources. To address this gap, we present the multilingual mGeNTe dataset.
Derived from the bilingual GeNTE (Piergentili et al., 2023), mGeNTE extends the
original corpus to include the English-Italian/German/Spanish language pairs.
Since each language pair is English-aligned with gendered and neutral sentences
in the target languages, mGeNTE enables research in both automatic
Gender-Neutral Translation (GNT) and language modelling for three grammatical
gender languages.
[LINK]
http://arxiv.org/abs/2501.09409v1
[DATE]
2025-01-16 17:35:15+08:00
[CATEGORIES]
cs.CL
Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval
[AUTHORS]
Jesus Lovon, Martin Mouysset, Jo Oleiwan, Jose G. Moreno, Christine Damase-Michel, Lynda Tamine
[ABSTRACT]
Electronic Health Record (EHR) tables pose unique challenges among which is
the presence of hidden contextual dependencies between medical features with a
high level of data dimensionality and sparsity. This study presents the first
investigation into the abilities of LLMs to comprehend EHRs for patient data
extraction and retrieval. We conduct extensive experiments using the MIMICSQL
dataset to explore the impact of the prompt structure, instruction, context,
and demonstration, of two backbone LLMs, Llama2 and Meditron, based on task
performance. Through quantitative and qualitative analyses, our findings show
that optimal feature selection and serialization methods can enhance task
performance by up to 26.79% compared to naive approaches. Similarly, in-context
learning setups with relevant example selection improve data extraction
performance by 5.95%. Based on our study findings, we propose guidelines that
we believe would help the design of LLM-based models to support health search.
[COMMENTS]
To be published as full paper in the Proceedings of the European
Conference on Information Retrieval (ECIR) 2025. Preprint
[LINK]
http://arxiv.org/abs/2501.09384v1
[DATE]
2025-01-16 16:52:50+08:00
[CATEGORIES]
cs.CL
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
[AUTHORS]
Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu
[ABSTRACT]
Speech encompasses a wealth of information, including but not limited to
content, paralinguistic, and environmental information. This comprehensive
nature of speech significantly impacts communication and is crucial for
human-computer interaction. Chat-Oriented Large Language Models (LLMs), known
for their general-purpose assistance capabilities, have evolved to handle
multi-modal inputs, including speech. Although these models can be adept at
recognizing and analyzing speech, they often fall short of generating
appropriate responses. We argue that this is due to the lack of principles on
task definition and model development, which requires open-source datasets and
metrics suitable for model evaluation. To bridge the gap, we present SD-Eval, a
benchmark dataset aimed at multidimensional evaluation of spoken dialogue
understanding and generation. SD-Eval focuses on paralinguistic and
environmental information and includes 7,303 utterances, amounting to 8.76
hours of speech data. The data is aggregated from eight public datasets,
representing four perspectives: emotion, accent, age, and background sound. To
assess the SD-Eval benchmark dataset, we implement three different models and
construct a training set following a process similar to that of SD-Eval. The
training set contains 1,052.72 hours of speech data and 724.4k utterances. We
also conduct a comprehensive evaluation using objective evaluation methods
(e.g. BLEU and ROUGE), subjective evaluations and LLM-based metrics for the
generated responses. Models conditioned with paralinguistic and environmental
information outperform their counterparts in both objective and subjective
measures. Moreover, experiments demonstrate that LLM-based metrics show a
higher correlation with human evaluation compared to traditional metrics. We
open-source SD-Eval at https://github.com/amphionspace/SD-Eval.
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.13340v2
[DATE]
2025-01-16 16:34:36+08:00
[CATEGORIES]
cs.CL
ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset
[AUTHORS]
Fen Wang, Bomiao Wang, Xueli Shu, Zhen Liu, Zekai Shao, Chao Liu, Siming Chen
[ABSTRACT]
Effective chart summary can significantly reduce the time and effort decision
makers spend interpreting charts, enabling precise and efficient communication
of data insights. Previous studies have faced challenges in generating accurate
and semantically rich summaries of time-series data charts. In this paper, we
identify summary elements and common hallucination types in the generation of
time-series chart summaries, which serve as our guidelines for automatic
generation. We introduce ChartInsighter, which automatically generates chart
summaries of time-series data, effectively reducing hallucinations in chart
summary generation. Specifically, we assign multiple agents to generate the
initial chart summary and collaborate iteratively, during which they invoke
external data analysis modules to extract insights and compile them into a
coherent summary. Additionally, we implement a self-consistency test method to
validate and correct our summary. We create a high-quality benchmark of charts
and summaries, with hallucination types annotated on a sentence-by-sentence
basis, facilitating the evaluation of the effectiveness of reducing
hallucinations. Our evaluations using our benchmark show that our method
surpasses state-of-the-art models, and that our summary hallucination rate is
the lowest, which effectively reduces various hallucinations and improves
summary quality. The benchmark is available at
https://github.com/wangfen01/ChartInsighter.
[LINK]
http://arxiv.org/abs/2501.09349v1
[DATE]
2025-01-16 16:03:32+08:00
[CATEGORIES]
cs.CL
Discriminative Representation learning via Attention-Enhanced Contrastive Learning for Short Text Clustering
[AUTHORS]
Zhihao Yao
[ABSTRACT]
Contrastive learning has gained significant attention in short text
clustering, yet it has an inherent drawback of mistakenly identifying samples
from the same category as negatives and then separating them in the feature
space (false negative separation), which hinders the generation of superior
representations. To generate more discriminative representations for efficient
clustering, we propose a novel short text clustering method, called
Discriminative Representation learning via \textbf{A}ttention-\textbf{E}nhanced
\textbf{C}ontrastive \textbf{L}earning for Short Text Clustering
(\textbf{AECL}). The \textbf{AECL} consists of two modules which are the
pseudo-label generation module and the contrastive learning module. Both
modules build a sample-level attention mechanism to capture similarity
relationships between samples and aggregate cross-sample features to generate
consistent representations. Then, the former module uses the more
discriminative consistent representation to produce reliable supervision
information for assist clustering, while the latter module explores similarity
relationships and consistent representations optimize the construction of
positive samples to perform similarity-guided contrastive learning, effectively
addressing the false negative separation issue. Experimental results
demonstrate that the proposed \textbf{AECL} outperforms state-of-the-art
methods. If the paper is accepted, we will open-source the code.
[LINK]
http://arxiv.org/abs/2501.03584v2
[DATE]
2025-01-16 15:56:42+08:00
[CATEGORIES]
cs.LG
cs.CL
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
[AUTHORS]
Yijia Shao, Vinay Samuel, Yucheng Jiang, John Yang, Diyi Yang
[ABSTRACT]
Recent advancements in language models (LMs) have sparked growing interest in
developing LM agents. While fully autonomous agents could excel in many
scenarios, numerous use cases inherently require them to collaborate with
humans due to humans’ latent preferences, domain expertise, or need for
control. To facilitate the study of human-agent collaboration, we present
Collaborative Gym (Co-Gym), a general framework enabling asynchronous,
tripartite interaction among agents, humans, and task environments. We
instantiate Co-Gym with three representative tasks in both simulated and
real-world conditions, and propose an evaluation framework that assesses both
the collaboration outcomes and processes. Our findings reveal that
collaborative agents consistently outperform their fully autonomous
counterparts in task performance within those delivered cases, achieving win
rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related
Work when evaluated by real users. However, our study also highlights
significant challenges in developing collaborative agents, requiring
advancements in core aspects of intelligence – communication capabilities,
situational awareness, and balancing autonomy and human control.
[COMMENTS]
Preprint. Work in progress
[LINK]
http://arxiv.org/abs/2412.15701v2
[DATE]
2025-01-16 15:01:37+08:00
[CATEGORIES]
cs.CL
Algorithm for Semantic Network Generation from Texts of Low Resource Languages Such as Kiswahili
[AUTHORS]
Barack Wamkaya Wanjawa, Lawrence Muchemi, Evans Miriti
[ABSTRACT]
Processing low-resource languages, such as Kiswahili, using machine learning
is difficult due to lack of adequate training data. However, such low-resource
languages are still important for human communication and are already in daily
use and users need practical machine processing tasks such as summarization,
disambiguation and even question answering (QA). One method of processing such
languages, while bypassing the need for training data, is the use semantic
networks. Some low resource languages, such as Kiswahili, are of the
subject-verb-object (SVO) structure, and similarly semantic networks are a
triple of subject-predicate-object, hence SVO parts of speech tags can map into
a semantic network triple. An algorithm to process raw natural language text
and map it into a semantic network is therefore necessary and desirable in
structuring low resource languages texts. This algorithm tested on the
Kiswahili QA task with upto 78.6% exact match.
[COMMENTS]
18 pages, 3 figures, published in Open Journal for Information
Technology
[LINK]
http://arxiv.org/abs/2501.09326v1
[DATE]
2025-01-16 14:51:32+08:00
[CATEGORIES]
cs.CL
MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish
[AUTHORS]
Xin Huang, Tarun Kumar Vangani, Minh Duc Pham, Xunlong Zou, Bin Wang, Zhengyuan Liu, Ai Ti Aw
[ABSTRACT]
Multilingual large language models (MLLMs) have shown impressive capabilities
across a variety of languages. However, efficacy can differ greatly between
different language families, especially for those with limited linguistic
resources. This report presents MERaLiON-TextLLM, a series of open-source
language models specifically tailored to improve understanding and generation
in Chinese, Indonesian, Malay, and Singlish. The initial released model is
built on Llama-3-8B-Base and refined through a meticulously crafted process of
continued pre-training and weight merging. Our approach achieves performance
improvements across benchmarks in these languages, exceeding the capabilities
of the official Llama-3 models. We provide the model checkpoints as a resource
to support further research and development in cross-lingual language
understanding.
[LINK]
http://arxiv.org/abs/2501.08335v2
[DATE]
2025-01-16 14:16:43+08:00
[CATEGORIES]
cs.CL
Shape-Based Single Object Classification Using Ensemble Method Classifiers
[AUTHORS]
Nur Shazwani Kamarudin, Mokhairi Makhtar, Syadiah Nor Wan Shamsuddin, Syed Abdullah Fadzli
[ABSTRACT]
Nowadays, more and more images are available. Annotation and retrieval of the
images pose classification problems, where each class is defined as the group
of database images labelled with a common semantic label. Various systems have
been proposed for content-based retrieval, as well as for image classification
and indexing. In this paper, a hierarchical classification framework has been
proposed for bridging the semantic gap effectively and achieving multi-category
image classification. A well known pre-processing and post-processing method
was used and applied to three problems; image segmentation, object
identification and image classification. The method was applied to classify
single object images from Amazon and Google datasets. The classification was
tested for four different classifiers; BayesNetwork (BN), Random Forest (RF),
Bagging and Vote. The estimated classification accuracies ranged from 20% to
99% (using 10-fold cross validation). The Bagging classifier presents the best
performance, followed by the Random Forest classifier.
[LINK]
http://arxiv.org/abs/2501.09311v1
[DATE]
2025-01-16 13:58:32+08:00
[CATEGORIES]
cs.CL
A Study of In-Context-Learning-Based Text-to-SQL Errors
[AUTHORS]
Jiawei Shen, Chengcheng Wan, Ruoyi Qiao, Jiazhen Zou, Hang Xu, Yuchen Shao, Yueling Zhang, Weikai Miao, Geguang Pu
[ABSTRACT]
Large language models (LLMs) have been adopted to perform text-to-SQL tasks,
utilizing their in-context learning (ICL) capability to translate natural
language questions into structured query language (SQL). However, such a
technique faces correctness problems and requires efficient repairing
solutions. In this paper, we conduct the first comprehensive study of
text-to-SQL errors. Our study covers four representative ICL-based techniques,
five basic repairing methods, two benchmarks, and two LLM settings. We find
that text-to-SQL errors are widespread and summarize 29 error types of 7
categories. We also find that existing repairing attempts have limited
correctness improvement at the cost of high computational overhead with many
mis-repairs. Based on the findings, we propose MapleRepair, a novel text-to-SQL
error detection and repairing framework. The evaluation demonstrates that
MapleRepair outperforms existing solutions by repairing 13.8% more queries with
neglectable mis-repairs and 67.4% less overhead.
[LINK]
http://arxiv.org/abs/2501.09310v1
[DATE]
2025-01-16 13:54:59+08:00
[CATEGORIES]
cs.CL
To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation
[AUTHORS]
Kaustubh D. Dhole
[ABSTRACT]
Retrieval-Augmented Generation equips large language models with the
capability to retrieve external knowledge, thereby mitigating hallucinations by
incorporating information beyond the model’s intrinsic abilities. However, most
prior works have focused on invoking retrieval deterministically, which makes
it unsuitable for tasks such as long-form question answering. Instead,
dynamically performing retrieval by invoking it only when the underlying LLM
lacks the required knowledge can be more efficient. In this context, we delve
deeper into the question, “To Retrieve or Not to Retrieve?” by exploring
multiple uncertainty detection methods. We evaluate these methods for the task
of long-form question answering, employing dynamic retrieval, and present our
comparisons. Our findings suggest that uncertainty detection metrics, such as
Degree Matrix Jaccard and Eccentricity, can reduce the number of retrieval
calls by almost half, with only a slight reduction in question-answering
accuracy.
[LINK]
http://arxiv.org/abs/2501.09292v1
[DATE]
2025-01-16 12:56:33+08:00
[CATEGORIES]
cs.CL
CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics
[AUTHORS]
Kai Yin, Chengkai Liu, Ali Mostafavi, Xia Hu
[ABSTRACT]
In the field of crisis/disaster informatics, social media is increasingly
being used for improving situational awareness to inform response and relief
efforts. Efficient and accurate text classification tools have been a focal
area of investigation in crisis informatics. However, current methods mostly
rely on single-label text classification models, which fails to capture
different insights embedded in dynamic and multifaceted disaster-related social
media data. This study introduces a novel approach to disaster text
classification by enhancing a pre-trained Large Language Model (LLM) through
instruction fine-tuning targeted for multi-label classification of
disaster-related tweets. Our methodology involves creating a comprehensive
instruction dataset from disaster-related tweets, which is then used to
fine-tune an open-source LLM, thereby embedding it with disaster-specific
knowledge. This fine-tuned model can classify multiple aspects of
disaster-related information simultaneously, such as the type of event,
informativeness, and involvement of human aid, significantly improving the
utility of social media data for situational awareness in disasters. The
results demonstrate that this approach enhances the categorization of critical
information from social media posts, thereby facilitating a more effective
deployment for situational awareness during emergencies. This research paves
the way for more advanced, adaptable, and robust disaster management tools,
leveraging the capabilities of LLMs to improve real-time situational awareness
and response strategies in disaster scenarios.
[COMMENTS]
Relevant source code and data is available:
https://github.com/KaiYin97/CrsisLLM
[LINK]
http://arxiv.org/abs/2406.15477v2
[DATE]
2025-01-16 11:26:36+08:00
[CATEGORIES]
cs.CL
A General Framework for Inference-time Scaling and Steering of Diffusion Models
[AUTHORS]
Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath
[ABSTRACT]
Diffusion models produce impressive results in modalities ranging from images
and video to protein design and text. However, generating samples with
user-specified properties remains a challenge. Recent research proposes
fine-tuning models to maximize rewards that capture desired properties, but
these methods require expensive training and are prone to mode collapse. In
this work, we propose Feynman Kac (FK) steering, an inference-time framework
for steering diffusion models with reward functions. FK steering works by
sampling a system of multiple interacting diffusion processes, called
particles, and resampling particles at intermediate steps based on scores
computed using functions called potentials. Potentials are defined using
rewards for intermediate states and are selected such that a high value
indicates that the particle will yield a high-reward sample. We explore various
choices of potentials, intermediate rewards, and samplers. We evaluate FK
steering on text-to-image and text diffusion models. For steering text-to-image
models with a human preference reward, we find that FK steering a 0.8B
parameter model outperforms a 2.6B parameter fine-tuned model on prompt
fidelity, with faster sampling and no training. For steering text diffusion
models with rewards for text quality and specific text attributes, we find that
FK steering generates lower perplexity, more linguistically acceptable outputs
and enables gradient-free control of attributes like toxicity. Our results
demonstrate that inference-time scaling and steering of diffusion models, even
with off-the-shelf rewards, can provide significant sample quality gains and
controllability benefits. Code is available at
https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .
[LINK]
http://arxiv.org/abs/2501.06848v3
[DATE]
2025-01-16 11:18:14+08:00
[CATEGORIES]
cs.LG
cs.CL
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
[AUTHORS]
Jinlong He, Pengfei Li, Gang Liu, Genrong He, Zhaolin Chen, Shenjun Zhong
[ABSTRACT]
Multimodal large language models (MLLMs) represent an evolutionary expansion
in the capabilities of traditional large language models, enabling them to
tackle challenges that surpass the scope of purely text-based applications. It
leverages the knowledge previously encoded within these language models,
thereby enhancing their applicability and functionality in the reign of
multimodal contexts. Recent works investigate the adaptation of MLLMs as a
universal solution to address medical multi-modal problems as a generative
task. In this paper, we propose a parameter efficient framework for fine-tuning
MLLMs, specifically validated on medical visual question answering (Med-VQA)
and medical report generation (MRG) tasks, using public benchmark datasets. We
also introduce an evaluation metric using the 5-point Likert scale and its
weighted average value to measure the quality of the generated reports for MRG
tasks, where the scale ratings are labelled by both humans manually and the
GPT-4 model. We further assess the consistency of performance metrics across
traditional measures, GPT-4, and human ratings for both VQA and MRG tasks. The
results indicate that semantic similarity assessments using GPT-4 align closely
with human annotators and provide greater stability, yet they reveal a
discrepancy when compared to conventional lexical similarity measurements. This
questions the reliability of lexical similarity metrics for evaluating the
performance of generative models in Med-VQA and report generation tasks.
Besides, our fine-tuned model significantly outperforms GPT-4v. This indicates
that without additional fine-tuning, multi-modal models like GPT-4v do not
perform effectively on medical imaging tasks. The code will be available here:
https://github.com/jinlHe/PeFoMed.
[COMMENTS]
12 pages, 8 figures, 12 tables
[LINK]
http://arxiv.org/abs/2401.02797v3
[DATE]
2025-01-16 10:31:20+08:00
[CATEGORIES]
cs.CL
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
[AUTHORS]
Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh
[COMMENTS]
Accepted to NeurIPS 2024 Datasets & Benchmark Track
[LINK]
http://arxiv.org/abs/2406.09948v2
[DATE]
2025-01-16 09:41:48+08:00
[CATEGORIES]
cs.CL
Foundations of Large Language Models
[AUTHORS]
Tong Xiao, Jingbo Zhu
[ABSTRACT]
This is a book about large language models. As indicated by the title, it
primarily focuses on foundational concepts rather than comprehensive coverage
of all cutting-edge technologies. The book is structured into four main
chapters, each exploring a key area: pre-training, generative models, prompting
techniques, and alignment methods. It is intended for college students,
professionals, and practitioners in natural language processing and related
fields, and can serve as a reference for anyone interested in large language
models.
[LINK]
http://arxiv.org/abs/2501.09223v1
[DATE]
2025-01-16 09:03:56+08:00
[CATEGORIES]
cs.CL
cs.LG
FineMedLM-o1: Enhancing the Medical Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training
[AUTHORS]
Hongzhou Yu, Tianhao Cheng, Ying Cheng, Rui Feng
[ABSTRACT]
Recent advancements in large language models (LLMs) have shown promise in
medical applications such as disease diagnosis and treatment planning. However,
most existing medical LLMs struggle with the advanced reasoning required for
complex clinical scenarios, such as differential diagnosis or personalized
treatment suggestions. We proposed FineMedLM-o1, which leverages high-quality
synthetic medical data and long-form reasoning data for Supervised Fine-Tuning
(SFT) and Direct Preference Optimization (DPO), enabling advanced dialogue and
deep reasoning capabilities. Additionally, we introduced Test-Time Training
(TTT) in the medical domain for the first time, facilitating domain adaptation
and ensuring reliable, accurate reasoning. Experimental results demonstrate
that FineMedLM-o1 achieves a 23% average performance improvement over prior
models on key medical benchmarks. Furthermore, the introduction of TTT provides
an additional 14% performance boost, highlighting its effectiveness in
enhancing medical reasoning capabilities. To support this process, we also
proposed a novel method for synthesizing medical dialogue. Compared to other
open-source datasets, our dataset stands out as superior in both quality and
complexity. The project and data will be released on GitHub.
[LINK]
http://arxiv.org/abs/2501.09213v1
[DATE]
2025-01-16 08:19:19+08:00
[CATEGORIES]
cs.CL
Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets
[AUTHORS]
Bryan E. Tuck, Rakesh M. Verma
[ABSTRACT]
The rapid development of large language models (LLMs) has significantly
improved the generation of fluent and convincing text, raising concerns about
their potential misuse on social media platforms. We present a comprehensive
methodology for creating nine Twitter datasets to examine the generative
capabilities of four prominent LLMs: Llama 3, Mistral, Qwen2, and GPT4o. These
datasets encompass four censored and five uncensored model configurations,
including 7B and 8B parameter base-instruction models of the three open-source
LLMs. Additionally, we perform a data quality analysis to assess the
characteristics of textual outputs from human, “censored,” and “uncensored”
models, employing semantic meaning, lexical richness, structural patterns,
content characteristics, and detector performance metrics to identify
differences and similarities. Our evaluation demonstrates that “uncensored”
models significantly undermine the effectiveness of automated detection
methods. This study addresses a critical gap by exploring smaller open-source
models and the ramifications of “uncensoring,” providing valuable insights into
how domain adaptation and content moderation strategies influence both the
detectability and structural characteristics of machine-generated text.
[LINK]
http://arxiv.org/abs/2406.17967v3
[DATE]
2025-01-16 06:20:15+08:00
[CATEGORIES]
cs.CL
Evaluating GenAI for Simplifying Texts for Education: Improving Accuracy and Consistency for Enhanced Readability
[AUTHORS]
Stephanie L. Day, Jacapo Cirica, Steven R. Clapp, Veronika Penkova, Amy E. Giroux, Abbey Banta, Catherine Bordeau, Poojitha Mutteneni, Ben D. Sawyer
[ABSTRACT]
Generative artificial intelligence (GenAI) holds great promise as a tool to
support personalized learning. Teachers need tools to efficiently and
effectively enhance content readability of educational texts so that they are
matched to individual students reading levels, while retaining key details.
Large Language Models (LLMs) show potential to fill this need, but previous
research notes multiple shortcomings in current approaches. In this study, we
introduced a generalized approach and metrics for the systematic evaluation of
the accuracy and consistency in which LLMs, prompting techniques, and a novel
multi-agent architecture to simplify sixty informational reading passages,
reducing each from the twelfth grade level down to the eighth, sixth, and
fourth grade levels. We calculated the degree to which each LLM and prompting
technique accurately achieved the targeted grade level for each passage,
percentage change in word count, and consistency in maintaining keywords and
key phrases (semantic similarity). One-sample t-tests and multiple regression
models revealed significant differences in the best performing LLM and prompt
technique for each of the four metrics. Both LLMs and prompting techniques
demonstrated variable utility in grade level accuracy and consistency of
keywords and key phrases when attempting to level content down to the fourth
grade reading level. These results demonstrate the promise of the application
of LLMs for efficient and precise automated text simplification, the
shortcomings of current models and prompting methods in attaining an ideal
balance across various evaluation criteria, and a generalizable method to
evaluate future systems.
[COMMENTS]
64 pages, 9 tables, 6 figures, and supplemental materials
[LINK]
http://arxiv.org/abs/2501.09158v1
[DATE]
2025-01-16 05:19:01+08:00
[CATEGORIES]
cs.CL
PASS: Presentation Automation for Slide Generation and Speech
[AUTHORS]
Tushar Aggarwal, Aarohi Bhand
[ABSTRACT]
In today’s fast-paced world, effective presentations have become an essential
tool for communication in both online and offline meetings. The crafting of a
compelling presentation requires significant time and effort, from gathering
key insights to designing slides that convey information clearly and concisely.
However, despite the wealth of resources available, people often find
themselves manually extracting crucial points, analyzing data, and organizing
content in a way that ensures clarity and impact. Furthermore, a successful
presentation goes beyond just the slides; it demands rehearsal and the ability
to weave a captivating narrative to fully engage the audience. Although there
has been some exploration of automating document-to-slide generation, existing
research is largely centered on converting research papers. In addition,
automation of the delivery of these presentations has yet to be addressed. We
introduce PASS, a pipeline used to generate slides from general Word documents,
going beyond just research papers, which also automates the oral delivery of
the generated slides. PASS analyzes user documents to create a dynamic,
engaging presentation with an AI-generated voice. Additionally, we developed an
LLM-based evaluation metric to assess our pipeline across three critical
dimensions of presentations: relevance, coherence, and redundancy. The data and
codes are available at https://github.com/AggarwalTushar/PASS.
[LINK]
http://arxiv.org/abs/2501.06497v2
[DATE]
2025-01-16 04:43:44+08:00
[CATEGORIES]
cs.CL
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
[AUTHORS]
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei
[ABSTRACT]
Large Language Models (LLMs) have revolutionized artificial intelligence (AI)
by enabling human like text generation and natural language understanding.
However, their reliance on static training data limits their ability to respond
to dynamic, real time queries, resulting in outdated or inaccurate outputs.
Retrieval Augmented Generation (RAG) has emerged as a solution, enhancing LLMs
by integrating real time data retrieval to provide contextually relevant and
up-to-date responses. Despite its promise, traditional RAG systems are
constrained by static workflows and lack the adaptability required for
multistep reasoning and complex task management.
Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these
limitations by embedding autonomous AI agents into the RAG pipeline. These
agents leverage agentic design patterns reflection, planning, tool use, and
multiagent collaboration to dynamically manage retrieval strategies,
iteratively refine contextual understanding, and adapt workflows to meet
complex task requirements. This integration enables Agentic RAG systems to
deliver unparalleled flexibility, scalability, and context awareness across
diverse applications.
This survey provides a comprehensive exploration of Agentic RAG, beginning
with its foundational principles and the evolution of RAG paradigms. It
presents a detailed taxonomy of Agentic RAG architectures, highlights key
applications in industries such as healthcare, finance, and education, and
examines practical implementation strategies. Additionally, it addresses
challenges in scaling these systems, ensuring ethical decision making, and
optimizing performance for real-world applications, while providing detailed
insights into frameworks and tools for implementing Agentic RAG
[LINK]
http://arxiv.org/abs/2501.09136v1
[DATE]
2025-01-16 04:40:25+08:00
[CATEGORIES]
cs.CL
Augmenting Human-Annotated Training Data with Large Language Model Generation and Distillation in Open-Response Assessment
[AUTHORS]
Conrad Borchers, Danielle R. Thomas, Jionghao Lin, Ralph Abboud, Kenneth R. Koedinger
[ABSTRACT]
Large Language Models (LLMs) like GPT-4o can help automate text
classification tasks at low cost and scale. However, there are major concerns
about the validity and reliability of LLM outputs. By contrast, human coding is
generally more reliable but expensive to procure at scale. In this study, we
propose a hybrid solution to leverage the strengths of both. We combine
human-coded data and synthetic LLM-produced data to fine-tune a classical
machine learning classifier, distilling both into a smaller BERT model. We
evaluate our method on a human-coded test set as a validity measure for LLM
output quality. In three experiments, we systematically vary LLM-generated
samples’ size, variety, and consistency, informed by best practices in LLM
tuning. Our findings indicate that augmenting datasets with synthetic samples
improves classifier performance, with optimal results achieved at an 80%
synthetic to 20% human-coded data ratio. Lower temperature settings of 0.3,
corresponding to less variability in LLM generations, produced more stable
improvements but also limited model learning from augmented samples. In
contrast, higher temperature settings (0.7 and above) introduced greater
variability in performance estimates and, at times, lower performance. Hence,
LLMs may produce more uniform output that classifiers overfit to earlier or
produce more diverse output that runs the risk of deteriorating model
performance through information irrelevant to the prediction task. Filtering
out inconsistent synthetic samples did not enhance performance. We conclude
that integrating human and LLM-generated data to improve text classification
models in assessment offers a scalable solution that leverages both the
accuracy of human coding and the variety of LLM outputs.
[COMMENTS]
Manuscript accepted to the Second Workshop on Generative AI for
Learning Analytics (GenAI-LA) at LAK25
[LINK]
http://arxiv.org/abs/2501.09126v1
[DATE]
2025-01-16 04:13:46+08:00
[CATEGORIES]
cs.CL
cs.LG
SteLLA: A Structured Grading System Using LLMs with RAG
[AUTHORS]
Hefei Qiu, Brian White, Ashley Ding, Reinaldo Costa, Ali Hachem, Wei Ding, Ping Chen
[ABSTRACT]
Large Language Models (LLMs) have shown strong general capabilities in many
applications. However, how to make them reliable tools for some specific tasks
such as automated short answer grading (ASAG) remains a challenge. We present
SteLLA (Structured Grading System Using LLMs with RAG) in which a) Retrieval
Augmented Generation (RAG) approach is used to empower LLMs specifically on the
ASAG task by extracting structured information from the highly relevant and
reliable external knowledge based on the instructor-provided reference answer
and rubric, b) an LLM performs a structured and question-answering-based
evaluation of student answers to provide analytical grades and feedback. A
real-world dataset that contains students’ answers in an exam was collected
from a college-level Biology course. Experiments show that our proposed system
can achieve substantial agreement with the human grader while providing
break-down grades and feedback on all the knowledge points examined in the
problem. A qualitative and error analysis of the feedback generated by GPT4
shows that GPT4 is good at capturing facts while may be prone to inferring too
much implication from the given text in the grading task which provides
insights into the usage of LLMs in the ASAG system.
[LINK]
http://arxiv.org/abs/2501.09092v1
[DATE]
2025-01-16 03:24:48+08:00
[CATEGORIES]
cs.CL
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
[AUTHORS]
Ruixiang Jiang, Changwen Chen
[ABSTRACT]
We present the first study on how Multimodal LLMs’ (MLLMs) reasoning ability
shall be elicited to evaluate the aesthetics of artworks. To facilitate this
investigation, we construct MM-StyleBench, a novel high-quality dataset for
benchmarking artistic stylization. We then develop a principled method for
human preference modeling and perform a systematic correlation analysis between
MLLMs’ responses and human preference. Our experiments reveal an inherent
hallucination issue of MLLMs in art evaluation, associated with response
subjectivity. ArtCoT is proposed, demonstrating that art-specific task
decomposition and the use of concrete language boost MLLMs’ reasoning ability
for aesthetics. Our findings offer valuable insights into MLLMs for art and can
benefit a wide range of downstream applications, such as style transfer and
artistic image generation. Code available at
https://github.com/songrise/MLLM4Art.
[COMMENTS]
WIP, Homepage https://github.com/songrise/MLLM4Art
[LINK]
http://arxiv.org/abs/2501.09012v1
[DATE]
2025-01-16 02:56:22+08:00
[CATEGORIES]
cs.CL
Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases
[AUTHORS]
Mercy Asiedu, Nenad Tomasev, Chintan Ghate, Tiya Tiyasirichokchai, Awa Dieng, Oluwatosin Akande, Geoffrey Siwo, Steve Adudans, Sylvanus Aitkins, Odianosen Ehiakhamen, Eric Ndombi, Katherine Heller
[ABSTRACT]
While large language models (LLMs) have shown promise for medical question
answering, there is limited work focused on tropical and infectious
disease-specific exploration. We build on an opensource tropical and infectious
diseases (TRINDs) dataset, expanding it to include demographic and semantic
clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM
performance on these, comparing generalist and medical LLMs, as well as LLM
outcomes to human experts. We demonstrate through systematic experimentation,
the benefit of contextual information such as demographics, location, gender,
risk factors for optimal LLM response. Finally we develop a prototype of
TRINDs-LM, a research tool that provides a playground to navigate how context
impacts LLM outputs for health.
[COMMENTS]
Accepted at 2 NeurIPS 2024 workshops: Generative AI for Health
Workshop and Workshop on Advancements In Medical Foundation Models:
Explainability, Robustness, Security, and Beyond
[LINK]
http://arxiv.org/abs/2409.09201v3
[DATE]
2025-01-16 02:52:52+08:00
[CATEGORIES]
cs.CL
Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition
[AUTHORS]
Sneheel Sarangi, Maha Elgarf, Hanan Salam
[ABSTRACT]
Theory of Mind (ToM) is the ability to understand and reflect on the mental
states of others. Although this capability is crucial for human interaction,
testing on Large Language Models (LLMs) reveals that they possess only a
rudimentary understanding of it. Although the most capable closed-source LLMs
have come close to human performance on some ToM tasks, they still perform
poorly on complex variations of the task that involve more structured
reasoning. In this work, we utilize the concept of “pretend-play”, or
“Simulation Theory” from cognitive psychology to propose “Decompose-ToM”:
an LLM-based inference algorithm that improves model performance on complex ToM
tasks. We recursively simulate user perspectives and decompose the ToM task
into a simpler set of functions: subject identification, question-reframing,
world model updation, and knowledge availability. We test the algorithm on
higher-order ToM tasks and a task testing for ToM capabilities in a
conversational setting, demonstrating that our approach shows significant
improvement across models compared to baseline methods while requiring minimal
prompt tuning across tasks and no additional model training.
[COMMENTS]
Accepted to COLING 2025
[LINK]
http://arxiv.org/abs/2501.09056v1
[DATE]
2025-01-16 02:44:01+08:00
[CATEGORIES]
cs.CL
Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
[AUTHORS]
Shaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien
[ABSTRACT]
As Large Language Models (LLMs) and generative AI become increasingly
widespread, concerns about content safety have grown in parallel. Currently,
there is a clear lack of high-quality, human-annotated datasets that address
the full spectrum of LLM-related safety risks and are usable for commercial
applications. To bridge this gap, we propose a comprehensive and adaptable
taxonomy for categorizing safety risks, structured into 12 top-level hazard
categories with an extension to 9 fine-grained subcategories. This taxonomy is
designed to meet the diverse requirements of downstream users, offering more
granular and flexible tools for managing various risk types. Using a hybrid
data generation pipeline that combines human annotations with a multi-LLM
“jury” system to assess the safety of responses, we obtain Aegis 2.0, a
carefully curated collection of 34,248 samples of human-LLM interactions,
annotated according to our proposed taxonomy. To validate its effectiveness, we
demonstrate that several lightweight models, trained using parameter-efficient
techniques on Aegis 2.0, achieve performance competitive with leading safety
models fully fine-tuned on much larger, non-commercial datasets. In addition,
we introduce a novel training blend that combines safety with topic following
data.This approach enhances the adaptability of guard models, enabling them to
generalize to new risk categories defined during inference. We plan to
open-source Aegis 2.0 data and models to the research community to aid in the
safety guardrailing of LLMs.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2404.05993
[LINK]
http://arxiv.org/abs/2501.09004v1
[DATE]
2025-01-16 02:37:08+08:00
[CATEGORIES]
cs.CL
Consistency of Responses and Continuations Generated by Large Language Models on Social Media
[AUTHORS]
Wenlu Fan, Yuqi Zhu, Chenyang Wang, Bin Wang, Wentao Xu
[ABSTRACT]
Large Language Models (LLMs) demonstrate remarkable capabilities in text
generation, yet their emotional consistency and semantic coherence in social
media contexts remain insufficiently understood. This study investigates how
LLMs handle emotional content and maintain semantic relationships through
continuation and response tasks using two open-source models: Gemma and Llama.
By analyzing climate change discussions from Twitter and Reddit, we examine
emotional transitions, intensity patterns, and semantic similarity between
human-authored and LLM-generated content. Our findings reveal that while both
models maintain high semantic coherence, they exhibit distinct emotional
patterns: Gemma shows a tendency toward negative emotion amplification,
particularly anger, while maintaining certain positive emotions like optimism.
Llama demonstrates superior emotional preservation across a broader spectrum of
affects. Both models systematically generate responses with attenuated
emotional intensity compared to human-authored content and show a bias toward
positive emotions in response tasks. Additionally, both models maintain strong
semantic similarity with original texts, though performance varies between
continuation and response tasks. These findings provide insights into LLMs’
emotional and semantic processing capabilities, with implications for their
deployment in social media contexts and human-AI interaction design.
[LINK]
http://arxiv.org/abs/2501.08102v2
[DATE]
2025-01-16 02:10:00+08:00
[CATEGORIES]
cs.CL
Applying General Turn-taking Models to Conversational Human-Robot Interaction
[AUTHORS]
Gabriel Skantze, Bahar Irfan
[ABSTRACT]
Turn-taking is a fundamental aspect of conversation, but current Human-Robot
Interaction (HRI) systems often rely on simplistic, silence-based models,
leading to unnatural pauses and interruptions. This paper investigates, for the
first time, the application of general turn-taking models, specifically TurnGPT
and Voice Activity Projection (VAP), to improve conversational dynamics in HRI.
These models are trained on human-human dialogue data using self-supervised
learning objectives, without requiring domain-specific fine-tuning. We propose
methods for using these models in tandem to predict when a robot should begin
preparing responses, take turns, and handle potential interruptions. We
evaluated the proposed system in a within-subject study against a traditional
baseline system, using the Furhat robot with 39 adults in a conversational
setting, in combination with a large language model for autonomous response
generation. The results show that participants significantly prefer the
proposed system, and it significantly reduces response delays and
interruptions.
[COMMENTS]
Accepted at HRI 2025 (the IEEE/ACM International Conference on
Human-Robot Interaction)
[LINK]
http://arxiv.org/abs/2501.08946v1
[DATE]
2025-01-16 00:49:22+08:00
[CATEGORIES]
cs.CL
Disentangling Exploration of Large Language Models by Optimal Exploitation
[AUTHORS]
Tim Grams, Patrick Betz, Christian Bartelt
[ABSTRACT]
Exploration is a crucial skill for self-improvement and open-ended
problem-solving. However, it remains uncertain whether large language models
can effectively explore the state-space. Existing evaluations predominantly
focus on the trade-off between exploration and exploitation, often assessed in
multi-armed bandit problems. In contrast, this work isolates exploration as the
sole objective, tasking the agent with delivering information that enhances
future returns. For the evaluation, we propose to decompose missing rewards
into exploration and exploitation components by measuring the optimal
achievable return for the states already explored. Our experiments with various
LLMs reveal that most models struggle to sufficiently explore the state-space
and that weak exploration is insufficient. We observe a positive correlation
between model size and exploration performance, with larger models
demonstrating superior capabilities. Furthermore, we show that our
decomposition provides insights into differences in behaviors driven by agent
instructions during prompt engineering, offering a valuable tool for refining
LLM performance in exploratory tasks.
[LINK]
http://arxiv.org/abs/2501.08925v1
[DATE]
2025-01-16 00:30:29+08:00
[CATEGORIES]
cs.LG
cs.CL
GenAI Content Detection Task 3: Cross-Domain Machine-Generated Text Detection Challenge
[AUTHORS]
Liam Dugan, Andrew Zhu, Firoj Alam, Preslav Nakov, Marianna Apidianaki, Chris Callison-Burch
[COMMENTS]
COLING 2025
[LINK]
http://arxiv.org/abs/2501.08913v1
[DATE]
2025-01-16 00:21:09+08:00
[CATEGORIES]
cs.CL
cs.LG
Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto
[AUTHORS]
Elizaveta Tennant, Stephen Hailes, Mirco Musolesi
[ABSTRACT]
Increasing interest in ensuring the safety of next-generation Artificial
Intelligence (AI) systems calls for novel approaches to embedding morality into
autonomous agents. This goal differs qualitatively from traditional
task-specific AI methodologies. In this paper, we provide a systematization of
existing approaches to the problem of introducing morality in machines -
modelled as a continuum. Our analysis suggests that popular techniques lie at
the extremes of this continuum - either being fully hard-coded into top-down,
explicit rules, or entirely learned in a bottom-up, implicit fashion with no
direct statement of any moral principle (this includes learning from human
feedback, as applied to the training and finetuning of large language models,
or LLMs). Given the relative strengths and weaknesses of each type of
methodology, we argue that more hybrid solutions are needed to create adaptable
and robust, yet controllable and interpretable agentic systems. To that end,
this paper discusses both the ethical foundations (including deontology,
consequentialism and virtue ethics) and implementations of morally aligned AI
systems.
We present a series of case studies that rely on intrinsic rewards, moral
constraints or textual instructions, applied to either pure-Reinforcement
Learning or LLM-based agents. By analysing these diverse implementations under
one framework, we compare their relative strengths and shortcomings in
developing morally aligned AI systems. We then discuss strategies for
evaluating the effectiveness of moral learning agents. Finally, we present open
research questions and implications for the future of AI safety and ethics
which are emerging from this hybrid framework.
[LINK]
http://arxiv.org/abs/2312.01818v3
[DATE]
2025-01-16 23:58:24+08:00
[CATEGORIES]
cs.LG
Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit
[AUTHORS]
Seok-Jin Kim, Min-hwan Oh
[ABSTRACT]
We study the performance guarantees of exploration-free greedy algorithms for
the linear contextual bandit problem. We introduce a novel condition, named the
\textit{Local Anti-Concentration} (LAC) condition, which enables a greedy
bandit algorithm to achieve provable efficiency. We show that the LAC condition
is satisfied by a broad class of distributions, including Gaussian,
exponential, uniform, Cauchy, and Student’s~$t$ distributions, along with other
exponential family distributions and their truncated variants. This
significantly expands the class of distributions under which greedy algorithms
can perform efficiently. Under our proposed LAC condition, we prove that the
cumulative expected regret of the greedy algorithm for the linear contextual
bandit is bounded by $O(\operatorname{poly} \log T)$. Our results establish the
widest range of distributions known to date that allow a sublinear regret bound
for greedy algorithms, further achieving a sharp poly-logarithmic regret.
[COMMENTS]
NeurIPS2024
[LINK]
http://arxiv.org/abs/2411.12878v2
[DATE]
2025-01-16 23:46:14+08:00
[CATEGORIES]
cs.LG
ARMAX identification of low rank graphical models
[AUTHORS]
Wenqi Cao, Aming Li
[ABSTRACT]
In large-scale systems, complex internal relationships are often present.
Such interconnected systems can be effectively described by low rank stochastic
processes. When identifying a predictive model of low rank processes from
sampling data, the rank-deficient property of spectral densities is often
obscured by the inevitable measurement noise in practice. However, existing low
rank identification approaches often did not take noise into explicit
consideration, leading to non-negligible inaccuracies even under weak noise. In
this paper, we address the identification issue of low rank processes under
measurement noise. We find that the noisy measurement model admits a sparse
plus low rank structure in latent-variable graphical models. Specifically, we
first decompose the problem into a maximum entropy covariance extension
problem, and a low rank graphical estimation problem based on an autoregressive
moving-average with exogenous input (ARMAX) model. To identify the ARMAX low
rank graphical models, we propose an estimation approach based on maximum
likelihood. The identifiability and consistency of this approach are proven
under certain conditions. Simulation results confirm the reliable performance
of the entire algorithm in both the parameter estimation and noisy data
filtering.
[LINK]
http://arxiv.org/abs/2501.09616v1
[DATE]
2025-01-16 23:43:32+08:00
[CATEGORIES]
cs.LG
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning
[AUTHORS]
Siddharth Aravindan, Dixant Mittal, Wee Sun Lee
[ABSTRACT]
Posterior Sampling for Reinforcement Learning (PSRL) is a well-known
algorithm that augments model-based reinforcement learning (MBRL) algorithms
with Thompson sampling. PSRL maintains posterior distributions of the
environment transition dynamics and the reward function, which are intractable
for tasks with high-dimensional state and action spaces. Recent works show that
dropout, used in conjunction with neural networks, induces variational
distributions that can approximate these posteriors. In this paper, we propose
Event-based Variational Distributions for Exploration (EVaDE), which are
variational distributions that are useful for MBRL, especially when the
underlying domain is object-based. We leverage the general domain knowledge of
object-based domains to design three types of event-based convolutional layers
to direct exploration. These layers rely on Gaussian dropouts and are inserted
between the layers of the deep neural network model to help facilitate
variational Thompson sampling. We empirically show the effectiveness of
EVaDE-equipped Simulated Policy Learning (EVaDE-SimPLe) on the 100K Atari game
suite.
[LINK]
http://arxiv.org/abs/2501.09611v1
[DATE]
2025-01-16 23:35:48+08:00
[CATEGORIES]
cs.LG
Adversarial-Ensemble Kolmogorov Arnold Networks for Enhancing Indoor Wi-Fi Positioning: A Defensive Approach Against Spoofing and Signal Manipulation Attacks
[AUTHORS]
Mitul Goswami, Romit Chatterjee, Somnath Mahato, Prasant Kumar Pattnaik
[ABSTRACT]
The research presents a study on enhancing the robustness of Wi-Fi-based
indoor positioning systems against adversarial attacks. The goal is to improve
the positioning accuracy and resilience of these systems under two attack
scenarios: Wi-Fi Spoofing and Signal Strength Manipulation. Three models are
developed and evaluated: a baseline model (M_Base), an adversarially trained
robust model (M_Rob), and an ensemble model (M_Ens). All models utilize a
Kolmogorov-Arnold Network (KAN) architecture. The robust model is trained with
adversarially perturbed data, while the ensemble model combines predictions
from both the base and robust models. Experimental results show that the robust
model reduces positioning error by approximately 10% compared to the baseline,
achieving 2.03 meters error under Wi-Fi spoofing and 2.00 meters under signal
strength manipulation. The ensemble model further outperforms with errors of
2.01 meters and 1.975 meters for the respective attack types. This analysis
highlights the effectiveness of adversarial training techniques in mitigating
attack impacts. The findings underscore the importance of considering
adversarial scenarios in developing indoor positioning systems, as improved
resilience can significantly enhance the accuracy and reliability of such
systems in mission-critical environments.
[LINK]
http://arxiv.org/abs/2501.09609v1
[DATE]
2025-01-16 23:34:00+08:00
[CATEGORIES]
cs.LG
Higher-Order Topological Directionality and Directed Simplicial Neural Networks
[AUTHORS]
Manuel Lecha, Andrea Cavallo, Francesca Dominici, Elvin Isufi, Claudio Battiloro
[ABSTRACT]
Topological Deep Learning (TDL) has emerged as a paradigm to process and
learn from signals defined on higher-order combinatorial topological spaces,
such as simplicial or cell complexes. Although many complex systems have an
asymmetric relational structure, most TDL models forcibly symmetrize these
relationships. In this paper, we first introduce a novel notion of higher-order
directionality and we then design Directed Simplicial Neural Networks
(Dir-SNNs) based on it. Dir-SNNs are message-passing networks operating on
directed simplicial complexes able to leverage directed and possibly asymmetric
interactions among the simplices. To our knowledge, this is the first TDL model
using a notion of higher-order directionality. We theoretically and empirically
prove that Dir-SNNs are more expressive than their directed graph counterpart
in distinguishing isomorphic directed graphs. Experiments on a synthetic source
localization task demonstrate that Dir-SNNs outperform undirected SNNs when the
underlying complex is directed, and perform comparably when the underlying
complex is undirected.
[COMMENTS]
7 pages, 8 figures, 1 table
[LINK]
http://arxiv.org/abs/2409.08389v3
[DATE]
2025-01-16 23:32:33+08:00
[CATEGORIES]
cs.LG
Reducing the Sensitivity of Neural Physics Simulators to Mesh Topology via Pretraining
[AUTHORS]
Nathan Vaska, Justin Goodwin, Robin Walters, Rajmonda S. Caceres
[ABSTRACT]
Meshes are used to represent complex objects in high fidelity physics
simulators across a variety of domains, such as radar sensing and aerodynamics.
There is growing interest in using neural networks to accelerate physics
simulations, and also a growing body of work on applying neural networks
directly to irregular mesh data. Since multiple mesh topologies can represent
the same object, mesh augmentation is typically required to handle topological
variation when training neural networks. Due to the sensitivity of physics
simulators to small changes in mesh shape, it is challenging to use these
augmentations when training neural network-based physics simulators. In this
work, we show that variations in mesh topology can significantly reduce the
performance of neural network simulators. We evaluate whether pretraining can
be used to address this issue, and find that employing an established
autoencoder pretraining technique with graph embedding models reduces the
sensitivity of neural network simulators to variations in mesh topology.
Finally, we highlight future research directions that may further reduce neural
simulator sensitivity to mesh topology.
[COMMENTS]
5 pages, 3 figures
[LINK]
http://arxiv.org/abs/2501.09597v1
[DATE]
2025-01-16 23:21:18+08:00
[CATEGORIES]
cs.LG
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
[AUTHORS]
Pratyush Dhingra, Janardhan Rao Doppa, Partha Pratim Pande
[ABSTRACT]
Transformer architectures have become the standard neural network model for
various machine learning applications including natural language processing and
computer vision. However, the compute and memory requirements introduced by
transformer models make them challenging to adopt for edge applications.
Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is
a common task to enhance the model’s predictive performance on specific
tasks/applications. Existing transformer accelerators are oblivious to
complexities introduced by fine-tuning. In this paper, we propose the design of
a three-dimensional (3D) heterogeneous architecture referred to as Atleus that
incorporates heterogeneous computing resources specifically optimized to
accelerate transformer models for the dual purposes of fine-tuning and
inference. Specifically, Atleus utilizes non-volatile memory and systolic array
for accelerating transformer computational kernels using an integrated 3D
platform. Moreover, we design a suitable NoC to achieve high performance and
energy efficiency. Finally, Atleus adopts an effective quantization scheme to
support model compression. Experimental results demonstrate that Atleus
outperforms existing state-of-the-art by up to 56x and 64.5x in terms of
performance and energy efficiency respectively
[COMMENTS]
Accepted for Publication in IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems (TCAD)
[LINK]
http://arxiv.org/abs/2501.09588v1
[DATE]
2025-01-16 23:11:33+08:00
[CATEGORIES]
cs.LG
Hybrid additive modeling with partial dependence for supervised regression and dynamical systems forecasting
[AUTHORS]
Yann Claes, Vân Anh Huynh-Thu, Pierre Geurts
[ABSTRACT]
Learning processes by exploiting restricted domain knowledge is an important
task across a plethora of scientific areas, with more and more hybrid training
methods additively combining data-driven and model-based approaches. Although
the obtained models are more accurate than purely data-driven models, the
optimization process usually comes with sensitive regularization constraints.
Furthermore, while such hybrid methods have been tested in various scientific
applications, they have been mostly tested on dynamical systems, with only
limited study about the influence of each model component on global performance
and parameter identification. In this work, we introduce a new hybrid training
approach based on partial dependence, which removes the need for intricate
regularization. Moreover, we assess the performance of hybrid modeling against
traditional machine learning methods on standard regression problems. We
compare, on both synthetic and real regression problems, several approaches for
training such hybrid models. We focus on hybrid methods that additively combine
a parametric term with a machine learning term and investigate model-agnostic
training procedures. Therefore, experiments are carried out with different
types of machine learning models, including tree-based models and artificial
neural networks. We also extend our partial dependence optimization process for
dynamical systems forecasting and compare it to existing schemes.
[COMMENTS]
Extended version of the paper entitled “Knowledge-Guided Additive
Modeling for Supervised Regression”
(https://link.springer.com/chapter/10.1007/978-3-031-45275-8_5), accepted for
publication in the Machine Learning journal. The extension includes new
experiments in the static setting, along with a dedicated section on the
application of our method to the problem of dynamical systems forecasting
[LINK]
http://arxiv.org/abs/2307.02229v2
[DATE]
2025-01-16 23:00:38+08:00
[CATEGORIES]
cs.LG
Sequential PatchCore: Anomaly Detection for Surface Inspection using Synthetic Impurities
[AUTHORS]
Runzhou Mao, Juraj Fulir, Christoph Garth, Petra Gospodnetić
[ABSTRACT]
The appearance of surface impurities (e.g., water stains, fingerprints,
stickers) is an often-mentioned issue that causes degradation of automated
visual inspection systems. At the same time, synthetic data generation
techniques for visual surface inspection have focused primarily on generating
perfect examples and defects, disregarding impurities. This study highlights
the importance of considering impurities when generating synthetic data. We
introduce a procedural method to include photorealistic water stains in
synthetic data. The synthetic datasets are generated to correspond to real
datasets and are further used to train an anomaly detection model and
investigate the influence of water stains. The high-resolution images used for
surface inspection lead to memory bottlenecks during anomaly detection
training. To address this, we introduce Sequential PatchCore - a method to
build coresets sequentially and make training on large images using
consumer-grade hardware tractable. This allows us to perform transfer learning
using coresets pre-trained on different dataset versions. Our results show the
benefits of using synthetic data for pre-training an explicit coreset anomaly
model and the extended performance benefits of finetuning the coreset using
real data. We observed how the impurities and labelling ambiguity lower the
model performance and have additionally reported the defect-wise recall to
provide an industrially relevant perspective on model performance.
[LINK]
http://arxiv.org/abs/2501.09579v1
[DATE]
2025-01-16 22:56:41+08:00
[CATEGORIES]
cs.LG
Towards Spectral Convergence of Locally Linear Embedding on Manifolds with Boundary
[AUTHORS]
Andrew Lyons
[ABSTRACT]
We study the eigenvalues and eigenfunctions of a differential operator that
governs the asymptotic behavior of the unsupervised learning algorithm known as
Locally Linear Embedding when a large data set is sampled from an interval or
disc. In particular, the differential operator is of second order, mixed-type,
and degenerates near the boundary. We show that a natural regularity condition
on the eigenfunctions imposes a consistent boundary condition and use the
Frobenius method to estimate pointwise behavior. We then determine the limiting
sequence of eigenvalues analytically and compare them to numerical predictions.
Finally, we propose a variational framework for determining eigenvalues on
other compact manifolds.
[COMMENTS]
26 pages, 7 figures; the author welcomes all comments
[LINK]
http://arxiv.org/abs/2501.09572v1
[DATE]
2025-01-16 22:45:53+08:00
[CATEGORIES]
cs.LG
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
[AUTHORS]
Long-Fei Li, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou
[COMMENTS]
NeurIPS 2024; v3 substantially improves the presentation and further
illustrates the role of $\kappa$ in function approximation
[LINK]
http://arxiv.org/abs/2405.17061v3
[DATE]
2025-01-16 22:45:52+08:00
[CATEGORIES]
cs.LG
Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks
[AUTHORS]
Bao Gia Doan, Afshar Shamsi, Xiao-Yu Guo, Arash Mohammadi, Hamid Alinejad-Rokny, Dino Sejdinovic, Damien Teney, Damith C. Ranasinghe, Ehsan Abbasnejad
[ABSTRACT]
Computational complexity of Bayesian learning is impeding its adoption in
practical, large-scale tasks. Despite demonstrations of significant merits such
as improved robustness and resilience to unseen or out-of-distribution inputs
over their non- Bayesian counterparts, their practical use has faded to near
insignificance. In this study, we introduce an innovative framework to mitigate
the computational burden of Bayesian neural networks (BNNs). Our approach
follows the principle of Bayesian techniques based on deep ensembles, but
significantly reduces their cost via multiple low-rank perturbations of
parameters arising from a pre-trained neural network. Both vanilla version of
ensembles as well as more sophisticated schemes such as Bayesian learning with
Stein Variational Gradient Descent (SVGD), previously deemed impractical for
large models, can be seamlessly implemented within the proposed framework,
called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a
dramatic reduction in the number of trainable parameters required to
approximate a Bayesian posterior; and ii) it not only maintains, but in some
instances, surpasses the performance of conventional Bayesian learning methods
and non-Bayesian baselines. Our results with large-scale tasks such as
ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the
effectiveness and versatility of Bella in building highly scalable and
practical Bayesian deep models for real-world applications.
[COMMENTS]
This paper is accepted in AAAI’2025
[LINK]
http://arxiv.org/abs/2407.20891v4
[DATE]
2025-01-16 22:45:36+08:00
[CATEGORIES]
cs.LG
MatrixNet: Learning over symmetry groups using learned group representations
[AUTHORS]
Lucas Laird, Circe Hsu, Asilata Bapat, Robin Walters
[ABSTRACT]
Group theory has been used in machine learning to provide a theoretically
grounded approach for incorporating known symmetry transformations in tasks
from robotics to protein modeling. In these applications, equivariant neural
networks use known symmetry groups with predefined representations to learn
over geometric input data. We propose MatrixNet, a neural network architecture
that learns matrix representations of group element inputs instead of using
predefined representations. MatrixNet achieves higher sample efficiency and
generalization over several standard baselines in prediction tasks over the
several finite groups and the Artin braid group. We also show that MatrixNet
respects group relations allowing generalization to group elements of greater
word length than in the training set.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2501.09571v1
[DATE]
2025-01-16 22:45:12+08:00
[CATEGORIES]
cs.LG
Latent Space Characterization of Autoencoder Variants
[AUTHORS]
Anika Shrivastava, Renu Rameshan, Samar Agnihotri
[ABSTRACT]
Understanding the latent spaces learned by deep learning models is crucial in
exploring how they represent and generate complex data. Autoencoders (AEs) have
played a key role in the area of representation learning, with numerous
regularization techniques and training principles developed not only to enhance
their ability to learn compact and robust representations, but also to reveal
how different architectures influence the structure and smoothness of the
lower-dimensional non-linear manifold. We strive to characterize the structure
of the latent spaces learned by different autoencoders including convolutional
autoencoders (CAEs), denoising autoencoders (DAEs), and variational
autoencoders (VAEs) and how they change with the perturbations in the input. By
characterizing the matrix manifolds corresponding to the latent spaces, we
provide an explanation for the well-known observation that the latent spaces of
CAE and DAE form non-smooth manifolds, while that of VAE forms a smooth
manifold. We also map the points of the matrix manifold to a Hilbert space
using distance preserving transforms and provide an alternate view in terms of
the subspaces generated in the Hilbert space as a function of the distortion in
the input. The results show that the latent manifolds of CAE and DAE are
stratified with each stratum being a smooth product manifold, while the
manifold of VAE is a smooth product manifold of two symmetric positive definite
matrices and a symmetric positive semi-definite matrix.
[COMMENTS]
9 pages, 6 figures, and 1 table
[LINK]
http://arxiv.org/abs/2412.04755v2
[DATE]
2025-01-16 22:44:39+08:00
[CATEGORIES]
cs.LG
FSDEM: Feature Selection Dynamic Evaluation Metric
[AUTHORS]
Muhammad Rajabinasab, Anton D. Lautrup, Tobias Hyrup, Arthur Zimek
[ABSTRACT]
Expressive evaluation metrics are indispensable for informative experiments
in all areas, and while several metrics are established in some areas, in
others, such as feature selection, only indirect or otherwise limited
evaluation metrics are found. In this paper, we propose a novel evaluation
metric to address several problems of its predecessors and allow for flexible
and reliable evaluation of feature selection algorithms. The proposed metric is
a dynamic metric with two properties that can be used to evaluate both the
performance and the stability of a feature selection algorithm. We conduct
several empirical experiments to illustrate the use of the proposed metric in
the successful evaluation of feature selection algorithms. We also provide a
comparison and analysis to show the different aspects involved in the
evaluation of the feature selection algorithms. The results indicate that the
proposed metric is successful in carrying out the evaluation task for feature
selection algorithms.
This paper is an extended version of a paper published at SISAP 2024.
[COMMENTS]
Short version of this paper is published at 17th International
Conference on Similarity Search and Applications, SISAP 2024
[LINK]
http://arxiv.org/abs/2408.14234v3
[DATE]
2025-01-16 22:29:47+08:00
[CATEGORIES]
cs.LG
Overshoot: Taking advantage of future gradients in momentum-based stochastic optimization
[AUTHORS]
Jakub Kopal, Michal Gregor, Santiago de Leon-Martinez, Jakub Simko
[ABSTRACT]
Overshoot is a novel, momentum-based stochastic gradient descent optimization
method designed to enhance performance beyond standard and Nesterov’s momentum.
In conventional momentum methods, gradients from previous steps are aggregated
with the gradient at current model weights before taking a step and updating
the model. Rather than calculating gradient at the current model weights,
Overshoot calculates the gradient at model weights shifted in the direction of
the current momentum. This sacrifices the immediate benefit of using the
gradient w.r.t. the exact model weights now, in favor of evaluating at a point,
which will likely be more relevant for future updates. We show that
incorporating this principle into momentum-based optimizers (SGD with momentum
and Adam) results in faster convergence (saving on average at least 15% of
steps). Overshoot consistently outperforms both standard and Nesterov’s
momentum across a wide range of tasks and integrates into popular
momentum-based optimizers with zero memory and small computational overhead.
[LINK]
http://arxiv.org/abs/2501.09556v1
[DATE]
2025-01-16 22:18:10+08:00
[CATEGORIES]
cs.LG
Intra-day Solar and Power Forecast for Optimization of Intraday Market Participation
[AUTHORS]
Nelson Salazar-Peña, Adolfo Palma-Vergara, Mateo Montes, María Alejandra Vargas-Torres, Adriana Salinas, Andrés Velasco, Alejandra Tabares, Andrés González-Mancera
[ABSTRACT]
The prediction of solar irradiance enhances reliability in photovoltaic (PV)
solar plant generation and grid integration. In Colombia, PV plants face
penalties if energy production deviates beyond governmental thresholds from
intraday market offers. This research employs Long Short-Term Memory (LSTM) and
Bidirectional-LSTM (Bi-LSTM) models, utilizing meteorological data from a PV
plant in El Paso, Cesar, Colombia, to predict solar irradiance with a 6-hour
horizon and 10-minute resolution. While Bi-LSTM showed superior performance,
the LSTM model achieved comparable results with significantly reduced training
time (6 hours versus 18 hours), making it computationally advantageous. The
LSTM predictions were averaged to create an hourly resolution model, evaluated
using Mean Absolute Error, Root-Mean-Square Error, Normalized Root-Mean-Square
Error, and Mean Absolute Percentage Error metrics. Comparison with the Global
Forecast System (GFS) revealed similar performance, with both models
effectively capturing daily solar irradiance patterns. The forecast model
integrates with an Object-Oriented power production model, enabling accurate
energy offers in the intraday market while minimizing penalty costs.
[COMMENTS]
20 pages, 37 figures, 9 tables
[LINK]
http://arxiv.org/abs/2501.09551v1
[DATE]
2025-01-16 22:12:03+08:00
[CATEGORIES]
cs.LG
STROOBnet Optimization via GPU-Accelerated Proximal Recurrence Strategies
[AUTHORS]
Ted Edward Holmberg, Mahdi Abdelguerfi, Elias Ioup
[ABSTRACT]
Spatiotemporal networks’ observational capabilities are crucial for accurate
data gathering and informed decisions across multiple sectors. This study
focuses on the Spatiotemporal Ranged Observer-Observable Bipartite Network
(STROOBnet), linking observational nodes (e.g., surveillance cameras) to events
within defined geographical regions, enabling efficient monitoring. Using data
from Real-Time Crime Camera (RTCC) systems and Calls for Service (CFS) in New
Orleans, where RTCC combats rising crime amidst reduced police presence, we
address the network’s initial observational imbalances. Aiming for uniform
observational efficacy, we propose the Proximal Recurrence approach. It
outperformed traditional clustering methods like k-means and DBSCAN by offering
holistic event frequency and spatial consideration, enhancing observational
coverage.
[COMMENTS]
10 pages, 17 figures, 2023 IEEE International Conference on Big Data
(BigData)
[LINK]
http://arxiv.org/abs/2404.14388v3
[DATE]
2025-01-16 22:02:26+08:00
[CATEGORIES]
cs.LG
A Consolidated Volatility Prediction with Back Propagation Neural Network and Genetic Algorithm
[AUTHORS]
Zong Ke, Jingyu Xu, Zizhou Zhang, Yu Cheng, Wenjun Wu
[ABSTRACT]
This paper provides a unique approach with AI algorithms to predict emerging
stock markets volatility. Traditionally, stock volatility is derived from
historical volatility,Monte Carlo simulation and implied volatility as well. In
this paper, the writer designs a consolidated model with back-propagation
neural network and genetic algorithm to predict future volatility of emerging
stock markets and found that the results are quite accurate with low errors.
[COMMENTS]
6 pages, 7 figures, 1 table, The paper will be published by IEEE on
conference: 2024 3rd International Conference on Image Processing, Computer
Vision and Machine Learning (ICICML 2024) (V2)
[LINK]
http://arxiv.org/abs/2412.07223v3
[DATE]
2025-01-16 21:53:47+08:00
[CATEGORIES]
cs.LG
MOGNET: A Mux-residual quantized Network leveraging Online-Generated weights
[AUTHORS]
Van Thien Nguyen, William Guicquero, Gilles Sicard
[ABSTRACT]
This paper presents a compact model architecture called MOGNET, compatible
with a resource-limited hardware. MOGNET uses a streamlined Convolutional
factorization block based on a combination of 2 point-wise (1x1) convolutions
with a group-wise convolution in-between. To further limit the overall model
size and reduce the on-chip required memory, the second point-wise
convolution’s parameters are on-line generated by a Cellular Automaton
structure. In addition, MOGNET enables the use of low-precision weights and
activations, by taking advantage of a Multiplexer mechanism with a proper
Bitshift rescaling for integrating residual paths without increasing the
hardware-related complexity. To efficiently train this model we also introduce
a novel weight ternarization method favoring the balance between quantized
levels. Experimental results show that given tiny memory budget (sub-2Mb),
MOGNET can achieve higher accuracy with a clear gap up to 1% at a similar or
even lower model size compared to recent state-of-the-art methods.
[COMMENTS]
Published at IEEE AICAS 2022
[LINK]
http://arxiv.org/abs/2501.09531v1
[DATE]
2025-01-16 21:30:20+08:00
[CATEGORIES]
cs.LG
Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation
[AUTHORS]
Hanrong Zhang, Yifei Yao, Zixuan Wang, Jiayuan Su, Mengxuan Li, Peng Peng, Hongwei Wang
[ABSTRACT]
Class-incremental fault diagnosis requires a model to adapt to new fault
classes while retaining previous knowledge. However, limited research exists
for imbalanced and long-tailed data. Extracting discriminative features from
few-shot fault data is challenging, and adding new fault classes often demands
costly model retraining. Moreover, incremental training of existing methods
risks catastrophic forgetting, and severe class imbalance can bias the model’s
decisions toward normal classes. To tackle these issues, we introduce a
Supervised Contrastive knowledge distiLlation for class Incremental Fault
Diagnosis (SCLIFD) framework proposing supervised contrastive knowledge
distillation for improved representation learning capability and less
forgetting, a novel prioritized exemplar selection method for sample replay to
alleviate catastrophic forgetting, and the Random Forest Classifier to address
the class imbalance. Extensive experimentation on simulated and real-world
industrial datasets across various imbalance ratios demonstrates the
superiority of SCLIFD over existing approaches. Our code can be found at
https://github.com/Zhang-Henry/SCLIFD_TII.
[LINK]
http://arxiv.org/abs/2501.09525v1
[DATE]
2025-01-16 21:20:29+08:00
[CATEGORIES]
cs.LG
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
[AUTHORS]
Anke Tang, Enneng Yang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao
[ABSTRACT]
Deep model merging represents an emerging research direction that combines
multiple fine-tuned models to harness their specialized capabilities across
different tasks and domains. Current model merging techniques focus on merging
all available models simultaneously, with weight interpolation-based methods
being the predominant approaches. However, these conventional approaches are
not well-suited for scenarios where models become available sequentially, and
they often suffer from high memory requirements and potential interference
between tasks. In this study, we propose a training-free projection-based
continual merging method that processes models sequentially through orthogonal
projections of weight matrices and adaptive scaling mechanisms. Our method
operates by projecting new parameter updates onto subspaces orthogonal to
existing merged parameter updates while using an adaptive scaling mechanism to
maintain stable parameter distances, enabling efficient sequential integration
of task-specific knowledge. Our approach maintains constant memory complexity
to the number of models, minimizes interference between tasks through
orthogonal projections, and retains the performance of previously merged models
through adaptive task vector scaling. Extensive experiments on CLIP-ViT models
demonstrate that our method achieves a 5-8% average accuracy improvement while
maintaining robust performance in different task orderings.
[LINK]
http://arxiv.org/abs/2501.09522v1
[DATE]
2025-01-16 21:17:24+08:00
[CATEGORIES]
cs.LG
Multi-Head Self-Attending Neural Tucker Factorization
[AUTHORS]
Yikai Hou, Peng Tang
[ABSTRACT]
Quality-of-service (QoS) data exhibit dynamic temporal patterns that are
crucial for accurately predicting missing values. These patterns arise from the
evolving interactions between users and services, making it essential to
capture the temporal dynamics inherent in such data for improved prediction
performance. As the size and complexity of QoS datasets increase, existing
models struggle to provide accurate predictions, highlighting the need for more
flexible and dynamic methods to better capture the underlying patterns in
large-scale QoS data. To address this issue, we introduce a neural
network-based tensor factorization approach tailored for learning
spatiotemporal representations of high-dimensional and incomplete (HDI)
tensors, namely the Multi-head Self-attending Neural Tucker Factorization
(MSNTucF). The model is elaborately designed for modeling intricate nonlinear
spatiotemporal feature interaction patterns hidden in real world data with a
two-fold idea. It first employs a neural network structure to generalize the
traditional framework of Tucker factorization and then proposes to leverage a
multi-head self-attending module to enforce nonlinear latent interaction
learning. In empirical studies on two dynamic QoS datasets from real
applications, the proposed MSNTucF model demonstrates superior performance
compared to state-of-the-art benchmark models in estimating missing
observations. This highlights its ability to learn non-linear spatiotemporal
representations of HDI tensors.
[LINK]
http://arxiv.org/abs/2501.09776v1
[DATE]
2025-01-16 21:04:15+08:00
[CATEGORIES]
cs.LG
Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel
[AUTHORS]
Richard Cornelius Suwandi, Zhidi Lin, Feng Yin, Zhiguo Wang, Sergios Theodoridis
[ABSTRACT]
Gaussian processes (GPs) stand as crucial tools in machine learning and
signal processing, with their effectiveness hinging on kernel design and
hyper-parameter optimization. This paper presents a novel GP linear multiple
kernel (LMK) and a generic sparsity-aware distributed learning framework to
optimize the hyper-parameters. The newly proposed grid spectral mixture product
(GSMP) kernel is tailored for multi-dimensional data, effectively reducing the
number of hyper-parameters while maintaining good approximation capability. We
further demonstrate that the associated hyper-parameter optimization of this
kernel yields sparse solutions. To exploit the inherent sparsity of the
solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL)
framework. The framework incorporates a quantized alternating direction method
of multipliers (ADMM) scheme for collaborative learning among multiple agents,
where the local optimization problem is solved using a distributed successive
convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale
hyper-parameter optimization for the proposed kernel, simultaneously ensuring
data privacy and minimizing communication costs. Theoretical analysis
establishes convergence guarantees for the learning framework, while
experiments on diverse datasets demonstrate the superior prediction performance
and efficiency of our proposed methods.
[LINK]
http://arxiv.org/abs/2309.08201v3
[DATE]
2025-01-16 20:33:37+08:00
[CATEGORIES]
cs.LG
AALF: Almost Always Linear Forecasting
[AUTHORS]
Matthias Jakobs, Thomas Liebig
[ABSTRACT]
Recent works for time-series forecasting more and more leverage the high
predictive power of Deep Learning models. With this increase in model
complexity, however, comes a lack in understanding of the underlying model
decision process, which is problematic for high-stakes application scenarios.
At the same time, simple, interpretable forecasting methods such as ARIMA still
perform very well, sometimes on-par, with Deep Learning approaches. We argue
that simple models are good enough most of the time, and that forecasting
performance could be improved by choosing a Deep Learning method only for few,
important predictions, increasing the overall interpretability of the
forecasting process. In this context, we propose a novel online model selection
framework which learns to identify these predictions. An extensive empirical
study on various real-world datasets shows that our selection methodology
performs comparable to state-of-the-art online model selections methods in most
cases while being significantly more interpretable. We find that almost always
choosing a simple autoregressive linear model for forecasting results in
competitive performance, suggesting that the need for opaque black-box models
in time-series forecasting might be smaller than recent works would suggest.
[LINK]
http://arxiv.org/abs/2409.10142v2
[DATE]
2025-01-16 20:29:28+08:00
[CATEGORIES]
cs.LG
Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces
[AUTHORS]
Viktor Stein, Sebastian Neumayer, Nicolaj Rux, Gabriele Steidl
[ABSTRACT]
Commonly used $f$-divergences of measures, e.g., the Kullback-Leibler
divergence, are subject to limitations regarding the support of the involved
measures. A remedy is regularizing the $f$-divergence by a squared maximum mean
discrepancy (MMD) associated with a characteristic kernel $K$. We use the
kernel mean embedding to show that this regularization can be rewritten as the
Moreau envelope of some function on the associated reproducing kernel Hilbert
space. Then, we exploit well-known results on Moreau envelopes in Hilbert
spaces to analyze the MMD-regularized $f$-divergences, particularly their
gradients. Subsequently, we use our findings to analyze Wasserstein gradient
flows of MMD-regularized $f$-divergences. We provide proof-of-the-concept
numerical examples for flows starting from empirical measures. Here, we cover
$f$-divergences with infinite and finite recession constants. Lastly, we extend
our results to the tight variational formulation of $f$-divergences and
numerically compare the resulting flows.
[COMMENTS]
56 pages, 14 figures, 3 tables. Comments welcome! NEW: Incorporated
Reviewers’ suggestions, added FISTA and tight formulation
[LINK]
http://arxiv.org/abs/2402.04613v3
[DATE]
2025-01-16 20:05:26+08:00
[CATEGORIES]
cs.LG
Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay
[AUTHORS]
Baiyu Peng, Aude Billard
[ABSTRACT]
Planning for a wide range of real-world tasks necessitates to know and write
all constraints. However, instances exist where these constraints are either
unknown or challenging to specify accurately. A possible solution is to infer
the unknown constraints from expert demonstration. The majority of prior works
limit themselves to learning simple linear constraints, or require strong
knowledge of the true constraint parameterization or environmental model. To
mitigate these problems, this paper presents a positive-unlabeled (PU) learning
approach to infer a continuous, arbitrary and possibly nonlinear, constraint
from demonstration. From a PU learning view, We treat all data in
demonstrations as positive (feasible) data, and learn a (sub)-optimal policy to
generate high-reward-winning but potentially infeasible trajectories, which
serve as unlabeled data containing both feasible and infeasible states. Under
an assumption on data distribution, a feasible-infeasible classifier (i.e.,
constraint model) is learned from the two datasets through a postprocessing PU
learning technique. The entire method employs an iterative framework
alternating between updating the policy, which generates and selects
higher-reward policies, and updating the constraint model. Additionally, a
memory buffer is introduced to record and reuse samples from previous
iterations to prevent forgetting. The effectiveness of the proposed method is
validated in two Mujoco environments, successfully inferring continuous
nonlinear constraints and outperforming a baseline method in terms of
constraint accuracy and policy safety.
[LINK]
http://arxiv.org/abs/2407.16485v3
[DATE]
2025-01-16 19:59:02+08:00
[CATEGORIES]
cs.LG
Utilizing AI Language Models to Identify Prognostic Factors for Coronary Artery Disease: A Study in Mashhad Residents
[AUTHORS]
Bami Zahra, Behnampour Nasser, Doosti Hassan, Ghayour Mobarhan Majid
[ABSTRACT]
Abstract: Background: Understanding cardiovascular artery disease risk
factors, the leading global cause of mortality, is crucial for influencing its
etiology, prevalence, and treatment. This study aims to evaluate prognostic
markers for coronary artery disease in Mashhad using Naive Bayes, REP Tree,
J48, CART, and CHAID algorithms. Methods:
Using data from the 2009 MASHAD STUDY, prognostic factors for coronary artery
disease were determined with Naive Bayes, REP Tree, J48, CART, CHAID, and
Random Forest algorithms using R 3.5.3 and WEKA 3.9.4. Model efficiency was
compared by sensitivity, specificity, and accuracy. Cases were patients with
coronary artery disease; each had three controls (totally 940). Results:
Prognostic factors for coronary artery disease in Mashhad residents varied by
algorithm. CHAID identified age, myocardial infarction history, and
hypertension. CART included depression score and physical activity. REP added
education level and anxiety score. NB included diabetes and family history. J48
highlighted father’s heart disease and weight loss. CHAID had the highest
accuracy (0.80).
Conclusion:
Key prognostic factors for coronary artery disease in CART and CHAID models
include age, myocardial infarction history, hypertension, depression score,
physical activity, and BMI. NB, REP Tree, and J48 identified numerous factors.
CHAID had the highest accuracy, sensitivity, and specificity. CART offers
simpler interpretation, aiding physician and paramedic model selection based on
specific. Keywords: RF, Na"ive Bayes, REP, J48 algorithms, Coronary Artery
Disease (CAD).
[LINK]
http://arxiv.org/abs/2501.09480v1
[DATE]
2025-01-16 19:32:03+08:00
[CATEGORIES]
cs.LG
Diffusion Models in Vision: A Survey
[AUTHORS]
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
[ABSTRACT]
Denoising diffusion models represent a recent emerging topic in computer
vision, demonstrating remarkable results in the area of generative modeling. A
diffusion model is a deep generative model that is based on two stages, a
forward diffusion stage and a reverse diffusion stage. In the forward diffusion
stage, the input data is gradually perturbed over several steps by adding
Gaussian noise. In the reverse stage, a model is tasked at recovering the
original input data by learning to gradually reverse the diffusion process,
step by step. Diffusion models are widely appreciated for the quality and
diversity of the generated samples, despite their known computational burdens,
i.e. low speeds due to the high number of steps involved during sampling. In
this survey, we provide a comprehensive review of articles on denoising
diffusion models applied in vision, comprising both theoretical and practical
contributions in the field. First, we identify and present three generic
diffusion modeling frameworks, which are based on denoising diffusion
probabilistic models, noise conditioned score networks, and stochastic
differential equations. We further discuss the relations between diffusion
models and other deep generative models, including variational auto-encoders,
generative adversarial networks, energy-based models, autoregressive models and
normalizing flows. Then, we introduce a multi-perspective categorization of
diffusion models applied in computer vision. Finally, we illustrate the current
limitations of diffusion models and envision some interesting directions for
future research.
[COMMENTS]
Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligence. 25 pages, 3 figures
[LINK]
http://arxiv.org/abs/2209.04747v6
[DATE]
2025-01-16 19:17:04+08:00
[CATEGORIES]
cs.LG
On the uncertainty principle of neural networks
[AUTHORS]
Jun-Jie Zhang, Dong-Xiao Zhang, Jian-Nan Chen, Long-Gang Pang, Deyu Meng
[ABSTRACT]
In this study, we explore the inherent trade-off between accuracy and
robustness in neural networks, drawing an analogy to the uncertainty principle
in quantum mechanics. We propose that neural networks are subject to an
uncertainty relation, which manifests as a fundamental limitation in their
ability to simultaneously achieve high accuracy and robustness against
adversarial attacks. Through mathematical proofs and empirical evidence, we
demonstrate that this trade-off is a natural consequence of the sharp
boundaries formed between different class concepts during training. Our
findings reveal that the complementarity principle, a cornerstone of quantum
physics, applies to neural networks, imposing fundamental limits on their
capabilities in simultaneous learning of conjugate features. Meanwhile, our
work suggests that achieving human-level intelligence through a single network
architecture or massive datasets alone may be inherently limited. Our work
provides new insights into the theoretical foundations of neural network
vulnerability and opens up avenues for designing more robust neural network
architectures.
[COMMENTS]
8 pages, 5 figures
[LINK]
http://arxiv.org/abs/2205.01493v4
[DATE]
2025-01-16 19:16:40+08:00
[CATEGORIES]
cs.LG
WindsorML: High-Fidelity Computational Fluid Dynamics Dataset For Automotive Aerodynamics
[AUTHORS]
Neil Ashton, Jordan B. Angel, Aditya S. Ghate, Gaetan K. W. Kenway, Man Long Wong, Cetin Kiris, Astrid Walle, Danielle C. Maddix, Gary Page
[ABSTRACT]
This paper presents a new open-source high-fidelity dataset for Machine
Learning (ML) containing 355 geometric variants of the Windsor body, to help
the development and testing of ML surrogate models for external automotive
aerodynamics. Each Computational Fluid Dynamics (CFD) simulation was run with a
GPU-native high-fidelity Wall-Modeled Large-Eddy Simulations (WMLES) using a
Cartesian immersed-boundary method using more than 280M cells to ensure the
greatest possible accuracy. The dataset contains geometry variants that
exhibits a wide range of flow characteristics that are representative of those
observed on road-cars. The dataset itself contains the 3D time-averaged volume
& boundary data as well as the geometry and force & moment coefficients. This
paper discusses the validation of the underlying CFD methods as well as
contents and structure of the dataset. To the authors knowledge, this
represents the first, large-scale high-fidelity CFD dataset for the Windsor
body with a permissive open-source license (CC-BY-SA).
[LINK]
http://arxiv.org/abs/2407.19320v4
[DATE]
2025-01-16 19:11:30+08:00
[CATEGORIES]
cs.LG
Predicting Air Temperature from Volumetric Urban Morphology with Machine Learning
[AUTHORS]
Berk Kıvılcım, Patrick Erik Bradley
[ABSTRACT]
In this study, we firstly introduce a method that converts CityGML data into
voxels which works efficiently and fast in high resolution for large scale
datasets such as cities but by sacrificing some building details to overcome
the limitations of previous voxelization methodologies that have been
computationally intensive and inefficient at transforming large-scale urban
areas into voxel representations for high resolution. Those voxelized 3D city
data from multiple cities and corresponding air temperature data are used to
develop a machine learning model. Before the model training, Gaussian blurring
is implemented on input data to consider spatial relationships, as a result the
correlation rate between air temperature and volumetric building morphology is
also increased after the Gaussian blurring. After the model training, the
prediction results are not just evaluated with Mean Square Error (MSE) but some
image similarity metrics such as Structural Similarity Index Measure (SSIM) and
Learned Perceptual Image Patch Similarity (LPIPS) that are able to detect and
consider spatial relations during the evaluation process. This trained model is
capable of predicting the spatial distribution of air temperature by using
building volume information of corresponding pixel as input. By doing so, this
research aims to assist urban planners in incorporating environmental
parameters into their planning strategies, thereby facilitating more
sustainable and inhabitable urban environments.
[COMMENTS]
30 pages, 8 figures, 2 tables
[LINK]
http://arxiv.org/abs/2501.09469v1
[DATE]
2025-01-16 19:10:38+08:00
[CATEGORIES]
cs.LG
Pruning for Sparse Diffusion Models based on Gradient Flow
[AUTHORS]
Ben Wan, Tianyi Zheng, Zhaoyu Chen, Yuxiao Wang, Jia Wang
[ABSTRACT]
Diffusion Models (DMs) have impressive capabilities among generation models,
but are limited to slower inference speeds and higher computational costs.
Previous works utilize one-shot structure pruning to derive lightweight DMs
from pre-trained ones, but this approach often leads to a significant drop in
generation quality and may result in the removal of crucial weights. Thus we
propose a iterative pruning method based on gradient flow, including the
gradient flow pruning process and the gradient flow pruning criterion. We
employ a progressive soft pruning strategy to maintain the continuity of the
mask matrix and guide it along the gradient flow of the energy function based
on the pruning criterion in sparse space, thereby avoiding the sudden
information loss typically caused by one-shot pruning. Gradient-flow based
criterion prune parameters whose removal increases the gradient norm of loss
function and can enable fast convergence for a pruned model in iterative
pruning stage. Our extensive experiments on widely used datasets demonstrate
that our method achieves superior performance in efficiency and consistency
with pre-trained models.
[COMMENTS]
5 pages, 1 figure, accepted by ICASSP2025
[LINK]
http://arxiv.org/abs/2501.09464v1
[DATE]
2025-01-16 18:55:05+08:00
[CATEGORIES]
cs.LG
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
[AUTHORS]
Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist
[ABSTRACT]
Reinforcement Learning (RL) has been used to finetune Large Language Models
(LLMs) using a reward model trained from preference data, to better align with
human judgment. The recently introduced direct alignment methods, which are
often simpler, more stable, and computationally lighter, can more directly
achieve this. However, these approaches cannot optimize arbitrary rewards, and
the preference-based ones are not the only rewards of interest for LLMs (eg.,
unit tests for code generation or textual entailment for summarization, among
others). RL-finetuning is usually done with a variation of policy gradient,
which calls for on-policy or near-on-policy samples, requiring costly
generations. We introduce Contrastive Policy Gradient, or CoPG, a simple and
mathematically principled new RL algorithm that can estimate the optimal policy
even from off-policy data. It can be seen as an off-policy policy gradient
approach that does not rely on important sampling techniques and highlights the
importance of using (the right) state baseline. We show this approach to
generalize the direct alignment method IPO (identity preference optimization)
and classic policy gradient. We experiment with the proposed CoPG on a toy
bandit problem to illustrate its properties, as well as for finetuning LLMs on
a summarization task, using a learned reward function considered as ground
truth for the purpose of the experiments.
[COMMENTS]
EMNLP 2024
[LINK]
http://arxiv.org/abs/2406.19185v2
[DATE]
2025-01-16 18:54:59+08:00
[CATEGORIES]
cs.LG
Dataset-Free Weight-Initialization on Restricted Boltzmann Machine
[AUTHORS]
Muneki Yasuda, Ryosuke Maeno, Chako Takahashi
[ABSTRACT]
In feed-forward neural networks, dataset-free weight-initialization methods
such as LeCun, Xavier (or Glorot), and He initializations have been developed.
These methods randomly determine the initial values of weight parameters based
on specific distributions (e.g., Gaussian or uniform distributions) without
using training datasets. To the best of the authors’ knowledge, such a
dataset-free weight-initialization method is yet to be developed for restricted
Boltzmann machines (RBMs), which are probabilistic neural networks consisting
of two layers. In this study, we derive a dataset-free weight-initialization
method for Bernoulli–Bernoulli RBMs based on statistical mechanical analysis.
In the proposed weight-initialization method, the weight parameters are drawn
from a Gaussian distribution with zero mean. The standard deviation of the
Gaussian distribution is optimized based on our hypothesis that a standard
deviation providing a larger layer correlation (LC) between the two layers
improves the learning efficiency. The expression of the LC is derived based on
a statistical mechanical analysis. The optimal value of the standard deviation
corresponds to the maximum point of the LC. The proposed weight-initialization
method is identical to Xavier initialization in a specific case (i.e., when the
sizes of the two layers are the same, the random variables of the layers are
$\{-1,1\}$-binary, and all bias parameters are zero). The validity of the
proposed weight-initialization method is demonstrated in numerical experiments
using a toy and real-world datasets.
[LINK]
http://arxiv.org/abs/2409.07708v3
[DATE]
2025-01-16 18:46:57+08:00
[CATEGORIES]
cs.LG
ERGNN: Spectral Graph Neural Network With Explicitly-Optimized Rational Graph Filters
[AUTHORS]
Guoming Li, Jian Yang, Shangsong Liang
[ABSTRACT]
Approximation-based spectral graph neural networks, which construct graph
filters with function approximation, have shown substantial performance in
graph learning tasks. Despite their great success, existing works primarily
employ polynomial approximation to construct the filters, whereas another
superior option, namely ration approximation, remains underexplored. Although a
handful of prior works have attempted to deploy the rational approximation,
their implementations often involve intensive computational demands or still
resort to polynomial approximations, hindering full potential of the rational
graph filters. To address the issues, this paper introduces ERGNN, a novel
spectral GNN with explicitly-optimized rational filter. ERGNN adopts a unique
two-step framework that sequentially applies the numerator filter and the
denominator filter to the input signals, thus streamlining the model paradigm
while enabling explicit optimization of both numerator and denominator of the
rational filter. Extensive experiments validate the superiority of ERGNN over
state-of-the-art methods, establishing it as a practical solution for deploying
rational-based GNNs.
[COMMENTS]
Accepted in 2025 IEEE International Conference on Acoustics, Speech,
and Signal Processing, ICASSP 2025
[LINK]
http://arxiv.org/abs/2412.19106v2
[DATE]
2025-01-16 18:29:53+08:00
[CATEGORIES]
cs.LG
An Adaptive Collocation Point Strategy For Physics Informed Neural Networks via the QR Discrete Empirical Interpolation Method
[AUTHORS]
Adrian Celaya, David Fuentes, Beatrice Riviere
[ABSTRACT]
Physics-informed neural networks (PINNs) have gained significant attention
for solving forward and inverse problems related to partial differential
equations (PDEs). While advancements in loss functions and network
architectures have improved PINN accuracy, the impact of collocation point
sampling on their performance remains underexplored. Fixed sampling methods,
such as uniform random sampling and equispaced grids, can fail to capture
critical regions with high solution gradients, limiting their effectiveness for
complex PDEs. Adaptive methods, inspired by adaptive mesh refinement from
traditional numerical methods, address this by dynamically updating collocation
points during training but may overlook residual dynamics between updates,
potentially losing valuable information. To overcome this limitation, we
propose an adaptive collocation point selection strategy utilizing the QR
Discrete Empirical Interpolation Method (QR-DEIM), a reduced-order modeling
technique for efficiently approximating nonlinear functions. Our results on
benchmark PDEs, including the wave, Allen-Cahn, and Burgers’ equations,
demonstrate that our QR-DEIM-based approach improves PINN accuracy compared to
existing methods, offering a promising direction for adaptive collocation point
strategies.
[COMMENTS]
Submitted to ICML 2025. Under review
[LINK]
http://arxiv.org/abs/2501.07700v2
[DATE]
2025-01-16 18:02:59+08:00
[CATEGORIES]
cs.LG
ADAGE: A generic two-layer framework for adaptive agent based modelling
[AUTHORS]
Benjamin Patrick Evans, Sihan Zeng, Sumitra Ganesh, Leo Ardon
[ABSTRACT]
Agent-based models (ABMs) are valuable for modelling complex, potentially
out-of-equilibria scenarios. However, ABMs have long suffered from the Lucas
critique, stating that agent behaviour should adapt to environmental changes.
Furthermore, the environment itself often adapts to these behavioural changes,
creating a complex bi-level adaptation problem. Recent progress integrating
multi-agent reinforcement learning into ABMs introduces adaptive agent
behaviour, beginning to address the first part of this critique, however, the
approaches are still relatively ad hoc, lacking a general formulation, and
furthermore, do not tackle the second aspect of simultaneously adapting
environmental level characteristics in addition to the agent behaviours. In
this work, we develop a generic two-layer framework for ADaptive AGEnt based
modelling (ADAGE) for addressing these problems. This framework formalises the
bi-level problem as a Stackelberg game with conditional behavioural policies,
providing a consolidated framework for adaptive agent-based modelling based on
solving a coupled set of non-linear equations. We demonstrate how this generic
approach encapsulates several common (previously viewed as distinct) ABM tasks,
such as policy design, calibration, scenario generation, and robust behavioural
learning under one unified framework. We provide example simulations on
multiple complex economic and financial environments, showing the strength of
the novel framework under these canonical settings, addressing long-standing
critiques of traditional ABMs.
[COMMENTS]
Accepted at the 2025 International Conference on Autonomous Agents
and Multiagent Systems (AAMAS)
[LINK]
http://arxiv.org/abs/2501.09429v1
[DATE]
2025-01-16 17:58:24+08:00
[CATEGORIES]
cs.LG
Dynamic Neural Style Transfer for Artistic Image Generation using VGG19
[AUTHORS]
Kapil Kashyap, Mehak Garg, Sean Fargose, Sindhu Nair
[LINK]
http://arxiv.org/abs/2501.09420v1
[DATE]
2025-01-16 17:47:18+08:00
[CATEGORIES]
cs.LG
FASP: Fast and Accurate Structured Pruning of Large Language Models
[AUTHORS]
Hanyu Hu, Pengxiang Zhao, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming Yuan
[ABSTRACT]
The rapid increase in the size of large language models (LLMs) has
significantly escalated their computational and memory demands, posing
challenges for efficient deployment, especially on resource-constrained
devices. Structured pruning has emerged as an effective model compression
method that can reduce these demands while preserving performance. In this
paper, we introduce FASP (Fast and Accurate Structured Pruning), a novel
structured pruning framework for LLMs that emphasizes both speed and accuracy.
FASP employs a distinctive pruning structure that interlinks sequential layers,
allowing for the removal of columns in one layer while simultaneously
eliminating corresponding rows in the preceding layer without incurring
additional performance loss. The pruning metric, inspired by Wanda, is
computationally efficient and effectively selects components to prune.
Additionally, we propose a restoration mechanism that enhances model fidelity
by adjusting the remaining weights post-pruning. We evaluate FASP on the OPT
and LLaMA model families, demonstrating superior performance in terms of
perplexity and accuracy on downstream tasks compared to state-of-the-art
methods. Our approach achieves significant speed-ups, pruning models such as
OPT-125M in 17 seconds and LLaMA-30B in 15 minutes on a single NVIDIA RTX 4090
GPU, making it a highly practical solution for optimizing LLMs.
[LINK]
http://arxiv.org/abs/2501.09412v1
[DATE]
2025-01-16 17:38:39+08:00
[CATEGORIES]
cs.LG
MoE$^2$: Optimizing Collaborative Inference for Edge Large Language Models
[AUTHORS]
Lyudong Jin, Yanning Zhang, Yanhan Li, Shurong Wang, Howard H. Yang, Jian Wu, Meng Zhang
[ABSTRACT]
Large language models (LLMs) have demonstrated remarkable capabilities across
a wide range of natural language processing tasks. Exploiting the heterogeneous
capabilities of edge LLMs is crucial for diverse emerging applications, as it
enables greater cost-effectiveness and reduced latency. In this work, we
introduce \textit{Mixture-of-Edge-Experts (MoE$^2$)}, a novel collaborative
inference framework for edge LLMs. We formulate the joint gating and expert
selection problem to optimize inference performance under energy and latency
constraints. Unlike conventional MoE problems, LLM expert selection is
significantly more challenging due to the combinatorial nature and the
heterogeneity of edge LLMs across various attributes. To this end, we propose a
two-level expert selection mechanism through which we uncover an
optimality-preserving property of gating parameters across expert selections.
This property enables the decomposition of the training and selection
processes, significantly reducing complexity. Furthermore, we leverage the
objective’s monotonicity and design a discrete monotonic optimization algorithm
for optimal expert selection. We implement edge servers with NVIDIA Jetson AGX
Orins and NVIDIA RTX 4090 GPUs, and perform extensive experiments. Our results
validate that performance improvements of various LLM models and show that our
MoE$^2$ method can achieve optimal trade-offs among different delay and energy
budgets, and outperforms baselines under various system resource constraints.
[COMMENTS]
Submitted to IEEE/ACM Transactions on Networking
[LINK]
http://arxiv.org/abs/2501.09410v1
[DATE]
2025-01-16 17:36:32+08:00
[CATEGORIES]
cs.LG
PISCO: Self-Supervised k-Space Regularization for Improved Neural Implicit k-Space Representations of Dynamic MRI
[AUTHORS]
Veronika Spieker, Hannah Eichhorn, Wenqi Huang, Jonathan K. Stelter, Tabita Catalan, Rickmer F. Braren, Daniel Rueckert, Francisco Sahli Costabal, Kerstin Hammernik, Dimitrios C. Karampinos, Claudia Prieto, Julia A. Schnabel
[ABSTRACT]
Neural implicit k-space representations (NIK) have shown promising results
for dynamic magnetic resonance imaging (MRI) at high temporal resolutions. Yet,
reducing acquisition time, and thereby available training data, results in
severe performance drops due to overfitting. To address this, we introduce a
novel self-supervised k-space loss function $\mathcal{L}_\mathrm{PISCO}$,
applicable for regularization of NIK-based reconstructions. The proposed loss
function is based on the concept of parallel imaging-inspired self-consistency
(PISCO), enforcing a consistent global k-space neighborhood relationship
without requiring additional data. Quantitative and qualitative evaluations on
static and dynamic MR reconstructions show that integrating PISCO significantly
improves NIK representations. Particularly for high acceleration factors
(R$\geq$54), NIK with PISCO achieves superior spatio-temporal reconstruction
quality compared to state-of-the-art methods. Furthermore, an extensive
analysis of the loss assumptions and stability shows PISCO’s potential as
versatile self-supervised k-space loss function for further applications and
architectures. Code is available at:
https://github.com/compai-lab/2025-pisco-spieker
[LINK]
http://arxiv.org/abs/2501.09403v1
[DATE]
2025-01-16 17:18:59+08:00
[CATEGORIES]
cs.LG
Fast Searching of Extreme Operating Conditions for Relay Protection Setting Calculation Based on Graph Neural Network and Reinforcement Learning
[AUTHORS]
Yan Li, Jingyu Wang, Jiankang Zhang, Huaiqiang Li, Longfei Ren, Yinhong Li, Dongyuan Shi, Xianzhong Duan
[ABSTRACT]
Searching for the Extreme Operating Conditions (EOCs) is one of the core
problems of power system relay protection setting calculation. The current
methods based on brute-force search, heuristic algorithms, and mathematical
programming can hardly meet the requirements of today’s power systems in terms
of computation speed due to the drastic changes in operating conditions induced
by renewables and power electronics. This paper proposes an EOC fast search
method, named Graph Dueling Double Deep Q Network (Graph D3QN), which combines
graph neural network and deep reinforcement learning to address this challenge.
First, the EOC search problem is modeled as a Markov decision process, where
the information of the underlying power system is extracted using graph neural
networks, so that the EOC of the system can be found via deep reinforcement
learning. Then, a two-stage Guided Learning and Free Exploration (GLFE)
training framework is constructed to accelerate the convergence speed of
reinforcement learning. Finally, the proposed Graph D3QN method is validated
through case studies of searching maximum fault current for relay protection
setting calculation on the IEEE 39-bus and 118-bus systems. The experimental
results demonstrate that Graph D3QN can reduce the computation time by 10 to
1000 times while guaranteeing the accuracy of the selected EOCs.
[COMMENTS]
10 pages, 9 figures
[LINK]
http://arxiv.org/abs/2501.09399v1
[DATE]
2025-01-16 17:11:48+08:00
[CATEGORIES]
cs.LG
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
[AUTHORS]
Abdullah Akgül, Manuel Haußmann, Melih Kandemir
[ABSTRACT]
Current approaches to model-based offline reinforcement learning often
incorporate uncertainty-based reward penalization to address the distributional
shift problem. These approaches, commonly known as pessimistic value iteration,
use Monte Carlo sampling to estimate the Bellman target to perform temporal
difference-based policy evaluation. We find out that the randomness caused by
this sampling step significantly delays convergence. We present a theoretical
result demonstrating the strong dependency of suboptimality on the number of
Monte Carlo samples taken per Bellman target calculation. Our main contribution
is a deterministic approximation to the Bellman target that uses progressive
moment matching, a method developed originally for deterministic variational
inference. The resulting algorithm, which we call Moment Matching Offline
Model-Based Policy Optimization (MOMBO), propagates the uncertainty of the next
state through a nonlinear Q-network in a deterministic fashion by approximating
the distributions of hidden layer activations by a normal distribution. We show
that it is possible to provide tighter guarantees for the suboptimality of
MOMBO than the existing Monte Carlo sampling approaches. We also observe MOMBO
to converge faster than these approaches in a large set of benchmark tasks.
[LINK]
http://arxiv.org/abs/2406.04088v3
[DATE]
2025-01-16 17:07:51+08:00
[CATEGORIES]
cs.LG
Disentangled Interleaving Variational Encoding
[AUTHORS]
Noelle Y. L. Wong, Eng Yeow Cheu, Zhonglin Chiam, Dipti Srinivasan
[ABSTRACT]
Conflicting objectives present a considerable challenge in interleaving
multi-task learning, necessitating the need for meticulous design and balance
to ensure effective learning of a representative latent data space across all
tasks without mutual negative impact. Drawing inspiration from the concept of
marginal and conditional probability distributions in probability theory, we
design a principled and well-founded approach to disentangle the original input
into marginal and conditional probability distributions in the latent space of
a variational autoencoder. Our proposed model, Deep Disentangled Interleaving
Variational Encoding (DeepDIVE) learns disentangled features from the original
input to form clusters in the embedding space and unifies these features via
the cross-attention mechanism in the fusion stage. We theoretically prove that
combining the objectives for reconstruction and forecasting fully captures the
lower bound and mathematically derive a loss function for disentanglement using
Na"ive Bayes. Under the assumption that the prior is a mixture of log-concave
distributions, we also establish that the Kullback-Leibler divergence between
the prior and the posterior is upper bounded by a function minimized by the
minimizer of the cross entropy loss, informing our adoption of radial basis
functions (RBF) and cross entropy with interleaving training for DeepDIVE to
provide a justified basis for convergence. Experiments on two public datasets
show that DeepDIVE disentangles the original input and yields forecast
accuracies better than the original VAE and comparable to existing
state-of-the-art baselines.
[LINK]
http://arxiv.org/abs/2501.08710v2
[DATE]
2025-01-16 17:07:00+08:00
[CATEGORIES]
cs.LG
Quantum-Enhanced Transformers for Robust Acoustic Scene Classification in IoT Environments
[AUTHORS]
Minh K. Quan, Mayuri Wijayasundara, Sujeeva Setunge, Pubudu N. Pathirana
[ABSTRACT]
The proliferation of Internet of Things (IoT) devices equipped with acoustic
sensors necessitates robust acoustic scene classification (ASC) capabilities,
even in noisy and data-limited environments. Traditional machine learning
methods often struggle to generalize effectively under such conditions. To
address this, we introduce Q-ASC, a novel Quantum-Inspired Acoustic Scene
Classifier that leverages the power of quantum-inspired transformers. By
integrating quantum concepts like superposition and entanglement, Q-ASC
achieves superior feature learning and enhanced noise resilience compared to
classical models. Furthermore, we introduce a Quantum Variational Autoencoder
(QVAE) based data augmentation technique to mitigate the challenge of limited
labeled data in IoT deployments. Extensive evaluations on the Tampere
University of Technology (TUT) Acoustic Scenes 2016 benchmark dataset
demonstrate that Q-ASC achieves remarkable accuracy between 68.3% and 88.5%
under challenging conditions, outperforming state-of-the-art methods by over 5%
in the best case. This research paves the way for deploying intelligent
acoustic sensing in IoT networks, with potential applications in smart homes,
industrial monitoring, and environmental surveillance, even in adverse acoustic
environments.
[COMMENTS]
5 pages, 4 figures
[LINK]
http://arxiv.org/abs/2501.09394v1
[DATE]
2025-01-16 17:06:10+08:00
[CATEGORIES]
cs.LG
PeFLL: Personalized Federated Learning by Learning to Learn
[AUTHORS]
Jonathan Scott, Hossein Zakerinia, Christoph H. Lampert
[ABSTRACT]
We present PeFLL, a new personalized federated learning algorithm that
improves over the state-of-the-art in three aspects: 1) it produces more
accurate models, especially in the low-data regime, and not only for clients
present during its training phase, but also for any that may emerge in the
future; 2) it reduces the amount of on-client computation and client-server
communication by providing future clients with ready-to-use personalized models
that require no additional finetuning or optimization; 3) it comes with
theoretical guarantees that establish generalization from the observed clients
to future ones. At the core of PeFLL lies a learning-to-learn approach that
jointly trains an embedding network and a hypernetwork. The embedding network
is used to represent clients in a latent descriptor space in a way that
reflects their similarity to each other. The hypernetwork takes as input such
descriptors and outputs the parameters of fully personalized client models. In
combination, both networks constitute a learning algorithm that achieves
state-of-the-art performance in several personalized federated learning
benchmarks.
[LINK]
http://arxiv.org/abs/2306.05515v4
[DATE]
2025-01-16 16:53:23+08:00
[CATEGORIES]
cs.LG
Simplified and Generalized Masked Diffusion for Discrete Data
[AUTHORS]
Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias
[ABSTRACT]
Masked (or absorbing) diffusion is actively explored as an alternative to
autoregressive models for generative modeling of discrete data. However,
existing work in this area has been hindered by unnecessarily complex model
formulations and unclear relationships between different perspectives, leading
to suboptimal parameterization, training objectives, and ad hoc adjustments to
counteract these issues. In this work, we aim to provide a simple and general
framework that unlocks the full potential of masked diffusion models. We show
that the continuous-time variational objective of masked diffusion models is a
simple weighted integral of cross-entropy losses. Our framework also enables
training generalized masked diffusion models with state-dependent masking
schedules. When evaluated by perplexity, our models trained on OpenWebText
surpass prior diffusion language models at GPT-2 scale and demonstrate superior
performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our
models vastly outperform previous discrete diffusion models on pixel-level
image modeling, achieving 2.75 (CIFAR-10) and 3.40 (ImageNet 64x64) bits per
dimension that are better than autoregressive models of similar sizes. Our code
is available at https://github.com/google-deepmind/md4.
[COMMENTS]
NeurIPS 2024. Code is available at:
https://github.com/google-deepmind/md4
[LINK]
http://arxiv.org/abs/2406.04329v4
[DATE]
2025-01-16 16:46:16+08:00
[CATEGORIES]
cs.LG
Hidden Markov Neural Networks
[AUTHORS]
Lorenzo Rimella, Nick Whiteley
[ABSTRACT]
We define an evolving in-time Bayesian neural network called a Hidden Markov
Neural Network, which addresses the crucial challenge in time-series
forecasting and continual learning: striking a balance between adapting to new
data and appropriately forgetting outdated information. This is achieved by
modelling the weights of a neural network as the hidden states of a Hidden
Markov model, with the observed process defined by the available data. A
filtering algorithm is employed to learn a variational approximation of the
evolving-in-time posterior distribution over the weights. By leveraging a
sequential variant of Bayes by Backprop, enriched with a stronger
regularization technique called variational DropConnect, Hidden Markov Neural
Networks achieve robust regularization and scalable inference. Experiments on
MNIST, dynamic classification tasks, and next-frame forecasting in videos
demonstrate that Hidden Markov Neural Networks provide strong predictive
performance while enabling effective uncertainty quantification.
[LINK]
http://arxiv.org/abs/2004.06963v3
[DATE]
2025-01-16 16:32:50+08:00
[CATEGORIES]
cs.LG
Learning to Assist Humans without Inferring Rewards
[AUTHORS]
Vivek Myers, Evan Ellis, Sergey Levine, Benjamin Eysenbach, Anca Dragan
[COMMENTS]
Conference on Neural Information Processing Systems (NeurIPS), 2024
[LINK]
http://arxiv.org/abs/2411.02623v3
[DATE]
2025-01-16 16:18:01+08:00
[CATEGORIES]
cs.LG
PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning
[AUTHORS]
Xianghu Yue, Yiming Chen, Xueyi Zhang, Xiaoxue Gao, Mengling Feng, Mingrui Lao, Huiping Zhuang, Haizhou Li
[ABSTRACT]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal
data, such as audio-visual and image-text pairs, thereby enabling models to
learn continuously across a sequence of tasks while mitigating forgetting.
While existing studies primarily focus on the integration and utilization of
multi-modal information for MMCIL, a critical challenge remains: the issue of
missing modalities during incremental learning phases. This oversight can
exacerbate severe forgetting and significantly impair model performance. To
bridge this gap, we propose PAL, a novel exemplar-free framework tailored to
MMCIL under missing-modality scenarios. Concretely, we devise modality-specific
prompts to compensate for missing information, facilitating the model to
maintain a holistic representation of the data. On this foundation, we
reformulate the MMCIL problem into a Recursive Least-Squares task, delivering
an analytical linear solution. Building upon these, PAL not only alleviates the
inherent under-fitting limitation in analytic learning but also preserves the
holistic representation of missing-modality data, achieving superior
performance with less forgetting across various multi-modal incremental
scenarios. Extensive experiments demonstrate that PAL significantly outperforms
competitive methods across various datasets, including UPMC-Food101 and
N24News, showcasing its robustness towards modality absence and its
anti-forgetting ability to maintain high incremental accuracy.
[LINK]
http://arxiv.org/abs/2501.09352v1
[DATE]
2025-01-16 16:04:04+08:00
[CATEGORIES]
cs.LG
Rational Tuning of LLM Cascades via Probabilistic Modeling
[AUTHORS]
Michael J. Zellinger, Matt Thomson
[ABSTRACT]
Understanding the reliability of large language models (LLMs) has recently
garnered significant attention. Given LLMs’ propensity to hallucinate, as well
as their high sensitivity to prompt design, it is already challenging to
predict the performance of an individual LLM. However, the problem becomes more
complex for compound LLM systems such as cascades, where in addition to each
model’s standalone performance, we must understand how the error rates of
different models interact. In this paper, we present a probabilistic model for
the joint performance distribution of a sequence of LLMs, which enables a
framework for rationally tuning the confidence thresholds of a LLM cascade
using continuous optimization. Compared to selecting confidence thresholds
using grid search, our parametric Markov-copula model significantly improves
runtime scaling with respect to the length of the cascade and the desired
resolution of the cost-error curve, turning them from intractable into
low-order polynomial. In addition, the optimal thresholds computed using our
continuous optimization-based algorithm increasingly outperform those found via
grid search as cascade length grows, improving the area under the cost-error
curve by 1.9% on average for cascades consisting of at least three models.
Overall, our Markov-copula model provides a rational basis for tuning LLM
cascade performance and points to the potential of probabilistic methods in
analyzing LLM systems.
[LINK]
http://arxiv.org/abs/2501.09345v1
[DATE]
2025-01-16 15:58:33+08:00
[CATEGORIES]
cs.LG
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements
[AUTHORS]
Xueyan Li, Xinyan Chen, Yazhe Niu, Shuai Hu, Yu Liu
[ABSTRACT]
In the field of psychology, traditional assessment methods, such as
standardized scales, are frequently critiqued for their static nature, lack of
personalization, and reduced participant engagement, while comprehensive
counseling evaluations are often inaccessible. The complexity of quantifying
psychological traits further limits these methods. Despite advances with large
language models (LLMs), many still depend on single-round Question-and-Answer
interactions. To bridge this gap, we introduce PsyDI, a personalized and
progressively in-depth chatbot designed for psychological measurements,
exemplified by its application in the Myers-Briggs Type Indicator (MBTI)
framework. PsyDI leverages user-related multi-modal information and engages in
customized, multi-turn interactions to provide personalized, easily accessible
measurements, while ensuring precise MBTI type determination. To address the
challenge of unquantifiable psychological traits, we introduce a novel training
paradigm that involves learning the ranking of proxy variables associated with
these traits, culminating in a robust score model for MBTI measurements. The
score model enables PsyDI to conduct comprehensive and precise measurements
through multi-turn interactions within a unified estimation context. Through
various experiments, we validate the efficacy of both the score model and the
PsyDI pipeline, demonstrating its potential to serve as a general framework for
psychological measurements. Furthermore, the online deployment of PsyDI has
garnered substantial user engagement, with over 3,000 visits, resulting in the
collection of numerous multi-turn dialogues annotated with MBTI types, which
facilitates further research. The source code for the training and web service
components is publicly available as a part of OpenDILab at:
https://github.com/opendilab/PsyDI
[COMMENTS]
29 pages, 15 figures
[LINK]
http://arxiv.org/abs/2408.03337v4
[DATE]
2025-01-16 15:40:27+08:00
[CATEGORIES]
cs.LG
Mitigating Overfitting in Graph Neural Networks via Feature and Hyperplane Perturbation
[AUTHORS]
Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim
[ABSTRACT]
Graph neural networks (GNNs) are commonly used in semi-supervised settings.
Previous research has primarily focused on finding appropriate graph filters
(e.g. aggregation methods) to perform well on both homophilic and heterophilic
graphs. While these methods are effective, they can still suffer from the
sparsity of node features, where the initial data contain few non-zero
elements. This can lead to overfitting in certain dimensions in the first
projection matrix, as training samples may not cover the entire range of graph
filters (hyperplanes). To address this, we propose a novel data augmentation
strategy. Specifically, by flipping both the initial features and hyperplane,
we create additional space for training, which leads to more precise updates of
the learnable parameters and improved robustness for unseen features during
inference. To the best of our knowledge, this is the first attempt to mitigate
the overfitting caused by the initial features. Extensive experiments on
real-world datasets show that our proposed technique increases node
classification accuracy by up to 46.5% relatively.
[LINK]
http://arxiv.org/abs/2211.15081v8
[DATE]
2025-01-16 15:34:31+08:00
[CATEGORIES]
cs.LG
Estimating shared subspace with AJIVE: the power and limitation of multiple data matrices
[AUTHORS]
Yuepeng Yang, Cong Ma
[ABSTRACT]
Integrative data analysis often requires disentangling joint and individual
variations across multiple datasets, a challenge commonly addressed by the
Joint and Individual Variation Explained (JIVE) model. While numerous methods
have been developed to estimate the shared subspace under JIVE, the theoretical
understanding of their performance remains limited, particularly in the context
of multiple matrices and varying levels of subspace misalignment. This paper
bridges this gap by providing a systematic analysis of shared subspace
estimation in multi-matrix settings.
We focus on the Angle-based Joint and Individual Variation Explained (AJIVE)
method, a two-stage spectral approach, and establish new performance guarantees
that uncover its strengths and limitations. Specifically, we show that in high
signal-to-noise ratio (SNR) regimes, AJIVE’s estimation error decreases with
the number of matrices, demonstrating the power of multi-matrix integration.
Conversely, in low-SNR settings, AJIVE exhibits a non-diminishing error,
highlighting fundamental limitations. To complement these results, we derive
minimax lower bounds, showing that AJIVE achieves optimal rates in high-SNR
regimes. Furthermore, we analyze an oracle-aided spectral estimator to
demonstrate that the non-diminishing error in low-SNR scenarios is a
fundamental barrier. Extensive numerical experiments corroborate our
theoretical findings, providing insights into the interplay between SNR, matrix
count, and subspace misalignment.
[LINK]
http://arxiv.org/abs/2501.09336v1
[DATE]
2025-01-16 15:23:26+08:00
[CATEGORIES]
cs.LG
Identifying Information from Observations with Uncertainty and Novelty
[AUTHORS]
Derek S. Prijatelj, Timothy J. Ireland, Walter J. Scheirer
[ABSTRACT]
A machine learning tasks from observations must encounter and process
uncertainty and novelty, especially when it is expected to maintain performance
when observing new information and to choose the best fitting hypothesis to the
currently observed information. In this context, some key questions arise: what
is information, how much information did the observations provide, how much
information is required to identify the data-generating process, how many
observations remain to get that information, and how does a predictor determine
that it has observed novel information? This paper strengthens existing answers
to these questions by formalizing the notion of “identifiable information” that
arises from the language used to express the relationship between distinct
states. Model identifiability and sample complexity are defined via computation
of an indicator function over a set of hypotheses. Their properties and
asymptotic statistics are described for data-generating processes ranging from
deterministic processes to ergodic stationary stochastic processes. This
connects the notion of identifying information in finite steps with asymptotic
statistics and PAC-learning. The indicator function’s computation naturally
formalizes novel information and its identification from observations with
respect to a hypothesis set. We also proved that computable PAC-Bayes learners’
sample complexity distribution is determined by its moments in terms of the the
prior probability distribution over a fixed finite hypothesis set.
[COMMENTS]
43 pages, 1 figure, 1 table, and 2 inline algorithms. Submitted to
JMLR Jan. 6, 2025
[LINK]
http://arxiv.org/abs/2501.09331v1
[DATE]
2025-01-16 15:02:05+08:00
[CATEGORIES]
cs.LG
Enhanced SPS Velocity-adaptive Scheme: Access Fairness in 5G NR V2I Networks
[AUTHORS]
Xiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang
[ABSTRACT]
Vehicle-to-Infrastructure (V2I) technology enables information exchange
between vehicles and road infrastructure. Specifically, when a vehicle
approaches a roadside unit (RSU), it can exchange information with the RSU to
obtain accurate data that assists in driving. With the release of the 3rd
Generation Partnership Project (3GPP) Release 16, which includes the 5G New
Radio (NR) Vehicle-to-Everything (V2X) standards, vehicles typically adopt
mode-2 communication using sensing-based semi-persistent scheduling (SPS) for
resource allocation. In this approach, vehicles identify candidate resources
within a selection window and exclude ineligible resources based on information
from a sensing window. However, vehicles often drive at different speeds,
resulting in varying amounts of data transmission with RSUs as they pass by,
which leads to unfair access. Therefore, it is essential to design an access
scheme that accounts for different vehicle speeds to achieve fair access across
the network. This paper formulates an optimization problem for vehicular
networks and proposes a multi-objective optimization scheme to address it by
adjusting the selection window in the SPS mechanism of 5G NR V2I mode-2.
Simulation results demonstrate the effectiveness of the proposed scheme
[COMMENTS]
This paper has been submitted to IEEE Journal. The source code has
been released at:
https://github.com/qiongwu86/Enhanced-SPS-Velocity-adaptiveScheme-Access-Fariness-in-5G-NR-V2I-Networks
[LINK]
http://arxiv.org/abs/2501.08037v2
[DATE]
2025-01-16 14:44:29+08:00
[CATEGORIES]
cs.LG
Cooperative Decentralized Backdoor Attacks on Vertical Federated Learning
[AUTHORS]
Seohyun Lee, Wenzhi Fang, Anindya Bijoy Das, Seyyedali Hosseinalipour, David J. Love, Christopher G. Brinton
[ABSTRACT]
Federated learning (FL) is vulnerable to backdoor attacks, where adversaries
alter model behavior on target classification labels by embedding triggers into
data samples. While these attacks have received considerable attention in
horizontal FL, they are less understood for vertical FL (VFL), where devices
hold different features of the samples, and only the server holds the labels.
In this work, we propose a novel backdoor attack on VFL which (i) does not rely
on gradient information from the server and (ii) considers potential collusion
among multiple adversaries for sample selection and trigger embedding. Our
label inference model augments variational autoencoders with metric learning,
which adversaries can train locally. A consensus process over the adversary
graph topology determines which datapoints to poison. We further propose
methods for trigger splitting across the adversaries, with an intensity-based
implantation scheme skewing the server towards the trigger. Our convergence
analysis reveals the impact of backdoor perturbations on VFL indicated by a
stationarity gap for the trained model, which we verify empirically as well. We
conduct experiments comparing our attack with recent backdoor VFL approaches,
finding that ours obtains significantly higher success rates for the same main
task performance despite not using server information. Additionally, our
results verify the impact of collusion on attack performance.
[COMMENTS]
This paper is currently under review in the IEEE/ACM Transactions on
Networking Special Issue on AI and Networking
[LINK]
http://arxiv.org/abs/2501.09320v1
[DATE]
2025-01-16 14:22:35+08:00
[CATEGORIES]
cs.LG
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
[AUTHORS]
Divyansh Srivastava, Ge Yan, Tsui-Wei Weng
[COMMENTS]
Appeared at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2408.01432v3
[DATE]
2025-01-16 13:42:28+08:00
[CATEGORIES]
cs.LG
Cost-aware Bayesian Optimization via the Pandora’s Box Gittins Index
[AUTHORS]
Qian Xie, Raul Astudillo, Peter I. Frazier, Ziv Scully, Alexander Terenin
[ABSTRACT]
Bayesian optimization is a technique for efficiently optimizing unknown
functions in a black-box manner. To handle practical settings where gathering
data requires use of finite resources, it is desirable to explicitly
incorporate function evaluation costs into Bayesian optimization policies. To
understand how to do so, we develop a previously-unexplored connection between
cost-aware Bayesian optimization and the Pandora’s Box problem, a decision
problem from economics. The Pandora’s Box problem admits a Bayesian-optimal
solution based on an expression called the Gittins index, which can be
reinterpreted as an acquisition function. We study the use of this acquisition
function for cost-aware Bayesian optimization, and demonstrate empirically that
it performs well, particularly in medium-high dimensions. We further show that
this performance carries over to classical Bayesian optimization without
explicit evaluation costs. Our work constitutes a first step towards
integrating techniques from Gittins index theory into Bayesian optimization.
[LINK]
http://arxiv.org/abs/2406.20062v3
[DATE]
2025-01-16 13:21:05+08:00
[CATEGORIES]
cs.LG
Physics-informed deep learning for infectious disease forecasting
[AUTHORS]
Ying Qian, Éric Marty, Avranil Basu, Eamon B. O’Dea, Xianqiao Wang, Spencer Fox, Pejman Rohani, John M. Drake, He Li
[ABSTRACT]
Accurate forecasting of contagious illnesses has become increasingly
important to public health policymaking, and better prediction could prevent
the loss of millions of lives. To better prepare for future pandemics, it is
essential to improve forecasting methods and capabilities. In this work, we
propose a new infectious disease forecasting model based on physics-informed
neural networks (PINNs), an emerging area of scientific machine learning. The
proposed PINN model incorporates dynamical systems representations of disease
transmission into the loss function, thereby assimilating epidemiological
theory and data using neural networks (NNs). Our approach is designed to
prevent model overfitting, which often occurs when training deep learning
models with observation data alone. In addition, we employ an additional
sub-network to account for mobility, vaccination, and other covariates that
influence the transmission rate, a key parameter in the compartment model. To
demonstrate the capability of the proposed model, we examine the performance of
the model using state-level COVID-19 data in California. Our simulation results
show that predictions of PINN model on the number of cases, deaths, and
hospitalizations are consistent with existing benchmarks. In particular, the
PINN model outperforms the basic NN model and naive baseline forecast. We also
show that the performance of the PINN model is comparable to a sophisticated
Gaussian infection state space with time dependence (GISST) forecasting model
that integrates the compartment model with a data observation model and a
regression model for inferring parameters in the compartment model.
Nonetheless, the PINN model offers a simpler structure and is easier to
implement. Our results show that the proposed forecaster could potentially
serve as a new computational tool to enhance the current capacity of infectious
disease forecasting.
[LINK]
http://arxiv.org/abs/2501.09298v1
[DATE]
2025-01-16 13:07:05+08:00
[CATEGORIES]
cs.LG
The surprising efficiency of temporal difference learning for rare event prediction
[AUTHORS]
Xiaoou Cheng, Jonathan Weare
[ABSTRACT]
We quantify the efficiency of temporal difference (TD) learning over the
direct, or Monte Carlo (MC), estimator for policy evaluation in reinforcement
learning, with an emphasis on estimation of quantities related to rare events.
Policy evaluation is complicated in the rare event setting by the long
timescale of the event and by the need for \emph{relative accuracy} in
estimates of very small values. Specifically, we focus on least-squares TD
(LSTD) prediction for finite state Markov chains, and show that LSTD can
achieve relative accuracy far more efficiently than MC. We prove a central
limit theorem for the LSTD estimator and upper bound the \emph{relative
asymptotic variance} by simple quantities characterizing the connectivity of
states relative to the transition probabilities between them. Using this bound,
we show that, even when both the timescale of the rare event and the relative
accuracy of the MC estimator are exponentially large in the number of states,
LSTD maintains a fixed level of relative accuracy with a total number of
observed transitions of the Markov chain that is only \emph{polynomially} large
in the number of states.
[COMMENTS]
Final camera-ready version published at NeurIPS 2024. Correct an
assumption statement and typos, and change/add a few sentences from the last
version
[LINK]
http://arxiv.org/abs/2405.17638v3
[DATE]
2025-01-16 12:11:29+08:00
[CATEGORIES]
cs.LG
Model-Based Transfer Learning for Contextual Reinforcement Learning
[AUTHORS]
Jung-Hoon Cho, Vindula Jayawardana, Sirui Li, Cathy Wu
[ABSTRACT]
Deep reinforcement learning (RL) is a powerful approach to complex decision
making. However, one issue that limits its practical application is its
brittleness, sometimes failing to train in the presence of small changes in the
environment. Motivated by the success of zero-shot transfer-where pre-trained
models perform well on related tasks-we consider the problem of selecting a
good set of training tasks to maximize generalization performance across a
range of tasks. Given the high cost of training, it is critical to select
training tasks strategically, but not well understood how to do so. We hence
introduce Model-Based Transfer Learning (MBTL), which layers on top of existing
RL methods to effectively solve contextual RL problems. MBTL models the
generalization performance in two parts: 1) the performance set point, modeled
using Gaussian processes, and 2) performance loss (generalization gap), modeled
as a linear function of contextual similarity. MBTL combines these two pieces
of information within a Bayesian optimization (BO) framework to strategically
select training tasks. We show theoretically that the method exhibits sublinear
regret in the number of training tasks and discuss conditions to further
tighten regret bounds. We experimentally validate our methods using urban
traffic and standard continuous control benchmarks. The experimental results
suggest that MBTL can achieve up to 43x improved sample efficiency compared
with canonical independent training and multi-task training. Further
experiments demonstrate the efficacy of BO and the insensitivity to the
underlying RL algorithm and hyperparameters. This work lays the foundations for
investigating explicit modeling of generalization, thereby enabling principled
yet effective methods for contextual RL.
[COMMENTS]
38th Conference on Neural Information Processing Systems (NeurIPS
2024)
[LINK]
http://arxiv.org/abs/2408.04498v3
[DATE]
2025-01-16 11:35:07+08:00
[CATEGORIES]
cs.LG
Statistical Efficiency of Distributional Temporal Difference Learning and Freedman’s Inequality in Hilbert Spaces
[AUTHORS]
Yang Peng, Liangyu Zhang, Zhihua Zhang
[ABSTRACT]
Distributional reinforcement learning (DRL) has achieved empirical success in
various domains. One core task in DRL is distributional policy evaluation,
which involves estimating the return distribution $\eta^\pi$ for a given policy
$\pi$. Distributional temporal difference learning has been accordingly
proposed, which extends the classic temporal difference learning (TD) in RL. In
this paper, we focus on the non-asymptotic statistical rates of distributional
TD. To facilitate theoretical analysis, we propose non-parametric
distributional TD (NTD). For a $\gamma$-discounted infinite-horizon tabular
Markov decision process, we show that for NTD with a generative model, we need
$\tilde{O}(\varepsilon^{-2}\mu_{\min}^{-1}(1-\gamma)^{-3})$ interactions with
the environment to achieve an $\varepsilon$-optimal estimator with high
probability, when the estimation error is measured by the $1$-Wasserstein. This
sample complexity bound is minimax optimal up to logarithmic factors. In
addition, we revisit categorical distributional TD (CTD), showing that the same
non-asymptotic convergence bounds hold for CTD in the case of the
$1$-Wasserstein distance. We also extend our analysis to the more general
setting where the data generating process is Markovian. In the Markovian
setting, we propose variance-reduced variants of NTD and CTD, and show that
both can achieve a $\tilde{O}(\varepsilon^{-2}
\mu_{\pi,\min}^{-1}(1-\gamma)^{-3}+t_{mix}\mu_{\pi,\min}^{-1}(1-\gamma)^{-1})$
sample complexity bounds in the case of the $1$-Wasserstein distance, which
matches the state-of-the-art statistical results for classic policy evaluation.
To achieve the sharp statistical rates, we establish a novel Freedman’s
inequality in Hilbert spaces. This new Freedman’s inequality would be of
independent interest for statistical analysis of various infinite-dimensional
online learning problems.
[LINK]
http://arxiv.org/abs/2403.05811v4
[DATE]
2025-01-16 11:31:46+08:00
[CATEGORIES]
cs.LG
On the convergence of noisy Bayesian Optimization with Expected Improvement
[AUTHORS]
Jingyi Wang, Haowei Wang, Cosmin G. Petra, Nai-Yuan Chiang
[ABSTRACT]
Expected improvement (EI) is one of the most widely-used acquisition
functions in Bayesian optimization (BO). Despite its proven success in
applications for decades, important open questions remain on the theoretical
convergence behaviors and rates for EI. In this paper, we contribute to the
convergence theories of EI in three novel and critical area. First, we consider
objective functions that are under the Gaussian process (GP) prior assumption,
whereas existing works mostly focus on functions in the reproducing kernel
Hilbert space (RKHS). Second, we establish the first asymptotic error bound and
its corresponding rate for GP-EI with noisy observations under the GP prior
assumption. Third, by investigating the exploration and exploitation of the
non-convex EI function, we prove improved error bounds for both the noise-free
and noisy cases. The improved noiseless bound is extended to the RKHS
assumption as well.
[LINK]
http://arxiv.org/abs/2501.09262v1
[DATE]
2025-01-16 11:11:50+08:00
[CATEGORIES]
cs.LG
Smoothness Really Matters: A Simple Yet Effective Approach for Unsupervised Graph Domain Adaptation
[AUTHORS]
Wei Chen, Guo Ye, Yakun Wang, Zhao Zhang, Libang Zhang, Daixin Wang, Zhiqiang Zhang, Fuzhen Zhuang
[ABSTRACT]
Unsupervised Graph Domain Adaptation (UGDA) seeks to bridge distribution
shifts between domains by transferring knowledge from labeled source graphs to
given unlabeled target graphs. Existing UGDA methods primarily focus on
aligning features in the latent space learned by graph neural networks (GNNs)
across domains, often overlooking structural shifts, resulting in limited
effectiveness when addressing structurally complex transfer scenarios. Given
the sensitivity of GNNs to local structural features, even slight discrepancies
between source and target graphs could lead to significant shifts in node
embeddings, thereby reducing the effectiveness of knowledge transfer. To
address this issue, we introduce a novel approach for UGDA called Target-Domain
Structural Smoothing (TDSS). TDSS is a simple and effective method designed to
perform structural smoothing directly on the target graph, thereby mitigating
structural distribution shifts and ensuring the consistency of node
representations. Specifically, by integrating smoothing techniques with
neighborhood sampling, TDSS maintains the structural coherence of the target
graph while mitigating the risk of over-smoothing. Our theoretical analysis
shows that TDSS effectively reduces target risk by improving model smoothness.
Empirical results on three real-world datasets demonstrate that TDSS
outperforms recent state-of-the-art baselines, achieving significant
improvements across six transfer scenarios. The code is available in
https://github.com/cwei01/TDSS.
[COMMENTS]
11 pages, Accpected by AAAI2025
[LINK]
http://arxiv.org/abs/2412.11654v3
[DATE]
2025-01-16 11:04:10+08:00
[CATEGORIES]
cs.LG
Clone-Robust AI Alignment
[AUTHORS]
Ariel D. Procaccia, Benjamin Schiffer, Shirley Zhang
[ABSTRACT]
A key challenge in training Large Language Models (LLMs) is properly aligning
them with human preferences. Reinforcement Learning with Human Feedback (RLHF)
uses pairwise comparisons from human annotators to train reward functions and
has emerged as a popular alignment method. However, input datasets in RLHF are
not necessarily balanced in the types of questions and answers that are
included. Therefore, we want RLHF algorithms to perform well even when the set
of alternatives is not uniformly distributed. Drawing on insights from social
choice theory, we introduce robustness to approximate clones, a desirable
property of RLHF algorithms which requires that adding near-duplicate
alternatives does not significantly change the learned reward function. We
first demonstrate that the standard RLHF algorithm based on regularized maximum
likelihood estimation (MLE) fails to satisfy this property. We then propose the
weighted MLE, a new RLHF algorithm that modifies the standard regularized MLE
by weighting alternatives based on their similarity to other alternatives. This
new algorithm guarantees robustness to approximate clones while preserving
desirable theoretical properties.
[LINK]
http://arxiv.org/abs/2501.09254v1
[DATE]
2025-01-16 10:43:44+08:00
[CATEGORIES]
cs.LG
Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors
[AUTHORS]
James Gong, Bruce Li, Waleed Abdulla
[ABSTRACT]
Backpropagation is the standard method for achieving state-of-the-art
accuracy in neural network training, but it often imposes high memory costs and
lacks biological plausibility. In this paper, we introduce the Mono-Forward
algorithm, a purely local layerwise learning method inspired by Hinton’s
Forward-Forward framework. Unlike backpropagation, Mono-Forward optimizes each
layer solely with locally available information, eliminating the reliance on
global error signals. We evaluated Mono-Forward on multi-layer perceptrons and
convolutional neural networks across multiple benchmarks, including MNIST,
Fashion-MNIST, CIFAR-10, and CIFAR-100. The test results show that Mono-Forward
consistently matches or surpasses the accuracy of backpropagation across all
tasks, with significantly reduced and more even memory usage, better
parallelizability, and a comparable convergence rate.
[COMMENTS]
12 pages
[LINK]
http://arxiv.org/abs/2501.09238v1
[DATE]
2025-01-16 09:50:34+08:00
[CATEGORIES]
cs.LG
Gameplay Filters: Robust Zero-Shot Safety through Adversarial Imagination
[AUTHORS]
Duy P. Nguyen, Kai-Chieh Hsu, Wenhao Yu, Jie Tan, Jaime F. Fisac
[ABSTRACT]
Despite the impressive recent advances in learning-based robot control,
ensuring robustness to out-of-distribution conditions remains an open
challenge. Safety filters can, in principle, keep arbitrary control policies
from incurring catastrophic failures by overriding unsafe actions, but existing
solutions for complex (e.g., legged) robot dynamics do not span the full motion
envelope and instead rely on local, reduced-order models. These filters tend to
overly restrict agility and can still fail when perturbed away from nominal
conditions. This paper presents the gameplay filter, a new class of predictive
safety filter that continually plays out hypothetical matches between its
simulation-trained safety strategy and a virtual adversary co-trained to invoke
worst-case events and sim-to-real error, and precludes actions that would cause
failures down the line. We demonstrate the scalability and robustness of the
approach with a first-of-its-kind full-order safety filter for (36-D)
quadrupedal dynamics. Physical experiments on two different quadruped platforms
demonstrate the superior zero-shot effectiveness of the gameplay filter under
large perturbations such as tugging and unmodeled terrain. Experiment videos
and open-source software are available online:
https://saferobotics.org/research/gameplay-filter
[LINK]
http://arxiv.org/abs/2405.00846v4
[DATE]
2025-01-16 09:49:35+08:00
[CATEGORIES]
cs.LG
Enhancing Graph Self-Supervised Learning with Graph Interplay
[AUTHORS]
Xinjian Zhao, Wei Pang, Xiangru Jian, Yaoyao Xu, Chaolong Ying, Tianshu Yu
[ABSTRACT]
Graph self-supervised learning (GSSL) has emerged as a compelling framework
for extracting informative representations from graph-structured data without
extensive reliance on labeled inputs. In this study, we introduce Graph
Interplay (GIP), an innovative and versatile approach that significantly
enhances the performance equipped with various existing GSSL methods. To this
end, GIP advocates direct graph-level communications by introducing random
inter-graph edges within standard batches. Against GIP’s simplicity, we further
theoretically show that \textsc{GIP} essentially performs a principled manifold
separation via combining inter-graph message passing and GSSL, bringing about
more structured embedding manifolds and thus benefits a series of downstream
tasks. Our empirical study demonstrates that GIP surpasses the performance of
prevailing GSSL methods across multiple benchmarks by significant margins,
highlighting its potential as a breakthrough approach. Besides, GIP can be
readily integrated into a series of GSSL methods and consistently offers
additional performance gain. This advancement not only amplifies the capability
of GSSL but also potentially sets the stage for a novel graph learning paradigm
in a broader sense.
[COMMENTS]
Due to potential implicit data leakage in our experimental setup,
where the pretraining dataset was ordered by default labels, we withdraw this
manuscript for further self-examination and rigorous validation
[LINK]
http://arxiv.org/abs/2410.04061v3
[DATE]
2025-01-16 09:18:40+08:00
[CATEGORIES]
cs.LG
CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM
[AUTHORS]
Minkyu Jeon, Rishwanth Raghu, Miro Astore, Geoffrey Woollard, Ryan Feathers, Alkin Kaz, Sonya M. Hanson, Pilar Cossio, Ellen D. Zhong
[ABSTRACT]
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining
high-resolution 3D biomolecular structures from imaging data. Its unique
ability to capture structural variability has spurred the development of
heterogeneous reconstruction algorithms that can infer distributions of 3D
structures from noisy, unlabeled imaging data. Despite the growing number of
advanced methods, progress in the field is hindered by the lack of standardized
benchmarks with ground truth information and reliable validation metrics. Here,
we introduce CryoBench, a suite of datasets, metrics, and benchmarks for
heterogeneous reconstruction in cryo-EM. CryoBench includes five datasets
representing different sources of heterogeneity and degrees of difficulty.
These include conformational heterogeneity generated from designed motions of
antibody complexes or sampled from a molecular dynamics simulation, as well as
compositional heterogeneity from mixtures of ribosome assembly states or 100
common complexes present in cells. We then analyze state-of-the-art
heterogeneous reconstruction tools, including neural and non-neural methods,
assess their sensitivity to noise, and propose new metrics for quantitative
evaluation. We hope that CryoBench will be a foundational resource for
accelerating algorithmic development and evaluation in the cryo-EM and machine
learning communities. Project page: https://cryobench.cs.princeton.edu.
[COMMENTS]
Accepted by NeurIPS 2024 (Spotlight)
[LINK]
http://arxiv.org/abs/2408.05526v2
[DATE]
2025-01-16 08:54:04+08:00
[CATEGORIES]
cs.LG
An efficient likelihood-free Bayesian inference method based on sequential neural posterior estimation
[AUTHORS]
Yifei Xiong, Xiliang Yang, Sanguo Zhang, Zhijian He
[ABSTRACT]
Sequential neural posterior estimation (SNPE) techniques have been recently
proposed for dealing with simulation-based models with intractable likelihoods.
Unlike approximate Bayesian computation, SNPE techniques learn the posterior
from sequential simulation using neural network-based conditional density
estimators by minimizing a specific loss function. The SNPE method proposed by
Lueckmann et al. (2017) used a calibration kernel to boost the sample weights
around the observed data, resulting in a concentrated loss function. However,
the use of calibration kernels may increase the variances of both the empirical
loss and its gradient, making the training inefficient. To improve the
stability of SNPE, this paper proposes to use an adaptive calibration kernel
and several variance reduction techniques. The proposed method greatly speeds
up the process of training and provides a better approximation of the posterior
than the original SNPE method and some existing competitors as confirmed by
numerical experiments. We also managed to demonstrate the superiority of the
proposed method for a high-dimensional model with a real-world dataset.
[COMMENTS]
28 pages, 9 figures
[LINK]
http://arxiv.org/abs/2311.12530v4
[DATE]
2025-01-16 08:53:15+08:00
[CATEGORIES]
cs.LG
Leveraging Scale-aware Representations for improved Concept-Representation Alignment in ViTs
[AUTHORS]
Sanchit Sinha, Guangzhi Xiong, Aidong Zhang
[ABSTRACT]
Vision Transformers (ViTs) are increasingly being adopted in various
sensitive vision applications - like medical diagnosis, facial recognition,
etc. To improve the interpretability of such models, many approaches attempt to
forward-align them with carefully annotated abstract, human-understandable
semantic entities - concepts. Concepts provide global rationales to the model
predictions and can be quickly understood/intervened on by domain experts. Most
current research focuses on designing model-agnostic, plug-and-play generic
concept-based explainability modules that do not incorporate the inner workings
of foundation models (e.g., inductive biases, scale invariance, etc.) during
training. To alleviate this issue for ViTs, in this paper, we propose a novel
Concept Representation Alignment Module (CRAM) which learns both scale and
position-aware representations from multi-scale feature pyramids and patch
representations respectively. CRAM further aligns these representations with
concept annotations through an attention matrix. The proposed CRAM module
improves the predictive performance of ViT architectures and also provides
accurate and robust concept explanations as demonstrated on five datasets -
including three widely used benchmarks (CUB, Pascal APY, Concept-MNIST) and 2
real-world datasets (AWA2, KITS).
[LINK]
http://arxiv.org/abs/2501.09221v1
[DATE]
2025-01-16 08:45:05+08:00
[CATEGORIES]
cs.LG
Predicting Long-Term Student Outcomes from Short-Term EdTech Log Data
[AUTHORS]
Ge Gao, Amelia Leon, Andrea Jetten, Jasmine Turner, Husni Almoubayyed, Stephen Fancsali, Emma Brunskill
[ABSTRACT]
Educational stakeholders are often particularly interested in sparse, delayed
student outcomes, like end-of-year statewide exams. The rare occurrence of such
assessments makes it harder to identify students likely to fail such
assessments, as well as making it slow for researchers and educators to be able
to assess the effectiveness of particular educational tools. Prior work has
primarily focused on using logs from students full usage (e.g. year-long) of an
educational product to predict outcomes, or considered predictive accuracy
using a few minutes to predict outcomes after a short (e.g. 1 hour) session. In
contrast, we investigate machine learning predictors using students’ logs
during their first few hours of usage can provide useful predictive insight
into those students’ end-of-school year external assessment. We do this on
three diverse datasets: from students in Uganda using a literacy game product,
and from students in the US using two mathematics intelligent tutoring systems.
We consider various measures of the accuracy of the resulting predictors,
including its ability to identify students at different parts along the
assessment performance distribution. Our findings suggest that short-term log
usage data, from 2-5 hours, can be used to provide valuable signal about
students’ long-term external performance.
[COMMENTS]
Accepted to the 15th International Learning Analytics and Knowledge
Conference (LAK2025)
[LINK]
http://arxiv.org/abs/2412.15473v2
[DATE]
2025-01-16 07:11:07+08:00
[CATEGORIES]
cs.LG
A Misclassification Network-Based Method for Comparative Genomic Analysis
[AUTHORS]
Wan He, Tina Eliassi-Rad, Samuel V. Scarpino
[ABSTRACT]
Classifying genome sequences based on metadata has been an active area of
research in comparative genomics for decades with many important applications
across the life sciences. Established methods for classifying genomes can be
broadly grouped into sequence alignment-based and alignment-free models.
Conventional alignment-based models rely on genome similarity measures
calculated based on local sequence alignments or consistent ordering among
sequences. However, such methods are computationally expensive when dealing
with large ensembles of even moderately sized genomes. In contrast,
alignment-free (AF) approaches measure genome similarity based on summary
statistics in an unsupervised setting and are efficient enough to analyze large
datasets. However, both alignment-based and AF methods typically assume fixed
scoring rubrics that lack the flexibility to assign varying importance to
different parts of the sequences based on prior knowledge. In this study, we
integrate AI and network science approaches to develop a comparative genomic
analysis framework that addresses these limitations. Our approach, termed the
Genome Misclassification Network Analysis (GMNA), simultaneously leverages
misclassified instances, a learned scoring rubric, and label information to
classify genomes based on associated metadata and better understand potential
drivers of misclassification. We evaluate the utility of the GMNA using Naive
Bayes and convolutional neural network models, supplemented by additional
experiments with transformer-based models, to construct SARS-CoV-2 sampling
location classifiers using over 500,000 viral genome sequences and study the
resulting network of misclassifications. We demonstrate the global health
potential of the GMNA by leveraging the SARS-CoV-2 genome misclassification
networks to investigate the role human mobility played in structuring
geographic clustering of SARS-CoV-2.
[LINK]
http://arxiv.org/abs/2412.07051v3
[DATE]
2025-01-16 06:50:44+08:00
[CATEGORIES]
cs.LG
Testing Noise Assumptions of Learning Algorithms
[AUTHORS]
Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
[ABSTRACT]
We pose a fundamental question in computational learning theory: can we
efficiently test whether a training set satisfies the assumptions of a given
noise model? This question has remained unaddressed despite decades of research
on learning in the presence of noise. In this work, we show that this task is
tractable and present the first efficient algorithm to test various noise
assumptions on the training data.
To model this question, we extend the recently proposed testable learning
framework of Rubinfeld and Vasilyan (2023) and require a learner to run an
associated test that satisfies the following two conditions: (1) whenever the
test accepts, the learner outputs a classifier along with a certificate of
optimality, and (2) the test must pass for any dataset drawn according to a
specified modeling assumption on both the marginal distribution and the noise
model. We then consider the problem of learning halfspaces over Gaussian
marginals with Massart noise (where each label can be flipped with probability
less than $1/2$ depending on the input features), and give a fully-polynomial
time testable learning algorithm.
We also show a separation between the classical setting of learning in the
presence of structured noise and testable learning. In fact, for the simple
case of random classification noise (where each label is flipped with fixed
probability $\eta = 1/2$), we show that testable learning requires
super-polynomial time while classical learning is trivial.
[LINK]
http://arxiv.org/abs/2501.09189v1
[DATE]
2025-01-16 06:33:55+08:00
[CATEGORIES]
cs.LG
Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection
[AUTHORS]
Qisen Cheng, Shuhui Qu, Janghwan Lee
[ABSTRACT]
Unsupervised visual defect detection is critical in industrial applications,
requiring a representation space that captures normal data features while
detecting deviations. Achieving a balance between expressiveness and
compactness is challenging; an overly expressive space risks inefficiency and
mode collapse, impairing detection accuracy. We propose a novel approach using
an enhanced VQ-VAE framework optimized for unsupervised defect detection. Our
model introduces a patch-aware dynamic code assignment scheme, enabling
context-sensitive code allocation to optimize spatial representation. This
strategy enhances normal-defect distinction and improves detection accuracy
during inference. Experiments on MVTecAD, BTAD, and MTSD datasets show our
method achieves state-of-the-art performance.
[COMMENTS]
7 pages, Accepted to 36th IEEE ICTAI 2024
[LINK]
http://arxiv.org/abs/2501.09187v1
[DATE]
2025-01-16 06:26:26+08:00
[CATEGORIES]
cs.LG
Enhancing Graph Representation Learning with Localized Topological Features
[AUTHORS]
Zuoyu Yan, Qi Zhao, Ze Ye, Tengfei Ma, Liangcai Gao, Zhi Tang, Yusu Wang, Chao Chen
[ABSTRACT]
Representation learning on graphs is a fundamental problem that can be
crucial in various tasks. Graph neural networks, the dominant approach for
graph representation learning, are limited in their representation power.
Therefore, it can be beneficial to explicitly extract and incorporate
high-order topological and geometric information into these models. In this
paper, we propose a principled approach to extract the rich connectivity
information of graphs based on the theory of persistent homology. Our method
utilizes the topological features to enhance the representation learning of
graph neural networks and achieve state-of-the-art performance on various node
classification and link prediction benchmarks. We also explore the option of
end-to-end learning of the topological features, i.e., treating topological
computation as a differentiable operator during learning. Our theoretical
analysis and empirical study provide insights and potential guidelines for
employing topological features in graph learning tasks.
[COMMENTS]
Accepted in JMLR 2025
[LINK]
http://arxiv.org/abs/2501.09178v1
[DATE]
2025-01-16 06:12:27+08:00
[CATEGORIES]
cs.LG
Attention is All You Need Until You Need Retention
[AUTHORS]
M. Murat Yaslioglu
[ABSTRACT]
This work introduces a novel Retention Layer mechanism for Transformer based
architectures, addressing their inherent lack of intrinsic retention
capabilities. Unlike human cognition, which can encode and dynamically recall
symbolic templates, Generative Pretrained Transformers rely solely on fixed
pretrained weights and ephemeral context windows, limiting their adaptability.
The proposed Retention Layer incorporates a persistent memory module capable of
real time data population, dynamic recall, and guided output generation. This
enhancement allows models to store, update, and reuse observed patterns across
sessions, enabling incremental learning and bridging the gap between static
pretraining and dynamic, context sensitive adaptation. The Retention Layer
design parallels social learning processes, encompassing attention, retention,
reproduction, and motivation stages. Technically, it integrates a memory
attention mechanism and episodic buffers to manage memory scalability, mitigate
overfitting, and ensure efficient recall. Applications span adaptive personal
assistants, real time fraud detection, autonomous robotics, content moderation,
and healthcare diagnostics. In each domain, the retention mechanism enables
systems to learn incrementally, personalize outputs, and respond to evolving
real world challenges effectively. By emulating key aspects of human learning,
this retention enhanced architecture fosters a more fluid and responsive AI
paradigm, paving the way for dynamic, session aware models that extend the
capabilities of traditional Transformers into domains requiring continual
adaptation.
[LINK]
http://arxiv.org/abs/2501.09166v1
[DATE]
2025-01-16 05:33:53+08:00
[CATEGORIES]
cs.LG
Towards Understanding Extrapolation: a Causal Lens
[AUTHORS]
Lingjing Kong, Guangyi Chen, Petar Stojanov, Haoxuan Li, Eric P. Xing, Kun Zhang
[ABSTRACT]
Canonical work handling distribution shifts typically necessitates an entire
target distribution that lands inside the training distribution. However,
practical scenarios often involve only a handful of target samples, potentially
lying outside the training support, which requires the capability of
extrapolation. In this work, we aim to provide a theoretical understanding of
when extrapolation is possible and offer principled methods to achieve it
without requiring an on-support target distribution. To this end, we formulate
the extrapolation problem with a latent-variable model that embodies the
minimal change principle in causal mechanisms. Under this formulation, we cast
the extrapolation problem into a latent-variable identification problem. We
provide realistic conditions on shift properties and the estimation objectives
that lead to identification even when only one off-support target sample is
available, tackling the most challenging scenarios. Our theory reveals the
intricate interplay between the underlying manifold’s smoothness and the shift
properties. We showcase how our theoretical results inform the design of
practical adaptation algorithms. Through experiments on both synthetic and
real-world data, we validate our theoretical findings and their practical
implications.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2501.09163v1
[DATE]
2025-01-16 05:29:29+08:00
[CATEGORIES]
cs.LG
FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction
[AUTHORS]
Alex Morehead, Jianlin Cheng
[ABSTRACT]
Powerful generative AI models of protein-ligand structure have recently been
proposed, but few of these methods support both flexible protein-ligand docking
and affinity estimation. Of those that do, none can directly model multiple
binding ligands concurrently or have been rigorously benchmarked on
pharmacologically relevant drug targets, hindering their widespread adoption in
drug discovery efforts. In this work, we propose FlowDock, the first deep
geometric generative model based on conditional flow matching that learns to
directly map unbound (apo) structures to their bound (holo) counterparts for an
arbitrary number of binding ligands. Furthermore, FlowDock provides predicted
structural confidence scores and binding affinity values with each of its
generated protein-ligand complex structures, enabling fast virtual screening of
new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark
dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind
docking success rate using unbound (apo) protein input structures and without
any information derived from multiple sequence alignments, and for the
challenging new DockGen-E dataset, FlowDock outperforms single-sequence
AlphaFold 3 and matches single-sequence Chai-1 for binding pocket
generalization. Additionally, in the ligand category of the 16th community-wide
Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock
ranked among the top-5 methods for pharmacological binding affinity estimation
across 140 protein-ligand complexes, demonstrating the efficacy of its learned
representations in virtual screening. Source code, data, and pre-trained models
are available at https://github.com/BioinfoMachineLearning/FlowDock.
[COMMENTS]
10 pages, 2 tables, 2 algorithms, 7 figures. Code, data, pre-trained
models, and baseline method predictions are available at
https://github.com/BioinfoMachineLearning/FlowDock
[LINK]
http://arxiv.org/abs/2412.10966v2
[DATE]
2025-01-16 05:20:03+08:00
[CATEGORIES]
cs.LG
Towards Federated Multi-Armed Bandit Learning for Content Dissemination using Swarm of UAVs
[AUTHORS]
Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas
[ABSTRACT]
This paper introduces an Unmanned Aerial Vehicle - enabled content management
architecture that is suitable for critical content access in communities of
users that are communication-isolated during diverse types of disaster
scenarios. The proposed architecture leverages a hybrid network of stationary
anchor UAVs and mobile Micro-UAVs for ubiquitous content dissemination. The
anchor UAVs are equipped with both vertical and lateral communication links,
and they serve local users, while the mobile micro-ferrying UAVs extend
coverage across communities with increased mobility. The focus is on developing
a content dissemination system that dynamically learns optimal caching policies
to maximize content availability. The core innovation is an adaptive content
dissemination framework based on distributed Federated Multi-Armed Bandit
learning. The goal is to optimize UAV content caching decisions based on
geo-temporal content popularity and user demand variations. A Selective Caching
Algorithm is also introduced to reduce redundant content replication by
incorporating inter-UAV information sharing. This method strategically
preserves the uniqueness in user preferences while amalgamating the
intelligence across a distributed learning system. This approach improves the
learning algorithm’s ability to adapt to diverse user preferences. Functional
verification and performance evaluation confirm the proposed architecture’s
utility across different network sizes, UAV swarms, and content popularity
patterns.
[COMMENTS]
25 pages, 11 figures, 1 table, 4 algorithms, journal
[LINK]
http://arxiv.org/abs/2501.09146v1
[DATE]
2025-01-16 04:55:13+08:00
[CATEGORIES]
cs.LG
Key-Exchange Convolutional Auto-Encoder for Data Augmentation in Early Knee Osteoarthritis Detection
[AUTHORS]
Zhe Wang, Aladine Chetouani, Mohamed Jarraya, Yung Hsin Chen, Yuhua Ru, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane
[ABSTRACT]
Knee Osteoarthritis (KOA) is a common musculoskeletal condition that
significantly affects mobility and quality of life, particularly in elderly
populations. However, training deep learning models for early KOA
classification is often hampered by the limited availability of annotated
medical datasets, owing to the high costs and labour-intensive nature of data
labelling. Traditional data augmentation techniques, while useful, rely on
simple transformations and fail to introduce sufficient diversity into the
dataset. To address these challenges, we propose the Key-Exchange Convolutional
Auto-Encoder (KECAE) as an innovative Artificial Intelligence (AI)-based data
augmentation strategy for early KOA classification. Our model employs a
convolutional autoencoder with a novel key-exchange mechanism that generates
synthetic images by selectively exchanging key pathological features between
X-ray images, which not only diversifies the dataset but also ensures the
clinical validity of the augmented data. A hybrid loss function is introduced
to supervise feature learning and reconstruction, integrating multiple
components, including reconstruction, supervision, and feature separation
losses. Experimental results demonstrate that the KECAE-generated data
significantly improve the performance of KOA classification models, with
accuracy gains of up to 1.98% across various standard and state-of-the-art
architectures. Furthermore, a clinical validation study involving expert
radiologists confirms the anatomical plausibility and diagnostic realism of the
synthetic outputs. These findings highlight the potential of KECAE as a robust
tool for augmenting medical datasets in early KOA detection.
[LINK]
http://arxiv.org/abs/2302.13336v2
[DATE]
2025-01-16 04:50:17+08:00
[CATEGORIES]
cs.LG
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
[AUTHORS]
Pierfrancesco Beneventano, Blake Woodworth
[ABSTRACT]
We study the gradient descent (GD) dynamics of a depth-2 linear neural
network with a single input and output. We show that GD converges at an
explicit linear rate to a global minimum of the training loss, even with a
large stepsize – about $2/\textrm{sharpness}$. It still converges for even
larger stepsizes, but may do so very slowly. We also characterize the solution
to which GD converges, which has lower norm and sharpness than the gradient
flow solution. Our analysis reveals a trade off between the speed of
convergence and the magnitude of implicit regularization. This sheds light on
the benefits of training at the “Edge of Stability”, which induces additional
regularization by delaying convergence and may have implications for training
more complex models.
[COMMENTS]
23 pages, 3 figures
[LINK]
http://arxiv.org/abs/2501.09137v1
[DATE]
2025-01-16 04:43:36+08:00
[CATEGORIES]
cs.LG
Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis
[AUTHORS]
Jiajin Li, Linglingzhi Zhu, Anthony Man-Cho So
[ABSTRACT]
Nonconvex-nonconcave minimax optimization has gained widespread interest over
the last decade. However, most existing works focus on variants of gradient
descent-ascent (GDA) algorithms, which are only applicable to smooth
nonconvex-concave settings. To address this limitation, we propose a novel
algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which
can effectively handle a broad range of structured nonsmooth
nonconvex-nonconcave minimax problems. Specifically, we consider the setting
where the primal function has a nonsmooth composite structure and the dual
function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta
\in [0,1)$. We introduce a novel convergence analysis framework for smoothed
PLDA, the key components of which are our newly developed nonsmooth primal
error bound and dual error bound. Using this framework, we show that smoothed
PLDA can find both $\epsilon$-game-stationary points and
$\epsilon$-optimization-stationary points of the problems of interest in
$\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when
$\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration
complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the
effectiveness and wide applicability of our analysis framework, we show that
certain max-structured problem possesses the KL property with exponent
$\theta=0$ under mild assumptions. As a by-product, we establish
algorithm-independent quantitative relationships among various stationarity
concepts, which may be of independent interest.
[COMMENTS]
Accepted for publication in Mathematical Programming
[LINK]
http://arxiv.org/abs/2209.10825v4
[DATE]
2025-01-16 04:43:18+08:00
[CATEGORIES]
cs.LG
Deep Self-Supervised Disturbance Mapping with the OPERA Sentinel-1 Radiometric Terrain Corrected SAR Backscatter Product
[AUTHORS]
Harris Hardiman-Mostow, Charles Marshak, Alexander L. Handwerger
[ABSTRACT]
Mapping land surface disturbances supports disaster response, resource and
ecosystem management, and climate adaptation efforts. Synthetic aperture radar
(SAR) is an invaluable tool for disturbance mapping, providing consistent
time-series images of the ground regardless of weather or illumination
conditions. Despite SAR’s potential for disturbance mapping, processing SAR
data to an analysis-ready format requires expertise and significant compute
resources, particularly for large-scale global analysis. In October 2023,
NASA’s Observational Products for End-Users from Remote Sensing Analysis
(OPERA) project released the near-global Radiometric Terrain Corrected SAR
backscatter from Sentinel-1 (RTC-S1) dataset, providing publicly available,
analysis-ready SAR imagery. In this work, we utilize this new dataset to
systematically analyze land surface disturbances. As labeling SAR data is often
prohibitively time-consuming, we train a self-supervised vision transformer -
which requires no labels to train - on OPERA RTC-S1 data to estimate a
per-pixel distribution from the set of baseline imagery and assess disturbances
when there is significant deviation from the modeled distribution. To test our
model’s capability and generality, we evaluate three different natural
disasters - which represent high-intensity, abrupt disturbances - from three
different regions of the world. Across events, our approach yields high quality
delineations: F1 scores exceeding 0.6 and Areas Under the Precision-Recall
Curve exceeding 0.65, consistently outperforming existing SAR disturbance
methods. Our findings suggest that a self-supervised vision transformer is
well-suited for global disturbance mapping and can be a valuable tool for
operational, near-global disturbance monitoring, particularly when labeled data
does not exist.
[COMMENTS]
19 pages, 18 figures, 5 tables. Preprint. Submitted to JSTARS
[LINK]
http://arxiv.org/abs/2501.09129v1
[DATE]
2025-01-16 04:24:18+08:00
[CATEGORIES]
cs.LG
Multi-Class Traffic Assignment using Multi-View Heterogeneous Graph Attention Networks
[AUTHORS]
Tong Liu, Hadi Meidani
[ABSTRACT]
Solving traffic assignment problem for large networks is computationally
challenging when conventional optimization-based methods are used. In our
research, we develop an innovative surrogate model for a traffic assignment
when multi-class vehicles are involved. We do so by employing heterogeneous
graph neural networks which use a multiple-view graph attention mechanism
tailored to different vehicle classes, along with additional links connecting
origin-destination pairs. We also integrate the node-based flow conservation
law into the loss function. As a result, our model adheres to flow conservation
while delivering highly accurate predictions for link flows and utilization
ratios. Through numerical experiments conducted on urban transportation
networks, we demonstrate that our model surpasses traditional neural network
approaches in convergence speed and predictive accuracy in both user
equilibrium and system optimal versions of traffic assignment.
[COMMENTS]
16 pages, 5 figures
[LINK]
http://arxiv.org/abs/2501.09117v1
[DATE]
2025-01-16 03:53:14+08:00
[CATEGORIES]
cs.LG
Towards Scalable and Stable Parallelization of Nonlinear RNNs
[AUTHORS]
Xavier Gonzalez, Andrew Warrington, Jimmy T. H. Smith, Scott W. Linderman
[ABSTRACT]
Transformers and linear state space models can be evaluated in parallel on
modern hardware, but evaluating nonlinear RNNs appears to be an inherently
sequential problem. Recently, however, Lim et al. ‘24 developed an approach
called DEER, which evaluates nonlinear RNNs in parallel by posing the states as
the solution to a fixed-point problem. They derived a parallel form of Newton’s
method to solve the fixed-point problem and achieved significant speedups over
sequential evaluation. However, the computational complexity of DEER is cubic
in the state size, and the algorithm can suffer from numerical instability. We
address these limitations with two novel contributions. To reduce the
computational complexity, we apply quasi-Newton approximations and show they
converge comparably to Newton, use less memory, and are faster. To stabilize
DEER, we leverage a connection between the Levenberg-Marquardt algorithm and
Kalman smoothing, which we call ELK. This connection allows us to stabilize
Newton’s method while using efficient parallelized Kalman smoothing algorithms
to retain performance. Through several experiments, we show that these
innovations allow for parallel evaluation of nonlinear RNNs at larger scales
and with greater stability.
[COMMENTS]
33 pages, 9 figures, NeurIPS 2024
[LINK]
http://arxiv.org/abs/2407.19115v3
[DATE]
2025-01-16 03:18:35+08:00
[CATEGORIES]
cs.LG
The Artificial Scientist – in-transit Machine Learning of Plasma Simulations
[AUTHORS]
Jeffrey Kelling, Vicente Bolea, Michael Bussmann, Ankush Checkervarty, Alexander Debus, Jan Ebert, Greg Eisenhauer, Vineeth Gutta, Stefan Kesselheim, Scott Klasky, Richard Pausch, Norbert Podhorszki, Franz Poschel, David Rogers, Jeyhun Rustamov, Steve Schmerler, Ulrich Schramm, Klaus Steiniger, Rene Widera, Anna Willmann, Sunita Chandrasekaran
[ABSTRACT]
Increasing HPC cluster sizes and large-scale simulations that produce
petabytes of data per run, create massive IO and storage challenges for
analysis. Deep learning-based techniques, in particular, make use of these
amounts of domain data to extract patterns that help build scientific
understanding. Here, we demonstrate a streaming workflow in which simulation
data is streamed directly to a machine-learning (ML) framework, circumventing
the file system bottleneck. Data is transformed in transit, asynchronously to
the simulation and the training of the model. With the presented workflow, data
operations can be performed in common and easy-to-use programming languages,
freeing the application user from adapting the application output routines. As
a proof-of-concept we consider a GPU accelerated particle-in-cell (PIConGPU)
simulation of the Kelvin- Helmholtz instability (KHI). We employ experience
replay to avoid catastrophic forgetting in learning from this non-steady
process in a continual manner. We detail challenges addressed while porting and
scaling to Frontier exascale system.
[COMMENTS]
12 pages, 9 figures
[LINK]
http://arxiv.org/abs/2501.03383v2
[DATE]
2025-01-16 03:16:18+08:00
[CATEGORIES]
cs.LG
Inferring Transition Dynamics from Value Functions
[AUTHORS]
Jacob Adamczyk
[ABSTRACT]
In reinforcement learning, the value function is typically trained to solve
the Bellman equation, which connects the current value to future values. This
temporal dependency hints that the value function may contain implicit
information about the environment’s transition dynamics. By rearranging the
Bellman equation, we show that a converged value function encodes a model of
the underlying dynamics of the environment. We build on this insight to propose
a simple method for inferring dynamics models directly from the value function,
potentially mitigating the need for explicit model learning. Furthermore, we
explore the challenges of next-state identifiability, discussing conditions
under which the inferred dynamics model is well-defined. Our work provides a
theoretical foundation for leveraging value functions in dynamics modeling and
opens a new avenue for bridging model-free and model-based reinforcement
learning.
[COMMENTS]
Accepted at the AAAI-25 8th Workshop on Generalization in Planning
[LINK]
http://arxiv.org/abs/2501.09081v1
[DATE]
2025-01-16 03:00:47+08:00
[CATEGORIES]
cs.LG
Average-Reward Reinforcement Learning with Entropy Regularization
[AUTHORS]
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni
[ABSTRACT]
The average-reward formulation of reinforcement learning (RL) has drawn
increased interest in recent years due to its ability to solve
temporally-extended problems without discounting. Independently, RL algorithms
have benefited from entropy-regularization: an approach used to make the
optimal policy stochastic, thereby more robust to noise. Despite the distinct
benefits of the two approaches, the combination of entropy regularization with
an average-reward objective is not well-studied in the literature and there has
been limited development of algorithms for this setting. To address this gap in
the field, we develop algorithms for solving entropy-regularized average-reward
RL problems with function approximation. We experimentally validate our method,
comparing it with existing algorithms on standard benchmarks for RL.
[COMMENTS]
Accepted at the AAAI-25 Eighth Workshop on Bridging the Gap Between
AI Planning and Reinforcement Learning (PRL)
[LINK]
http://arxiv.org/abs/2501.09080v1
[DATE]
2025-01-16 03:00:46+08:00
[CATEGORIES]
cs.LG
EVAL: EigenVector-based Average-reward Learning
[AUTHORS]
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni
[ABSTRACT]
In reinforcement learning, two objective functions have been developed
extensively in the literature: discounted and averaged rewards. The
generalization to an entropy-regularized setting has led to improved robustness
and exploration for both of these objectives. Recently, the entropy-regularized
average-reward problem was addressed using tools from large deviation theory in
the tabular setting. This method has the advantage of linearity, providing
access to both the optimal policy and average reward-rate through properties of
a single matrix. In this paper, we extend that framework to more general
settings by developing approaches based on function approximation by neural
networks. This formulation reveals new theoretical insights into the
relationship between different objectives used in RL. Additionally, we combine
our algorithm with a posterior policy iteration scheme, showing how our
approach can also solve the average-reward RL problem without
entropy-regularization. Using classic control benchmarks, we experimentally
find that our method compares favorably with other algorithms in terms of
stability and rate of convergence.
[COMMENTS]
Accepted at the AAAI-25 8th Workshop on Generalization in Planning.
arXiv admin note: text overlap with arXiv:2501.09080
[LINK]
http://arxiv.org/abs/2501.09770v1
[DATE]
2025-01-16 03:00:45+08:00
[CATEGORIES]
cs.LG
Generative diffusion model with inverse renormalization group flows
[AUTHORS]
Kanta Masuki, Yuto Ashida
[ABSTRACT]
Diffusion models represent a class of generative models that produce data by
denoising a sample corrupted by white noise. Despite the success of diffusion
models in computer vision, audio synthesis, and point cloud generation, so far
they overlook inherent multiscale structures in data and have a slow generation
process due to many iteration steps. In physics, the renormalization group
offers a fundamental framework for linking different scales and giving an
accurate coarse-grained model. Here we introduce a renormalization group-based
diffusion model that leverages multiscale nature of data distributions for
realizing a high-quality data generation. In the spirit of renormalization
group procedures, we define a flow equation that progressively erases data
information from fine-scale details to coarse-grained structures. Through
reversing the renormalization group flows, our model is able to generate
high-quality samples in a coarse-to-fine manner. We validate the versatility of
the model through applications to protein structure prediction and image
generation. Our model consistently outperforms conventional diffusion models
across standard evaluation metrics, enhancing sample quality and/or
accelerating sampling speed by an order of magnitude. The proposed method
alleviates the need for data-dependent tuning of hyperparameters in the
generative diffusion models, showing promise for systematically increasing
sample efficiency based on the concept of the renormalization group.
[COMMENTS]
9+21 pages, 4+11 figures. The code and trained models are available
at https://github.com/kantamasuki/RGDM
[LINK]
http://arxiv.org/abs/2501.09064v1
[DATE]
2025-01-16 03:00:01+08:00
[CATEGORIES]
cs.LG
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
[AUTHORS]
Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan
[ABSTRACT]
The foundation model (FM) paradigm is transforming Machine Learning Force
Fields (MLFFs), leveraging general-purpose representations and scalable
training to perform a variety of computational chemistry tasks. Although MLFF
FMs have begun to close the accuracy gap relative to first-principles methods,
there is still a strong need for faster inference speed. Additionally, while
research is increasingly focused on general-purpose models which transfer
across chemical space, practitioners typically only study a small subset of
systems at a given time. This underscores the need for fast, specialized MLFFs
relevant to specific downstream applications, which preserve test-time physical
soundness while maintaining train-time scalability. In this work, we introduce
a method for transferring general-purpose representations from MLFF foundation
models to smaller, faster MLFFs specialized to specific regions of chemical
space. We formulate our approach as a knowledge distillation procedure, where
the smaller “student” MLFF is trained to match the Hessians of the energy
predictions of the “teacher” foundation model. Our specialized MLFFs can be up
to 20 $\times$ faster than the original foundation model, while retaining, and
in some cases exceeding, its performance and that of undistilled models. We
also show that distilling from a teacher model with a direct force
parameterization into a student model trained with conservative forces (i.e.,
computed as derivatives of the potential energy) successfully leverages the
representations from the large-scale teacher for improved accuracy, while
maintaining energy conservation during test-time molecular dynamics
simulations. More broadly, our work suggests a new paradigm for MLFF
development, in which foundation models are released along with smaller,
specialized simulation “engines” for common chemical subsets.
[COMMENTS]
Under Review at ICLR 2025
[LINK]
http://arxiv.org/abs/2501.09009v1
[DATE]
2025-01-16 02:50:52+08:00
[CATEGORIES]
cs.LG
Improving Stability Estimates in Adversarial Explainable AI through Alternate Search Methods
[AUTHORS]
Christopher Burger, Charles Walter
[ABSTRACT]
Advances in the effectiveness of machine learning models have come at the
cost of enormous complexity resulting in a poor understanding of how they
function. Local surrogate methods have been used to approximate the workings of
these complex models, but recent work has revealed their vulnerability to
adversarial attacks where the explanation produced is appreciably different
while the meaning and structure of the complex model’s output remains similar.
This prior work has focused on the existence of these weaknesses but not on
their magnitude. Here we explore using an alternate search method with the goal
of finding minimum viable perturbations, the fewest perturbations necessary to
achieve a fixed similarity value between the original and altered text’s
explanation. Intuitively, a method that requires fewer perturbations to expose
a given level of instability is inferior to one which requires more. This
nuance allows for superior comparisons of the stability of explainability
methods.
[COMMENTS]
9 pages, 3 figures, 5 tables. arXiv admin note: text overlap with
arXiv:2406.15839
[LINK]
http://arxiv.org/abs/2501.09006v1
[DATE]
2025-01-16 02:45:05+08:00
[CATEGORIES]
cs.LG
Delay Sensitive Hierarchical Federated Learning with Stochastic Local Updates
[AUTHORS]
Abdulmoneam Ali, Ahmed Arafa
[COMMENTS]
To appear in the IEEE Transactions on Cognitive Communications and
Networking
[LINK]
http://arxiv.org/abs/2302.04851v2
[DATE]
2025-01-16 02:45:04+08:00
[CATEGORIES]
cs.LG
Reward Machines for Deep RL in Noisy and Uncertain Environments
[AUTHORS]
Andrew C. Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith
[ABSTRACT]
Reward Machines provide an automaton-inspired structure for specifying
instructions, safety constraints, and other temporally extended reward-worthy
behaviour. By exposing the underlying structure of a reward function, they
enable the decomposition of an RL task, leading to impressive gains in sample
efficiency. Although Reward Machines and similar formal specifications have a
rich history of application towards sequential decision-making problems, they
critically rely on a ground-truth interpretation of the domain-specific
vocabulary that forms the building blocks of the reward function–such
ground-truth interpretations are elusive in the real world due in part to
partial observability and noisy sensing. In this work, we explore the use of
Reward Machines for Deep RL in noisy and uncertain environments. We
characterize this problem as a POMDP and propose a suite of RL algorithms that
exploit task structure under uncertain interpretation of the domain-specific
vocabulary. Through theory and experiments, we expose pitfalls in naive
approaches to this problem while simultaneously demonstrating how task
structure can be successfully leveraged under noisy interpretations of the
vocabulary.
[LINK]
http://arxiv.org/abs/2406.00120v4
[DATE]
2025-01-16 02:30:12+08:00
[CATEGORIES]
cs.LG
CrystalGRW: Generative Modeling of Crystal Structures with Targeted Properties via Geodesic Random Walks
[AUTHORS]
Krit Tangsongcharoen, Teerachote Pakornchote, Chayanon Atthapak, Natthaphon Choomphon-anomakhun, Annop Ektarawong, Björn Alling, Christopher Sutton, Thiti Bovornratanaraks, Thiparat Chotibut
[ABSTRACT]
Determining whether a candidate crystalline material is thermodynamically
stable depends on identifying its true ground-state structure, a central
challenge in computational materials science. We introduce CrystalGRW, a
diffusion-based generative model on Riemannian manifolds that proposes novel
crystal configurations and can predict stable phases validated by density
functional theory. The crystal properties, such as fractional coordinates,
atomic types, and lattice matrices, are represented on suitable Riemannian
manifolds, ensuring that new predictions generated through the diffusion
process preserve the periodicity of crystal structures. We incorporate an
equivariant graph neural network to also account for rotational and
translational symmetries during the generation process. CrystalGRW demonstrates
the ability to generate realistic crystal structures that are close to their
ground states with accuracy comparable to existing models, while also enabling
conditional control, such as specifying a desired crystallographic point group.
These features help accelerate materials discovery and inverse design by
offering stable, symmetry-consistent crystal candidates for experimental
validation.
[COMMENTS]
10+12 pages, 10 figures
[LINK]
http://arxiv.org/abs/2501.08998v1
[DATE]
2025-01-16 02:26:35+08:00
[CATEGORIES]
cs.LG
Optimal Federated Learning for Functional Mean Estimation under Heterogeneous Privacy Constraints
[AUTHORS]
Tony Cai, Abhinav Chakraborty, Lasse Vuursteen
[ABSTRACT]
Federated learning (FL) is a distributed machine learning technique designed
to preserve data privacy and security, and it has gained significant importance
due to its broad range of applications. This paper addresses the problem of
optimal functional mean estimation from discretely sampled data in a federated
setting.
We consider a heterogeneous framework where the number of individuals,
measurements per individual, and privacy parameters vary across one or more
servers, under both common and independent design settings. In the common
design setting, the same design points are measured for each individual,
whereas in the independent design, each individual has their own random
collection of design points. Within this framework, we establish minimax upper
and lower bounds for the estimation error of the underlying mean function,
highlighting the nuanced differences between common and independent designs
under distributed privacy constraints.
We propose algorithms that achieve the optimal trade-off between privacy and
accuracy and provide optimality results that quantify the fundamental limits of
private functional mean estimation across diverse distributed settings. These
results characterize the cost of privacy and offer practical insights into the
potential for privacy-preserving statistical analysis in federated
environments.
[COMMENTS]
54 pages: 25 page article and 29 pages of appendix
[LINK]
http://arxiv.org/abs/2412.18992v2
[DATE]
2025-01-16 02:07:15+08:00
[CATEGORIES]
cs.LG
Debiasing Synthetic Data Generated by Deep Generative Models
[AUTHORS]
Alexander Decruyenaere, Heidelinde Dehaene, Paloma Rabaey, Christiaan Polet, Johan Decruyenaere, Thomas Demeester, Stijn Vansteelandt
[ABSTRACT]
While synthetic data hold great promise for privacy protection, their
statistical analysis poses significant challenges that necessitate innovative
solutions. The use of deep generative models (DGMs) for synthetic data
generation is known to induce considerable bias and imprecision into synthetic
data analyses, compromising their inferential utility as opposed to original
data analyses. This bias and uncertainty can be substantial enough to impede
statistical convergence rates, even in seemingly straightforward analyses like
mean calculation. The standard errors of such estimators then exhibit slower
shrinkage with sample size than the typical 1 over root-$n$ rate. This
complicates fundamental calculations like p-values and confidence intervals,
with no straightforward remedy currently available. In response to these
challenges, we propose a new strategy that targets synthetic data created by
DGMs for specific data analyses. Drawing insights from debiased and targeted
machine learning, our approach accounts for biases, enhances convergence rates,
and facilitates the calculation of estimators with easily approximated large
sample variances. We exemplify our proposal through a simulation study on toy
data and two case studies on real-world data, highlighting the importance of
tailoring DGMs for targeted data analysis. This debiasing strategy contributes
to advancing the reliability and applicability of synthetic data in statistical
inference.
[COMMENTS]
Accepted for the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024), joint first authors
[LINK]
http://arxiv.org/abs/2411.04216v2
[DATE]
2025-01-16 01:47:22+08:00
[CATEGORIES]
cs.LG
Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography
[AUTHORS]
Ilia Shumailov, Daniel Ramage, Sarah Meiklejohn, Peter Kairouz, Florian Hartmann, Borja Balle, Eugene Bagdasarian
[ABSTRACT]
We often interact with untrusted parties. Prioritization of privacy can limit
the effectiveness of these interactions, as achieving certain goals
necessitates sharing private data. Traditionally, addressing this challenge has
involved either seeking trusted intermediaries or constructing cryptographic
protocols that restrict how much data is revealed, such as multi-party
computations or zero-knowledge proofs. While significant advances have been
made in scaling cryptographic approaches, they remain limited in terms of the
size and complexity of applications they can be used for. In this paper, we
argue that capable machine learning models can fulfill the role of a trusted
third party, thus enabling secure computations for applications that were
previously infeasible. In particular, we describe Trusted Capable Model
Environments (TCMEs) as an alternative approach for scaling secure computation,
where capable machine learning model(s) interact under input/output
constraints, with explicit information flow control and explicit statelessness.
This approach aims to achieve a balance between privacy and computational
efficiency, enabling private inference where classical cryptographic solutions
are currently infeasible. We describe a number of use cases that are enabled by
TCME, and show that even some simple classic cryptographic problems can already
be solved with TCME. Finally, we outline current limitations and discuss the
path forward in implementing them.
[LINK]
http://arxiv.org/abs/2501.08970v1
[DATE]
2025-01-16 01:28:53+08:00
[CATEGORIES]
cs.LG
A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series
[AUTHORS]
Lucas Correia, Jan-Christoph Goos, Thomas Bäck, Anna V. Kononova
[ABSTRACT]
Benchmarking anomaly detection approaches for multivariate time series is
challenging due to the lack of high-quality datasets. Current publicly
available datasets are too small, not diverse and feature trivial anomalies,
which hinders measurable progress in this research area. We propose a solution:
a diverse, extensive, and non-trivial dataset generated via state-of-the-art
simulation tools that reflects realistic behaviour of an automotive powertrain,
including its multivariate, dynamic and variable-state properties. To cater for
both unsupervised and semi-supervised anomaly detection settings, as well as
time series generation and forecasting, we make different versions of the
dataset available, where training and test subsets are offered in contaminated
and clean versions, depending on the task. We also provide baseline results
from a small selection of approaches based on deterministic and variational
autoencoders, as well as a non-parametric approach. As expected, the baseline
experimentation shows that the approaches trained on the semi-supervised
version of the dataset outperform their unsupervised counterparts, highlighting
a need for approaches more robust to contaminated training data.
[COMMENTS]
Submitted to the IEEE Transactions on Reliability journal
[LINK]
http://arxiv.org/abs/2411.13951v3
[DATE]
2025-01-16 01:16:22+08:00
[CATEGORIES]
cs.LG
Kolmogorov-Arnold Networks for Time Series Granger Causality Inference
[AUTHORS]
Meiliang Liu, Yunfang Xu, Zijin Li, Zhengye Si, Xiaoxiao Yang, Xinyue Yang, Zhiwen Zhao
[ABSTRACT]
We introduce Granger Causality Kolmogorov-Arnold Networks (GCKAN), an
innovative architecture that extends the recently proposed Kolmogorov-Arnold
Networks (KAN) to the domain of causal inference. By extracting base weights
from KAN layers and incorporating the sparsity-inducing penalty along with
ridge regularization, GCKAN infers the Granger causality from time series while
enabling automatic time lag selection. Additionally, we propose an algorithm
leveraging time-reversed Granger causality to enhance inference accuracy. The
algorithm compares prediction and sparse-inducing losses derived from the
original and time-reversed series, automatically selecting the casual
relationship with the higher score or integrating the results to mitigate
spurious connectivities. Comprehensive experiments conducted on Lorenz-96, gene
regulatory networks, fMRI BOLD signals, and VAR datasets demonstrate that the
proposed model achieves competitive performance to state-of-the-art methods in
inferring Granger causality from nonlinear, high-dimensional, and
limited-sample time series.
[LINK]
http://arxiv.org/abs/2501.08958v1
[DATE]
2025-01-16 01:09:07+08:00
[CATEGORIES]
cs.LG
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization
[AUTHORS]
Yao Ni, Shan Zhang, Piotr Koniusz
[ABSTRACT]
Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained
transformers to downstream tasks. However, the optimization of tasks
performance often comes at the cost of generalizability in fine-tuned models.
To address this issue, we theoretically connect smaller weight gradient norms
during training and larger datasets to the improvements in model
generalization. Motivated by this connection, we propose reducing gradient
norms for enhanced generalization and aligning fine-tuned model with the
pre-trained counterpart to retain knowledge from large-scale pre-training data.
Yet, naive alignment does not guarantee gradient reduction and can potentially
cause gradient explosion, complicating efforts to manage gradients. To address
such an issue, we propose PACE, marrying generalization of PArameter-efficient
fine-tuning with Consistency rEgularization. We perturb features learned from
the adapter with the multiplicative noise and ensure the fine-tuned model
remains consistent for same sample under different perturbations. Theoretical
analysis shows that PACE not only implicitly regularizes gradients for enhanced
generalization, but also implicitly aligns the fine-tuned and pre-trained
models to retain knowledge. Experimental evidence supports our theories. PACE
surpasses existing PEFT methods in visual adaptation tasks (VTAB-1k, FGVC,
few-shot learning, domain adaptation) showcasing its potential for
resource-efficient fine-tuning. It also improves LoRA in text classification
(GLUE) and mathematical reasoning (GSM-8K). The code is available at
https://github.com/MaxwellYaoNi/PACE
[COMMENTS]
Accepted by NeurIPS 2024 as a spotlight
[LINK]
http://arxiv.org/abs/2409.17137v4
[DATE]
2025-01-16 00:56:26+08:00
[CATEGORIES]
cs.LG
Computing Approximated Fixpoints via Dampened Mann Iteration
[AUTHORS]
Paolo Baldan, Sebastian Gurke, Barbara König, Tommaso Padoan, Florian Wittbold
[ABSTRACT]
Fixpoints are ubiquitous in computer science and when dealing with
quantitative semantics and verification one is commonly led to consider least
fixpoints of (higher-dimensional) functions over the nonnegative reals. We show
how to approximate the least fixpoint of such functions, focusing on the case
in which they are not known precisely, but represented by a sequence of
approximating functions that converge to them. We concentrate on monotone and
non-expansive functions, for which uniqueness of fixpoints is not guaranteed
and standard fixpoint iteration schemes might get stuck at a fixpoint that is
not the least. Our main contribution is the identification of an iteration
scheme, a variation of Mann iteration with a dampening factor, which, under
suitable conditions, is shown to guarantee convergence to the least fixpoint of
the function of interest. We then argue that these results are relevant in the
context of model-based reinforcement learning for Markov decision processes
(MDPs), showing that the proposed iteration scheme instantiates to MDPs and
allows us to derive convergence to the optimal expected return. More generally,
we show that our results can be used to iterate to the least fixpoint almost
surely for systems where the function of interest can be approximated with
given probabilistic error bounds, as it happens for probabilistic systems, such
as simple stochastic games, that can be explored via sampling.
[LINK]
http://arxiv.org/abs/2501.08950v1
[DATE]
2025-01-16 00:52:21+08:00
[CATEGORIES]
cs.LG
Supervised Kernel Thinning
[AUTHORS]
Albert Gong, Kyuseong Choi, Raaz Dwivedi
[ABSTRACT]
The kernel thinning algorithm of Dwivedi & Mackey (2024) provides a
better-than-i.i.d. compression of a generic set of points. By generating
high-fidelity coresets of size significantly smaller than the input points, KT
is known to speed up unsupervised tasks like Monte Carlo integration,
uncertainty quantification, and non-parametric hypothesis testing, with minimal
loss in statistical accuracy. In this work, we generalize the KT algorithm to
speed up supervised learning problems involving kernel methods. Specifically,
we combine two classical algorithms–Nadaraya-Watson (NW) regression or kernel
smoothing, and kernel ridge regression (KRR)–with KT to provide a quadratic
speed-up in both training and inference times. We show how distribution
compression with KT in each setting reduces to constructing an appropriate
kernel, and introduce the Kernel-Thinned NW and Kernel-Thinned KRR estimators.
We prove that KT-based regression estimators enjoy significantly superior
computational efficiency over the full-data estimators and improved statistical
efficiency over i.i.d. subsampling of the training data. En route, we also
provide a novel multiplicative error guarantee for compressing with KT. We
validate our design choices with both simulations and real data experiments.
[COMMENTS]
Published at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.13749v2
[DATE]
2025-01-16 00:50:11+08:00
[CATEGORIES]
cs.LG
Integrating Multi-Physics Simulations and Machine Learning to Define the Spatter Mechanism and Process Window in Laser Powder Bed Fusion
[AUTHORS]
Olabode T. Ajenifujah, Francis Ogoke, Florian Wirth, Jack Beuth, Amir Barati Farimani
[ABSTRACT]
Laser powder bed fusion (LPBF) has shown promise for wide range of
applications due to its ability to fabricate freeform geometries and generate a
controlled microstructure. However, components generated by LPBF still possess
sub-optimal mechanical properties due to the defects that are created during
laser-material interactions. In this work, we investigate mechanism of spatter
formation, using a high-fidelity modelling tool that was built to simulate the
multi-physics phenomena in LPBF. The modelling tool have the capability to
capture the 3D resolution of the meltpool and the spatter behavior. To
understand spatter behavior and formation, we reveal its properties at ejection
and evaluate its variation from the meltpool, the source where it is formed.
The dataset of the spatter and the meltpool collected consist of 50 % spatter
and 50 % melt pool samples, with features that include position components,
velocity components, velocity magnitude, temperature, density and pressure. The
relationship between the spatter and the meltpool were evaluated via
correlation analysis and machine learning (ML) algorithms for classification
tasks. Upon screening different ML algorithms on the dataset, a high accuracy
was observed for all the ML models, with ExtraTrees having the highest at 96 %
and KNN having the lowest at 94 %.
[LINK]
http://arxiv.org/abs/2405.07823v2
[DATE]
2025-01-16 00:29:38+08:00
[CATEGORIES]
cs.LG
Projection Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning
[AUTHORS]
Xinchen Han, Hossam Afifi, Michel Marot
[ABSTRACT]
Offline Reinforcement Learning (RL) faces a critical challenge of
extrapolation errors caused by out-of-distribution (OOD) actions. Implicit
Q-Learning (IQL) algorithm employs expectile regression to achieve in-sample
learning, effectively mitigating the risks associated with OOD actions.
However, the fixed hyperparameter in policy evaluation and density-based policy
improvement method limit its overall efficiency. In this paper, we propose
Proj-IQL, a projective IQL algorithm enhanced with the support constraint. In
the policy evaluation phase, Proj-IQL generalizes the one-step approach to a
multi-step approach through vector projection, while maintaining in-sample
learning and expectile regression framework. In the policy improvement phase,
Proj-IQL introduces support constraint that is more aligned with the policy
evaluation approach. Furthermore, we theoretically demonstrate that Proj-IQL
guarantees monotonic policy improvement and enjoys a progressively more
rigorous criterion for superior actions. Empirical results demonstrate the
Proj-IQL achieves state-of-the-art performance on D4RL benchmarks, especially
in challenging navigation domains.
[LINK]
http://arxiv.org/abs/2501.08907v1
[DATE]
2025-01-16 00:17:02+08:00
[CATEGORIES]
cs.LG
Multi-View Transformers for Airway-To-Lung Ratio Inference on Cardiac CT Scans: The C4R Study
[AUTHORS]
Sneha N. Naik, Elsa D. Angelini, Eric A. Hoffman, Elizabeth C. Oelsner, R. Graham Barr, Benjamin M. Smith, Andrew F. Laine
[ABSTRACT]
The ratio of airway tree lumen to lung size (ALR), assessed at full
inspiration on high resolution full-lung computed tomography (CT), is a major
risk factor for chronic obstructive pulmonary disease (COPD). There is growing
interest to infer ALR from cardiac CT images, which are widely available in
epidemiological cohorts, to investigate the relationship of ALR to severe
COVID-19 and post-acute sequelae of SARS-CoV-2 infection (PASC). Previously,
cardiac scans included approximately 2/3 of the total lung volume with 5-6x
greater slice thickness than high-resolution (HR) full-lung (FL) CT. In this
study, we present a novel attention-based Multi-view Swin Transformer to infer
FL ALR values from segmented cardiac CT scans. For the supervised training we
exploit paired full-lung and cardiac CTs acquired in the Multi-Ethnic Study of
Atherosclerosis (MESA). Our network significantly outperforms a proxy direct
ALR inference on segmented cardiac CT scans and achieves accuracy and
reproducibility comparable with a scan-rescan reproducibility of the FL ALR
ground-truth.
[COMMENTS]
Accepted to appear in Proceedings of International Symposium on
Biomedical Imaging (ISBI), 2025
[LINK]
http://arxiv.org/abs/2501.08902v1
[DATE]
2025-01-16 00:11:24+08:00
[CATEGORIES]
cs.LG
Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data
[AUTHORS]
David Holzmüller, Léo Grinsztajn, Ingo Steinwart
[COMMENTS]
NeurIPS 2024. Changes in v3: mention bug in XGBoost results, mention
original name of he+5 method. Code is available at
github.com/dholzmueller/pytabkit
[LINK]
http://arxiv.org/abs/2407.04491v3
[DATE]
2025-01-16 00:02:08+08:00
[CATEGORIES]
cs.LG
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind
[AUTHORS]
Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Saki Mizuno, Keita Suzuki, Ryo Masumura, Hiroaki Sugiyama, Kuniko Saito
[ABSTRACT]
Existing Theory of Mind (ToM) benchmarks diverge from real-world scenarios in
three aspects: 1) they assess a limited range of mental states such as beliefs,
2) false beliefs are not comprehensively explored, and 3) the diverse
personality traits of characters are overlooked. To address these challenges,
we introduce ToMATO, a new ToM benchmark formulated as multiple-choice QA over
conversations. ToMATO is generated via LLM-LLM conversations featuring
information asymmetry. By employing a prompting method that requires
role-playing LLMs to verbalize their thoughts before each utterance, we capture
both first- and second-order mental states across five categories: belief,
intention, desire, emotion, and knowledge. These verbalized thoughts serve as
answers to questions designed to assess the mental states of characters within
conversations. Furthermore, the information asymmetry introduced by hiding
thoughts from others induces the generation of false beliefs about various
mental states. Assigning distinct personality traits to LLMs further
diversifies both utterances and thoughts. ToMATO consists of 5.4k questions,
753 conversations, and 15 personality trait patterns. Our analysis shows that
this dataset construction approach frequently generates false beliefs due to
the information asymmetry between role-playing LLMs, and effectively reflects
diverse personalities. We evaluate nine LLMs on ToMATO and find that even
GPT-4o mini lags behind human performance, especially in understanding false
beliefs, and lacks robustness to various personality traits.
[COMMENTS]
Accepted by AAAI 2025
[LINK]
http://arxiv.org/abs/2501.08838v1
[DATE]
2025-01-15 22:47:02+08:00
[CATEGORIES]
cs.CL
Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition
[AUTHORS]
Yuming Yang, Wantong Zhao, Caishuang Huang, Junjie Ye, Xiao Wang, Huiyuan Zheng, Yang Nan, Yuran Wang, Xueying Xu, Kaixin Huang, Yunke Zhang, Tao Gui, Qi Zhang, Xuanjing Huang
[COMMENTS]
Accepted at COLING 2025. Camera-ready version updated. Project page:
https://github.com/UmeanNever/B2NER
[LINK]
http://arxiv.org/abs/2406.11192v2
[DATE]
2025-01-15 22:38:01+08:00
[CATEGORIES]
cs.CL
SAIF: A Comprehensive Framework for Evaluating the Risks of Generative AI in the Public Sector
[AUTHORS]
Kyeongryul Lee, Heehyeon Kim, Joyce Jiyoung Whang
[ABSTRACT]
The rapid adoption of generative AI in the public sector, encompassing
diverse applications ranging from automated public assistance to welfare
services and immigration processes, highlights its transformative potential
while underscoring the pressing need for thorough risk assessments. Despite its
growing presence, evaluations of risks associated with AI-driven systems in the
public sector remain insufficiently explored. Building upon an established
taxonomy of AI risks derived from diverse government policies and corporate
guidelines, we investigate the critical risks posed by generative AI in the
public sector while extending the scope to account for its multimodal
capabilities. In addition, we propose a Systematic dAta generatIon Framework
for evaluating the risks of generative AI (SAIF). SAIF involves four key
stages: breaking down risks, designing scenarios, applying jailbreak methods,
and exploring prompt types. It ensures the systematic and consistent generation
of prompt data, facilitating a comprehensive evaluation while providing a solid
foundation for mitigating the risks. Furthermore, SAIF is designed to
accommodate emerging jailbreak methods and evolving prompt types, thereby
enabling effective responses to unforeseen risk scenarios. We believe that this
study can play a crucial role in fostering the safe and responsible integration
of generative AI into the public sector.
[COMMENTS]
6 pages, 2 figures, 1 tables. AI for Public Missions (AIPM) Workshop
at the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)
[LINK]
http://arxiv.org/abs/2501.08814v1
[DATE]
2025-01-15 22:12:38+08:00
[CATEGORIES]
cs.CL
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
[AUTHORS]
Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang
[ABSTRACT]
Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of
the most intuitive yet challenging embodied AI tasks. Agents are tasked to
navigate towards a target goal by executing a set of low-level actions,
following a series of natural language instructions. All VLN-CE methods in the
literature assume that language instructions are exact. However, in practice,
instructions given by humans can contain errors when describing a spatial
environment due to inaccurate memory or confusion. Current VLN-CE benchmarks do
not address this scenario, making the state-of-the-art methods in VLN-CE
fragile in the presence of erroneous instructions from human users. For the
first time, we propose a novel benchmark dataset that introduces various types
of instruction errors considering potential human causes. This benchmark
provides valuable insight into the robustness of VLN systems in continuous
environments. We observe a noticeable performance drop (up to -25%) in Success
Rate when evaluating the state-of-the-art VLN-CE methods on our benchmark.
Moreover, we formally define the task of Instruction Error Detection and
Localization, and establish an evaluation protocol on top of our benchmark
dataset. We also propose an effective method, based on a cross-modal
transformer architecture, that achieves the best performance in error detection
and localization, compared to baselines. Surprisingly, our proposed method has
revealed errors in the validation set of the two commonly used datasets for
VLN-CE, i.e., R2R-CE and RxR-CE, demonstrating the utility of our technique in
other tasks. Code and dataset available at
https://intelligolabs.github.io/R2RIE-CE
[COMMENTS]
3 figures, 8 pages. Accepted at IROS’24
[LINK]
http://arxiv.org/abs/2403.10700v2
[DATE]
2025-01-15 20:45:24+08:00
[CATEGORIES]
cs.CL
How to Build an AI Tutor That Can Adapt to Any Course Using Knowledge Graph-Enhanced Retrieval-Augmented Generation (KG-RAG)
[AUTHORS]
Chenxi Dong, Yimin Yuan, Kan Chen, Shupei Cheng, Chujie Wen
[ABSTRACT]
This paper introduces a novel framework for adaptable AI tutors using
Knowledge Graph-enhanced Retrieval-Augmented Generation (KG-RAG). This approach
addresses the critical challenges of information hallucination and limited
course-specific adaptation prevalent in Large Language Model (LLM)-based
tutoring systems. By integrating Knowledge Graphs (KGs) with RAG, we provide a
structured representation of course concepts and their interrelationships,
grounding the AI tutor’s responses in relevant, validated material. We leverage
Qwen2.5, a powerful and cost-effective LLM, within our KG-RAG framework. A user
study (n=50) demonstrated positive student feedback regarding answer relevance,
ease of use, and overall satisfaction. This KG-RAG framework offers a promising
pathway towards personalized learning experiences and broader access to
high-quality education.
[COMMENTS]
6 pages, 5 figures
[LINK]
http://arxiv.org/abs/2311.17696v5
[DATE]
2025-01-15 19:12:26+08:00
[CATEGORIES]
cs.CL
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
[AUTHORS]
Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim
[ABSTRACT]
Transformers with linear attention (i.e., linear transformers) and
state-space models have recently been suggested as a viable linear-time
alternative to transformers with softmax attention. However, these models still
underperform transformers especially on tasks that require in-context
retrieval. While more expressive variants of linear transformers which replace
the additive update in linear transformers with the delta rule (DeltaNet) have
been found to be more effective at associative recall, existing algorithms for
training such models do not parallelize over sequence length and are thus
inefficient to train on modern hardware. This work describes a
hardware-efficient algorithm for training linear transformers with the delta
rule, which exploits a memory-efficient representation for computing products
of Householder matrices. This algorithm allows us to scale up DeltaNet to
standard language modeling settings. We train a 1.3B model for 100B tokens and
find that it outperforms recent linear-time baselines such as Mamba and GLA in
terms of perplexity and zero-shot performance on downstream tasks. We also
experiment with two hybrid models which combine DeltaNet layers with (1)
sliding-window attention layers every other layer or (2) two global attention
layers, and find that these hybrids outperform strong transformer baselines.
[COMMENTS]
Final camera ready
[LINK]
http://arxiv.org/abs/2406.06484v6
[DATE]
2025-01-15 18:41:40+08:00
[CATEGORIES]
cs.LG
cs.CL
Deep Learning-Based Feature Fusion for Emotion Analysis and Suicide Risk Differentiation in Chinese Psychological Support Hotlines
[AUTHORS]
Han Wang, Jianqiang Li, Qing Zhao, Zhonglong Chen, Changwei Song, Jing Tang, Yuning Huang, Wei Zhai, Yongsheng Tong, Guanghui Fu
[ABSTRACT]
Mental health is a critical global public health issue, and psychological
support hotlines play a pivotal role in providing mental health assistance and
identifying suicide risks at an early stage. However, the emotional expressions
conveyed during these calls remain underexplored in current research. This
study introduces a method that combines pitch acoustic features with deep
learning-based features to analyze and understand emotions expressed during
hotline interactions. Using data from China’s largest psychological support
hotline, our method achieved an F1-score of 79.13% for negative binary emotion
classification.Additionally, the proposed approach was validated on an open
dataset for multi-class emotion classification,where it demonstrated better
performance compared to the state-of-the-art methods. To explore its clinical
relevance, we applied the model to analysis the frequency of negative emotions
and the rate of emotional change in the conversation, comparing 46 subjects
with suicidal behavior to those without. While the suicidal group exhibited
more frequent emotional changes than the non-suicidal group, the difference was
not statistically significant.Importantly, our findings suggest that emotional
fluctuation intensity and frequency could serve as novel features for
psychological assessment scales and suicide risk prediction.The proposed method
provides valuable insights into emotional dynamics and has the potential to
advance early intervention and improve suicide prevention strategies through
integration with clinical tools and assessments The source code is publicly
available at https://github.com/Sco-field/Speechemotionrecognition/tree/main.
[LINK]
http://arxiv.org/abs/2501.08696v1
[DATE]
2025-01-15 18:09:38+08:00
[CATEGORIES]
cs.CL
Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching
[AUTHORS]
Chuangtao Ma, Sriom Chakrabarti, Arijit Khan, Bálint Molnár
[ABSTRACT]
Traditional similarity-based schema matching methods are incapable of
resolving semantic ambiguities and conflicts in domain-specific complex mapping
scenarios due to missing commonsense and domain-specific knowledge. The
hallucination problem of large language models (LLMs) also makes it challenging
for LLM-based schema matching to address the above issues. Therefore, we
propose a Knowledge Graph-based Retrieval-Augmented Generation model for Schema
Matching, referred to as the KG-RAG4SM. In particular, KG-RAG4SM introduces
novel vector-based, graph traversal-based, and query-based graph retrievals, as
well as a hybrid approach and ranking schemes that identify the most relevant
subgraphs from external large knowledge graphs (KGs). We showcase that KG-based
retrieval-augmented LLMs are capable of generating more accurate results for
complex matching cases without any re-training. Our experimental results show
that KG-RAG4SM outperforms the LLM-based state-of-the-art (SOTA) methods (e.g.,
Jellyfish-8B) by 35.89% and 30.50% in terms of precision and F1 score on the
MIMIC dataset, respectively; KG-RAG4SM with GPT-4o-mini outperforms the
pre-trained language model (PLM)-based SOTA methods (e.g., SMAT) by 69.20% and
21.97% in terms of precision and F1 score on the Synthea dataset, respectively.
The results also demonstrate that our approach is more efficient in end-to-end
schema matching, and scales to retrieve from large KGs. Our case studies on the
dataset from the real-world schema matching scenario exhibit that the
hallucination problem of LLMs for schema matching is well mitigated by our
solution.
[COMMENTS]
Under Review
[LINK]
http://arxiv.org/abs/2501.08686v1
[DATE]
2025-01-15 17:32:37+08:00
[CATEGORIES]
cs.CL
MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
[AUTHORS]
Savya Khosla, Kushal Kafle, Simon Jenni, Handong Zhao, John Collomosse, Jing Shi
[ABSTRACT]
While originally designed for unidirectional generative modeling,
decoder-only large language models (LLMs) are increasingly being adapted for
bidirectional modeling. However, unidirectional and bidirectional models are
typically trained separately with distinct objectives (generation and
representation learning, respectively). This separation overlooks the
opportunity for developing a more versatile language model and for these
objectives to complement each other. In this work, we introduce MAGNET, an
adaptation of decoder-only LLMs that enhances their ability to generate robust
representations and infill missing text spans, while preserving their knowledge
and text generation capabilities. MAGNET employs three self-supervised training
objectives and introduces an attention mechanism that combines bidirectional
and causal attention, enabling unified training across all objectives. Our
results demonstrate that LLMs adapted with MAGNET (1) surpass strong text
encoders on token-level and sentence-level representation learning tasks, (2)
generate contextually appropriate text infills by leveraging future context,
(3) retain the ability for open-ended text generation without exhibiting
repetition problem, and (4) preserve the knowledge gained by the LLM during
pretraining.
[LINK]
http://arxiv.org/abs/2501.08648v1
[DATE]
2025-01-15 16:24:03+08:00
[CATEGORIES]
cs.CL
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
[AUTHORS]
Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2402.16705v2
[DATE]
2025-01-15 16:20:19+08:00
[CATEGORIES]
cs.CL
cs.LG
Mitigating Knowledge Conflicts in Language Model-Driven Question Answering
[AUTHORS]
Han Cao, Zhaoyang Zhang, Xiangtian Li, Chufan Wu, Hansong Zhang, Wenqing Zhang
[ABSTRACT]
In the context of knowledge-driven seq-to-seq generation tasks, such as
document-based question answering and document summarization systems, two
fundamental knowledge sources play crucial roles: the inherent knowledge
embedded within model parameters and the external knowledge obtained through
context. Recent studies revealed a significant challenge: when there exists a
misalignment between the model’s inherent knowledge and the ground truth
answers in training data, the system may exhibit problematic behaviors during
inference, such as ignoring input context, or generating unfaithful content.
Our investigation proposes a strategy to minimize hallucination by building
explicit connection between source inputs and generated outputs. We
specifically target a common hallucination pattern in question answering,
examining how the correspondence between entities and their contexts during
model training influences the system’s performance at inference time.
[COMMENTS]
revised version, more figures
[LINK]
http://arxiv.org/abs/2411.11344v3
[DATE]
2025-01-15 15:46:15+08:00
[CATEGORIES]
cs.CL
TANQ: An open domain dataset of table answered questions
[AUTHORS]
Mubashara Akhtar, Chenxi Pang, Andreea Marzoca, Yasemin Altun, Julian Martin Eisenschlos
[ABSTRACT]
Language models, potentially augmented with tool usage such as retrieval are
becoming the go-to means of answering questions. Understanding and answering
questions in real-world settings often requires retrieving information from
different sources, processing and aggregating data to extract insights, and
presenting complex findings in form of structured artifacts such as novel
tables, charts, or infographics. In this paper, we introduce TANQ, the first
open domain question answering dataset where the answers require building
tables from information across multiple sources. We release the full source
attribution for every cell in the resulting table and benchmark
state-of-the-art language models in open, oracle, and closed book setups. Our
best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging
behind human performance by 19.7 points. We analyse baselines’ performance
across different dataset attributes such as different skills required for this
task, including multi-hop reasoning, math operations, and unit conversions. We
further discuss common failures in model-generated answers, suggesting that
TANQ is a complex task with many challenges ahead.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2405.07765v2
[DATE]
2025-01-15 15:29:20+08:00
[CATEGORIES]
cs.CL
ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair
[AUTHORS]
Hong-Viet Tran, Minh-Quy Nguyen, Van-Vinh Nguyen
[ABSTRACT]
This paper presents an results of the VLSP 2022-2023 Machine Translation
Shared Tasks, focusing on Vietnamese-Chinese and Vietnamese-Lao machine
translation. The tasks were organized as part of the 9th, 10th annual workshop
on Vietnamese Language and Speech Processing (VLSP 2022, VLSP 2023). The
objective of the shared task was to build machine translation systems,
specifically targeting Vietnamese-Chinese and Vietnamese-Lao translation
(corresponding to 4 translation directions). The submission were evaluated on
1,000 pairs for testing (news and general domains) using established metrics
like BLEU [11] and SacreBLEU [12]. Additionally, system outputs also were
evaluated with human judgment provided by experts in Chinese and Lao languages.
These human assessments played a crucial role in ranking the performance of the
machine translation models, ensuring a more comprehensive evaluation.
[LINK]
http://arxiv.org/abs/2501.08621v1
[DATE]
2025-01-15 14:40:26+08:00
[CATEGORIES]
cs.CL
Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
[AUTHORS]
Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller
[ABSTRACT]
All natural languages are structured hierarchically. In humans, this
structural restriction is neurologically coded: when two grammars are presented
with identical vocabularies, brain areas responsible for language processing
are only sensitive to hierarchical grammars. Using large language models
(LLMs), we investigate whether such functionally distinct hierarchical
processing regions can arise solely from exposure to large-scale language
distributions. We generate inputs using English, Italian, Japanese, or nonce
words, varying the underlying grammars to conform to either hierarchical or
linear/positional rules. Using these grammars, we first observe that language
models show distinct behaviors on hierarchical versus linearly structured
inputs. Then, we find that the components responsible for processing
hierarchical grammars are distinct from those that process linear grammars; we
causally verify this in ablation experiments. Finally, we observe that
hierarchy-selective components are also active on nonce grammars; this suggests
that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs.
[LINK]
http://arxiv.org/abs/2501.08618v1
[DATE]
2025-01-15 14:34:34+08:00
[CATEGORIES]
cs.CL
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
[AUTHORS]
Kaiqu Liang, Haimin Hu, Ryan Liu, Thomas L. Griffiths, Jaime Fernández Fisac
[ABSTRACT]
Generative AI systems like foundation models (FMs) must align well with human
values to ensure their behavior is helpful and trustworthy. While Reinforcement
Learning from Human Feedback (RLHF) has shown promise for optimizing model
performance using human judgments, existing RLHF pipelines predominantly rely
on immediate feedback, which can fail to accurately reflect the downstream
impact of an interaction on users’ utility. We demonstrate that feedback based
on evaluators’ foresight estimates of downstream consequences systematically
induces Goodhart’s Law dynamics, incentivizing misaligned behaviors like
sycophancy and deception and ultimately degrading user outcomes. To alleviate
this, we propose decoupling evaluation from prediction by refocusing RLHF on
hindsight feedback. Our theoretical analysis reveals that conditioning
evaluator feedback on downstream observations mitigates misalignment and
improves expected human utility, even when these observations are simulated by
the AI system itself. To leverage this insight in a practical alignment
algorithm, we introduce Reinforcement Learning from Hindsight Simulation
(RLHS), which first simulates plausible consequences and then elicits feedback
to assess what behaviors were genuinely beneficial in hindsight. We apply RLHS
to two widely-employed online and offline preference optimization methods –
Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) –
and show empirically that misalignment is significantly reduced with both
methods. Through an online human user study, we show that RLHS consistently
outperforms RLHF in helping users achieve their goals and earns higher
satisfaction ratings, despite being trained solely with simulated hindsight
feedback. These results underscore the importance of focusing on long-term
consequences, even simulated ones, to mitigate misalignment in RLHF.
[LINK]
http://arxiv.org/abs/2501.08617v1
[DATE]
2025-01-15 14:33:15+08:00
[CATEGORIES]
cs.LG
cs.CL
Noise-powered Multi-modal Knowledge Graph Representation Framework
[AUTHORS]
Zhuo Chen, Yin Fang, Yichi Zhang, Lingbing Guo, Jiaoyan Chen, Jeff Z. Pan, Huajun Chen, Wen Zhang
[ABSTRACT]
The rise of Multi-modal Pre-training highlights the necessity for a unified
Multi-Modal Knowledge Graph (MMKG) representation learning framework. Such a
framework is essential for embedding structured knowledge into multi-modal
Large Language Models effectively, alleviating issues like knowledge
misconceptions and multi-modal hallucinations. In this work, we explore the
efficacy of models in accurately embedding entities within MMKGs through two
pivotal tasks: Multi-modal Knowledge Graph Completion (MKGC) and Multi-modal
Entity Alignment (MMEA). Building on this foundation, we propose a novel SNAG
method that utilizes a Transformer-based architecture equipped with
modality-level noise masking to robustly integrate multi-modal entity features
in KGs. By incorporating specific training objectives for both MKGC and MMEA,
our approach achieves SOTA performance across a total of ten datasets,
demonstrating its versatility. Moreover, SNAG can not only function as a
standalone model but also enhance other existing methods, providing stable
performance improvements. Code and data are available at
https://github.com/zjukg/SNAG.
[COMMENTS]
COLING 2025 Accepted, Repo is available at
https://github.com/zjukg/SNAG
[LINK]
http://arxiv.org/abs/2403.06832v4
[DATE]
2025-01-15 14:30:19+08:00
[CATEGORIES]
cs.CL
Assessing the Alignment of FOL Closeness Metrics with Human Judgement
[AUTHORS]
Ramya Keerthy Thatikonda, Wray Buntine, Ehsan Shareghi
[ABSTRACT]
The recent successful paradigm of solving logical reasoning problems with
tool-augmented large language models (LLMs) leverages translation of natural
language statements into First-Order Logic~(FOL) and external theorem provers.
However, the correctness of FOL statements, comprising operators and text
predicates, often goes unverified due to the lack of a reliable evaluation
metric for comparing generated and ground-truth FOLs. In this paper, we present
a comprehensive study of sensitivity of existing metrics and their alignment
with human judgement on FOL evaluation. Using ground-truth FOLs, we carefully
designed various perturbations on the ground-truth to assess metric
sensitivity. We sample FOL translation candidates for natural language
statements and measure the ranking alignment between automatic metrics and
human annotators. Our empirical findings highlight oversensitivity in the
n-gram metric BLEU for text perturbations, the semantic graph metric Smatch++
for structural perturbations, and FOL metric for operator perturbation. We also
observe a closer alignment between BertScore and human judgement. Additionally,
we show that combining metrics enhances both alignment and sensitivity compared
to using individual metrics.
[COMMENTS]
Code: https://github.com/RamyaKeerthy/AlignmentFOL
[LINK]
http://arxiv.org/abs/2501.08613v1
[DATE]
2025-01-15 14:22:35+08:00
[CATEGORIES]
cs.CL
Comparative Analysis of Listwise Reranking with Large Language Models in Limited-Resource Language Contexts
[AUTHORS]
Yanxin Shen, Lun Wang, Chuanqi Shi, Shaoshuai Du, Yiyi Tao, Yixian Shen, Hang Zhang
[ABSTRACT]
Large Language Models (LLMs) have demonstrated significant effectiveness
across various NLP tasks, including text ranking. This study assesses the
performance of large language models (LLMs) in listwise reranking for
limited-resource African languages. We compare proprietary models RankGPT3.5,
Rank4o-mini, RankGPTo1-mini and RankClaude-sonnet in cross-lingual contexts.
Results indicate that these LLMs significantly outperform traditional baseline
methods such as BM25-DT in most evaluation metrics, particularly in nDCG@10 and
MRR@100. These findings highlight the potential of LLMs in enhancing reranking
tasks for low-resource languages and offer insights into cost-effective
solutions.
[LINK]
http://arxiv.org/abs/2412.20061v2
[DATE]
2025-01-15 14:15:13+08:00
[CATEGORIES]
cs.CL
Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning
[AUTHORS]
Julian Perry, Surasakdi Siripong, Thanakorn Phonchai
[ABSTRACT]
Large Vision-Language Models (LVLMs) have demonstrated impressive
capabilities in multimodal tasks, but their performance is often constrained by
the lack of external knowledge integration, limiting their ability to handle
knowledge-intensive tasks such as visual question answering and reasoning. To
address this challenge, we propose a novel method, Adaptive Knowledge-Guided
Pretraining for Large Vision-Language Models (AKGP-LVLM), which dynamically
incorporates structured and unstructured knowledge into LVLMs during
pretraining and fine-tuning. Our approach employs a knowledge encoder to
represent external knowledge, a retrieval mechanism to select task-relevant
information, and a dynamic adaptor to align multimodal and knowledge
representations effectively. We evaluate our method on four benchmark datasets,
demonstrating significant performance improvements over state-of-the-art
models. Furthermore, human evaluations highlight the superior correctness and
relevance of our model’s outputs. Extensive analyses confirm the robustness,
efficiency, and scalability of AKGP-LVLM, making it a compelling solution for
real-world knowledge-intensive tasks.
[LINK]
http://arxiv.org/abs/2501.08597v1
[DATE]
2025-01-15 13:45:04+08:00
[CATEGORIES]
cs.CL
LoRS: Efficient Low-Rank Adaptation for Sparse Large Language Model
[AUTHORS]
Yuxuan Hu, Jing Zhang, Xiaodong Chen, Zhe Zhao, Cuiping Li, Hong Chen
[ABSTRACT]
Existing low-rank adaptation (LoRA) methods face challenges on sparse large
language models (LLMs) due to the inability to maintain sparsity. Recent works
introduced methods that maintain sparsity by augmenting LoRA techniques with
additional masking mechanisms. Despite these successes, such approaches suffer
from an increased memory and computation overhead, which affects efficiency of
LoRA methods. In response to this limitation, we introduce LoRS, an innovative
method designed to achieve both memory and computation efficiency when
fine-tuning sparse LLMs. To mitigate the substantial memory and computation
demands associated with preserving sparsity, our approach incorporates
strategies of weight recompute and computational graph rearrangement. In
addition, we also improve the effectiveness of LoRS through better adapter
initialization. These innovations lead to a notable reduction in memory and
computation consumption during the fine-tuning phase, all while achieving
performance levels that outperform existing LoRA approaches.
[COMMENTS]
12 pages, 4 figures
[LINK]
http://arxiv.org/abs/2501.08582v1
[DATE]
2025-01-15 13:07:06+08:00
[CATEGORIES]
cs.CL
What Limits LLM-based Human Simulation: LLMs or Our Design?
[AUTHORS]
Qian Wang, Jiaying Wu, Zhenheng Tang, Bingqiao Luo, Nuo Chen, Wei Chen, Bingsheng He
[ABSTRACT]
We argue that advancing LLM-based human simulation requires addressing both
LLM’s inherent limitations and simulation framework design challenges. Recent
studies have revealed significant gaps between LLM-based human simulations and
real-world observations, highlighting these dual challenges. To address these
gaps, we present a comprehensive analysis of LLM limitations and our design
issues, proposing targeted solutions for both aspects. Furthermore, we explore
future directions that address both challenges simultaneously, particularly in
data collection, LLM generation, and evaluation. To support further research in
this field, we provide a curated collection of LLM-based human simulation
resources.\footnote{https://github.com/Persdre/llm-human-simulation}
[LINK]
http://arxiv.org/abs/2501.08579v1
[DATE]
2025-01-15 12:59:49+08:00
[CATEGORIES]
cs.CL
Boosting Tool Use of Large Language Models via Iterative Reinforced Fine-Tuning
[AUTHORS]
Yirong Zeng, Xiao Ding, Yuxian Wang, Weiwen Liu, Wu Ning, Yutai Hou, Xu Huang, Bing Qin, Ting Liu
[ABSTRACT]
Augmenting large language models (LLMs) with external tools is a promising
approach to enhance their capabilities. Effectively leveraging this potential
for complex tasks hinges crucially on improving their ability to use tools.
Synthesizing tool use data by simulating the real world is an effective
approach. Nevertheless, our investigation reveals that training gains
significantly decay as the scale of these data increases. The primary factor is
the model’s poor performance (a.k.a deficiency) in complex scenarios, which
hinders learning from data using SFT. Driven by this objective, we propose an
iterative reinforced fine-tuning strategy to continually guide the model to
alleviate it. Specifically, we first identify deficiency-related data based on
feedback from the policy model, then perform a Monte Carlo Tree Search to
collect fine-grained preference pairs to pinpoint deficiencies. Subsequently,
we update the policy model using preference optimization to align with ground
truth and misalign with deficiencies. This process can be iterated. Moreover,
before the iteration, we propose an easy-to-hard warm-up SFT strategy to
facilitate learning from challenging data. The experiments demonstrate our
models go beyond the same parametric models, outperforming many larger
open-source and closed-source models. Additionally, it has achieved notable
training gains in complex tool use scenarios.
[LINK]
http://arxiv.org/abs/2501.09766v1
[DATE]
2025-01-15 12:52:34+08:00
[CATEGORIES]
cs.CL
cs.LG
Do Large Language Models Mirror Cognitive Language Processing?
[AUTHORS]
Yuqi Ren, Renren Jin, Tongxuan Zhang, Deyi Xiong
[ABSTRACT]
Large Language Models (LLMs) have demonstrated remarkable abilities in text
comprehension and logical reasoning, indicating that the text representations
learned by LLMs can facilitate their language processing capabilities. In
neuroscience, brain cognitive processing signals are typically utilized to
study human language processing. Therefore, it is natural to ask how well the
text embeddings from LLMs align with the brain cognitive processing signals,
and how training strategies affect the LLM-brain alignment? In this paper, we
employ Representational Similarity Analysis (RSA) to measure the alignment
between 23 mainstream LLMs and fMRI signals of the brain to evaluate how
effectively LLMs simulate cognitive language processing. We empirically
investigate the impact of various factors (e.g., pre-training data size, model
scaling, alignment training, and prompts) on such LLM-brain alignment.
Experimental results indicate that pre-training data size and model scaling are
positively correlated with LLM-brain similarity, and alignment training can
significantly improve LLM-brain similarity. Explicit prompts contribute to the
consistency of LLMs with brain cognitive language processing, while nonsensical
noisy prompts may attenuate such alignment. Additionally, the performance of a
wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated
with the LLM-brain similarity.
[LINK]
http://arxiv.org/abs/2402.18023v3
[DATE]
2025-01-15 12:47:36+08:00
[CATEGORIES]
cs.CL
Information Entropy Invariance: Enhancing Length Extrapolation in Attention Mechanisms
[AUTHORS]
Kewei Li, Yanwen Kong, Yiping Xu, Lan Huang, Ruochi Zhang, Fengfeng Zhou
[ABSTRACT]
Improving the length extrapolation capabilities of Large Language Models
(LLMs) remains a critical challenge in natural language processing. Many recent
efforts have focused on modifying the scaled dot-product attention mechanism,
and often introduce scaled temperatures without rigorous theoretical
justification. To fill this gap, we introduce a novel approach based on
information entropy invariance. We propose two new scaled temperatures to
enhance length extrapolation. First, a training-free method InfoScale is
designed for dot-product attention, and preserves focus on original tokens
during length extrapolation by ensuring information entropy remains consistent.
Second, we theoretically analyze the impact of scaling (CosScale) on cosine
attention. Experimental data demonstrates that combining InfoScale and CosScale
achieves state-of-the-art performance on the GAU-{\alpha} model with a context
window extended to 64 times the training length, and outperforms seven existing
methods. Our analysis reveals that significantly increasing CosScale
approximates windowed attention, and highlights the significance of attention
score dilution as a key challenge in long-range context handling. The code and
data are available at https://github.com/HT-NEKO/InfoScale.
[LINK]
http://arxiv.org/abs/2501.08570v1
[DATE]
2025-01-15 12:32:41+08:00
[CATEGORIES]
cs.CL
Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructing
[AUTHORS]
Fan Yuan, Xiaoyuan Fang, Rong Quan, Jing Li, Wei Bi, Xiaogang Xu, Piji Li
[ABSTRACT]
Visual Commonsense Reasoning, which is regarded as one challenging task to
pursue advanced visual scene comprehension, has been used to diagnose the
reasoning ability of AI systems. However, reliable reasoning requires a good
grasp of the scene’s details. Existing work fails to effectively exploit the
real-world object relationship information present within the scene, and
instead overly relies on knowledge from training memory. Based on these
observations, we propose a novel scene-graph-enhanced visual commonsense
reasoning generation method named \textit{\textbf{G2}}, which first utilizes
the image patches and LLMs to construct a location-free scene graph, and then
answer and explain based on the scene graph’s information. We also propose
automatic scene graph filtering and selection strategies to absorb valuable
scene graph information during training. Extensive experiments are conducted on
the tasks and datasets of scene graph constructing and visual commonsense
answering and explaining, respectively. Experimental results and ablation
analysis demonstrate the effectiveness of our proposed framework.
[LINK]
http://arxiv.org/abs/2501.09041v1
[DATE]
2025-01-15 12:00:36+08:00
[CATEGORIES]
cs.CL
Natural Language Outlines for Code: Literate Programming in the LLM Era
[AUTHORS]
Kensen Shi, Deniz Altınbüken, Saswat Anand, Mihai Christodorescu, Katja Grünwedel, Alexa Koenings, Sai Naidu, Anurag Pathak, Marc Rasi, Fredde Ribeiro, Brandon Ruffin, Siddhant Sanyam, Maxim Tabachnyk, Sara Toth, Roy Tu, Tobias Welp, Pengcheng Yin, Manzil Zaheer, Satish Chandra, Charles Sutton
[ABSTRACT]
We propose using natural language outlines as a novel modality and
interaction surface for providing AI assistance to developers throughout the
software development process. An NL outline for a code function comprises
multiple statements written in concise prose, which partition the code and
summarize its main ideas in the style of literate programming. Crucially, we
find that modern LLMs can generate accurate and high-quality NL outlines in
practice. Moreover, NL outlines enable a bidirectional sync between code and
NL, allowing changes in one to be automatically reflected in the other. We
discuss many use cases for NL outlines: they can accelerate understanding and
navigation of code and diffs, simplify code maintenance, augment code search,
steer code generation, and more. We then propose and compare multiple LLM
prompting techniques for generating outlines and ask professional developers to
judge outline quality. Finally, we present two case studies applying NL
outlines toward code review and malware detection.
[LINK]
http://arxiv.org/abs/2408.04820v2
[DATE]
2025-01-15 11:43:22+08:00
[CATEGORIES]
cs.CL
cs.LG
Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
[AUTHORS]
Yi Fang, Moxin Li, Wenjie Wang, Hui Lin, Fuli Feng
[ABSTRACT]
Large Language Models (LLMs) excel in various natural language processing
tasks but struggle with hallucination issues. Existing solutions have
considered utilizing LLMs’ inherent reasoning abilities to alleviate
hallucination, such as self-correction and diverse sampling methods. However,
these methods often overtrust LLMs’ initial answers due to inherent biases. The
key to alleviating this issue lies in overriding LLMs’ inherent biases for
answer inspection. To this end, we propose a CounterFactual Multi-Agent Debate
(CFMAD) framework. CFMAD presets the stances of LLMs to override their inherent
biases by compelling LLMs to generate justifications for a predetermined
answer’s correctness. The LLMs with different predetermined stances are engaged
with a skeptical critic for counterfactual debate on the rationality of
generated justifications. Finally, the debate process is evaluated by a
third-party judge to determine the final answer. Extensive experiments on four
datasets of three tasks demonstrate the superiority of CFMAD over existing
methods.
[COMMENTS]
accepted by COLING 2025
[LINK]
http://arxiv.org/abs/2406.11514v2
[DATE]
2025-01-15 11:20:24+08:00
[CATEGORIES]
cs.CL
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model
[AUTHORS]
Xinshuo Hu, Zifei Shan, Xinping Zhao, Zetian Sun, Zhenyu Liu, Dongfang Li, Shaolin Ye, Xinyuan Wei, Qian Chen, Baotian Hu, Haofen Wang, Jun Yu, Min Zhang
[ABSTRACT]
As retrieval-augmented generation prevails in large language models,
embedding models are becoming increasingly crucial. Despite the growing number
of general embedding models, prior work often overlooks the critical role of
training data quality. In this work, we introduce KaLM-Embedding, a general
multilingual embedding model that leverages a large quantity of cleaner, more
diverse, and domain-specific training data. Our model has been trained with key
techniques proven to enhance performance: (1) persona-based synthetic data to
create diversified examples distilled from LLMs, (2) ranking consistency
filtering to remove less informative samples, and (3) semi-homogeneous task
batch sampling to improve training efficacy. Departing from traditional
BERT-like architectures, we adopt Qwen2-0.5B as the pre-trained model,
facilitating the adaptation of auto-regressive language models for general
embedding tasks. Extensive evaluations of the MTEB benchmark across multiple
languages show that our model outperforms others of comparable size, setting a
new standard for multilingual embedding models with <1B parameters.
[COMMENTS]
Technical Report. 23 pages, 6 figures, 10 tables
[LINK]
http://arxiv.org/abs/2501.01028v4
[DATE]
2025-01-15 11:02:22+08:00
[CATEGORIES]
cs.CL
Knowledge prompt chaining for semantic modeling
[AUTHORS]
Ning Pei Ding, Jingge Du, Zaiwen Feng
[ABSTRACT]
The task of building semantics for structured data such as CSV, JSON, and XML
files is highly relevant in the knowledge representation field. Even though we
have a vast of structured data on the internet, mapping them to domain
ontologies to build semantics for them is still very challenging as it requires
the construction model to understand and learn graph-structured knowledge.
Otherwise, the task will require human beings’ effort and cost. In this paper,
we proposed a novel automatic semantic modeling framework: Knowledge Prompt
Chaining. It can serialize the graph-structured knowledge and inject it into
the LLMs properly in a Prompt Chaining architecture. Through this knowledge
injection and prompting chaining, the model in our framework can learn the
structure information and latent space of the graph and generate the semantic
labels and semantic graphs following the chains’ insturction naturally. Based
on experimental results, our method achieves better performance than existing
leading techniques, despite using reduced structured input data.
[LINK]
http://arxiv.org/abs/2501.08540v1
[DATE]
2025-01-15 11:00:57+08:00
[CATEGORIES]
cs.CL
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following
[AUTHORS]
Yin Fang, Xinle Deng, Kangwei Liu, Ningyu Zhang, Jingyang Qian, Penghui Yang, Xiaohui Fan, Huajun Chen
[ABSTRACT]
Large language models excel at interpreting complex natural language
instructions, enabling them to perform a wide range of tasks. In the life
sciences, single-cell RNA sequencing (scRNA-seq) data serves as the “language
of cellular biology”, capturing intricate gene expression patterns at the
single-cell level. However, interacting with this “language” through
conventional tools is often inefficient and unintuitive, posing challenges for
researchers. To address these limitations, we present InstructCell, a
multi-modal AI copilot that leverages natural language as a medium for more
direct and flexible single-cell analysis. We construct a comprehensive
multi-modal instruction dataset that pairs text-based instructions with
scRNA-seq profiles from diverse tissues and species. Building on this, we
develop a multi-modal cell language architecture capable of simultaneously
interpreting and processing both modalities. InstructCell empowers researchers
to accomplish critical tasks-such as cell type annotation, conditional
pseudo-cell generation, and drug sensitivity prediction-using straightforward
natural language commands. Extensive evaluations demonstrate that InstructCell
consistently meets or exceeds the performance of existing single-cell
foundation models, while adapting to diverse experimental conditions. More
importantly, InstructCell provides an accessible and intuitive tool for
exploring complex single-cell data, lowering technical barriers and enabling
deeper biological insights.
[COMMENTS]
37 pages; 13 figures; Code: https://github.com/zjunlp/Instructcell,
Models: https://huggingface.co/zjunlp/Instructcell-chat,
https://huggingface.co/zjunlp/InstructCell-instruct
[LINK]
http://arxiv.org/abs/2501.08187v2
[DATE]
2025-01-15 10:59:32+08:00
[CATEGORIES]
cs.CL
cs.LG
Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers
[AUTHORS]
Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu
[ABSTRACT]
Transformers have demonstrated impressive capabilities across various tasks,
yet their performance on compositional problems remains a subject of debate. In
this study, we investigate the internal mechanisms underlying Transformers’
behavior in compositional tasks. We find that complexity control strategies
significantly influence whether the model learns primitive-level rules that
generalize out-of-distribution (reasoning-based solutions) or relies solely on
memorized mappings (memory-based solutions). By applying masking strategies to
the model’s information circuits and employing multiple complexity metrics, we
reveal distinct internal working mechanisms associated with different solution
types. Further analysis reveals that reasoning-based solutions exhibit a lower
complexity bias, which aligns with the well-studied neuron condensation
phenomenon. This lower complexity bias is hypothesized to be the key factor
enabling these solutions to learn reasoning rules. We validate these
conclusions across multiple real-world datasets, including image generation and
natural language processing tasks, confirming the broad applicability of our
findings.
[COMMENTS]
Mistakenly submitted as a replacement to 2405.05409v4
[LINK]
http://arxiv.org/abs/2501.08537v1
[DATE]
2025-01-15 10:54:52+08:00
[CATEGORIES]
cs.CL
cs.LG
Understanding Emergent Abilities of Language Models from the Loss Perspective
[AUTHORS]
Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang
[ABSTRACT]
Recent studies have put into question the belief that emergent abilities in
language models are exclusive to large models. This skepticism arises from two
observations: 1) smaller models can also exhibit high performance on emergent
abilities and 2) there is doubt on the discontinuous metrics used to measure
these abilities. In this paper, we propose to study emergent abilities in the
lens of pre-training loss, instead of model size or training compute. We
demonstrate that the Transformer models with the same pre-training loss, but
different model and data sizes, generate the same performance on various
downstream tasks, with a fixed data corpus, tokenization, and model
architecture. We also discover that a model exhibits emergent abilities on
certain tasks – regardless of the continuity of metrics – when its
pre-training loss falls below a specific threshold. Before reaching this
threshold, its performance remains at the level of random guessing. This
inspires us to redefine emergent abilities as those that manifest in models
with lower pre-training losses, highlighting that these abilities cannot be
predicted by merely extrapolating the performance trends of models with higher
pre-training losses.
[COMMENTS]
23 pages, 8 figures. Accepted in NeurIPS 2024
[LINK]
http://arxiv.org/abs/2403.15796v3
[DATE]
2025-01-15 10:48:59+08:00
[CATEGORIES]
cs.CL
cs.LG
Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation
[AUTHORS]
Jiaxin Guo, Yuanchang Luo, Daimeng Wei, Ling Zhang, Zongyao Li, Hengchao Shang, Zhiqiang Rao, Shaojun Li, Jinlong Yang, Zhanglin Wu, Hao Yang
[ABSTRACT]
The field of artificial intelligence has witnessed significant advancements
in natural language processing, largely attributed to the capabilities of Large
Language Models (LLMs). These models form the backbone of Agents designed to
address long-context dependencies, particularly in Document-level Machine
Translation (DocMT). DocMT presents unique challenges, with quality,
consistency, and fluency being the key metrics for evaluation. Existing
approaches, such as Doc2Doc and Doc2Sent, either omit sentences or compromise
fluency. This paper introduces Doc-Guided Sent2Sent++, an Agent that employs an
incremental sentence-level forced decoding strategy \textbf{to ensure every
sentence is translated while enhancing the fluency of adjacent sentences.} Our
Agent leverages a Doc-Guided Memory, focusing solely on the summary and its
translation, which we find to be an efficient approach to maintaining
consistency. Through extensive testing across multiple languages and domains,
we demonstrate that Sent2Sent++ outperforms other methods in terms of quality,
consistency, and fluency. The results indicate that, our approach has achieved
significant improvements in metrics such as s-COMET, d-COMET, LTCR-$1_f$, and
document-level perplexity (d-ppl). The contributions of this paper include a
detailed analysis of current DocMT research, the introduction of the
Sent2Sent++ decoding method, the Doc-Guided Memory mechanism, and validation of
its effectiveness across languages and domains.
[LINK]
http://arxiv.org/abs/2501.08523v1
[DATE]
2025-01-15 10:25:35+08:00
[CATEGORIES]
cs.CL
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
[AUTHORS]
Beyazit Yalcinkaya, Niklas Lauffer, Marcell Vazquez-Chanlatte, Sanjit A. Seshia
[ABSTRACT]
Goal-conditioned reinforcement learning is a powerful way to control an AI
agent’s behavior at runtime. That said, popular goal representations, e.g.,
target states or natural language, are either limited to Markovian tasks or
rely on ambiguous task semantics. We propose representing temporal goals using
compositions of deterministic finite automata (cDFAs) and use cDFAs to guide RL
agents. cDFAs balance the need for formal temporal semantics with ease of
interpretation: if one can understand a flow chart, one can understand a cDFA.
On the other hand, cDFAs form a countably infinite concept class with Boolean
semantics, and subtle changes to the automaton can result in very different
tasks, making them difficult to condition agent behavior on. To address this,
we observe that all paths through a DFA correspond to a series of reach-avoid
tasks and propose pre-training graph neural network embeddings on “reach-avoid
derived” DFAs. Through empirical evaluation, we demonstrate that the proposed
pre-training method enables zero-shot generalization to various cDFA task
classes and accelerated policy specialization without the myopic suboptimality
of hierarchical methods.
[LINK]
http://arxiv.org/abs/2411.00205v2
[DATE]
2025-01-15 09:46:25+08:00
[CATEGORIES]
cs.LG
cs.CL
2 OLMo 2 Furious
[AUTHORS]
Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V. Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi
[ABSTRACT]
We present OLMo 2, the next generation of our fully open language models.
OLMo 2 includes dense autoregressive models with improved architecture and
training recipe, pretraining data mixtures, and instruction tuning recipes. Our
modified model architecture and training recipe achieve both better training
stability and improved per-token efficiency. Our updated pretraining data
mixture introduces a new, specialized data mix called Dolmino Mix 1124, which
significantly improves model capabilities across many downstream task
benchmarks when introduced via late-stage curriculum training (i.e. specialized
data during the annealing phase of pretraining). Finally, we incorporate best
practices from T"ulu 3 to develop OLMo 2-Instruct, focusing on permissive data
and extending our final-stage reinforcement learning with verifiable rewards
(RLVR). Our OLMo 2 base models sit at the Pareto frontier of performance to
compute, often matching or outperforming open-weight only models like Llama 3.1
and Qwen 2.5 while using fewer FLOPs and with fully transparent training data,
code, and recipe. Our fully open OLMo 2-Instruct models are competitive with or
surpassing open-weight only models of comparable size, including Qwen 2.5,
Llama 3.1 and Gemma 2. We release all OLMo 2 artifacts openly – models at 7B
and 13B scales, both pretrained and post-trained, including their full training
data, training code and recipes, training logs and thousands of intermediate
checkpoints. The final instruction model is available on the Ai2 Playground as
a free research demo.
[COMMENTS]
Model demo available at playground.allenai.org
[LINK]
http://arxiv.org/abs/2501.00656v2
[DATE]
2025-01-15 09:44:16+08:00
[CATEGORIES]
cs.CL
cs.LG
Adapting Whisper for Regional Dialects: Enhancing Public Services for Vulnerable Populations in the United Kingdom
[AUTHORS]
Melissa Torgbi, Andrew Clayman, Jordan J. Speight, Harish Tayyar Madabushi
[ABSTRACT]
We collect novel data in the public service domain to evaluate the capability
of the state-of-the-art automatic speech recognition (ASR) models in capturing
regional differences in accents in the United Kingdom (UK), specifically
focusing on two accents from Scotland with distinct dialects. This study
addresses real-world problems where biased ASR models can lead to
miscommunication in public services, disadvantaging individuals with regional
accents particularly those in vulnerable populations. We first examine the
out-of-the-box performance of the Whisper large-v3 model on a baseline dataset
and our data. We then explore the impact of fine-tuning Whisper on the
performance in the two UK regions and investigate the effectiveness of existing
model evaluation techniques for our real-world application through manual
inspection of model errors. We observe that the Whisper model has a higher word
error rate (WER) on our test datasets compared to the baseline data and
fine-tuning on a given data improves performance on the test dataset with the
same domain and accent. The fine-tuned models also appear to show improved
performance when applied to the test data outside of the region it was trained
on suggesting that fine-tuned models may be transferable within parts of the
UK. Our manual analysis of model outputs reveals the benefits and drawbacks of
using WER as an evaluation metric and fine-tuning to adapt to regional
dialects.
[LINK]
http://arxiv.org/abs/2501.08502v1
[DATE]
2025-01-15 08:39:21+08:00
[CATEGORIES]
cs.CL
Automated Review Generation Method Based on Large Language Models
[AUTHORS]
Shican Wu, Xiao Ma, Dehui Luo, Lulu Li, Xiangcheng Shi, Xin Chang, Xiaoyun Lin, Ran Luo, Chunlei Pei, Changying Du, Zhi-Jian Zhao, Jinlong Gong
[ABSTRACT]
Literature research, vital for scientific work, faces the challenge of
surging information volumes exceeding researchers’ processing capabilities. We
present an automated review generation method based on large language models
(LLMs) to overcome efficiency bottlenecks and reduce cognitive load. Our
statistically validated evaluation framework demonstrates that the generated
reviews match or exceed manual quality, offering broad applicability across
research fields without requiring users’ domain knowledge. Applied to propane
dehydrogenation (PDH) catalysts, our method swiftly analyzed 343 articles,
averaging seconds per article per LLM account, producing comprehensive reviews
spanning 35 topics, with extended analysis of 1041 articles providing insights
into catalysts’ properties. Through multi-layered quality control, we
effectively mitigated LLMs’ hallucinations, with expert verification confirming
accuracy and citation integrity while demonstrating hallucination risks reduced
to below 0.5\% with 95\% confidence. Released Windows application enables
one-click review generation, enhancing research productivity and literature
recommendation efficiency while setting the stage for broader scientific
explorations.
[COMMENTS]
21 pages, 5 figures, 1 tables Code:
https://github.com/TJU-ECAT-AI/AutomaticReviewGeneration Data:
https://github.com/TJU-ECAT-AI/AutomaticReviewGenerationData This research
has been invited for a Short Oral presentation at the 18th ICC -
International Congress on Catalysis, taking place in Lyon, France from July
14-19, 2024
[LINK]
http://arxiv.org/abs/2407.20906v4
[DATE]
2025-01-15 08:10:57+08:00
[CATEGORIES]
cs.CL
Quantifying the Importance of Data Alignment in Downstream Model Performance
[AUTHORS]
Krrish Chawla, Aryan Sahai, Mario DePavia, Sudharsan Sundar, Brando Miranda
[ABSTRACT]
Contrary to the conventional emphasis on dataset size, we explore the role of
data alignment – an often overlooked aspect of data quality – in training
capable Large Language Models (LLMs). To do so, we use the Task2Vec-based
alignment coefficient, a quantitative measure of the similarity between two
datasets, to quantify the impact of alignment between training data and
evaluation data on downstream performance. In particular, we conduct controlled
\textit{interventional} experiments for two settings: 1. the impact of
increased alignment coefficients between various pre-training (pt) against
evaluation datasets, and 2. the impact of increased alignment coefficients
between domain specific fine-tuning (ft) against domain specific evaluation.
The domain specific task we explore is Autoformalization – the machine
translation task between natural language and code for formal verification. In
both settings, we find a strong, predictable negative correlation between the
alignment coefficient of a model’s training and evaluation data and the model’s
loss/perplexity on the respective downstream task. These findings suggest a
re-evaluation of LLM training approaches, demonstrating the relevance of data
alignment compared to data quantity, especially in specialized downstream tasks
such as Autoformalization.
[LINK]
http://arxiv.org/abs/2501.08496v1
[DATE]
2025-01-15 07:59:23+08:00
[CATEGORIES]
cs.CL
cs.LG
The Theater Stage as Laboratory: Review of Real-Time Comedy LLM Systems for Live Performance
[AUTHORS]
Piotr Wojciech Mirowski, Boyd Branch, Kory Wallace Mathewson
[ABSTRACT]
In this position paper, we review the eclectic recent history of academic and
artistic works involving computational systems for humor generation, and focus
specifically on live performance. We make the case that AI comedy should be
evaluated in live conditions, in front of audiences sharing either physical or
online spaces, and under real-time constraints. We further suggest that
improvised comedy is therefore the perfect substrate for deploying and
assessing computational humor systems. Using examples of successful AI-infused
shows, we demonstrate that live performance raises three sets of challenges for
computational humor generation: 1) questions around robotic embodiment,
anthropomorphism and competition between humans and machines, 2) questions
around comedic timing and the nature of audience interaction, and 3) questions
about the human interpretation of seemingly absurd AI-generated humor. We argue
that these questions impact the choice of methodologies for evaluating
computational humor, as any such method needs to work around the constraints of
live audiences and performance spaces. These interrogations also highlight
different types of collaborative relationship of human comedians towards AI
tools.
[COMMENTS]
8 pages, 1st Workshop on Computational Humor (CHum), COLING 2025
[LINK]
http://arxiv.org/abs/2501.08474v1
[DATE]
2025-01-15 06:38:55+08:00
[CATEGORIES]
cs.CL
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time
[AUTHORS]
Mihai Masala, Marius Leordeanu
[ABSTRACT]
In the current era of Machine Learning, Transformers have become the de facto
approach across a variety of domains, such as computer vision and natural
language processing. Transformer-based solutions are the backbone of current
state-of-the-art methods for language generation, image and video
classification, segmentation, action and object recognition, among many others.
Interestingly enough, while these state-of-the-art methods produce impressive
results in their respective domains, the problem of understanding the
relationship between vision and language is still beyond our reach. In this
work, we propose a common ground between vision and language based on events in
space and time in an explainable and programmatic way, to connect
learning-based vision and language state of the art models and provide a
solution to the long standing problem of describing videos in natural language.
We validate that our algorithmic approach is able to generate coherent, rich
and relevant textual descriptions on videos collected from a variety of
datasets, using both standard metrics (e.g. Bleu, ROUGE) and the modern
LLM-as-a-Jury approach.
[LINK]
http://arxiv.org/abs/2501.08460v1
[DATE]
2025-01-15 06:09:06+08:00
[CATEGORIES]
cs.CL
Jochre 3 and the Yiddish OCR corpus
[AUTHORS]
Assaf Urieli, Amber Clooney, Michelle Sigiel, Grisha Leyfer
[ABSTRACT]
We describe the construction of a publicly available Yiddish OCR Corpus, and
describe and evaluate the open source OCR tool suite Jochre 3, including an
Alto editor for corpus annotation, OCR software for Alto OCR layer generation,
and a customizable OCR search engine. The current version of the Yiddish OCR
corpus contains 658 pages, 186K tokens and 840K glyphs. The Jochre 3 OCR tool
uses various fine-tuned YOLOv8 models for top-down page layout analysis, and a
custom CNN network for glyph recognition. It attains a CER of 1.5% on our test
corpus, far out-performing all other existing public models for Yiddish. We
analyzed the full 660M word Yiddish Book Center with Jochre 3 OCR, and the new
OCR is searchable through the Yiddish Book Center OCR search engine.
[COMMENTS]
10 pages, 4 figures
[LINK]
http://arxiv.org/abs/2501.08442v1
[DATE]
2025-01-15 05:21:39+08:00
[CATEGORIES]
cs.CL
Religious Bias Landscape in Language and Text-to-Image Models: Analysis, Detection, and Debiasing Strategies
[AUTHORS]
Ajwad Abrar, Nafisa Tabassum Oeshy, Mohsinul Kabir, Sophia Ananiadou
[ABSTRACT]
Note: This paper includes examples of potentially offensive content related
to religious bias, presented solely for academic purposes. The widespread
adoption of language models highlights the need for critical examinations of
their inherent biases, particularly concerning religion. This study
systematically investigates religious bias in both language models and
text-to-image generation models, analyzing both open-source and closed-source
systems. We construct approximately 400 unique, naturally occurring prompts to
probe language models for religious bias across diverse tasks, including mask
filling, prompt completion, and image generation. Our experiments reveal
concerning instances of underlying stereotypes and biases associated
disproportionately with certain religions. Additionally, we explore
cross-domain biases, examining how religious bias intersects with demographic
factors such as gender, age, and nationality. This study further evaluates the
effectiveness of targeted debiasing techniques by employing corrective prompts
designed to mitigate the identified biases. Our findings demonstrate that
language models continue to exhibit significant biases in both text and image
generation tasks, emphasizing the urgent need to develop fairer language models
to achieve global acceptability.
[LINK]
http://arxiv.org/abs/2501.08441v1
[DATE]
2025-01-15 05:10:08+08:00
[CATEGORIES]
cs.CL
A Multi-way Parallel Named Entity Annotated Corpus for English, Tamil and Sinhala
[AUTHORS]
Surangika Ranathunga, Asanka Ranasinghea, Janaka Shamala, Ayodya Dandeniyaa, Rashmi Galappaththia, Malithi Samaraweeraa
[ABSTRACT]
This paper presents a multi-way parallel English-Tamil-Sinhala corpus
annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource
languages. Using pre-trained multilingual Language Models (mLMs), we establish
new benchmark Named Entity Recognition (NER) results on this dataset for
Sinhala and Tamil. We also carry out a detailed investigation on the NER
capabilities of different types of mLMs. Finally, we demonstrate the utility of
our NER system on a low-resource Neural Machine Translation (NMT) task. Our
dataset is publicly released: https://github.com/suralk/multiNER.
[LINK]
http://arxiv.org/abs/2412.02056v2
[DATE]
2025-01-15 05:02:56+08:00
[CATEGORIES]
cs.CL
Ensemble of Large Language Models for Curated Labeling and Rating of Free-text Data
[AUTHORS]
Jiaxing Qiu, Dongliang Guo, Papini Natalie, Peace Noelle, Levinson Cheri, Teague R. Henry
[ABSTRACT]
Free-text responses are commonly collected in psychological studies,
providing rich qualitative insights that quantitative measures may not capture.
Labeling curated topics of research interest in free-text data by multiple
trained human coders is typically labor-intensive and time-consuming. Though
large language models (LLMs) excel in language processing, LLM-assisted
labeling techniques relying on closed-source LLMs cannot be directly applied to
free-text data, without explicit consent for external use.
In this study, we propose a framework of assembling locally-deployable LLMs
to enhance the labeling of predetermined topics in free-text data under privacy
constraints. Analogous to annotation by multiple human raters, this framework
leverages the heterogeneity of diverse open-source LLMs. The ensemble approach
seeks a balance between the agreement and disagreement across LLMs, guided by a
relevancy scoring methodology that utilizes embedding distances between topic
descriptions and LLMs’ reasoning. We evaluated the ensemble approach using both
publicly accessible Reddit data from eating disorder related forums, and
free-text responses from eating disorder patients, both complemented by human
annotations.
We found that: (1) there is heterogeneity in the performance of labeling
among same-sized LLMs, with some showing low sensitivity but high precision,
while others exhibit high sensitivity but low precision. (2) Compared to
individual LLMs, the ensemble of LLMs achieved the highest accuracy and optimal
precision-sensitivity trade-off in predicting human annotations. (3) The
relevancy scores across LLMs showed greater agreement than dichotomous labels,
indicating that the relevancy scoring method effectively mitigates the
heterogeneity in LLMs’ labeling.
[LINK]
http://arxiv.org/abs/2501.08413v1
[DATE]
2025-01-15 04:08:16+08:00
[CATEGORIES]
cs.CL
OptiChat: Bridging Optimization Models and Practitioners with Large Language Models
[AUTHORS]
Hao Chen, Gonzalo Esteban Constante-Flores, Krishna Sri Ipsit Mantri, Sai Madhukiran Kompalli, Akshdeep Singh Ahluwalia, Can Li
[ABSTRACT]
Optimization models have been applied to solve a wide variety of
decision-making problems. These models are usually developed by optimization
experts but are used by practitioners without optimization expertise in various
application domains. As a result, practitioners often struggle to interact with
and draw useful conclusions from optimization models independently. To fill
this gap, we introduce OptiChat, a natural language dialogue system designed to
help practitioners interpret model formulation, diagnose infeasibility, analyze
sensitivity, retrieve information, evaluate modifications, and provide
counterfactual explanations. By augmenting large language models (LLMs) with
functional calls and code generation tailored for optimization models, we
enable seamless interaction and minimize the risk of hallucinations in
OptiChat. We develop a new dataset to evaluate OptiChat’s performance in
explaining optimization models. Experiments demonstrate that OptiChat
effectively bridges the gap between optimization models and practitioners,
delivering autonomous, accurate, and instant responses.
[LINK]
http://arxiv.org/abs/2501.08406v1
[DATE]
2025-01-15 03:53:58+08:00
[CATEGORIES]
cs.CL
cs.LG
PokerBench: Training Large Language Models to become Professional Poker Players
[AUTHORS]
Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli
[ABSTRACT]
We introduce PokerBench - a benchmark for evaluating the poker-playing
abilities of large language models (LLMs). As LLMs excel in traditional NLP
tasks, their application to complex, strategic games like poker poses a new
challenge. Poker, an incomplete information game, demands a multitude of skills
such as mathematics, reasoning, planning, strategy, and a deep understanding of
game theory and human psychology. This makes Poker the ideal next frontier for
large language models. PokerBench consists of a comprehensive compilation of
11,000 most important scenarios, split between pre-flop and post-flop play,
developed in collaboration with trained poker players. We evaluate prominent
models including GPT-4, ChatGPT 3.5, and various Llama and Gemma series models,
finding that all state-of-the-art LLMs underperform in playing optimal poker.
However, after fine-tuning, these models show marked improvements. We validate
PokerBench by having models with different scores compete with each other,
demonstrating that higher scores on PokerBench lead to higher win rates in
actual poker games. Through gameplay between our fine-tuned model and GPT-4, we
also identify limitations of simple supervised fine-tuning for learning optimal
playing strategy, suggesting the need for more advanced methodologies for
effectively training language models to excel in games. PokerBench thus
presents a unique benchmark for a quick and reliable evaluation of the
poker-playing ability of LLMs as well as a comprehensive benchmark to study the
progress of LLMs in complex game-playing scenarios. The dataset and code will
be made available at: \url{https://github.com/pokerllm/pokerbench}.
[COMMENTS]
AAAI 2025
[LINK]
http://arxiv.org/abs/2501.08328v1
[DATE]
2025-01-15 02:59:03+08:00
[CATEGORIES]
cs.CL
Multigenre AI-powered Story Composition
[AUTHORS]
Edirlei Soares de Lima, Margot M. E. Neggers, Antonio L. Furtado
[ABSTRACT]
This paper shows how to construct genre patterns, whose purpose is to guide
interactive story composition in a way that enforces thematic consistency. To
start the discussion we argue, based on previous seminal works, for the
existence of five fundamental genres, namely comedy, romance - in the sense of
epic plots, flourishing since the twelfth century -, tragedy, satire, and
mystery. To construct the patterns, a simple two-phase process is employed:
first retrieving examples that match our genre characterizations, and then
applying a form of most specific generalization to the groups of examples in
order to find their commonalities. In both phases, AI agents are instrumental,
with our PatternTeller prototype being called to operate the story composition
process, offering the opportunity to generate stories from a given premise of
the user, to be developed under the guidance of the chosen pattern and trying
to accommodate the user’s suggestions along the composition stages.
[COMMENTS]
Added publication details to references that were published after the
submission of the previous version (references [18] and [19])
[LINK]
http://arxiv.org/abs/2405.06685v2
[DATE]
2025-01-15 02:58:42+08:00
[CATEGORIES]
cs.CL
MiniMax-01: Scaling Foundation Models with Lightning Attention
[AUTHORS]
MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu
[ABSTRACT]
We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01,
which are comparable to top-tier models while offering superior capabilities in
processing longer contexts. The core lies in lightning attention and its
efficient scaling. To maximize computational capacity, we integrate it with
Mixture of Experts (MoE), creating a model with 32 experts and 456 billion
total parameters, of which 45.9 billion are activated for each token. We
develop an optimized parallel strategy and highly efficient
computation-communication overlap techniques for MoE and lightning attention.
This approach enables us to conduct efficient training and inference on models
with hundreds of billions of parameters across contexts spanning millions of
tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens
during training and extrapolate to 4 million tokens during inference at an
affordable cost. Our vision-language model, MiniMax-VL-01 is built through
continued training with 512 billion vision-language tokens. Experiments on both
standard and in-house benchmarks show that our models match the performance of
state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32
times longer context window. We publicly release MiniMax-01 at
https://github.com/MiniMax-AI.
[COMMENTS]
A technical report from MiniMax. The authors are listed in
alphabetical order. We open-sourced our MiniMax-01 at
https://github.com/MiniMax-AI
[LINK]
http://arxiv.org/abs/2501.08313v1
[DATE]
2025-01-15 02:50:05+08:00
[CATEGORIES]
cs.CL
Everybody Likes to Sleep: A Computer-Assisted Comparison of Object Naming Data from 30 Languages
[AUTHORS]
Alžběta Kučerová, Johann-Mattis List
[ABSTRACT]
Object naming - the act of identifying an object with a word or a phrase - is
a fundamental skill in interpersonal communication, relevant to many
disciplines, such as psycholinguistics, cognitive linguistics, or language and
vision research. Object naming datasets, which consist of concept lists with
picture pairings, are used to gain insights into how humans access and select
names for objects in their surroundings and to study the cognitive processes
involved in converting visual stimuli into semantic concepts. Unfortunately,
object naming datasets often lack transparency and have a highly idiosyncratic
structure. Our study tries to make current object naming data transparent and
comparable by using a multilingual, computer-assisted approach that links
individual items of object naming lists to unified concepts. Our current sample
links 17 object naming datasets that cover 30 languages from 10 different
language families. We illustrate how the comparative dataset can be explored by
searching for concepts that recur across the majority of datasets and comparing
the conceptual spaces of covered object naming datasets with classical basic
vocabulary lists from historical linguistics and linguistic typology. Our
findings can serve as a basis for enhancing cross-linguistic object naming
research and as a guideline for future studies dealing with object naming
tasks.
[COMMENTS]
To appear in the Proceedings of the Global WordNet Conference 2025
[LINK]
http://arxiv.org/abs/2501.08312v1
[DATE]
2025-01-15 02:50:00+08:00
[CATEGORIES]
cs.CL
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
[AUTHORS]
Abhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi
[ABSTRACT]
Despite their impressive ability to generate high-quality and fluent text,
generative large language models (LLMs) also produce hallucinations: statements
that are misaligned with established world knowledge or provided input context.
However, measuring hallucination can be challenging, as having humans verify
model generations on-the-fly is both expensive and time-consuming. In this
work, we release HALoGEN, a comprehensive hallucination benchmark consisting
of: (1) 10,923 prompts for generative models spanning nine domains including
programming, scientific attribution, and summarization, and (2) automatic
high-precision verifiers for each use case that decompose LLM generations into
atomic units, and verify each unit against a high-quality knowledge source. We
use this framework to evaluate ~150,000 generations from 14 language models,
finding that even the best-performing models are riddled with hallucinations
(sometimes up to 86% of generated atomic facts depending on the domain). We
further define a novel error classification for LLM hallucinations based on
whether they likely stem from incorrect recollection of training data (Type A
errors), or incorrect knowledge in training data (Type B errors), or are
fabrication (Type C errors). We hope our framework provides a foundation to
enable the principled study of why generative models hallucinate, and advances
the development of trustworthy large language models.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2501.08292v1
[DATE]
2025-01-15 02:13:08+08:00
[CATEGORIES]
cs.CL
Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing
[AUTHORS]
Pulkit Arora, Akbar Karimi, Lucie Flek
[ABSTRACT]
Large Language Models (LLMs) have shown impressive performance in various NLP
tasks. However, there are concerns about their reliability in different domains
of linguistic variations. Many works have proposed robustness evaluation
measures for local adversarial attacks, but we need globally robust models
unbiased to different language styles. We take a broader approach to explore a
wider range of variations across sociodemographic dimensions to perform
structured reliability tests on the reasoning capacity of language models. We
extend the SocialIQA dataset to create diverse paraphrased sets conditioned on
sociodemographic styles. The assessment aims to provide a deeper understanding
of LLMs in (a) their capability of generating demographic paraphrases with
engineered prompts and (b) their reasoning capabilities in real-world, complex
language scenarios. We also explore measures such as perplexity,
explainability, and ATOMIC performance of paraphrases for fine-grained
reliability analysis of LLMs on these sets. We find that demographic-specific
paraphrasing significantly impacts the performance of language models,
indicating that the subtleties of language variations remain a significant
challenge. The code and dataset will be made available for reproducibility and
future research.
[LINK]
http://arxiv.org/abs/2501.08276v1
[DATE]
2025-01-15 01:50:06+08:00
[CATEGORIES]
cs.CL
Comparative Analysis of Efficient Adapter-Based Fine-Tuning of State-of-the-Art Transformer Models
[AUTHORS]
Saad Mashkoor Siddiqui, Mohammad Ali Sheikh, Muhammad Aleem, Kajol R Singh
[ABSTRACT]
In this work, we investigate the efficacy of various adapter architectures on
supervised binary classification tasks from the SuperGLUE benchmark as well as
a supervised multi-class news category classification task from Kaggle.
Specifically, we compare classification performance and time complexity of
three transformer models, namely DistilBERT, ELECTRA, and BART, using
conventional fine-tuning as well as nine state-of-the-art (SoTA) adapter
architectures. Our analysis reveals performance differences across adapter
architectures, highlighting their ability to achieve comparable or better
performance relative to fine-tuning at a fraction of the training time. Similar
results are observed on the new classification task, further supporting our
findings and demonstrating adapters as efficient and flexible alternatives to
fine-tuning. This study provides valuable insights and guidelines for selecting
and implementing adapters in diverse natural language processing (NLP)
applications.
[LINK]
http://arxiv.org/abs/2501.08271v1
[DATE]
2025-01-15 01:37:40+08:00
[CATEGORIES]
cs.CL
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation
[AUTHORS]
Han He, Qianchu Liu, Lei Xu, Chaitanya Shivade, Yi Zhang, Sundararajan Srinivasan, Katrin Kirchhoff
[COMMENTS]
Accepted to AAAI-2025
[LINK]
http://arxiv.org/abs/2410.02748v3
[DATE]
2025-01-15 01:20:04+08:00
[CATEGORIES]
cs.CL
cs.LG
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
[AUTHORS]
Eva Sánchez Salido, Roser Morante, Julio Gonzalo, Guillermo Marco, Jorge Carrillo-de-Albornoz, Laura Plaza, Enrique Amigó, Andrés Fernández, Alejandro Benito-Santos, Adrián Ghajari Espinosa, Victor Fresno
[ABSTRACT]
In this article we present UNED-ACCESS 2024, a bilingual dataset that
consists of 1003 multiple-choice questions of university entrance level exams
in Spanish and English. Questions are originally formulated in Spanish and
translated manually into English, and have not ever been publicly released. A
selection of current open-source and proprietary models are evaluated in a
uniform zero-shot experimental setting both on the UNED-ACCESS 2024 dataset and
on an equivalent subset of MMLU questions. Results show that (i) reasoning
questions are challenging for models, (ii) smaller models perform worse than
larger models and degrade faster in Spanish than in English and (iii) the
performance gap between languages is negligible for the best models and grows
up to 37% for smaller models. Model ranking on UNED-ACCESS 2024 is almost
identical in English and Spanish, and has also a high correlation (0.98
Pearson) with ranking on MMLU, suggesting that a small dataset is sufficiently
diverse and representative to measure performance by discipline.
[LINK]
http://arxiv.org/abs/2409.12746v2
[DATE]
2025-01-15 00:41:28+08:00
[CATEGORIES]
cs.CL
Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models
[AUTHORS]
Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han
[ABSTRACT]
Recent advancements in long-context language models (LCLMs) promise to
transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With
their expanded context windows, LCLMs can process entire knowledge bases and
perform retrieval and reasoning directly – a capability we define as
In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like
LOFT often overestimate LCLM performance by providing overly simplified
contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs
in more realistic scenarios by including confounding passages retrieved with
strong retrievers. We then propose three methods to enhance LCLM performance:
(1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which
uses attention heads to filter and de-noise long contexts during decoding, and
(3) joint retrieval head training alongside the generation head. Our evaluation
of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with
our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on
LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised
fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks
despite being a much smaller model.
[LINK]
http://arxiv.org/abs/2501.08248v1
[DATE]
2025-01-15 00:38:33+08:00
[CATEGORIES]
cs.CL
cs.LG
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
[AUTHORS]
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su
[ABSTRACT]
In order to thrive in hostile and ever-changing natural environments,
mammalian brains evolved to store large amounts of knowledge about the world
and continually integrate new information while avoiding catastrophic
forgetting. Despite the impressive accomplishments, large language models
(LLMs), even with retrieval-augmented generation (RAG), still struggle to
efficiently and effectively integrate a large amount of new experiences after
pre-training. In this work, we introduce HippoRAG, a novel retrieval framework
inspired by the hippocampal indexing theory of human long-term memory to enable
deeper and more efficient knowledge integration over new experiences. HippoRAG
synergistically orchestrates LLMs, knowledge graphs, and the Personalized
PageRank algorithm to mimic the different roles of neocortex and hippocampus in
human memory. We compare HippoRAG with existing RAG methods on multi-hop
question answering and show that our method outperforms the state-of-the-art
methods remarkably, by up to 20%. Single-step retrieval with HippoRAG achieves
comparable or better performance than iterative retrieval like IRCoT while
being 10-30 times cheaper and 6-13 times faster, and integrating HippoRAG into
IRCoT brings further substantial gains. Finally, we show that our method can
tackle new types of scenarios that are out of reach of existing methods. Code
and data are available at https://github.com/OSU-NLP-Group/HippoRAG.
[COMMENTS]
NeurIPS 2024. Code and data:
https://github.com/OSU-NLP-Group/HippoRAG
[LINK]
http://arxiv.org/abs/2405.14831v3
[DATE]
2025-01-15 00:17:49+08:00
[CATEGORIES]
cs.CL
A Two-Stage Pretraining-Finetuning Framework for Treatment Effect Estimation with Unmeasured Confounding
[AUTHORS]
Chuan Zhou, Yaxuan Li, Chunyuan Zheng, Haiteng Zhang, Min Zhang, Haoxuan Li, Mingming Gong
[ABSTRACT]
Estimating the conditional average treatment effect (CATE) from observational
data plays a crucial role in areas such as e-commerce, healthcare, and
economics. Existing studies mainly rely on the strong ignorability assumption
that there are no unmeasured confounders, whose presence cannot be tested from
observational data and can invalidate any causal conclusion. In contrast, data
collected from randomized controlled trials (RCT) do not suffer from
confounding, but are usually limited by a small sample size. In this paper, we
propose a two-stage pretraining-finetuning (TSPF) framework using both
large-scale observational data and small-scale RCT data to estimate the CATE in
the presence of unmeasured confounding. In the first stage, a foundational
representation of covariates is trained to estimate counterfactual outcomes
through large-scale observational data. In the second stage, we propose to
train an augmented representation of the covariates, which is concatenated to
the foundational representation obtained in the first stage to adjust for the
unmeasured confounding. To avoid overfitting caused by the small-scale RCT data
in the second stage, we further propose a partial parameter initialization
approach, rather than training a separate network. The superiority of our
approach is validated on two public datasets with extensive experiments. The
code is available at https://github.com/zhouchuanCN/KDD25-TSPF.
[COMMENTS]
KDD 25 Research Track
[LINK]
http://arxiv.org/abs/2501.08888v1
[DATE]
2025-01-15 23:58:16+08:00
[CATEGORIES]
cs.LG
Improved Compression Bounds for Scenario Decision Making
[AUTHORS]
Guillaume O. Berger, Raphaël M. Jungers
[ABSTRACT]
Scenario decision making offers a flexible way of making decision in an
uncertain environment while obtaining probabilistic guarantees on the risk of
failure of the decision. The idea of this approach is to draw samples of the
uncertainty and make a decision based on the samples, called “scenarios”. The
probabilistic guarantees take the form of a bound on the probability of
sampling a set of scenarios that will lead to a decision whose risk of failure
is above a given maximum tolerance. This bound can be expressed as a function
of the number of sampled scenarios, the maximum tolerated risk, and some
intrinsic property of the problem called the “compression size”. Several such
bounds have been proposed in the literature under various assumptions on the
problem. We propose new bounds that improve upon the existing ones without
requiring stronger assumptions on the problem.
[LINK]
http://arxiv.org/abs/2501.08884v1
[DATE]
2025-01-15 23:53:34+08:00
[CATEGORIES]
cs.LG
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
[AUTHORS]
Keisuke Kamo, Hideaki Iiduka
[ABSTRACT]
Stochastic gradient descent with momentum (SGDM), which is defined by adding
a momentum term to SGD, has been well studied in both theory and practice.
Theoretically investigated results showed that the settings of the learning
rate and momentum weight affect the convergence of SGDM. Meanwhile, practical
results showed that the setting of batch size strongly depends on the
performance of SGDM. In this paper, we focus on mini-batch SGDM with constant
learning rate and constant momentum weight, which is frequently used to train
deep neural networks in practice. The contribution of this paper is showing
theoretically that using a constant batch size does not always minimize the
expectation of the full gradient norm of the empirical loss in training a deep
neural network, whereas using an increasing batch size definitely minimizes it,
that is, increasing batch size improves convergence of mini-batch SGDM. We also
provide numerical results supporting our analyses, indicating specifically that
mini-batch SGDM with an increasing batch size converges to stationary points
faster than with a constant batch size. Python implementations of the
optimizers used in the numerical experiments are available at
https://anonymous.4open.science/r/momentum-increasing-batch-size-888C/.
[COMMENTS]
22 pages
[LINK]
http://arxiv.org/abs/2501.08883v1
[DATE]
2025-01-15 23:53:27+08:00
[CATEGORIES]
cs.LG
Incrementally Learning Multiple Diverse Data Domains via Multi-Source Dynamic Expansion Model
[AUTHORS]
Runqing Wu, Fei Ye, Qihe Liu, Guoxi Huang, Jinyu Guo, Rongyao Hu
[ABSTRACT]
Continual Learning seeks to develop a model capable of incrementally
assimilating new information while retaining prior knowledge. However, current
research predominantly addresses a straightforward learning context, wherein
all data samples originate from a singular data domain. This paper shifts focus
to a more complex and realistic learning environment, characterized by data
samples sourced from multiple distinct domains. We tackle this intricate
learning challenge by introducing a novel methodology, termed the Multi-Source
Dynamic Expansion Model (MSDEM), which leverages various pre-trained models as
backbones and progressively establishes new experts based on them to adapt to
emerging tasks. Additionally, we propose an innovative dynamic expandable
attention mechanism designed to selectively harness knowledge from multiple
backbones, thereby accelerating the new task learning. Moreover, we introduce a
dynamic graph weight router that strategically reuses all previously acquired
parameters and representations for new task learning, maximizing the positive
knowledge transfer effect, which further improves generalization performance.
We conduct a comprehensive series of experiments, and the empirical findings
indicate that our proposed approach achieves state-of-the-art performance.
[COMMENTS]
10 pages, 5 figures
[LINK]
http://arxiv.org/abs/2501.08878v1
[DATE]
2025-01-15 23:49:46+08:00
[CATEGORIES]
cs.LG
Ensemble sampling for linear bandits: small ensembles suffice
[AUTHORS]
David Janz, Alexander E. Litvak, Csaba Szepesvári
[ABSTRACT]
We provide the first useful and rigorous analysis of ensemble sampling for
the stochastic linear bandit setting. In particular, we show that, under
standard assumptions, for a $d$-dimensional stochastic linear bandit with an
interaction horizon $T$, ensemble sampling with an ensemble of size of order $d
\log T$ incurs regret at most of the order $(d \log T)^{5/2} \sqrt{T}$. Ours is
the first result in any structured setting not to require the size of the
ensemble to scale linearly with $T$ – which defeats the purpose of ensemble
sampling – while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is
also the first to allow for infinite action sets.
[LINK]
http://arxiv.org/abs/2311.08376v4
[DATE]
2025-01-15 23:41:09+08:00
[CATEGORIES]
cs.LG
Inferring stochastic low-rank recurrent neural networks from neural data
[AUTHORS]
Matthijs Pals, A Erdem Sağtekin, Felix Pei, Manuel Gloeckler, Jakob H Macke
[ABSTRACT]
A central aim in computational neuroscience is to relate the activity of
large populations of neurons to an underlying dynamical system. Models of these
neural dynamics should ideally be both interpretable and fit the observed data
well. Low-rank recurrent neural networks (RNNs) exhibit such interpretability
by having tractable dynamics. However, it is unclear how to best fit low-rank
RNNs to data consisting of noisy observations of an underlying stochastic
system. Here, we propose to fit stochastic low-rank RNNs with variational
sequential Monte Carlo methods. We validate our method on several datasets
consisting of both continuous and spiking neural data, where we obtain lower
dimensional latent dynamics than current state of the art methods.
Additionally, for low-rank models with piecewise linear nonlinearities, we show
how to efficiently identify all fixed points in polynomial rather than
exponential cost in the number of units, making analysis of the inferred
dynamics tractable for large RNNs. Our method both elucidates the dynamical
systems underlying experimental recordings and provides a generative model
whose trajectories match observed variability.
[LINK]
http://arxiv.org/abs/2406.16749v4
[DATE]
2025-01-15 23:40:12+08:00
[CATEGORIES]
cs.LG
Taming the Long Tail in Human Mobility Prediction
[AUTHORS]
Xiaohang Xu, Renhe Jiang, Chuang Yang, Zipei Fan, Kaoru Sezaki
[ABSTRACT]
With the popularity of location-based services, human mobility prediction
plays a key role in enhancing personalized navigation, optimizing
recommendation systems, and facilitating urban mobility and planning. This
involves predicting a user’s next POI (point-of-interest) visit using their
past visit history. However, the uneven distribution of visitations over time
and space, namely the long-tail problem in spatial distribution, makes it
difficult for AI models to predict those POIs that are less visited by humans.
In light of this issue, we propose the Long-Tail Adjusted Next POI Prediction
(LoTNext) framework for mobility prediction, combining a Long-Tailed Graph
Adjustment module to reduce the impact of the long-tailed nodes in the user-POI
interaction graph and a novel Long-Tailed Loss Adjustment module to adjust loss
by logit score and sample weight adjustment strategy. Also, we employ the
auxiliary prediction task to enhance generalization and accuracy. Our
experiments with two real-world trajectory datasets demonstrate that LoTNext
significantly surpasses existing state-of-the-art works.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.14970v4
[DATE]
2025-01-15 23:35:22+08:00
[CATEGORIES]
cs.LG
The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning
[AUTHORS]
Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Joschka Boedecker
[ABSTRACT]
Visual Reinforcement Learning (RL) methods often require extensive amounts of
data. As opposed to model-free RL, model-based RL (MBRL) offers a potential
solution with efficient data utilization through planning. Additionally, RL
lacks generalization capabilities for real-world tasks. Prior work has shown
that incorporating pre-trained visual representations (PVRs) enhances sample
efficiency and generalization. While PVRs have been extensively studied in the
context of model-free RL, their potential in MBRL remains largely unexplored.
In this paper, we benchmark a set of PVRs on challenging control tasks in a
model-based RL setting. We investigate the data efficiency, generalization
capabilities, and the impact of different properties of PVRs on the performance
of model-based agents. Our results, perhaps surprisingly, reveal that for MBRL
current PVRs are not more sample efficient than learning representations from
scratch, and that they do not generalize better to out-of-distribution (OOD)
settings. To explain this, we analyze the quality of the trained dynamics
model. Furthermore, we show that data diversity and network architecture are
the most important contributors to OOD generalization performance.
[COMMENTS]
Published at the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024). Project page: https://schneimo.com/pvr4mbrl/
[LINK]
http://arxiv.org/abs/2411.10175v2
[DATE]
2025-01-15 23:24:32+08:00
[CATEGORIES]
cs.LG
ARMOR: Shielding Unlearnable Examples against Data Augmentation
[AUTHORS]
Xueluan Gong, Yuji Wang, Yanjiao Chen, Haocheng Dong, Yiming Li, Mengyuan Sun, Shuaike Li, Qian Wang, Chen Chen
[ABSTRACT]
Private data, when published online, may be collected by unauthorized parties
to train deep neural networks (DNNs). To protect privacy, defensive noises can
be added to original samples to degrade their learnability by DNNs. Recently,
unlearnable examples are proposed to minimize the training loss such that the
model learns almost nothing. However, raw data are often pre-processed before
being used for training, which may restore the private information of protected
data. In this paper, we reveal the data privacy violation induced by data
augmentation, a commonly used data pre-processing technique to improve model
generalization capability, which is the first of its kind as far as we are
concerned. We demonstrate that data augmentation can significantly raise the
accuracy of the model trained on unlearnable examples from 21.3% to 66.1%. To
address this issue, we propose a defense framework, dubbed ARMOR, to protect
data privacy from potential breaches of data augmentation. To overcome the
difficulty of having no access to the model training process, we design a
non-local module-assisted surrogate model that better captures the effect of
data augmentation. In addition, we design a surrogate augmentation selection
strategy that maximizes distribution alignment between augmented and
non-augmented samples, to choose the optimal augmentation strategy for each
class. We also use a dynamic step size adjustment algorithm to enhance the
defensive noise generation process. Extensive experiments are conducted on 4
datasets and 5 data augmentation methods to verify the performance of ARMOR.
Comparisons with 6 state-of-the-art defense methods have demonstrated that
ARMOR can preserve the unlearnability of protected private data under data
augmentation. ARMOR reduces the test accuracy of the model trained on augmented
protected samples by as much as 60% more than baselines.
[LINK]
http://arxiv.org/abs/2501.08862v1
[DATE]
2025-01-15 23:22:57+08:00
[CATEGORIES]
cs.LG
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions
[AUTHORS]
Easton K. Huch, Jieru Shi, Madeline R. Abbott, Jessica R. Golbus, Alexander Moreno, Walter H. Dempsey
[ABSTRACT]
Mobile health leverages personalized and contextually tailored interventions
optimized through bandit and reinforcement learning algorithms. In practice,
however, challenges such as participant heterogeneity, nonstationarity, and
nonlinear relationships hinder algorithm performance. We propose RoME, a Robust
Mixed-Effects contextual bandit algorithm that simultaneously addresses these
challenges via (1) modeling the differential reward with user- and
time-specific random effects, (2) network cohesion penalties, and (3) debiased
machine learning for flexible estimation of baseline rewards. We establish a
high-probability regret bound that depends solely on the dimension of the
differential-reward model, enabling us to achieve robust regret bounds even
when the baseline reward is highly complex. We demonstrate the superior
performance of the RoME algorithm in a simulation and two off-policy evaluation
studies.
[LINK]
http://arxiv.org/abs/2312.06403v4
[DATE]
2025-01-15 23:21:46+08:00
[CATEGORIES]
cs.LG
Improved Algorithms for Contextual Dynamic Pricing
[AUTHORS]
Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet
[ABSTRACT]
In contextual dynamic pricing, a seller sequentially prices goods based on
contextual information. Buyers will purchase products only if the prices are
below their valuations. The goal of the seller is to design a pricing strategy
that collects as much revenue as possible. We focus on two different valuation
models. The first assumes that valuations linearly depend on the context and
are further distorted by noise. Under minor regularity assumptions, our
algorithm achieves an optimal regret bound of $\tilde{\mathcal{O}}(T^{2/3})$,
improving the existing results. The second model removes the linearity
assumption, requiring only that the expected buyer valuation is
$\beta$-H"older in the context. For this model, our algorithm obtains a regret
$\tilde{\mathcal{O}}(T^{d+2\beta/d+3\beta})$, where $d$ is the dimension of the
context space.
[LINK]
http://arxiv.org/abs/2406.11316v2
[DATE]
2025-01-15 23:07:59+08:00
[CATEGORIES]
cs.LG
PRIMO: Private Regression in Multiple Outcomes
[AUTHORS]
Seth Neel
[ABSTRACT]
We introduce a new private regression setting we call Private Regression in
Multiple Outcomes (PRIMO), inspired by the common situation where a data
analyst wants to perform a set of $l$ regressions while preserving privacy,
where the features $X$ are shared across all $l$ regressions, and each
regression $i \in [l]$ has a different vector of outcomes $y_i$. Naively
applying existing private linear regression techniques $l$ times leads to a
$\sqrt{l}$ multiplicative increase in error over the standard linear regression
setting. We apply a variety of techniques including sufficient statistics
perturbation (SSP) and geometric projection-based methods to develop scalable
algorithms that outperform this baseline across a range of parameter regimes.
In particular, we obtain no dependence on l in the asymptotic error when $l$ is
sufficiently large. Empirically, on the task of genomic risk prediction with
multiple phenotypes we find that even for values of $l$ far smaller than the
theory would predict, our projection-based method improves the accuracy
relative to the variant that doesn’t use the projection.
[LINK]
http://arxiv.org/abs/2303.04195v2
[DATE]
2025-01-15 23:06:56+08:00
[CATEGORIES]
cs.LG
Graph Counterfactual Explainable AI via Latent Space Traversal
[AUTHORS]
Andreas Abildtrup Hansen, Paraskevas Pegios, Anna Calissano, Aasa Feragen
[ABSTRACT]
Explaining the predictions of a deep neural network is a nontrivial task, yet
high-quality explanations for predictions are often a prerequisite for
practitioners to trust these models. Counterfactual explanations aim to explain
predictions by finding the ‘‘nearest’’ in-distribution alternative input whose
prediction changes in a pre-specified way. However, it remains an open question
how to define this nearest alternative input, whose solution depends on both
the domain (e.g. images, graphs, tabular data, etc.) and the specific
application considered. For graphs, this problem is complicated i) by their
discrete nature, as opposed to the continuous nature of state-of-the-art graph
classifiers; and ii) by the node permutation group acting on the graphs. We
propose a method to generate counterfactual explanations for any differentiable
black-box graph classifier, utilizing a case-specific permutation equivariant
graph variational autoencoder. We generate counterfactual explanations in a
continuous fashion by traversing the latent space of the autoencoder across the
classification boundary of the classifier, allowing for seamless integration of
discrete graph structure and continuous graph attributes. We empirically
validate the approach on three graph datasets, showing that our model is
consistently high-performing and more robust than the baselines.
[COMMENTS]
Published at Northern Lights Deep Learning Conference 2025
[LINK]
http://arxiv.org/abs/2501.08850v1
[DATE]
2025-01-15 23:04:10+08:00
[CATEGORIES]
cs.LG
RouteNet-Gauss: Hardware-Enhanced Network Modeling with Machine Learning
[AUTHORS]
Carlos Güemes-Palau, Miquel Ferriol-Galmés, Jordi Paillisse-Vilanova, Albert López-Brescó, Pere Barlet-Ros, Albert Cabellos-Aparicio
[ABSTRACT]
Network simulation is pivotal in network modeling, assisting with tasks
ranging from capacity planning to performance estimation. Traditional
approaches such as Discrete Event Simulation (DES) face limitations in terms of
computational cost and accuracy. This paper introduces RouteNet-Gauss, a novel
integration of a testbed network with a Machine Learning (ML) model to address
these challenges. By using the testbed as a hardware accelerator,
RouteNet-Gauss generates training datasets rapidly and simulates network
scenarios with high fidelity to real-world conditions. Experimental results
show that RouteNet-Gauss significantly reduces prediction errors by up to 95%
and achieves a 488x speedup in inference time compared to state-of-the-art
DES-based methods. RouteNet-Gauss’s modular architecture is dynamically
constructed based on the specific characteristics of the network scenario, such
as topology and routing. This enables it to understand and generalize to
different network configurations beyond those seen during training, including
networks up to 10x larger. Additionally, it supports Temporal Aggregated
Performance Estimation (TAPE), providing configurable temporal granularity and
maintaining high accuracy in flow performance metrics. This approach shows
promise in improving both simulation efficiency and accuracy, offering a
valuable tool for network operators.
[COMMENTS]
13 pages, 11 figures
[LINK]
http://arxiv.org/abs/2501.08848v1
[DATE]
2025-01-15 23:00:11+08:00
[CATEGORIES]
cs.LG
A Closer Look at the Learnability of Out-of-Distribution (OOD) Detection
[AUTHORS]
Konstantin Garov, Kamalika Chaudhuri
[ABSTRACT]
Machine learning algorithms often encounter different or
“out-of-distribution” (OOD) data at deployment time, and OOD detection is
frequently employed to detect these examples. While it works reasonably well in
practice, existing theoretical results on OOD detection are highly pessimistic.
In this work, we take a closer look at this problem, and make a distinction
between uniform and non-uniform learnability, following PAC learning theory. We
characterize under what conditions OOD detection is uniformly and non-uniformly
learnable, and we show that in several cases, non-uniform learnability turns a
number of negative results into positive. In all cases where OOD detection is
learnable, we provide concrete learning algorithms and a sample-complexity
analysis.
[LINK]
http://arxiv.org/abs/2501.08821v1
[DATE]
2025-01-15 22:19:03+08:00
[CATEGORIES]
cs.LG
IDEA: Image Description Enhanced CLIP-Adapter
[AUTHORS]
Zhipeng Ye, Feng Jiang, Qiufeng Wang, Kaizhu Huang, Jiaqi Huang
[ABSTRACT]
CLIP (Contrastive Language-Image Pre-training) has attained great success in
pattern recognition and computer vision. Transferring CLIP to downstream tasks
(e.g. zero- or few-shot classification) is a hot topic in multimodal learning.
However, current studies primarily focus on either prompt learning for text or
adapter tuning for vision, without fully exploiting the complementary
information and correlations among image-text pairs. In this paper, we propose
an Image Description Enhanced CLIP-Adapter (IDEA) method to adapt CLIP to
few-shot image classification tasks. This method captures fine-grained features
by leveraging both visual features and textual descriptions of images. IDEA is
a training-free method for CLIP, and it can be comparable to or even exceeds
state-of-the-art models on multiple tasks. Furthermore, we introduce
Trainable-IDEA (T-IDEA), which extends IDEA by adding two lightweight learnable
components (i.e., a projector and a learnable latent space), further enhancing
the model’s performance and achieving SOTA results on 11 datasets. As one
important contribution, we employ the Llama model and design a comprehensive
pipeline to generate textual descriptions for images of 11 datasets, resulting
in a total of 1,637,795 image-text pairs, named “IMD-11”. Our code and data are
released at https://github.com/FourierAI/IDEA.
[LINK]
http://arxiv.org/abs/2501.08816v1
[DATE]
2025-01-15 22:12:59+08:00
[CATEGORIES]
cs.LG
Volterra Accentuated Non-Linear Dynamical Admittance (VANYA) to model Deforestation: An Exemplification from the Amazon Rainforest
[AUTHORS]
Karthik R., Ramamoorthy A
[ABSTRACT]
Intelligent automation supports us against cyclones, droughts, and seismic
events with recent technology advancements. Algorithmic learning has advanced
fields like neuroscience, genetics, and human-computer interaction. Time-series
data boosts progress. Challenges persist in adopting these approaches in
traditional fields. Neural networks face comprehension and bias issues. AI’s
expansion across scientific areas is due to adaptable descriptors and
combinatorial argumentation. This article focuses on modeling Forest loss using
the VANYA Model, incorporating Prey Predator Dynamics. VANYA predicts forest
cover, demonstrated on Amazon Rainforest data against other forecasters like
Long Short-Term Memory, N-BEATS, RCN.
[COMMENTS]
The experimental data used in this article has given wrong practical
interpretation. The data has to be updated to improve this
[LINK]
http://arxiv.org/abs/2308.06471v2
[DATE]
2025-01-15 22:12:04+08:00
[CATEGORIES]
cs.LG
Learning Optimal Tax Design in Nonatomic Congestion Games
[AUTHORS]
Qiwen Cui, Maryam Fazel, Simon S. Du
[ABSTRACT]
In multiplayer games, self-interested behavior among the players can harm the
social welfare. Tax mechanisms are a common method to alleviate this issue and
induce socially optimal behavior. In this work, we take the initial step of
learning the optimal tax that can maximize social welfare with limited feedback
in congestion games. We propose a new type of feedback named \emph{equilibrium
feedback}, where the tax designer can only observe the Nash equilibrium after
deploying a tax plan. Existing algorithms are not applicable due to the
exponentially large tax function space, nonexistence of the gradient, and
nonconvexity of the objective. To tackle these challenges, we design a
computationally efficient algorithm that leverages several novel components:
(1) a piece-wise linear tax to approximate the optimal tax; (2) extra linear
terms to guarantee a strongly convex potential function; (3) an efficient
subroutine to find the exploratory tax that can provide critical information
about the game. The algorithm can find an $\epsilon$-optimal tax with $O(\beta
F^2/\epsilon)$ sample complexity, where $\beta$ is the smoothness of the cost
function and $F$ is the number of facilities.
[COMMENTS]
23 pages. Accepted by Conference on Neural Information Processing
Systems (NeurIPS) 2024
[LINK]
http://arxiv.org/abs/2402.07437v2
[DATE]
2025-01-15 22:02:51+08:00
[CATEGORIES]
cs.LG
Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks
[AUTHORS]
Shuang Cui, Yi Li, Jiangmeng Li, Xiongxin Tang, Bing Su, Fanjiang Xu, Hui Xiong
[ABSTRACT]
Single image defocus deblurring (SIDD) aims to restore an all-in-focus image
from a defocused one. Distribution shifts in defocused images generally lead to
performance degradation of existing methods during out-of-distribution
inferences. In this work, we gauge the intrinsic reason behind the performance
degradation, which is identified as the heterogeneity of lens-specific point
spread functions. Empirical evidence supports this finding, motivating us to
employ a continual test-time adaptation (CTTA) paradigm for SIDD. However,
traditional CTTA methods, which primarily rely on entropy minimization, cannot
sufficiently explore task-dependent information for pixel-level regression
tasks like SIDD. To address this issue, we propose a novel Siamese
networks-based continual test-time adaptation framework, which adapts source
models to continuously changing target domains only requiring unlabeled target
data in an online manner. To further mitigate semantically erroneous textures
introduced by source SIDD models under severe degradation, we revisit the
learning paradigm through a structural causal model and propose Causal Siamese
networks (CauSiam). Our method leverages large-scale pre-trained
vision-language models to derive discriminative universal semantic priors and
integrates these priors into Siamese networks, ensuring causal identifiability
between blurry inputs and restored images. Extensive experiments demonstrate
that CauSiam effectively improves the generalization performance of existing
SIDD methods in continuously changing domains.
[LINK]
http://arxiv.org/abs/2501.09052v1
[DATE]
2025-01-15 21:42:39+08:00
[CATEGORIES]
cs.LG
Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning
[AUTHORS]
Marvin Alles, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl
[ABSTRACT]
In offline reinforcement learning, a policy is learned using a static dataset
in the absence of costly feedback from the environment. In contrast to the
online setting, only using static datasets poses additional challenges, such as
policies generating out-of-distribution samples. Model-based offline
reinforcement learning methods try to overcome these by learning a model of the
underlying dynamics of the environment and using it to guide policy search. It
is beneficial but, with limited datasets, errors in the model and the issue of
value overestimation among out-of-distribution states can worsen performance.
Current model-based methods apply some notion of conservatism to the Bellman
update, often implemented using uncertainty estimation derived from model
ensembles. In this paper, we propose Constrained Latent Action Policies (C-LAP)
which learns a generative model of the joint distribution of observations and
actions. We cast policy learning as a constrained objective to always stay
within the support of the latent action distribution, and use the generative
capabilities of the model to impose an implicit constraint on the generated
actions. Thereby eliminating the need to use additional uncertainty penalties
on the Bellman update and significantly decreasing the number of gradient steps
required to learn a policy. We empirically evaluate C-LAP on the D4RL and
V-D4RL benchmark, and show that C-LAP is competitive to state-of-the-art
methods, especially outperforming on datasets with visual observations.
[COMMENTS]
38th Conference on Neural Information Processing Systems (NeurIPS
2024)
[LINK]
http://arxiv.org/abs/2411.04562v2
[DATE]
2025-01-15 21:24:49+08:00
[CATEGORIES]
cs.LG
Deep learning for temporal super-resolution 4D Flow MRI
[AUTHORS]
Pia Callmer, Mia Bonini, Edward Ferdian, David Nordsletten, Daniel Giese, Alistair A. Young, Alexander Fyrdahl, David Marlevi
[ABSTRACT]
4D Flow Magnetic Resonance Imaging (4D Flow MRI) is a non-invasive technique
for volumetric, time-resolved blood flow quantification. However, apparent
trade-offs between acquisition time, image noise, and resolution limit clinical
applicability. In particular, in regions of highly transient flow, coarse
temporal resolution can hinder accurate capture of physiologically relevant
flow variations. To overcome these issues, post-processing techniques using
deep learning have shown promising results to enhance resolution post-scan
using so-called super-resolution networks. However, while super-resolution has
been focusing on spatial upsampling, temporal super-resolution remains largely
unexplored. The aim of this study was therefore to implement and evaluate a
residual network for temporal super-resolution 4D Flow MRI. To achieve this, an
existing spatial network (4DFlowNet) was re-designed for temporal upsampling,
adapting input dimensions, and optimizing internal layer structures. Training
and testing were performed using synthetic 4D Flow MRI data originating from
patient-specific in-silico models, as well as using in-vivo datasets. Overall,
excellent performance was achieved with input velocities effectively denoised
and temporally upsampled, with a mean absolute error (MAE) of 1.0 cm/s in an
unseen in-silico setting, outperforming deterministic alternatives (linear
interpolation MAE = 2.3 cm/s, sinc interpolation MAE = 2.6 cm/s). Further, the
network synthesized high-resolution temporal information from unseen
low-resolution in-vivo data, with strong correlation observed at peak flow
frames. As such, our results highlight the potential of utilizing data-driven
neural networks for temporal super-resolution 4D Flow MRI, enabling
high-frame-rate flow quantification without extending acquisition times beyond
clinically acceptable limits.
[COMMENTS]
12 pages, 8 figures
[LINK]
http://arxiv.org/abs/2501.08780v1
[DATE]
2025-01-15 21:01:47+08:00
[CATEGORIES]
cs.LG
Nesterov Acceleration for Ensemble Kalman Inversion and Variants
[AUTHORS]
Sydney Vernon, Eviatar Bach, Oliver R. A. Dunbar
[ABSTRACT]
Ensemble Kalman inversion (EKI) is a derivative-free, particle-based
optimization method for solving inverse problems. It can be shown that EKI
approximates a gradient flow, which allows the application of methods for
accelerating gradient descent. Here, we show that Nesterov acceleration is
effective in speeding up the reduction of the EKI cost function on a variety of
inverse problems. We also implement Nesterov acceleration for two EKI variants,
unscented Kalman inversion and ensemble transform Kalman inversion. Our
specific implementation takes the form of a particle-level nudge that is
demonstrably simple to couple in a black-box fashion with any existing EKI
variant algorithms, comes with no additional computational expense, and with no
additional tuning hyperparameters. This work shows a pathway for future
research to translate advances in gradient-based optimization into advances in
gradient-free Kalman optimization.
[LINK]
http://arxiv.org/abs/2501.08779v1
[DATE]
2025-01-15 21:01:34+08:00
[CATEGORIES]
cs.LG
Networked Agents in the Dark: Team Value Learning under Partial Observability
[AUTHORS]
Guilherme S. Varela, Alberto Sardinha, Francisco S. Melo
[ABSTRACT]
We propose a novel cooperative multi-agent reinforcement learning (MARL)
approach for networked agents. In contrast to previous methods that rely on
complete state information or joint observations, our agents must learn how to
reach shared objectives under partial observability. During training, they
collect individual rewards and approximate a team value function through local
communication, resulting in cooperative behavior. To describe our problem, we
introduce the networked dynamic partially observable Markov game framework,
where agents communicate over a switching topology communication network. Our
distributed method, DNA-MARL, uses a consensus mechanism for local
communication and gradient descent for local computation. DNA-MARL increases
the range of the possible applications of networked agents, being well-suited
for real world domains that impose privacy and where the messages may not reach
their recipients. We evaluate DNA-MARL across benchmark MARL scenarios. Our
results highlight the superior performance of DNA-MARL over previous methods.
[COMMENTS]
18 pages, 7 figures, 5 tables. Accepted as supplemental material at
Proceedings of the 24th International Conference on Autonomous Agents and
Multiagent Systems (AAMAS 2025), Detroit, Michigan, USA, May 19 - 23, 2025,
IFAAMAS
[LINK]
http://arxiv.org/abs/2501.08778v1
[DATE]
2025-01-15 21:01:32+08:00
[CATEGORIES]
cs.LG
Metric Space Magnitude for Evaluating the Diversity of Latent Representations
[AUTHORS]
Katharina Limbeck, Rayna Andreeva, Rik Sarkar, Bastian Rieck
[ABSTRACT]
The magnitude of a metric space is a novel invariant that provides a measure
of the ‘effective size’ of a space across multiple scales, while also capturing
numerous geometrical properties, such as curvature, density, or entropy. We
develop a family of magnitude-based measures of the intrinsic diversity of
latent representations, formalising a novel notion of dissimilarity between
magnitude functions of finite metric spaces. Our measures are provably stable
under perturbations of the data, can be efficiently calculated, and enable a
rigorous multi-scale characterisation and comparison of latent representations.
We show their utility and superior performance across different domains and
tasks, including (i) the automated estimation of diversity, (ii) the detection
of mode collapse, and (iii) the evaluation of generative models for text,
image, and graph data.
[COMMENTS]
Accepted at the 38th Conference on Neural Information Processing
Systems (NeurIPS) 2024. The code for computing magnitude is available at
https://github.com/aidos-lab/magnipy
[LINK]
http://arxiv.org/abs/2311.16054v5
[DATE]
2025-01-15 20:57:47+08:00
[CATEGORIES]
cs.LG
Leveraging LLM Agents for Translating Network Configurations
[AUTHORS]
Yunze Wei, Xiaohui Xie, Yiwei Zuo, Tianshuo Hu, Xinyi Chen, Kaiwen Chi, Yong Cui
[ABSTRACT]
Configuration translation is a critical and frequent task in network
operations. When a network device is damaged or outdated, administrators need
to replace it to maintain service continuity. The replacement devices may
originate from different vendors, necessitating configuration translation to
ensure seamless network operation. However, translating configurations manually
is a labor-intensive and error-prone process. In this paper, we propose an
intent-based framework for translating network configuration with Large
Language Model (LLM) Agents. The core of our approach is an Intent-based
Retrieval Augmented Generation (IRAG) module that systematically splits a
configuration file into fragments, extracts intents, and generates accurate
translations. We also design a two-stage verification method to validate the
syntax and semantics correctness of the translated configurations. We implement
and evaluate the proposed method on real-world network configurations.
Experimental results show that our method achieves 97.74% syntax correctness,
outperforming state-of-the-art methods in translation accuracy.
[LINK]
http://arxiv.org/abs/2501.08760v1
[DATE]
2025-01-15 20:25:56+08:00
[CATEGORIES]
cs.LG
Maximizing Uncertainty for Federated learning via Bayesian Optimisation-based Model Poisoning
[AUTHORS]
Marios Aristodemou, Xiaolan Liu, Yuan Wang, Konstantinos G. Kyriakopoulos, Sangarapillai Lambotharan, Qingsong Wei
[ABSTRACT]
As we transition from Narrow Artificial Intelligence towards Artificial Super
Intelligence, users are increasingly concerned about their privacy and the
trustworthiness of machine learning (ML) technology. A common denominator for
the metrics of trustworthiness is the quantification of uncertainty inherent in
DL algorithms, and specifically in the model parameters, input data, and model
predictions. One of the common approaches to address privacy-related issues in
DL is to adopt distributed learning such as federated learning (FL), where
private raw data is not shared among users. Despite the privacy-preserving
mechanisms in FL, it still faces challenges in trustworthiness. Specifically,
the malicious users, during training, can systematically create malicious model
parameters to compromise the models predictive and generative capabilities,
resulting in high uncertainty about their reliability. To demonstrate malicious
behaviour, we propose a novel model poisoning attack method named Delphi which
aims to maximise the uncertainty of the global model output. We achieve this by
taking advantage of the relationship between the uncertainty and the model
parameters of the first hidden layer of the local model. Delphi employs two
types of optimisation , Bayesian Optimisation and Least Squares Trust Region,
to search for the optimal poisoned model parameters, named as Delphi-BO and
Delphi-LSTR. We quantify the uncertainty using the KL Divergence to minimise
the distance of the predictive probability distribution towards an uncertain
distribution of model output. Furthermore, we establish a mathematical proof
for the attack effectiveness demonstrated in FL. Numerical results demonstrate
that Delphi-BO induces a higher amount of uncertainty than Delphi-LSTR
highlighting vulnerability of FL systems to model poisoning attacks.
[COMMENTS]
14 pages
[LINK]
http://arxiv.org/abs/2501.08002v2
[DATE]
2025-01-15 19:52:29+08:00
[CATEGORIES]
cs.LG
MeshMask: Physics-Based Simulations with Masked Graph Neural Networks
[AUTHORS]
Paul Garnier, Vincent Lannelongue, Jonathan Viquerat, Elie Hachem
[ABSTRACT]
We introduce a novel masked pre-training technique for graph neural networks
(GNNs) applied to computational fluid dynamics (CFD) problems. By randomly
masking up to 40\% of input mesh nodes during pre-training, we force the model
to learn robust representations of complex fluid dynamics. We pair this masking
strategy with an asymmetric encoder-decoder architecture and gated multi-layer
perceptrons to further enhance performance. The proposed method achieves
state-of-the-art results on seven CFD datasets, including a new challenging
dataset of 3D intracranial aneurysm simulations with over 250,000 nodes per
mesh. Moreover, it significantly improves model performance and training
efficiency across such diverse range of fluid simulation tasks. We demonstrate
improvements of up to 60\% in long-term prediction accuracy compared to
previous best models, while maintaining similar computational costs. Notably,
our approach enables effective pre-training on multiple datasets
simultaneously, significantly reducing the time and data required to achieve
high performance on new tasks. Through extensive ablation studies, we provide
insights into the optimal masking ratio, architectural choices, and training
strategies.
[LINK]
http://arxiv.org/abs/2501.08738v1
[DATE]
2025-01-15 19:34:56+08:00
[CATEGORIES]
cs.LG
Anthropomorphic Features for On-Line Signatures
[AUTHORS]
Moises Diaz, Miguel A. Ferrer, Jose J. Quintana
[ABSTRACT]
Many features have been proposed in on-line signature verification.
Generally, these features rely on the position of the on-line signature samples
and their dynamic properties, as recorded by a tablet. This paper proposes a
novel feature space to describe efficiently on-line signatures. Since producing
a signature requires a skeletal arm system and its associated muscles, the new
feature space is based on characterizing the movement of the shoulder, the
elbow and the wrist joints when signing. As this motion is not directly
obtained from a digital tablet, the new features are calculated by means of a
virtual skeletal arm (VSA) model, which simulates the architecture of a real
arm and forearm. Specifically, the VSA motion is described by its 3D joint
position and its joint angles. These anthropomorphic features are worked out
from both pen position and orientation through the VSA forward and direct
kinematic model. The anthropomorphic features’ robustness is proved by
achieving state-of-the-art performance with several verifiers and multiple
benchmarks on third party signature databases, which were collected with
different devices and in different languages and scripts.
[LINK]
http://arxiv.org/abs/2501.09048v1
[DATE]
2025-01-15 19:28:36+08:00
[CATEGORIES]
cs.LG
Applying the maximum entropy principle to neural networks enhances multi-species distribution models
[AUTHORS]
Maxime Ryckewaert, Diego Marcos, Christophe Botella, Maximilien Servajean, Pierre Bonnet, Alexis Joly
[ABSTRACT]
The rapid expansion of citizen science initiatives has led to a significant
growth of biodiversity databases, and particularly presence-only (PO)
observations. PO data are invaluable for understanding species distributions
and their dynamics, but their use in a Species Distribution Model (SDM) is
curtailed by sampling biases and the lack of information on absences. Poisson
point processes are widely used for SDMs, with Maxent being one of the most
popular methods. Maxent maximises the entropy of a probability distribution
across sites as a function of predefined transformations of variables, called
features. In contrast, neural networks and deep learning have emerged as a
promising technique for automatic feature extraction from complex input
variables. Arbitrarily complex transformations of input variables can be
learned from the data efficiently through backpropagation and stochastic
gradient descent (SGD). In this paper, we propose DeepMaxent, which harnesses
neural networks to automatically learn shared features among species, using the
maximum entropy principle. To do so, it employs a normalised Poisson loss where
for each species, presence probabilities across sites are modelled by a neural
network. We evaluate DeepMaxent on a benchmark dataset known for its spatial
sampling biases, using PO data for calibration and presence-absence (PA) data
for validation across six regions with different biological groups and
covariates. Our results indicate that DeepMaxent performs better than Maxent
and other leading SDMs across all regions and taxonomic groups. The method
performs particularly well in regions of uneven sampling, demonstrating
substantial potential to increase SDM performances. In particular, our approach
yields more accurate predictions than traditional single-species models, which
opens up new possibilities for methodological enhancement.
[COMMENTS]
Submitted to Methods in Ecology and Evolution
[LINK]
http://arxiv.org/abs/2412.19217v2
[DATE]
2025-01-15 19:21:16+08:00
[CATEGORIES]
cs.LG
A Closer Look at Deep Learning Methods on Tabular Datasets
[AUTHORS]
Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, De-Chuan Zhan
[ABSTRACT]
Tabular data is prevalent across diverse domains in machine learning. While
classical methods like tree-based models have long been effective, Deep Neural
Network (DNN)-based methods have recently demonstrated promising performance.
However, the diverse characteristics of methods and the inherent heterogeneity
of tabular datasets make understanding and interpreting tabular methods both
challenging and prone to unstable observations. In this paper, we conduct
in-depth evaluations and comprehensive analyses of tabular methods, with a
particular focus on DNN-based models, using a benchmark of over 300 tabular
datasets spanning a wide range of task types, sizes, and domains. First, we
perform an extensive comparison of 32 state-of-the-art deep and tree-based
methods, evaluating their average performance across multiple criteria.
Although method ranks vary across datasets, we empirically find that
top-performing methods tend to concentrate within a small subset of tabular
models, regardless of the criteria used. Next, we investigate whether the
training dynamics of deep tabular models can be predicted based on dataset
properties. This approach not only offers insights into the behavior of deep
tabular methods but also identifies a core set of “meta-features” that reflect
dataset heterogeneity. The other subset includes datasets where method ranks
are consistent with the overall benchmark, acting as a reliable probe for
further tabular analysis.
[LINK]
http://arxiv.org/abs/2407.00956v3
[DATE]
2025-01-15 19:19:30+08:00
[CATEGORIES]
cs.LG
MambaLRP: Explaining Selective State Space Sequence Models
[AUTHORS]
Farnoush Rezaei Jafari, Grégoire Montavon, Klaus-Robert Müller, Oliver Eberle
[ABSTRACT]
Recent sequence modeling approaches using selective state space sequence
models, referred to as Mamba models, have seen a surge of interest. These
models allow efficient processing of long sequences in linear time and are
rapidly being adopted in a wide range of applications such as language
modeling, demonstrating promising performance. To foster their reliable use in
real-world scenarios, it is crucial to augment their transparency. Our work
bridges this critical gap by bringing explainability, particularly Layer-wise
Relevance Propagation (LRP), to the Mamba architecture. Guided by the axiom of
relevance conservation, we identify specific components in the Mamba
architecture, which cause unfaithful explanations. To remedy this issue, we
propose MambaLRP, a novel algorithm within the LRP framework, which ensures a
more stable and reliable relevance propagation through these components. Our
proposed method is theoretically sound and excels in achieving state-of-the-art
explanation performance across a diverse range of models and datasets.
Moreover, MambaLRP facilitates a deeper inspection of Mamba architectures,
uncovering various biases and evaluating their significance. It also enables
the analysis of previous speculations regarding the long-range capabilities of
Mamba models.
[LINK]
http://arxiv.org/abs/2406.07592v3
[DATE]
2025-01-15 19:18:10+08:00
[CATEGORIES]
cs.LG
GRAPPA – A Hybrid Graph Neural Network for Predicting Pure Component Vapor Pressures
[AUTHORS]
Marco Hoffmann, Hans Hasse, Fabian Jirasek
[ABSTRACT]
Although the pure component vapor pressure is one of the most important
properties for designing chemical processes, no broadly applicable,
sufficiently accurate, and open-source prediction method has been available. To
overcome this, we have developed GRAPPA - a hybrid graph neural network for
predicting vapor pressures of pure components. GRAPPA enables the prediction of
the vapor pressure curve of basically any organic molecule, requiring only the
molecular structure as input. The new model consists of three parts: A graph
attention network for the message passing step, a pooling function that
captures long-range interactions, and a prediction head that yields the
component-specific parameters of the Antoine equation, from which the vapor
pressure can readily and consistently be calculated for any temperature. We
have trained and evaluated GRAPPA on experimental vapor pressure data of almost
25,000 pure components. We found excellent prediction accuracy for unseen
components, outperforming state-of-the-art group contribution methods and other
machine learning approaches in applicability and accuracy. The trained model
and its code are fully disclosed, and GRAPPA is directly applicable via the
interactive website ml-prop.mv.rptu.de.
[COMMENTS]
38 pages, 12 figures
[LINK]
http://arxiv.org/abs/2501.08729v1
[DATE]
2025-01-15 19:11:38+08:00
[CATEGORIES]
cs.LG
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
[AUTHORS]
Zerui Tao, Yuhta Takida, Naoki Murata, Qibin Zhao, Yuki Mitsufuji
[ABSTRACT]
Parameter-Efficient Fine-Tuning (PEFT) of text-to-image models has become an
increasingly popular technique with many applications. Among the various PEFT
methods, Low-Rank Adaptation (LoRA) and its variants have gained significant
attention due to their effectiveness, enabling users to fine-tune models with
limited computational resources. However, the approximation gap between the
low-rank assumption and desired fine-tuning weights prevents the simultaneous
acquisition of ultra-parameter-efficiency and better performance. To reduce
this gap and further improve the power of LoRA, we propose a new PEFT method
that combines two classes of adaptations, namely, transform and residual
adaptations. In specific, we first apply a full-rank and dense transform to the
pre-trained weight. This learnable transform is expected to align the
pre-trained weight as closely as possible to the desired weight, thereby
reducing the rank of the residual weight. Then, the residual part can be
effectively approximated by more compact and parameter-efficient structures,
with a smaller approximation error. To achieve ultra-parameter-efficiency in
practice, we design highly flexible and effective tensor decompositions for
both the transform and residual adaptations. Additionally, popular PEFT methods
such as DoRA can be summarized under this transform plus residual adaptation
scheme. Experiments are conducted on fine-tuning Stable Diffusion models in
subject-driven and controllable generation. The results manifest that our
method can achieve better performances and parameter efficiency compared to
LoRA and several baselines.
[LINK]
http://arxiv.org/abs/2501.08727v1
[DATE]
2025-01-15 19:10:37+08:00
[CATEGORIES]
cs.LG
Sparse Low-Ranked Self-Attention Transformer for Remaining Useful Lifetime Prediction of Optical Fiber Amplifiers
[AUTHORS]
Dominic Schneider, Lutz Rapp
[ABSTRACT]
Optical fiber amplifiers are key elements in present optical networks.
Failures of these components result in high financial loss of income of the
network operator as the communication traffic over an affected link is
interrupted. Applying Remaining useful lifetime (RUL) prediction in the context
of Predictive Maintenance (PdM) to optical fiber amplifiers to predict upcoming
system failures at an early stage, so that network outages can be minimized
through planning of targeted maintenance actions, ensures reliability and
safety. Optical fiber amplifier are complex systems, that work under various
operating conditions, which makes correct forecasting a difficult task.
Increased monitoring capabilities of systems results in datasets that
facilitate the application of data-driven RUL prediction methods. Deep learning
models in particular have shown good performance, but generalization based on
comparatively small datasets for RUL prediction is difficult. In this paper, we
propose Sparse Low-ranked self-Attention Transformer (SLAT) as a novel RUL
prediction method. SLAT is based on an encoder-decoder architecture, wherein
two parallel working encoders extract features for sensors and time steps. By
utilizing the self-attention mechanism, long-term dependencies can be learned
from long sequences. The implementation of sparsity in the attention matrix and
a low-rank parametrization reduce overfitting and increase generalization.
Experimental application to optical fiber amplifiers exemplified on EDFA, as
well as a reference dataset from turbofan engines, shows that SLAT outperforms
the state-of-the-art methods.
[COMMENTS]
9 pages, 7 figures
[LINK]
http://arxiv.org/abs/2409.14378v3
[DATE]
2025-01-15 19:07:35+08:00
[CATEGORIES]
cs.LG
$\texttt{InfoHier}$: Hierarchical Information Extraction via Encoding and Embedding
[AUTHORS]
Tianru Zhang, Li Ju, Prashant Singh, Salman Toor
[ABSTRACT]
Analyzing large-scale datasets, especially involving complex and
high-dimensional data like images, is particularly challenging. While
self-supervised learning (SSL) has proven effective for learning
representations from unlabelled data, it typically focuses on flat,
non-hierarchical structures, missing the multi-level relationships present in
many real-world datasets. Hierarchical clustering (HC) can uncover these
relationships by organizing data into a tree-like structure, but it often
relies on rigid similarity metrics that struggle to capture the complexity of
diverse data types. To address these we envision $\texttt{InfoHier}$, a
framework that combines SSL with HC to jointly learn robust latent
representations and hierarchical structures. This approach leverages SSL to
provide adaptive representations, enhancing HC’s ability to capture complex
patterns. Simultaneously, it integrates HC loss to refine SSL training,
resulting in representations that are more attuned to the underlying
information hierarchy. $\texttt{InfoHier}$ has the potential to improve the
expressiveness and performance of both clustering and representation learning,
offering significant benefits for data analysis, management, and information
retrieval.
[COMMENTS]
10 pages, 4 figures
[LINK]
http://arxiv.org/abs/2501.08717v1
[DATE]
2025-01-15 18:58:32+08:00
[CATEGORIES]
cs.LG
Self-supervised Transformation Learning for Equivariant Representations
[AUTHORS]
Jaemyung Yu, Jaehyun Choi, Dong-Jae Lee, HyeongGwon Hong, Junmo Kim
[COMMENTS]
38th Conference on Neural Information Processing Systems (NeurIPS
2024)
[LINK]
http://arxiv.org/abs/2501.08712v1
[DATE]
2025-01-15 18:54:21+08:00
[CATEGORIES]
cs.LG
FADE: Towards Fairness-aware Augmentation for Domain Generalization via Classifier-Guided Score-based Diffusion Models
[AUTHORS]
Yujie Lin, Dong Li, Chen Zhao, Minglai Shao, Guihong Wan
[ABSTRACT]
Fairness-aware domain generalization (FairDG) has emerged as a critical
challenge for deploying trustworthy AI systems, particularly in scenarios
involving distribution shifts. Traditional methods for addressing fairness have
failed in domain generalization due to their lack of consideration for
distribution shifts. Although disentanglement has been used to tackle FairDG,
it is limited by its strong assumptions. To overcome these limitations, we
propose Fairness-aware Classifier-Guided Score-based Diffusion Models (FADE) as
a novel approach to effectively address the FairDG issue. Specifically, we
first pre-train a score-based diffusion model (SDM) and two classifiers to
equip the model with strong generalization capabilities across different
domains. Then, we guide the SDM using these pre-trained classifiers to
effectively eliminate sensitive information from the generated data. Finally,
the generated fair data is used to train downstream classifiers, ensuring
robust performance under new data distributions. Extensive experiments on three
real-world datasets demonstrate that FADE not only enhances fairness but also
improves accuracy in the presence of distribution shifts. Additionally, FADE
outperforms existing methods in achieving the best accuracy-fairness
trade-offs.
[LINK]
http://arxiv.org/abs/2406.09495v3
[DATE]
2025-01-15 18:47:05+08:00
[CATEGORIES]
cs.LG
Relational Reasoning Networks
[AUTHORS]
Giuseppe Marra, Michelangelo Diligenti, Francesco Giannini
[ABSTRACT]
Neuro-symbolic methods integrate neural architectures, knowledge
representation and reasoning. However, they have been struggling at both
dealing with the intrinsic uncertainty of the observations and scaling to
real-world applications. This paper presents Relational Reasoning Networks
(R2N), a novel end-to-end model that performs relational reasoning in the
latent space of a deep learner architecture, where the representations of
constants, ground atoms and their manipulations are learned in an integrated
fashion. Unlike flat architectures like Knowledge Graph Embedders, which can
only represent relations between entities, R2Ns define an additional
computational structure, accounting for higher-level relations among the ground
atoms. The considered relations can be explicitly known, like the ones defined
by logic formulas, or defined as unconstrained correlations among groups of
ground atoms. R2Ns can be applied to purely symbolic tasks or as a
neuro-symbolic platform to integrate learning and reasoning in heterogeneous
problems with both symbolic and feature-based represented entities. The
proposed model overtakes the limitations of previous neuro-symbolic methods
that have been either limited in terms of scalability or expressivity. The
proposed methodology is shown to achieve state-of-the-art results in different
experimental settings.
[LINK]
http://arxiv.org/abs/2106.00393v4
[DATE]
2025-01-15 18:33:52+08:00
[CATEGORIES]
cs.LG
Extended convexity and smoothness and their applications in deep learning
[AUTHORS]
Binchuan Qi, Wei Gong, Li Li
[ABSTRACT]
This paper introduces an optimization framework aimed at providing a
theoretical foundation for a class of composite optimization problems,
particularly those encountered in deep learning. In this framework, we
introduce $\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness to
generalize the existing concepts of Lipschitz smoothness and strong convexity.
Furthermore, we analyze and establish the convergence of both gradient descent
and stochastic gradient descent methods for objective functions that are
$\mathcal{H}(\Phi)$-smooth. We prove that the optimal convergence rates of
these methods depend solely on the homogeneous degree of $\Phi$. Based on these
findings, we construct two types of non-convex and non-smooth optimization
problems: deterministic composite and stochastic composite optimization
problems, which encompass the majority of optimization problems in deep
learning. To address these problems, we develop the gradient structure control
algorithm and prove that it can locate approximate global optima. This marks a
significant departure from traditional non-convex analysis framework, which
typically settle for stationary points. Therefore, with the introduction of
$\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness, along with
the GSC algorithm, the non-convex optimization mechanisms in deep learning can
be theoretically explained and supported. Finally, the effectiveness of the
proposed framework is substantiated through empirical experimentation.
[LINK]
http://arxiv.org/abs/2410.05807v2
[DATE]
2025-01-15 17:53:49+08:00
[CATEGORIES]
cs.LG
Learning Hemodynamic Scalar Fields on Coronary Artery Meshes: A Benchmark of Geometric Deep Learning Models
[AUTHORS]
Guido Nannini, Julian Suk, Patryk Rygiel, Simone Saitta, Luca Mariani, Riccardo Maranga, Andrea Baggiano, Gianluca Pontone, Alberto Redaelli
[ABSTRACT]
Coronary artery disease, caused by the narrowing of coronary vessels due to
atherosclerosis, is the leading cause of death worldwide. The diagnostic gold
standard, fractional flow reserve (FFR), measures the trans-stenotic pressure
ratio during maximal vasodilation but is invasive and costly. This has driven
the development of virtual FFR (vFFR) using computational fluid dynamics (CFD)
to simulate coronary flow. Geometric deep learning algorithms have shown
promise for learning features on meshes, including cardiovascular research
applications. This study empirically analyzes various backends for predicting
vFFR fields in coronary arteries as CFD surrogates, comparing six backends for
learning hemodynamics on meshes using CFD solutions as ground truth.
The study has two parts: i) Using 1,500 synthetic left coronary artery
bifurcations, models were trained to predict pressure-related fields for vFFR
reconstruction, comparing different learning variables. ii) Using 427
patient-specific CFD simulations, experiments were repeated focusing on the
best-performing learning variable from the synthetic dataset.
Most backends performed well on the synthetic dataset, especially when
predicting pressure drop over the manifold. Transformer-based backends
outperformed others when predicting pressure and vFFR fields and were the only
models achieving strong performance on patient-specific data, excelling in both
average per-point error and vFFR accuracy in stenotic lesions.
These results suggest geometric deep learning backends can effectively
replace CFD for simple geometries, while transformer-based networks are
superior for complex, heterogeneous datasets. Pressure drop was identified as
the optimal network output for learning pressure-related fields.
[LINK]
http://arxiv.org/abs/2501.09046v1
[DATE]
2025-01-15 17:52:40+08:00
[CATEGORIES]
cs.LG
Diffusion-based Unsupervised Audio-visual Speech Enhancement
[AUTHORS]
Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel, Xavier Alameda-Pineda
[ABSTRACT]
This paper proposes a new unsupervised audio-visual speech enhancement (AVSE)
approach that combines a diffusion-based audio-visual speech generative model
with a non-negative matrix factorization (NMF) noise model. First, the
diffusion model is pre-trained on clean speech conditioned on corresponding
video data to simulate the speech generative distribution. This pre-trained
model is then paired with the NMF-based noise model to estimate clean speech
iteratively. Specifically, a diffusion-based posterior sampling approach is
implemented within the reverse diffusion process, where after each iteration, a
speech estimate is obtained and used to update the noise parameters.
Experimental results confirm that the proposed AVSE approach not only
outperforms its audio-only counterpart but also generalizes better than a
recent supervised-generative AVSE method. Additionally, the new inference
algorithm offers a better balance between inference speed and performance
compared to the previous diffusion-based method. Code and demo available at:
https://jeaneudesayilo.github.io/fast_UdiffSE
[LINK]
http://arxiv.org/abs/2410.05301v2
[DATE]
2025-01-15 17:42:42+08:00
[CATEGORIES]
cs.LG
Interpreting Equivariant Representations
[AUTHORS]
Andreas Abildtrup Hansen, Anna Calissano, Aasa Feragen
[ABSTRACT]
Latent representations are used extensively for downstream tasks, such as
visualization, interpolation or feature extraction of deep learning models.
Invariant and equivariant neural networks are powerful and well-established
models for enforcing inductive biases. In this paper, we demonstrate that the
inductive bias imposed on the by an equivariant model must also be taken into
account when using latent representations. We show how not accounting for the
inductive biases leads to decreased performance on downstream tasks, and vice
versa, how accounting for inductive biases can be done effectively by using an
invariant projection of the latent representations. We propose principles for
how to choose such a projection, and show the impact of using these principles
in two common examples: First, we study a permutation equivariant variational
auto-encoder trained for molecule graph generation; here we show that invariant
projections can be designed that incur no loss of information in the resulting
invariant representation. Next, we study a rotation-equivariant representation
used for image classification. Here, we illustrate how random invariant
projections can be used to obtain an invariant representation with a high
degree of retained information. In both cases, the analysis of invariant latent
representations proves superior to their equivariant counterparts. Finally, we
illustrate that the phenomena documented here for equivariant neural networks
have counterparts in standard neural networks where invariance is encouraged
via augmentation. Thus, while these ambiguities may be known by experienced
developers of equivariant models, we make both the knowledge as well as
effective tools to handle the ambiguities available to the broader community.
[COMMENTS]
This paper was updated to reflect the version accepted to ICML 2024
[LINK]
http://arxiv.org/abs/2401.12588v2
[DATE]
2025-01-15 17:30:18+08:00
[CATEGORIES]
cs.LG
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
[AUTHORS]
Junghyun Lee, Se-Young Yun, Kwang-Sung Jun
[ABSTRACT]
We present a unified likelihood ratio-based confidence sequence (CS) for any
(self-concordant) generalized linear model (GLM) that is guaranteed to be
convex and numerically tight. We show that this is on par or improves upon
known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In
particular, for the first time, our CS for Bernoulli has a
$\mathrm{poly}(S)$-free radius where $S$ is the norm of the unknown parameter.
Our first technical novelty is its derivation, which utilizes a time-uniform
PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a
rather unpopular choice for deriving CSs. As a direct application of our new
CS, we propose a simple and natural optimistic algorithm called OFUGLB,
applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our
analysis shows that the celebrated optimistic approach simultaneously attains
state-of-the-art regrets for various self-concordant (not necessarily bounded)
GLBs, and even $\mathrm{poly}(S)$-free for bounded GLBs, including logistic
bandits. The regret analysis, our second technical novelty, follows from
combining our new CS with a new proof technique that completely avoids the
previously widely used self-concordant control lemma (Faury et al., 2020, Lemma
9). Numerically, OFUGLB outperforms or is at par with prior algorithms for
logistic bandits.
[COMMENTS]
39 pages, 2 figures, 2 tables; Accepted to the 38th Conference on
Neural Information Processing Systems (NeurIPS 2024) (ver3: minor revisions,
code refactoring; ver2: major revision, including new experiments,
reorganization, fixing typos in the proofs of ver1, etc)
[LINK]
http://arxiv.org/abs/2407.13977v3
[DATE]
2025-01-15 17:25:02+08:00
[CATEGORIES]
cs.LG
SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks
[AUTHORS]
Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib
[ABSTRACT]
Graph Neural Networks (GNNs) have gained traction across different domains
such as transportation, bio-informatics, language processing, and computer
vision. However, there is a noticeable absence of research on applying GNNs to
supply chain networks. Supply chain networks are inherently graph-like in
structure, making them prime candidates for applying GNN methodologies. This
opens up a world of possibilities for optimizing, predicting, and solving even
the most complex supply chain problems. A major setback in this approach lies
in the absence of real-world benchmark datasets to facilitate the research and
resolution of supply chain problems using GNNs. To address the issue, we
present a real-world benchmark dataset for temporal tasks, obtained from one of
the leading FMCG companies in Bangladesh, focusing on supply chain planning for
production purposes. The dataset includes temporal data as node features to
enable sales predictions, production planning, and the identification of
factory issues. By utilizing this dataset, researchers can employ GNNs to
address numerous supply chain problems, thereby advancing the field of supply
chain analytics and planning. Source: https://github.com/CIOL-SUST/SupplyGraph
[COMMENTS]
Accepted to 4th workshop on Graphs and more Complex structures for
Learning and Reasoning, colocated with AAAI 2024
[LINK]
http://arxiv.org/abs/2401.15299v3
[DATE]
2025-01-15 17:23:55+08:00
[CATEGORIES]
cs.LG
Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity
[AUTHORS]
Yicheng Li, Qian Lin
[ABSTRACT]
This paper introduces a diagonal adaptive kernel model that dynamically
learns kernel eigenvalues and output coefficients simultaneously during
training. Unlike fixed-kernel methods tied to the neural tangent kernel theory,
the diagonal adaptive kernel model adapts to the structure of the truth
function, significantly improving generalization over fixed-kernel methods,
especially when the initial kernel is misaligned with the target. Moreover, we
show that the adaptivity comes from learning the right eigenvalues during
training, showing a feature learning behavior. By extending to deeper
parameterization, we further show how extra depth enhances adaptability and
generalization. This study combines the insights from feature learning and
implicit regularization and provides new perspective into the adaptivity and
generalization potential of neural networks beyond the kernel regime.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2409.00894
[LINK]
http://arxiv.org/abs/2501.08679v1
[DATE]
2025-01-15 17:20:02+08:00
[CATEGORIES]
cs.LG
Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework
[AUTHORS]
Zhongchao Yi, Zhengyang Zhou, Qihe Huang, Yanjiang Chen, Liheng Yu, Xu Wang, Yang Wang
[ABSTRACT]
Spatiotemporal learning has become a pivotal technique to enable urban
intelligence. Traditional spatiotemporal models mostly focus on a specific task
by assuming a same distribution between training and testing sets. However,
given that urban systems are usually dynamic, multi-sourced with imbalanced
data distributions, current specific task-specific models fail to generalize to
new urban conditions and adapt to new domains without explicitly modeling
interdependencies across various dimensions and types of urban data. To this
end, we argue that there is an essential to propose a Continuous Multi-task
Spatio-Temporal learning framework (CMuST) to empower collective urban
intelligence, which reforms the urban spatiotemporal learning from
single-domain to cooperatively multi-dimensional and multi-task learning.
Specifically, CMuST proposes a new multi-dimensional spatiotemporal interaction
network (MSTI) to allow cross-interactions between context and main
observations as well as self-interactions within spatial and temporal aspects
to be exposed, which is also the core for capturing task-level commonality and
personalization. To ensure continuous task learning, a novel Rolling Adaptation
training scheme (RoAda) is devised, which not only preserves task uniqueness by
constructing data summarization-driven task prompts, but also harnesses
correlated patterns among tasks by iterative model behavior modeling. We
further establish a benchmark of three cities for multi-task spatiotemporal
learning, and empirically demonstrate the superiority of CMuST via extensive
evaluations on these datasets. The impressive improvements on both few-shot
streaming data and new domain tasks against existing SOAT methods are achieved.
Code is available at https://github.com/DILab-USTCSZ/CMuST.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.10524v2
[DATE]
2025-01-15 17:17:01+08:00
[CATEGORIES]
cs.LG
Investigating Parameter-Efficiency of Hybrid QuGANs Based on Geometric Properties of Generated Sea Route Graphs
[AUTHORS]
Tobias Rohe, Florian Burger, Michael Kölle, Sebastian Wölckert, Maximilian Zorn, Claudia Linnhoff-Popien
[ABSTRACT]
The demand for artificially generated data for the development, training and
testing of new algorithms is omnipresent. Quantum computing (QC), does offer
the hope that its inherent probabilistic functionality can be utilised in this
field of generative artificial intelligence. In this study, we use
quantum-classical hybrid generative adversarial networks (QuGANs) to
artificially generate graphs of shipping routes. We create a training dataset
based on real shipping data and investigate to what extent QuGANs are able to
learn and reproduce inherent distributions and geometric features of this data.
We compare hybrid QuGANs with classical Generative Adversarial Networks (GANs),
with a special focus on their parameter efficiency. Our results indicate that
QuGANs are indeed able to quickly learn and represent underlying geometric
properties and distributions, although they seem to have difficulties in
introducing variance into the sampled data. Compared to classical GANs of
greater size, measured in the number of parameters used, some QuGANs show
similar result quality. Our reference to concrete use cases, such as the
generation of shipping data, provides an illustrative example and demonstrate
the potential and diversity in which QC can be used.
[LINK]
http://arxiv.org/abs/2501.08678v1
[DATE]
2025-01-15 17:08:05+08:00
[CATEGORIES]
cs.LG
SPEQ: Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning
[AUTHORS]
Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov
[ABSTRACT]
A key challenge in Deep Reinforcement Learning is sample efficiency,
especially in real-world applications where collecting environment interactions
is expensive or risky. Recent off-policy algorithms improve sample efficiency
by increasing the Update-To-Data (UTD) ratio and performing more gradient
updates per environment interaction. While this improves sample efficiency, it
significantly increases computational cost due to the higher number of gradient
updates required. In this paper we propose a sample-efficient method to improve
computational efficiency by separating training into distinct learning phases
in order to exploit gradient updates more effectively. Our approach builds on
top of the Dropout Q-Functions (DroQ) algorithm and alternates between an
online, low UTD ratio training phase, and an offline stabilization phase.
During the stabilization phase, we fine-tune the Q-functions without collecting
new environment interactions. This process improves the effectiveness of the
replay buffer and reduces computational overhead. Our experimental results on
continuous control problems show that our method achieves results comparable to
state-of-the-art, high UTD ratio algorithms while requiring 56\% fewer gradient
updates and 50\% less training time than DroQ. Our approach offers an effective
and computationally economical solution while maintaining the same sample
efficiency as the more costly, high UTD ratio state-of-the-art.
[LINK]
http://arxiv.org/abs/2501.08669v1
[DATE]
2025-01-15 17:04:19+08:00
[CATEGORIES]
cs.LG
Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products
[AUTHORS]
Bethia Sun, Maurice Pagnucco, Yang Song
[ABSTRACT]
Since the inception of the classicalist vs. connectionist debate, it has been
argued that the ability to systematically combine symbol-like entities into
compositional representations is crucial for human intelligence. In
connectionist systems, the field of disentanglement has gained prominence for
its ability to produce explicitly compositional representations; however, it
relies on a fundamentally symbolic, concatenative representation of
compositional structure that clashes with the continuous, distributed
foundations of deep learning. To resolve this tension, we extend Smolensky’s
Tensor Product Representation (TPR) and introduce Soft TPR, a representational
form that encodes compositional structure in an inherently distributed,
flexible manner, along with Soft TPR Autoencoder, a theoretically-principled
architecture designed specifically to learn Soft TPRs. Comprehensive
evaluations in the visual representation learning domain demonstrate that the
Soft TPR framework consistently outperforms conventional disentanglement
alternatives – achieving state-of-the-art disentanglement, boosting
representation learner convergence, and delivering superior sample efficiency
and low-sample regime performance in downstream tasks. These findings highlight
the promise of a distributed and flexible approach to representing
compositional structure by potentially enhancing alignment with the core
principles of deep learning over the conventional symbolic approach.
[COMMENTS]
Accepted to Neurips 2024. 10 pages + supplementary
[LINK]
http://arxiv.org/abs/2412.04671v2
[DATE]
2025-01-15 17:01:09+08:00
[CATEGORIES]
cs.LG
Product of Gaussian Mixture Diffusion Model for non-linear MRI Inversion
[AUTHORS]
Laurenz Nagler, Martin Zach, Thomas Pock
[ABSTRACT]
Diffusion models have recently shown remarkable results in magnetic resonance
imaging reconstruction. However, the employed networks typically are black-box
estimators of the (smoothed) prior score with tens of millions of parameters,
restricting interpretability and increasing reconstruction time. Furthermore,
parallel imaging reconstruction algorithms either rely on off-line coil
sensitivity estimation, which is prone to misalignment and restricting sampling
trajectories, or perform per-coil reconstruction, making the computational cost
proportional to the number of coils. To overcome this, we jointly reconstruct
the image and the coil sensitivities using the lightweight,
parameter-efficient, and interpretable product of Gaussian mixture diffusion
model as an image prior and a classical smoothness priors on the coil
sensitivities. The proposed method delivers promising results while allowing
for fast inference and demonstrating robustness to contrast out-of-distribution
data and sampling trajectories, comparable to classical variational penalties
such as total variation. Finally, the probabilistic formulation allows the
calculation of the posterior expectation and pixel-wise variance.
[LINK]
http://arxiv.org/abs/2501.08662v1
[DATE]
2025-01-15 16:57:41+08:00
[CATEGORIES]
cs.LG
Fine-grained Spatio-temporal Event Prediction with Self-adaptive Anchor Graph
[AUTHORS]
Wang-Tao Zhou, Zhao Kang, Sicong Liu, Lizong Zhang, Ling Tian
[ABSTRACT]
Event prediction tasks often handle spatio-temporal data distributed in a
large spatial area. Different regions in the area exhibit different
characteristics while having latent correlations. This spatial heterogeneity
and correlations greatly affect the spatio-temporal distributions of event
occurrences, which has not been addressed by state-of-the-art models. Learning
spatial dependencies of events in a continuous space is challenging due to its
fine granularity and a lack of prior knowledge. In this work, we propose a
novel Graph Spatio-Temporal Point Process (GSTPP) model for fine-grained event
prediction. It adopts an encoder-decoder architecture that jointly models the
state dynamics of spatially localized regions using neural Ordinary
Differential Equations (ODEs). The state evolution is built on the foundation
of a novel Self-Adaptive Anchor Graph (SAAG) that captures spatial
dependencies. By adaptively localizing the anchor nodes in the space and
jointly constructing the correlation edges between them, the SAAG enhances the
model’s ability of learning complex spatial event patterns. The proposed GSTPP
model greatly improves the accuracy of fine-grained event prediction. Extensive
experimental results show that our method greatly improves the prediction
accuracy over existing spatio-temporal event prediction approaches.
[COMMENTS]
Accepted to SIAM International Conference on Data Mining 2025
(SDM’25)
[LINK]
http://arxiv.org/abs/2501.08653v1
[DATE]
2025-01-15 16:38:07+08:00
[CATEGORIES]
cs.LG
Joint Learning of Depth and Appearance for Portrait Image Animation
[AUTHORS]
Xinya Ji, Gaspard Zoss, Prashanth Chandran, Lingchen Yang, Xun Cao, Barbara Solenthaler, Derek Bradley
[ABSTRACT]
2D portrait animation has experienced significant advancements in recent
years. Much research has utilized the prior knowledge embedded in large
generative diffusion models to enhance high-quality image manipulation.
However, most methods only focus on generating RGB images as output, and the
co-generation of consistent visual plus 3D output remains largely
under-explored. In our work, we propose to jointly learn the visual appearance
and depth simultaneously in a diffusion-based portrait image generator. Our
method embraces the end-to-end diffusion paradigm and introduces a new
architecture suitable for learning this conditional joint distribution,
consisting of a reference network and a channel-expanded diffusion backbone.
Once trained, our framework can be efficiently adapted to various downstream
applications, such as facial depth-to-image and image-to-depth generation,
portrait relighting, and audio-driven talking head animation with consistent 3D
output.
[LINK]
http://arxiv.org/abs/2501.08649v1
[DATE]
2025-01-15 16:24:35+08:00
[CATEGORIES]
cs.LG
An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness
[AUTHORS]
Xiaochuan Gong, Jie Hao, Mingrui Liu
[ABSTRACT]
This paper investigates a class of stochastic bilevel optimization problems
where the upper-level function is nonconvex with potentially unbounded
smoothness and the lower-level problem is strongly convex. These problems have
significant applications in sequential data learning, such as text
classification using recurrent neural networks. The unbounded smoothness is
characterized by the smoothness constant of the upper-level function scaling
linearly with the gradient norm, lacking a uniform upper bound. Existing
state-of-the-art algorithms require $\widetilde{O}(1/\epsilon^4)$ oracle calls
of stochastic gradient or Hessian/Jacobian-vector product to find an
$\epsilon$-stationary point. However, it remains unclear if we can further
improve the convergence rate when the assumptions for the function in the
population level also hold for each random realization almost surely. To
address this issue, we propose a new Accelerated Bilevel Optimization algorithm
named AccBO. The algorithm updates the upper-level variable by normalized
stochastic gradient descent with recursive momentum and the lower-level
variable by the stochastic Nesterov accelerated gradient descent algorithm with
averaging. We prove that our algorithm achieves an oracle complexity of
$\widetilde{O}(1/\epsilon^3)$ to find an $\epsilon$-stationary point, when the
lower-level stochastic gradient’s variance is $O(\epsilon)$. Our proof relies
on a novel lemma characterizing the dynamics of stochastic Nesterov accelerated
gradient descent algorithm under distribution drift with high probability for
the lower-level variable, which is of independent interest and also plays a
crucial role in analyzing the hypergradient estimation error over time.
Experimental results on various tasks confirm that our proposed algorithm
achieves the predicted theoretical acceleration and significantly outperforms
baselines in bilevel optimization.
[COMMENTS]
Accepted by NeurIPS 2024. The code is available at
https://github.com/MingruiLiu-ML-Lab/Accelerated-Bilevel-Optimization-Unbounded-Smoothness
[LINK]
http://arxiv.org/abs/2409.19212v5
[DATE]
2025-01-15 16:18:27+08:00
[CATEGORIES]
cs.LG
MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training
[AUTHORS]
Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui
[ABSTRACT]
Nowadays, Large Language Models (LLMs) have been trained using extended
context lengths to foster more creative applications. However, long context
training poses great challenges considering the constraint of GPU memory. It
not only leads to substantial activation memory consumption during training,
but also incurs considerable memory fragmentation. To facilitate long context
training, existing frameworks have adopted strategies such as recomputation and
various forms of parallelisms. Nevertheless, these techniques rely on redundant
computation or extensive communication, resulting in low Model FLOPS
Utilization (MFU). In this paper, we propose MEMO, a novel LLM training
framework designed for fine-grained activation memory management. Given the
quadratic scaling of computation and linear scaling of memory with sequence
lengths when using FlashAttention, we offload memory-consuming activations to
CPU memory after each layer’s forward pass and fetch them during the backward
pass. To maximize the swapping of activations without hindering computation,
and to avoid exhausting limited CPU memory, we implement a token-wise
activation recomputation and swapping mechanism. Furthermore, we tackle the
memory fragmentation issue by employing a bi-level Mixed Integer Programming
(MIP) approach, optimizing memory reuse across transformer layers. Empirical
results demonstrate that MEMO achieves an average of 1.97x and 1.80x MFU
compared to Megatron-LM and DeepSpeed, respectively. This improvement is
attributed to MEMO’s ability to minimize memory fragmentation, reduce
recomputation and intensive communication, and circumvent the delays associated
with the memory reorganization process due to fragmentation. By leveraging
fine-grained activation memory management, MEMO facilitates efficient training
of 7B LLM with 1 million sequence length on just 8 A800 GPUs, achieving an MFU
of 52.30%.
[LINK]
http://arxiv.org/abs/2407.12117v3
[DATE]
2025-01-15 16:03:55+08:00
[CATEGORIES]
cs.LG
Dynamic Localisation of Spatial-Temporal Graph Neural Network
[AUTHORS]
Wenying Duan, Shujun Guo, Wei huang, Hong Rao, Xiaoxi He
[ABSTRACT]
Spatial-temporal data, fundamental to many intelligent applications, reveals
dependencies indicating causal links between present measurements at specific
locations and historical data at the same or other locations. Within this
context, adaptive spatial-temporal graph neural networks (ASTGNNs) have emerged
as valuable tools for modelling these dependencies, especially through a
data-driven approach rather than pre-defined spatial graphs. While this
approach offers higher accuracy, it presents increased computational demands.
Addressing this challenge, this paper delves into the concept of localisation
within ASTGNNs, introducing an innovative perspective that spatial dependencies
should be dynamically evolving over time. We introduce \textit{DynAGS}, a
localised ASTGNN framework aimed at maximising efficiency and accuracy in
distributed deployment. This framework integrates dynamic localisation,
time-evolving spatial graphs, and personalised localisation, all orchestrated
around the Dynamic Graph Generator, a light-weighted central module leveraging
cross attention. The central module can integrate historical information in a
node-independent manner to enhance the feature representation of nodes at the
current moment. This improved feature representation is then used to generate a
dynamic sparse graph without the need for costly data exchanges, and it
supports personalised localisation. Performance assessments across two core
ASTGNN architectures and nine real-world datasets from various applications
reveal that \textit{DynAGS} outshines current benchmarks, underscoring that the
dynamic modelling of spatial dependencies can drastically improve model
expressibility, flexibility, and system efficiency, especially in distributed
settings.
[COMMENTS]
This paper was accepted by KDD’25
[LINK]
http://arxiv.org/abs/2501.04239v3
[DATE]
2025-01-15 15:59:39+08:00
[CATEGORIES]
cs.LG
OminiControl: Minimal and Universal Control for Diffusion Transformer
[AUTHORS]
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang
[ABSTRACT]
In this paper, we introduce OminiControl, a highly versatile and
parameter-efficient framework that integrates image conditions into pre-trained
Diffusion Transformer (DiT) models. At its core, OminiControl leverages a
parameter reuse mechanism, enabling the DiT to encode image conditions using
itself as a powerful backbone and process them with its flexible multi-modal
attention processors. Unlike existing methods, which rely heavily on additional
encoder modules with complex architectures, OminiControl (1) effectively and
efficiently incorporates injected image conditions with only ~0.1% additional
parameters, and (2) addresses a wide range of image conditioning tasks in a
unified manner, including subject-driven generation and spatially-aligned
conditions such as edges, depth, and more. Remarkably, these capabilities are
achieved by training on images generated by the DiT itself, which is
particularly beneficial for subject-driven generation. Extensive evaluations
demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted
models in both subject-driven and spatially-aligned conditional generation.
Additionally, we release our training dataset, Subjects200K, a diverse
collection of over 200,000 identity-consistent images, along with an efficient
data synthesis pipeline to advance research in subject-consistent generation.
[LINK]
http://arxiv.org/abs/2411.15098v4
[DATE]
2025-01-15 15:30:29+08:00
[CATEGORIES]
cs.LG
Transformer-based Multivariate Time Series Anomaly Localization
[AUTHORS]
Charalampos Shimillas, Kleanthis Malialis, Konstantinos Fokianos, Marios M. Polycarpou
[ABSTRACT]
With the growing complexity of Cyber-Physical Systems (CPS) and the
integration of Internet of Things (IoT), the use of sensors for online
monitoring generates large volume of multivariate time series (MTS) data.
Consequently, the need for robust anomaly diagnosis in MTS is paramount to
maintaining system reliability and safety. While significant advancements have
been made in anomaly detection, localization remains a largely underexplored
area, though crucial for intelligent decision-making. This paper introduces a
novel transformer-based model for unsupervised anomaly diagnosis in MTS, with a
focus on improving localization performance, through an in-depth analysis of
the self-attention mechanism’s learning behavior under both normal and
anomalous conditions. We formulate the anomaly localization problem as a
three-stage process: time-step, window, and segment-based. This leads to the
development of the Space-Time Anomaly Score (STAS), a new metric inspired by
the connection between transformer latent representations and space-time
statistical models. STAS is designed to capture individual anomaly behaviors
and inter-series dependencies, delivering enhanced localization performance.
Additionally, the Statistical Feature Anomaly Score (SFAS) complements STAS by
analyzing statistical features around anomalies, with their combination helping
to reduce false alarms. Experiments on real world and synthetic datasets
illustrate the model’s superiority over state-of-the-art methods in both
detection and localization tasks.
[LINK]
http://arxiv.org/abs/2501.08628v1
[DATE]
2025-01-15 15:18:51+08:00
[CATEGORIES]
cs.LG
Diffusion Models as Network Optimizers: Explorations and Analysis
[AUTHORS]
Ruihuai Liang, Bo Yang, Pengyu Chen, Xianjin Li, Yifan Xue, Zhiwen Yu, Xuelin Cao, Yan Zhang, Mérouane Debbah, H. Vincent Poor, Chau Yuen
[ABSTRACT]
Network optimization is a fundamental challenge in the Internet of Things
(IoT) network, often characterized by complex features that make it difficult
to solve these problems. Recently, generative diffusion models (GDMs) have
emerged as a promising new approach to network optimization, with the potential
to directly address these optimization problems. However, the application of
GDMs in this field is still in its early stages, and there is a noticeable lack
of theoretical research and empirical findings. In this study, we first explore
the intrinsic characteristics of generative models. Next, we provide a concise
theoretical proof and intuitive demonstration of the advantages of generative
models over discriminative models in network optimization. Based on this
exploration, we implement GDMs as optimizers aimed at learning high-quality
solution distributions for given inputs, sampling from these distributions
during inference to approximate or achieve optimal solutions. Specifically, we
utilize denoising diffusion probabilistic models (DDPMs) and employ a
classifier-free guidance mechanism to manage conditional guidance based on
input parameters. We conduct extensive experiments across three challenging
network optimization problems. By investigating various model configurations
and the principles of GDMs as optimizers, we demonstrate the ability to
overcome prediction errors and validate the convergence of generated solutions
to optimal solutions. We provide code and data at
https://github.com/qiyu3816/DiffSG.
[LINK]
http://arxiv.org/abs/2411.00453v4
[DATE]
2025-01-15 15:18:43+08:00
[CATEGORIES]
cs.LG
CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network
[AUTHORS]
Zijian Zhao, Tingwei Chen, Zhijie Cai, Xiaoyang Li, Hang Li, Qimei Chen, Guangxu Zhu
[ABSTRACT]
In recent years, Wi-Fi sensing has garnered significant attention due to its
numerous benefits, such as privacy protection, low cost, and penetration
ability. Extensive research has been conducted in this field, focusing on areas
such as gesture recognition, people identification, and fall detection.
However, many data-driven methods encounter challenges related to domain shift,
where the model fails to perform well in environments different from the
training data. One major factor contributing to this issue is the limited
availability of Wi-Fi sensing datasets, which makes models learn excessive
irrelevant information and over-fit to the training set. Unfortunately,
collecting large-scale Wi-Fi sensing datasets across diverse scenarios is a
challenging task. To address this problem, we propose CrossFi, a siamese
network-based approach that excels in both in-domain scenario and cross-domain
scenario, including few-shot, zero-shot scenarios, and even works in few-shot
new-class scenario where testing set contains new categories. The core
component of CrossFi is a sample-similarity calculation network called CSi-Net,
which improves the structure of the siamese network by using an attention
mechanism to capture similarity information, instead of simply calculating the
distance or cosine similarity. Based on it, we develop an extra Weight-Net that
can generate a template for each class, so that our CrossFi can work in
different scenarios. Experimental results demonstrate that our CrossFi achieves
state-of-the-art performance across various scenarios. In gesture recognition
task, our CrossFi achieves an accuracy of 98.17% in in-domain scenario, 91.72%
in one-shot cross-domain scenario, 64.81% in zero-shot cross-domain scenario,
and 84.75% in one-shot new-class scenario. The code for our model is publicly
available at https://github.com/RS2002/CrossFi.
[LINK]
http://arxiv.org/abs/2408.10919v3
[DATE]
2025-01-15 15:17:58+08:00
[CATEGORIES]
cs.LG
A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game
[AUTHORS]
Jason T. Isa, Lillian J. Ratliff, Samuel A. Burden
[ABSTRACT]
When humans interact with learning-based control systems, a common goal is to
minimize a cost function known only to the human. For instance, an exoskeleton
may adapt its assistance in an effort to minimize the human’s metabolic
cost-of-transport. Conventional approaches to synthesizing the learning
algorithm solve an inverse problem to infer the human’s cost. However, these
problems can be ill-posed, hard to solve, or sensitive to problem data. Here we
show a game-theoretic learning algorithm that works solely by observing human
actions to find the cost minimum, avoiding the need to solve an inverse
problem. We evaluate the performance of our algorithm in an extensive set of
human subjects experiments, demonstrating consistent convergence to the minimum
of a prescribed human cost function in scalar and multidimensional
instantiations of the game. We conclude by outlining future directions for
theoretical and empirical extensions of our results.
[LINK]
http://arxiv.org/abs/2501.08626v1
[DATE]
2025-01-15 15:07:48+08:00
[CATEGORIES]
cs.LG
CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion
[AUTHORS]
Yuan Wang, Bin Xhu, Yanbin Hao, Chong-Wah Ngo, Yi Tan, Xiang Wang
[ABSTRACT]
Recent advancements in text-to-image generation models have excelled in
creating diverse and realistic images. This success extends to food imagery,
where various conditional inputs like cooking styles, ingredients, and recipes
are utilized. However, a yet-unexplored challenge is generating a sequence of
procedural images based on cooking steps from a recipe. This could enhance the
cooking experience with visual guidance and possibly lead to an intelligent
cooking simulation system. To fill this gap, we introduce a novel task called
\textbf{cooking procedural image generation}. This task is inherently
demanding, as it strives to create photo-realistic images that align with
cooking steps while preserving sequential consistency. To collectively tackle
these challenges, we present \textbf{CookingDiffusion}, a novel approach that
leverages Stable Diffusion and three innovative Memory Nets to model procedural
prompts. These prompts encompass text prompts (representing cooking steps),
image prompts (corresponding to cooking images), and multi-modal prompts
(mixing cooking steps and images), ensuring the consistent generation of
cooking procedural images. To validate the effectiveness of our approach, we
preprocess the YouCookII dataset, establishing a new benchmark. Our
experimental results demonstrate that our model excels at generating
high-quality cooking procedural images with remarkable consistency across
sequential cooking steps, as measured by both the FID and the proposed Average
Procedure Consistency metrics. Furthermore, CookingDiffusion demonstrates the
ability to manipulate ingredients and cooking methods in a recipe. We will make
our code, models, and dataset publicly accessible.
[LINK]
http://arxiv.org/abs/2501.09042v1
[DATE]
2025-01-15 14:58:53+08:00
[CATEGORIES]
cs.LG
CT-PatchTST: Channel-Time Patch Time-Series Transformer for Long-Term Renewable Energy Forecasting
[AUTHORS]
Menghao Huo, Kuan Lu, Yuxiao Li, Qiang Zhu
[ABSTRACT]
Accurately predicting renewable energy output is crucial for the efficient
integration of solar and wind power into modern energy systems. This study
develops and evaluates an advanced deep learning model, Channel-Time Patch
Time-Series Transformer (CT-PatchTST), to forecast the power output of
photovoltaic and wind energy systems using annual offshore wind power, onshore
wind power, and solar power generation data from Denmark. While the original
Patch Time-Series Transformer(PatchTST) model employs a channel-independent
(CI) approach, it tends to overlook inter-channel relationships during
training, potentially leading to a loss of critical information. To address
this limitation and further leverage the benefits of increased data granularity
brought by CI, we propose CT-PatchTST. This enhanced model improves the
processing of inter-channel information while maintaining the advantages of the
channel-independent approach. The predictive performance of CT-PatchTST is
rigorously analyzed, demonstrating its ability to provide precise and reliable
energy forecasts. This work contributes to improving the predictability of
renewable energy systems, supporting their broader adoption and integration
into energy grids.
[LINK]
http://arxiv.org/abs/2501.08620v1
[DATE]
2025-01-15 14:35:39+08:00
[CATEGORIES]
cs.LG
Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model
[AUTHORS]
Cong Cao, Huanjing Yue, Xin Liu, Jingyu Yang
[ABSTRACT]
Diffusion-based zero-shot image restoration and enhancement models have
achieved great success in various tasks of image restoration and enhancement.
However, directly applying them to video restoration and enhancement results in
severe temporal flickering artifacts. In this paper, we propose the first
framework for zero-shot video restoration and enhancement based on the
pre-trained image diffusion model. By replacing the spatial self-attention
layer with the proposed short-long-range (SLR) temporal attention layer, the
pre-trained image diffusion model can take advantage of the temporal
correlation between frames. We further propose temporal consistency guidance,
spatial-temporal noise sharing, and an early stopping sampling strategy to
improve temporally consistent sampling. Our method is a plug-and-play module
that can be inserted into any diffusion-based image restoration or enhancement
methods to further improve their performance. Experimental results demonstrate
the superiority of our proposed method. Our code is available at
https://github.com/cao-cong/ZVRD.
[COMMENTS]
Accepted by AAAI 2025
[LINK]
http://arxiv.org/abs/2407.01960v2
[DATE]
2025-01-15 14:06:31+08:00
[CATEGORIES]
cs.LG
Machine unlearning through fine-grained model parameters perturbation
[AUTHORS]
Zhiwei Zuo, Zhuo Tang, Kenli Li, Anwitaman Datta
[ABSTRACT]
Machine unlearning techniques, which involve retracting data records and
reducing influence of said data on trained models, help with the user privacy
protection objective but incur significant computational costs. Weight
perturbation-based unlearning is a general approach, but it typically involves
globally modifying the parameters. We propose fine-grained Top-K and Random-k
parameters perturbed inexact machine unlearning strategies that address the
privacy needs while keeping the computational costs tractable.
In order to demonstrate the efficacy of our strategies we also tackle the
challenge of evaluating the effectiveness of machine unlearning by considering
the model’s generalization performance across both unlearning and remaining
data. To better assess the unlearning effect and model generalization, we
propose novel metrics, namely, the forgetting rate and memory retention rate.
However, for inexact machine unlearning, current metrics are inadequate in
quantifying the degree of forgetting that occurs after unlearning strategies
are applied. To address this, we introduce SPD-GAN, which subtly perturbs the
distribution of data targeted for unlearning. Then, we evaluate the degree of
unlearning by measuring the performance difference of the models on the
perturbed unlearning data before and after the unlearning process. By
implementing these innovative techniques and metrics, we achieve
computationally efficacious privacy protection in machine learning applications
without significant sacrifice of model performance. Furthermore, this approach
provides a novel method for evaluating the degree of unlearning.
[LINK]
http://arxiv.org/abs/2401.04385v4
[DATE]
2025-01-15 14:00:17+08:00
[CATEGORIES]
cs.LG
Clarify Confused Nodes via Separated Learning
[AUTHORS]
Jiajun Zhou, Shengbo Gong, Xuanze Chen, Chenxuan Xie, Shanqing Yu, Qi Xuan, Xiaoniu Yang
[ABSTRACT]
Graph neural networks (GNNs) have achieved remarkable advances in
graph-oriented tasks. However, real-world graphs invariably contain a certain
proportion of heterophilous nodes, challenging the homophily assumption of
traditional GNNs and hindering their performance. Most existing studies
continue to design generic models with shared weights between heterophilous and
homophilous nodes. Despite the incorporation of high-order messages or
multi-channel architectures, these efforts often fall short. A minority of
studies attempt to train different node groups separately but suffer from
inappropriate separation metrics and low efficiency. In this paper, we first
propose a new metric, termed Neighborhood Confusion (NC), to facilitate a more
reliable separation of nodes. We observe that node groups with different levels
of NC values exhibit certain differences in intra-group accuracy and visualized
embeddings. These pave the way for Neighborhood Confusion-guided Graph
Convolutional Network (NCGCN), in which nodes are grouped by their NC values
and accept intra-group weight sharing and message passing. Extensive
experiments on both homophilous and heterophilous benchmarks demonstrate that
our framework can effectively separate nodes and yield significant performance
improvement compared to the latest methods. The source code will be available
in https://github.com/GISec-Team/NCGNN.
[COMMENTS]
Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence
[LINK]
http://arxiv.org/abs/2306.02285v5
[DATE]
2025-01-15 13:53:54+08:00
[CATEGORIES]
cs.LG
STORM: A Spatio-Temporal Factor Model Based on Dual Vector Quantized Variational Autoencoders for Financial Trading
[AUTHORS]
Yilei Zhao, Wentao Zhang, Tingran Yang, Yong Jiang, Fei Huang, Wei Yang Bryan Lim
[ABSTRACT]
In financial trading, factor models are widely used to price assets and
capture excess returns from mispricing. Recently, we have witnessed the rise of
variational autoencoder-based latent factor models, which learn latent factors
self-adaptively. While these models focus on modeling overall market
conditions, they often fail to effectively capture the temporal patterns of
individual stocks. Additionally, representing multiple factors as single values
simplifies the model but limits its ability to capture complex relationships
and dependencies. As a result, the learned factors are of low quality and lack
diversity, reducing their effectiveness and robustness across different trading
periods. To address these issues, we propose a Spatio-Temporal factOR Model
based on dual vector quantized variational autoencoders, named STORM, which
extracts features of stocks from temporal and spatial perspectives, then fuses
and aligns these features at the fine-grained and semantic level, and
represents the factors as multi-dimensional embeddings. The discrete codebooks
cluster similar factor embeddings, ensuring orthogonality and diversity, which
helps distinguish between different factors and enables factor selection in
financial trading. To show the performance of the proposed factor model, we
apply it to two downstream experiments: portfolio management on two stock
datasets and individual trading tasks on six specific stocks. The extensive
experiments demonstrate STORM’s flexibility in adapting to downstream tasks and
superior performance over baseline models.
[LINK]
http://arxiv.org/abs/2412.09468v2
[DATE]
2025-01-15 13:25:35+08:00
[CATEGORIES]
cs.LG
Molecular Graph Contrastive Learning with Line Graph
[AUTHORS]
Xueyuan Chen, Shangzhe Li, Ruomei Liu, Bowen Shi, Jiaheng Liu, Junran Wu, Ke Xu
[ABSTRACT]
Trapped by the label scarcity in molecular property prediction and drug
design, graph contrastive learning (GCL) came forward. Leading contrastive
learning works show two kinds of view generators, that is, random or learnable
data corruption and domain knowledge incorporation. While effective, the two
ways also lead to molecular semantics altering and limited generalization
capability, respectively. To this end, we relate the \textbf{L}in\textbf{E}
graph with \textbf{MO}lecular graph co\textbf{N}trastive learning and propose a
novel method termed \textit{LEMON}. Specifically, by contrasting the given
graph with the corresponding line graph, the graph encoder can freely encode
the molecular semantics without omission. Furthermore, we present a new patch
with edge attribute fusion and two local contrastive losses enhance information
transmission and tackle hard negative samples. Compared with state-of-the-art
(SOTA) methods for view generation, superior performance on molecular property
prediction suggests the effectiveness of our proposed framework.
[LINK]
http://arxiv.org/abs/2501.08589v1
[DATE]
2025-01-15 13:17:38+08:00
[CATEGORIES]
cs.LG
Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification
[AUTHORS]
Baoming Zhang, MingCai Chen, Jianqing Song, Shuangjie Li, Jie Zhang, Chongjun Wang
[ABSTRACT]
Graph Neural Networks (GNNs) have demonstrated remarkable ability in
semi-supervised node classification. However, most existing GNNs rely heavily
on a large amount of labeled data for training, which is labor-intensive and
requires extensive domain knowledge. In this paper, we first analyze the
restrictions of GNNs generalization from the perspective of supervision signals
in the context of few-shot semi-supervised node classification. To address
these challenges, we propose a novel algorithm named NormProp, which utilizes
the homophily assumption of unlabeled nodes to generate additional supervision
signals, thereby enhancing the generalization against label scarcity. The key
idea is to efficiently capture both the class information and the consistency
of aggregation during message passing, via decoupling the direction and
Euclidean norm of node representations. Moreover, we conduct a theoretical
analysis to determine the upper bound of Euclidean norm, and then propose
homophilous regularization to constraint the consistency of unlabeled nodes.
Extensive experiments demonstrate that NormProp achieve state-of-the-art
performance under low-label rate scenarios with low computational complexity.
[COMMENTS]
Accepted by AAAI 2025
[LINK]
http://arxiv.org/abs/2501.08581v1
[DATE]
2025-01-15 13:01:14+08:00
[CATEGORIES]
cs.LG
Dual Cone Gradient Descent for Training Physics-Informed Neural Networks
[AUTHORS]
Youngsik Hwang, Dong-Young Lim
[ABSTRACT]
Physics-informed neural networks (PINNs) have emerged as a prominent approach
for solving partial differential equations (PDEs) by minimizing a combined loss
function that incorporates both boundary loss and PDE residual loss. Despite
their remarkable empirical performance in various scientific computing tasks,
PINNs often fail to generate reasonable solutions, and such pathological
behaviors remain difficult to explain and resolve. In this paper, we identify
that PINNs can be adversely trained when gradients of each loss function
exhibit a significant imbalance in their magnitudes and present a negative
inner product value. To address these issues, we propose a novel optimization
framework, Dual Cone Gradient Descent (DCGD), which adjusts the direction of
the updated gradient to ensure it falls within a dual cone region. This region
is defined as a set of vectors where the inner products with both the gradients
of the PDE residual loss and the boundary loss are non-negative. Theoretically,
we analyze the convergence properties of DCGD algorithms in a non-convex
setting. On a variety of benchmark equations, we demonstrate that DCGD
outperforms other optimization algorithms in terms of various evaluation
metrics. In particular, DCGD achieves superior predictive accuracy and enhances
the stability of training for failure modes of PINNs and complex PDEs, compared
to existing optimally tuned models. Moreover, DCGD can be further improved by
combining it with popular strategies for PINNs, including learning rate
annealing and the Neural Tangent Kernel (NTK).
[COMMENTS]
The Thirty-eighth Annual Conference on Neural Information Processing
Systems, 2024
[LINK]
http://arxiv.org/abs/2409.18426v2
[DATE]
2025-01-15 12:59:43+08:00
[CATEGORIES]
cs.LG
Conformal-in-the-Loop for Learning with Imbalanced Noisy Data
[AUTHORS]
John Brandon Graham-Knight, Jamil Fayyad, Nourhan Bayasi, Patricia Lasserre, Homayoun Najjaran
[ABSTRACT]
Class imbalance and label noise are pervasive in large-scale datasets, yet
much of machine learning research assumes well-labeled, balanced data, which
rarely reflects real world conditions. Existing approaches typically address
either label noise or class imbalance in isolation, leading to suboptimal
results when both issues coexist. In this work, we propose
Conformal-in-the-Loop (CitL), a novel training framework that addresses both
challenges with a conformal prediction-based approach. CitL evaluates sample
uncertainty to adjust weights and prune unreliable examples, enhancing model
resilience and accuracy with minimal computational cost. Our extensive
experiments include a detailed analysis showing how CitL effectively emphasizes
impactful data in noisy, imbalanced datasets. Our results show that CitL
consistently boosts model performance, achieving up to a 6.1% increase in
classification accuracy and a 5.0 mIoU improvement in segmentation. Our code is
publicly available: CitL.
[COMMENTS]
Under Review
[LINK]
http://arxiv.org/abs/2411.02281v2
[DATE]
2025-01-15 12:51:48+08:00
[CATEGORIES]
cs.LG
ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches
[AUTHORS]
Maura Pintor, Daniele Angioni, Angelo Sotgiu, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli
[ABSTRACT]
Adversarial patches are optimized contiguous pixel blocks in an input image
that cause a machine-learning model to misclassify it. However, their
optimization is computationally demanding, and requires careful hyperparameter
tuning, potentially leading to suboptimal robustness evaluations. To overcome
these issues, we propose ImageNet-Patch, a dataset to benchmark
machine-learning models against adversarial patches. It consists of a set of
patches, optimized to generalize across different models, and readily
applicable to ImageNet data after preprocessing them with affine
transformations. This process enables an approximate yet faster robustness
evaluation, leveraging the transferability of adversarial perturbations. We
showcase the usefulness of this dataset by testing the effectiveness of the
computed patches against 127 models. We conclude by discussing how our dataset
could be used as a benchmark for robustness, and how our methodology can be
generalized to other domains. We open source our dataset and evaluation code at
https://github.com/pralab/ImageNet-Patch.
[COMMENTS]
Published in Pattern Recognition. DOI:
https://doi.org/10.1016/j.patcog.2022.109064
[LINK]
http://arxiv.org/abs/2203.04412v2
[DATE]
2025-01-15 12:46:30+08:00
[CATEGORIES]
cs.LG
DNMDR: Dynamic Networks and Multi-view Drug Representations for Safe Medication Recommendation
[AUTHORS]
Guanlin Liu, Xiaomei Yu, Zihao Liu, Xue Li, Xingxu Fan, Xiangwei Zheng
[ABSTRACT]
Medication Recommendation (MR) is a promising research topic which booms
diverse applications in the healthcare and clinical domains. However, existing
methods mainly rely on sequential modeling and static graphs for representation
learning, which ignore the dynamic correlations in diverse medical events of a
patient’s temporal visits, leading to insufficient global structural
exploration on nodes. Additionally, mitigating drug-drug interactions (DDIs) is
another issue determining the utility of the MR systems. To address the
challenges mentioned above, this paper proposes a novel MR method with the
integration of dynamic networks and multi-view drug representations (DNMDR).
Specifically, weighted snapshot sequences for dynamic heterogeneous networks
are constructed based on discrete visits in temporal EHRs, and all the dynamic
networks are jointly trained to gain both structural correlations in diverse
medical events and temporal dependency in historical health conditions, for
achieving comprehensive patient representations with both semantic features and
structural relationships. Moreover, combining the drug co-occurrences and
adverse drug-drug interactions (DDIs) in internal view of drug molecule
structure and interactive view of drug pairs, the safe drug representations are
available to obtain high-quality medication combination recommendation.
Finally, extensive experiments on real world datasets are conducted for
performance evaluation, and the experimental results demonstrate that the
proposed DNMDR method outperforms the state-of-the-art baseline models with a
large margin on various metrics such as PRAUC, Jaccard, DDI rates and so on.
[LINK]
http://arxiv.org/abs/2501.08572v1
[DATE]
2025-01-15 12:36:55+08:00
[CATEGORIES]
cs.LG
EdgeSight: Enabling Modeless and Cost-Efficient Inference at the Edge
[AUTHORS]
ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu
[ABSTRACT]
Traditional ML inference is evolving toward modeless inference, which
abstracts the complexity of model selection from users, allowing the system to
automatically choose the most appropriate model for each request based on
accuracy and resource requirements. While prior studies have focused on
modeless inference within data centers, this paper tackles the pressing need
for cost-efficient modeless inference at the edge – particularly within its
unique constraints of limited device memory, volatile network conditions, and
restricted power consumption.
To overcome these challenges, we propose EdgeSight, a system that provides
cost-efficient EdgeSight serving for diverse DNNs at the edge. EdgeSight
employs an edge-data center (edge-DC) architecture, utilizing confidence
scaling to reduce the number of model options while meeting diverse accuracy
requirements. Additionally, it supports lossy inference in volatile network
environments. Our experimental results show that EdgeSight outperforms existing
systems by up to 1.6x in P99 latency for modeless services. Furthermore, our
FPGA prototype demonstrates similar performance at certain accuracy levels,
with a power consumption reduction of up to 3.34x.
[COMMENTS]
12 pages
[LINK]
http://arxiv.org/abs/2405.19213v2
[DATE]
2025-01-15 12:17:38+08:00
[CATEGORIES]
cs.LG
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications
[AUTHORS]
Jin Chen, Jin Zhang, Xu huang, Yi Yang, Defu Lian, Enhong Chen
[ABSTRACT]
The softmax function is a cornerstone of multi-class classification, integral
to a wide range of machine learning applications, from large-scale retrieval
and ranking models to advanced large language models. However, its
computational cost grows linearly with the number of classes, which becomes
prohibitively expensive in scenarios with millions or even billions of classes.
The sampled softmax, which relies on self-normalized importance sampling, has
emerged as a powerful alternative, significantly reducing computational
complexity. Yet, its estimator remains unbiased only when the sampling
distribution matches the true softmax distribution. To improve both
approximation accuracy and sampling efficiency, we propose the MIDX Sampler, a
novel adaptive sampling strategy based on an inverted multi-index approach.
Concretely, we decompose the softmax probability into several multinomial
probabilities, each associated with a specific set of codewords and the last
associated with the residual score of queries, thus reducing time complexity to
the number of codewords instead of the number of classes. To further boost
efficiency, we replace the query-specific residual probability with a simple
uniform distribution, simplifying the computation while retaining high
performance. Our method is backed by rigorous theoretical analysis, addressing
key concerns such as sampling bias, gradient bias, convergence rates, and
generalization error bounds. The results demonstrate that a smaller divergence
from the ideal softmax distribution leads to faster convergence and improved
generalization. Extensive experiments on large-scale language models,
sequential recommenders, and extreme multi-class classification tasks confirm
that the MIDX-Sampler delivers superior effectiveness and efficiency compared
to existing approaches.
[COMMENTS]
40 pages
[LINK]
http://arxiv.org/abs/2501.08563v1
[DATE]
2025-01-15 12:09:21+08:00
[CATEGORIES]
cs.LG
MIAFEx: An Attention-based Feature Extraction Method for Medical Image Classification
[AUTHORS]
Oscar Ramos-Soto, Jorge Ramos-Frutos, Ezequiel Perez-Zarate, Diego Oliva, Sandra E. Balderas-Mata
[ABSTRACT]
Feature extraction techniques are crucial in medical image classification;
however, classical feature extractors in addition to traditional machine
learning classifiers often exhibit significant limitations in providing
sufficient discriminative information for complex image sets. While
Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) have shown
promise in feature extraction, they are prone to overfitting due to the
inherent characteristics of medical imaging data, including small sample sizes
or high intra-class variance. In this work, the Medical Image Attention-based
Feature Extractor (MIAFEx) is proposed, a novel method that employs a learnable
refinement mechanism to enhance the classification token within the Transformer
encoder architecture. This mechanism adjusts the token based on learned
weights, improving the extraction of salient features and enhancing the model’s
adaptability to the challenges presented by medical imaging data. The MIAFEx
output features quality is compared against classical feature extractors using
traditional and hybrid classifiers. Also, the performance of these features is
compared against modern CNN and ViT models in classification tasks,
demonstrating its superiority in accuracy and robustness across multiple
complex classification medical imaging datasets. This advantage is particularly
pronounced in scenarios with limited training data, where traditional and
modern models often struggle to generalize effectively. The source code of this
proposal can be found at
https://github.com/Oscar-RamosS/Medical-Image-Attention-based-Feature-Extractor-MIAFEx
[COMMENTS]
In preparation for Journal Submission
[LINK]
http://arxiv.org/abs/2501.08562v1
[DATE]
2025-01-15 12:07:06+08:00
[CATEGORIES]
cs.LG
ANSR-DT: An Adaptive Neuro-Symbolic Learning and Reasoning Framework for Digital Twins
[AUTHORS]
Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Houbing Herbert Song
[ABSTRACT]
In this paper, we propose an Adaptive Neuro-Symbolic Learning Framework for
digital twin technology called ``ANSR-DT.” Our approach combines pattern
recognition algorithms with reinforcement learning and symbolic reasoning to
enable real-time learning and adaptive intelligence. This integration enhances
the understanding of the environment and promotes continuous learning, leading
to better and more effective decision-making in real-time for applications that
require human-machine collaboration. We evaluated the \textit{ANSR-DT}
framework for its ability to learn and adapt to dynamic patterns, observing
significant improvements in decision accuracy, reliability, and
interpretability when compared to existing state-of-the-art methods. However,
challenges still exist in extracting and integrating symbolic rules in complex
environments, which limits the full potential of our framework in heterogeneous
settings. Moreover, our ongoing research aims to address this issue in the
future by ensuring seamless integration of neural models at large. In addition,
our open-source implementation promotes reproducibility and encourages future
research to build on our foundational work.
[LINK]
http://arxiv.org/abs/2501.08561v1
[DATE]
2025-01-15 12:04:57+08:00
[CATEGORIES]
cs.LG
Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal
[AUTHORS]
Jifeng Hu, Li Shen, Sili Huang, Zhejian Yang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao
[ABSTRACT]
Artificial neural networks, especially recent diffusion-based models, have
shown remarkable superiority in gaming, control, and QA systems, where the
training tasks’ datasets are usually static. However, in real-world
applications, such as robotic control of reinforcement learning (RL), the tasks
are changing, and new tasks arise in a sequential order. This situation poses
the new challenge of plasticity-stability trade-off for training an agent who
can adapt to task changes and retain acquired knowledge. In view of this, we
propose a rehearsal-based continual diffusion model, called Continual Diffuser
(CoD), to endow the diffuser with the capabilities of quick adaptation
(plasticity) and lasting retention (stability). Specifically, we first
construct an offline benchmark that contains 90 tasks from multiple domains.
Then, we train the CoD on each task with sequential modeling and conditional
generation for making decisions. Next, we preserve a small portion of previous
datasets as the rehearsal buffer and replay it to retain the acquired
knowledge. Extensive experiments on a series of tasks show CoD can achieve a
promising plasticity-stability trade-off and outperform existing
diffusion-based methods and other representative baselines on most tasks.
[COMMENTS]
This work has been submitted to the IEEE for possible publication
[LINK]
http://arxiv.org/abs/2409.02512v2
[DATE]
2025-01-15 11:23:39+08:00
[CATEGORIES]
cs.LG
Reinforcement Learning-Enhanced Procedural Generation for Dynamic Narrative-Driven AR Experiences
[AUTHORS]
Aniruddha Srinivas Joshi
[ABSTRACT]
Procedural Content Generation (PCG) is widely used to create scalable and
diverse environments in games. However, existing methods, such as the Wave
Function Collapse (WFC) algorithm, are often limited to static scenarios and
lack the adaptability required for dynamic, narrative-driven applications,
particularly in augmented reality (AR) games. This paper presents a
reinforcement learning-enhanced WFC framework designed for mobile AR
environments. By integrating environment-specific rules and dynamic tile weight
adjustments informed by reinforcement learning (RL), the proposed method
generates maps that are both contextually coherent and responsive to gameplay
needs. Comparative evaluations and user studies demonstrate that the framework
achieves superior map quality and delivers immersive experiences, making it
well-suited for narrative-driven AR games. Additionally, the method holds
promise for broader applications in education, simulation training, and
immersive extended reality (XR) experiences, where dynamic and adaptive
environments are critical.
[COMMENTS]
Number of pages: 13, Number of figures: 4. Accepted for presentation
at GRAPP 2025 - 20th International Conference on Computer Graphics Theory and
Applications (for additional details on the conference visit
https://grapp.scitevents.org). Disclaimer: This preprint may differ from the
final version published in the conference proceedings
[LINK]
http://arxiv.org/abs/2501.08552v1
[DATE]
2025-01-15 11:23:06+08:00
[CATEGORIES]
cs.LG
A Theory of Optimistically Universal Online Learnability for General Concept Classes
[AUTHORS]
Steve Hanneke, Hongao Wang
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2501.08551v1
[DATE]
2025-01-15 11:20:16+08:00
[CATEGORIES]
cs.LG
OMEGA: A Low-Latency GNN Serving System for Large Graphs
[AUTHORS]
Geon-Woo Kim, Donghyun Kim, Jeongyoon Moon, Henry Liu, Tarannum Khan, Anand Iyer, Daehyeok Kim, Aditya Akella
[ABSTRACT]
Graph Neural Networks (GNNs) have been widely adopted for their ability to
compute expressive node representations in graph datasets. However, serving
GNNs on large graphs is challenging due to the high communication, computation,
and memory overheads of constructing and executing computation graphs, which
represent information flow across large neighborhoods. Existing approximation
techniques in training can mitigate the overheads but, in serving, still lead
to high latency and/or accuracy loss. To this end, we propose OMEGA, a system
that enables low-latency GNN serving for large graphs with minimal accuracy
loss through two key ideas. First, OMEGA employs selective recomputation of
precomputed embeddings, which allows for reusing precomputed computation
subgraphs while selectively recomputing a small fraction to minimize accuracy
loss. Second, we develop computation graph parallelism, which reduces
communication overhead by parallelizing the creation and execution of
computation graphs across machines. Our evaluation with large graph datasets
and GNN models shows that OMEGA significantly outperforms state-of-the-art
techniques.
[LINK]
http://arxiv.org/abs/2501.08547v1
[DATE]
2025-01-15 11:14:18+08:00
[CATEGORIES]
cs.LG
Data-driven inventory management for new products: A warm-start and adjusted Dyna-$Q$ approach
[AUTHORS]
Xinye Qu, Longxiao Liu, Wenjie Huang
[ABSTRACT]
In this paper, we propose a novel reinforcement learning algorithm for
inventory management of newly launched products with no or limited historical
demand information. The algorithm follows the classic Dyna-$Q$ structure,
balancing the model-based and model-free approaches, while accelerating the
training process of Dyna-$Q$ and mitigating the model discrepancy generated by
the model-based feedback. Warm-start information from the demand data of
existing similar products can be incorporated into the algorithm to further
stabilize the early-stage training and reduce the variance of the estimated
optimal policy. Our approach is validated through a case study of bakery
inventory management with real data. The adjusted Dyna-$Q$ shows up to a 23.7%
reduction in average daily cost compared with $Q$-learning, and up to a 77.5%
reduction in training time within the same horizon compared with classic
Dyna-$Q$. By incorporating the warm-start information, it can be found that the
adjusted Dyna-$Q$ has the lowest total cost, lowest variance in total cost, and
relatively low shortage percentages among all the algorithms under a 30-day
testing.
[COMMENTS]
7 pages, 2 figures
[LINK]
http://arxiv.org/abs/2501.08109v2
[DATE]
2025-01-15 10:48:33+08:00
[CATEGORIES]
cs.LG
Unconditional stability of a recurrent neural circuit implementing divisive normalization
[AUTHORS]
Shivang Rawat, David J. Heeger, Stefano Martiniani
[ABSTRACT]
Stability in recurrent neural models poses a significant challenge,
particularly in developing biologically plausible neurodynamical models that
can be seamlessly trained. Traditional cortical circuit models are notoriously
difficult to train due to expansive nonlinearities in the dynamical system,
leading to an optimization problem with nonlinear stability constraints that
are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in
tasks involving sequential data but lack biological plausibility and
interpretability. In this work, we address these challenges by linking dynamic
divisive normalization (DN) to the stability of ORGaNICs, a biologically
plausible recurrent cortical circuit model that dynamically achieves DN and
that has been shown to simulate a wide range of neurophysiological phenomena.
By using the indirect method of Lyapunov, we prove the remarkable property of
unconditional local stability for an arbitrary-dimensional ORGaNICs circuit
when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a
system of coupled damped harmonic oscillators, which enables us to derive the
circuit’s energy function, providing a normative principle of what the circuit,
and individual neurons, aim to accomplish. Further, for a generic recurrent
weight matrix, we prove the stability of the 2D model and demonstrate
empirically that stability holds in higher dimensions. Finally, we show that
ORGaNICs can be trained by backpropagation through time without gradient
clipping/scaling, thanks to its intrinsic stability property and adaptive time
constants, which address the problems of exploding, vanishing, and oscillating
gradients. By evaluating the model’s performance on RNN benchmarks, we find
that ORGaNICs outperform alternative neurodynamical models on static image
classification tasks and perform comparably to LSTMs on sequential tasks.
[LINK]
http://arxiv.org/abs/2409.18946v3
[DATE]
2025-01-15 10:42:42+08:00
[CATEGORIES]
cs.LG
Investigating the Effect of Network Pruning on Performance and Interpretability
[AUTHORS]
Jonathan von Rad, Florian Seuffert
[ABSTRACT]
Deep Neural Networks (DNNs) are often over-parameterized for their tasks and
can be compressed quite drastically by removing weights, a process called
pruning. We investigate the impact of different pruning techniques on the
classification performance and interpretability of GoogLeNet. We systematically
apply unstructured and structured pruning, as well as connection sparsity
(pruning of input weights) methods to the network and analyze the outcomes
regarding the network’s performance on the validation set of ImageNet. We also
compare different retraining strategies, such as iterative pruning and one-shot
pruning. We find that with sufficient retraining epochs, the performance of the
networks can approximate the performance of the default GoogLeNet - and even
surpass it in some cases. To assess interpretability, we employ the Mechanistic
Interpretability Score (MIS) developed by Zimmermann et al. . Our experiments
reveal that there is no significant relationship between interpretability and
pruning rate when using MIS as a measure. Additionally, we observe that
networks with extremely low accuracy can still achieve high MIS scores,
suggesting that the MIS may not always align with intuitive notions of
interpretability, such as understanding the basis of correct decisions.
[COMMENTS]
4 pages, 6 figures
[LINK]
http://arxiv.org/abs/2409.19727v2
[DATE]
2025-01-15 10:29:14+08:00
[CATEGORIES]
cs.LG
Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics
[AUTHORS]
Luke Snow, Vikram Krishnamurthy
[ABSTRACT]
This paper provides a finite-sample analysis of a passive stochastic gradient
Langevin dynamics (PSGLD) algorithm. This algorithm is designed to achieve
adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate
the cost function of a forward learner performing a stochastic gradient
algorithm (e.g., policy gradient reinforcement learning) by observing their
estimates in real-time. The PSGLD algorithm is considered passive because it
incorporates noisy gradients provided by an external stochastic gradient
algorithm (forward learner), of which it has no control. The PSGLD algorithm
acts as a randomized sampler to achieve adaptive IRL by reconstructing the
forward learner’s cost function nonparametrically from the stationary measure
of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample)
performance; we provide explicit bounds on the 2-Wasserstein distance between
PSGLD algorithm sample measure and the stationary measure encoding the cost
function, and provide guarantees for a kernel density estimation scheme which
reconstructs the cost function from empirical samples. Our analysis uses tools
from the study of Markov diffusion operators. The derived bounds have both
practical and theoretical significance. They provide finite-time guarantees for
an adaptive IRL mechanism, and substantially generalize the analytical
framework of a line of research in passive stochastic gradient algorithms.
[LINK]
http://arxiv.org/abs/2304.09123v3
[DATE]
2025-01-15 10:19:34+08:00
[CATEGORIES]
cs.LG
Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes
[AUTHORS]
Huy Q. Le, Ye Lin Tun, Yu Qiao, Minh N. H. Nguyen, Keon Oh Kim, Choong Seon Hong
[ABSTRACT]
Federated Learning (FL) has emerged as a decentralized machine learning
technique, allowing clients to train a global model collaboratively without
sharing private data. However, most FL studies ignore the crucial challenge of
heterogeneous domains where each client has a distinct feature distribution,
which is common in real-world scenarios. Prototype learning, which leverages
the mean feature vectors within the same classes, has become a prominent
solution for federated learning under domain skew. However, existing federated
prototype learning methods only consider inter-domain prototypes on the server
and overlook intra-domain characteristics. In this work, we introduce a novel
federated prototype learning method, namely I$^2$PFL, which incorporates
$\textbf{I}$ntra-domain and $\textbf{I}$nter-domain $\textbf{P}$rototypes, to
mitigate domain shifts and learn a generalized global model across multiple
domains in federated learning. To construct intra-domain prototypes, we propose
feature alignment with MixUp-based augmented prototypes to capture the
diversity of local domains and enhance the generalization of local features.
Additionally, we introduce a reweighting mechanism for inter-domain prototypes
to generate generalized prototypes to provide inter-domain knowledge and reduce
domain skew across multiple clients. Extensive experiments on the Digits,
Office-10, and PACS datasets illustrate the superior performance of our method
compared to other baselines.
[COMMENTS]
13 pages, 9 figures, 10 tables
[LINK]
http://arxiv.org/abs/2501.08521v1
[DATE]
2025-01-15 10:17:38+08:00
[CATEGORIES]
cs.LG
Learning Hyperplane Tree: A Piecewise Linear and Fully Interpretable Decision-making Framework
[AUTHORS]
Hongyi Li, Jun Xu, William Ward Armstrong
[ABSTRACT]
This paper introduces a novel tree-based model, Learning Hyperplane Tree
(LHT), which outperforms state-of-the-art (SOTA) tree models for classification
tasks on several public datasets. The structure of LHT is simple and efficient:
it partitions the data using several hyperplanes to progressively distinguish
between target and non-target class samples. Although the separation is not
perfect at each stage, LHT effectively improves the distinction through
successive partitions. During testing, a sample is classified by evaluating the
hyperplanes defined in the branching blocks and traversing down the tree until
it reaches the corresponding leaf block. The class of the test sample is then
determined using the piecewise linear membership function defined in the leaf
blocks, which is derived through least-squares fitting and fuzzy logic. LHT is
highly transparent and interpretable–at each branching block, the contribution
of each feature to the classification can be clearly observed.
[LINK]
http://arxiv.org/abs/2501.08515v1
[DATE]
2025-01-15 09:59:24+08:00
[CATEGORIES]
cs.LG
Learning Cross-Domain Representations for Transferable Drug Perturbations on Single-Cell Transcriptional Responses
[AUTHORS]
Hui Liu, Shikai Jin
[ABSTRACT]
Phenotypic drug discovery has attracted widespread attention because of its
potential to identify bioactive molecules. Transcriptomic profiling provides a
comprehensive reflection of phenotypic changes in cellular responses to
external perturbations. In this paper, we propose XTransferCDR, a novel
generative framework designed for feature decoupling and transferable
representation learning across domains. Given a pair of perturbed expression
profiles, our approach decouples the perturbation representations from basal
states through domain separation encoders and then cross-transfers them in the
latent space. The transferred representations are then used to reconstruct the
corresponding perturbed expression profiles via a shared decoder. This
cross-transfer constraint effectively promotes the learning of transferable
drug perturbation representations. We conducted extensive evaluations of our
model on multiple datasets, including single-cell transcriptional responses to
drugs and single- and combinatorial genetic perturbations. The experimental
results show that XTransferCDR achieved better performance than current
state-of-the-art methods, showcasing its potential to advance phenotypic drug
discovery.
[COMMENTS]
Accepted by The 39th Annual AAAI Conference on Artificial Intelligenc
(AAAI 2025)
[LINK]
http://arxiv.org/abs/2412.19228v2
[DATE]
2025-01-15 09:16:30+08:00
[CATEGORIES]
cs.LG
Score-based 3D molecule generation with neural fields
[AUTHORS]
Matthieu Kirchmeyer, Pedro O. Pinheiro, Saeed Saremi
[ABSTRACT]
We introduce a new representation for 3D molecules based on their continuous
atomic density fields. Using this representation, we propose a new model based
on walk-jump sampling for unconditional 3D molecule generation in the
continuous space using neural fields. Our model, FuncMol, encodes molecular
fields into latent codes using a conditional neural field, samples noisy codes
from a Gaussian-smoothed distribution with Langevin MCMC (walk), denoises these
samples in a single step (jump), and finally decodes them into molecular
fields. FuncMol performs all-atom generation of 3D molecules without
assumptions on the molecular structure and scales well with the size of
molecules, unlike most approaches. Our method achieves competitive results on
drug-like molecules and easily scales to macro-cyclic peptides, with at least
one order of magnitude faster sampling. The code is available at
https://github.com/prescient-design/funcmol.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2501.08508v1
[DATE]
2025-01-15 09:10:59+08:00
[CATEGORIES]
cs.LG
SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization
[AUTHORS]
Waqwoya Abebe, Sadegh Jafari, Sixing Yu, Akash Dutta, Jan Strube, Nathan R. Tallent, Luanzheng Guo, Pablo Munoz, Ali Jannesari
[ABSTRACT]
Neural Architecture Search (NAS) is a powerful approach of automating the
design of efficient neural architectures. In contrast to traditional NAS
methods, recently proposed one-shot NAS methods prove to be more efficient in
performing NAS. One-shot NAS works by generating a singular weight-sharing
supernetwork that acts as a search space (container) of subnetworks. Despite
its achievements, designing the one-shot search space remains a major
challenge. In this work we propose a search space design strategy for Vision
Transformer (ViT)-based architectures. In particular, we convert the Segment
Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our
approach involves automating the search space design via layer-wise structured
pruning and parameter prioritization. While the structured pruning applies
probabilistic removal of certain transformer layers, parameter prioritization
performs weight reordering and slicing of MLP-blocks in the remaining layers.
We train supernetworks on several datasets using the sandwich rule. For
deployment, we enhance subnetwork discovery by utilizing a program autotuner to
identify efficient subnetworks within the search space. The resulting
subnetworks are 30-70% smaller in size compared to the original pre-trained SAM
ViT-B, yet outperform the pretrained model. Our work introduces a new and
effective method for ViT NAS search-space design.
[LINK]
http://arxiv.org/abs/2501.08504v1
[DATE]
2025-01-15 08:54:12+08:00
[CATEGORIES]
cs.LG
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
[AUTHORS]
Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo
[ABSTRACT]
Efficient tokenization of videos remains a challenge in training vision
models that can process long videos. One promising direction is to develop a
tokenizer that can encode long video clips, as it would enable the tokenizer to
leverage the temporal coherence of videos better for tokenization. However,
training existing tokenizers on long videos often incurs a huge training cost
as they are trained to reconstruct all the frames at once. In this paper, we
introduce CoordTok, a video tokenizer that learns a mapping from
coordinate-based representations to the corresponding patches of input videos,
inspired by recent advances in 3D generative models. In particular, CoordTok
encodes a video into factorized triplane representations and reconstructs
patches that correspond to randomly sampled $(x,y,t)$ coordinates. This allows
for training large tokenizer models directly on long videos without requiring
excessive training resources. Our experiments show that CoordTok can
drastically reduce the number of tokens for encoding long video clips. For
instance, CoordTok can encode a 128-frame video with 128$\times$128 resolution
into 1280 tokens, while baselines need 6144 or 8192 tokens to achieve similar
reconstruction quality. We further show that this efficient video tokenization
enables memory-efficient training of a diffusion transformer that can generate
128 frames at once.
[COMMENTS]
Code is available on the project webpage:
https://huiwon-jang.github.io/coordtok/
[LINK]
http://arxiv.org/abs/2411.14762v3
[DATE]
2025-01-15 08:53:38+08:00
[CATEGORIES]
cs.LG
Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks
[AUTHORS]
Zhiwei Gao, George Em Karniadakis
[ABSTRACT]
Uncertainty quantification (UQ) plays a pivotal role in scientific machine
learning, especially when surrogate models are used to approximate complex
systems. Although multilayer perceptions (MLPs) are commonly employed as
surrogates, they often suffer from overfitting due to their large number of
parameters. Kolmogorov-Arnold networks (KANs) offer an alternative solution
with fewer parameters. However, gradient-based inference methods, such as
Hamiltonian Monte Carlo (HMC), may result in computational inefficiency when
applied to KANs, especially for large-scale datasets, due to the high cost of
back-propagation.To address these challenges, we propose a novel approach,
combining the dropout Tikhonov ensemble Kalman inversion (DTEKI) with Chebyshev
KANs. This gradient-free method effectively mitigates overfitting and enhances
numerical stability. Additionally, we incorporate the active subspace method to
reduce the parameter-space dimensionality, allowing us to improve the accuracy
of predictions and obtain more reliable uncertainty estimates.Extensive
experiments demonstrate the efficacy of our approach in various test cases,
including scenarios with large datasets and high noise levels. Our results show
that the new method achieves comparable or better accuracy, much higher
efficiency as well as stability compared to HMC, in addition to scalability.
Moreover, by leveraging the low-dimensional parameter subspace, our method
preserves prediction accuracy while substantially reducing further the
computational cost.
[LINK]
http://arxiv.org/abs/2501.08501v1
[DATE]
2025-01-15 08:38:13+08:00
[CATEGORIES]
cs.LG
A Unifying Information-theoretic Perspective on Evaluating Generative Models
[AUTHORS]
Alexis Fox, Samarth Swarup, Abhijin Adiga
[ABSTRACT]
Considering the difficulty of interpreting generative model output, there is
significant current research focused on determining meaningful evaluation
metrics. Several recent approaches utilize “precision” and “recall,” borrowed
from the classification domain, to individually quantify the output fidelity
(realism) and output diversity (representation of the real data variation),
respectively. With the increase in metric proposals, there is a need for a
unifying perspective, allowing for easier comparison and clearer explanation of
their benefits and drawbacks. To this end, we unify a class of
kth-nearest-neighbors (kNN)-based metrics under an information-theoretic lens
using approaches from kNN density estimation. Additionally, we propose a
tri-dimensional metric composed of Precision Cross-Entropy (PCE), Recall
Cross-Entropy (RCE), and Recall Entropy (RE), which separately measure fidelity
and two distinct aspects of diversity, inter- and intra-class. Our
domain-agnostic metric, derived from the information-theoretic concepts of
entropy and cross-entropy, can be dissected for both sample- and mode-level
analysis. Our detailed experimental results demonstrate the sensitivity of our
metric components to their respective qualities and reveal undesirable
behaviors of other metrics.
[LINK]
http://arxiv.org/abs/2412.14340v2
[DATE]
2025-01-15 08:02:00+08:00
[CATEGORIES]
cs.LG
High-dimensional learning of narrow neural networks
[AUTHORS]
Hugo Cui
[ABSTRACT]
Recent years have been marked with the fast-pace diversification and
increasing ubiquity of machine learning applications. Yet, a firm theoretical
understanding of the surprising efficiency of neural networks to learn from
high-dimensional data still proves largely elusive. In this endeavour, analyses
inspired by statistical physics have proven instrumental, enabling the tight
asymptotic characterization of the learning of neural networks in high
dimensions, for a broad class of solvable models. This manuscript reviews the
tools and ideas underlying recent progress in this line of work. We introduce a
generic model – the sequence multi-index model – which encompasses numerous
previously studied models as special instances. This unified framework covers a
broad class of machine learning architectures with a finite number of hidden
units, including multi-layer perceptrons, autoencoders, attention mechanisms;
and tasks, including (un)supervised learning, denoising, contrastive learning,
in the limit of large data dimension, and comparably large number of samples.
We explicate in full detail the analysis of the learning of sequence
multi-index models, using statistical physics techniques such as the replica
method and approximate message-passing algorithms. This manuscript thus
provides a unified presentation of analyses reported in several previous works,
and a detailed overview of central techniques in the field of statistical
physics of machine learning. This review should be a useful primer for machine
learning theoreticians curious of statistical physics approaches; it should
also be of value to statistical physicists interested in the transfer of such
ideas to the study of neural networks.
[LINK]
http://arxiv.org/abs/2409.13904v2
[DATE]
2025-01-15 07:31:03+08:00
[CATEGORIES]
cs.LG
Expressive Text-to-Image Generation with Rich Text
[AUTHORS]
Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang
[ABSTRACT]
Plain text has become a prevalent interface for text-to-image synthesis.
However, its limited customization options hinder users from accurately
describing desired outputs. For example, plain text makes it hard to specify
continuous quantities, such as the precise RGB color value or importance of
each word. Furthermore, creating detailed text prompts for complex scenes is
tedious for humans to write and challenging for text encoders to interpret. To
address these challenges, we propose using a rich-text editor supporting
formats such as font style, size, color, and footnote. We extract each word’s
attributes from rich text to enable local style control, explicit token
reweighting, precise color rendering, and detailed region synthesis. We achieve
these capabilities through a region-based diffusion process. We first obtain
each word’s region based on attention maps of a diffusion process using plain
text. For each region, we enforce its text attributes by creating
region-specific detailed prompts and applying region-specific guidance, and
maintain its fidelity against plain-text generation through region-based
injections. We present various examples of image generation from rich text and
demonstrate that our method outperforms strong baselines with quantitative
evaluations.
[COMMENTS]
Project webpage: https://rich-text-to-image.github.io/
[LINK]
http://arxiv.org/abs/2304.06720v4
[DATE]
2025-01-15 06:30:10+08:00
[CATEGORIES]
cs.LG
Time series forecasting for multidimensional telemetry data using GAN and BiLSTM in a Digital Twin
[AUTHORS]
Joao Carmo de Almeida Neto, Claudio Miceli de Farias, Leandro Santiago de Araujo, Leopoldo Andre Dutra Lusquino Filho
[ABSTRACT]
The research related to digital twins has been increasing in recent years.
Besides the mirroring of the physical word into the digital, there is the need
of providing services related to the data collected and transferred to the
virtual world. One of these services is the forecasting of physical part future
behavior, that could lead to applications, like preventing harmful events or
designing improvements to get better performance. One strategy used to predict
any system operation it is the use of time series models like ARIMA or LSTM,
and improvements were implemented using these algorithms. Recently, deep
learning techniques based on generative models such as Generative Adversarial
Networks (GANs) have been proposed to create time series and the use of LSTM
has gained more relevance in time series forecasting, but both have limitations
that restrict the forecasting results. Another issue found in the literature is
the challenge of handling multivariate environments/applications in time series
generation. Therefore, new methods need to be studied in order to fill these
gaps and, consequently, provide better resources for creating useful digital
twins. In this proposal, it is going to be studied the integration of a BiLSTM
layer with a time series obtained by GAN in order to improve the forecasting of
all the features provided by the dataset in terms of accuracy and,
consequently, improving behaviour prediction.
[LINK]
http://arxiv.org/abs/2501.08464v1
[DATE]
2025-01-15 06:20:55+08:00
[CATEGORIES]
cs.LG
Neural Network Emulator for Atmospheric Chemical ODE
[AUTHORS]
Zhi-Song Liu, Petri Clusius, Michael Boy
[ABSTRACT]
Modeling atmospheric chemistry is complex and computationally intense. Given
the recent success of Deep neural networks in digital signal processing, we
propose a Neural Network Emulator for fast chemical concentration modeling. We
consider atmospheric chemistry as a time-dependent Ordinary Differential
Equation. To extract the hidden correlations between initial states and future
time evolution, we propose ChemNNE, an Attention based Neural Network Emulator
(NNE) that can model the atmospheric chemistry as a neural ODE process. To
efficiently simulate the chemical changes, we propose the sinusoidal time
embedding to estimate the oscillating tendency over time. More importantly, we
use the Fourier neural operator to model the ODE process for efficient
computation. We also propose three physical-informed losses to supervise the
training optimization. To evaluate our model, we propose a large-scale chemical
dataset that can be used for neural network training and evaluation. The
extensive experiments show that our approach achieves state-of-the-art
performance in modeling accuracy and computational speed.
[COMMENTS]
25 pages, 8 figures
[LINK]
http://arxiv.org/abs/2408.01829v3
[DATE]
2025-01-15 06:10:21+08:00
[CATEGORIES]
cs.LG
MassSpecGym: A benchmark for the discovery and identification of molecules
[AUTHORS]
Roman Bushuiev, Anton Bushuiev, Niek F. de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, Marcus Ludwig, Nils A. Haupt, Apurva Kalia, Corinna Brungs, Robin Schmid, Russell Greiner, Bo Wang, David S. Wishart, Li-Ping Liu, Juho Rousu, Wout Bittremieux, Hannes Rost, Tytus D. Mak, Soha Hassoun, Florian Huber, Justin J. J. van der Hooft, Michael A. Stravs, Sebastian Böcker, Josef Sivic, Tomáš Pluskal
[ABSTRACT]
The discovery and identification of molecules in biological and environmental
samples is crucial for advancing biomedical and chemical sciences. Tandem mass
spectrometry (MS/MS) is the leading technique for high-throughput elucidation
of molecular structures. However, decoding a molecular structure from its mass
spectrum is exceptionally challenging, even when performed by human experts. As
a result, the vast majority of acquired MS/MS spectra remain uninterpreted,
thereby limiting our understanding of the underlying (bio)chemical processes.
Despite decades of progress in machine learning applications for predicting
molecular structures from MS/MS spectra, the development of new methods is
severely hindered by the lack of standard datasets and evaluation protocols. To
address this problem, we propose MassSpecGym – the first comprehensive
benchmark for the discovery and identification of molecules from MS/MS data.
Our benchmark comprises the largest publicly available collection of
high-quality labeled MS/MS spectra and defines three MS/MS annotation
challenges: \textit{de novo} molecular structure generation, molecule
retrieval, and spectrum simulation. It includes new evaluation metrics and a
generalization-demanding data split, therefore standardizing the MS/MS
annotation tasks and rendering the problem accessible to the broad machine
learning community. MassSpecGym is publicly available at
\url{https://github.com/pluskal-lab/MassSpecGym}.
[LINK]
http://arxiv.org/abs/2410.23326v2
[DATE]
2025-01-15 06:08:40+08:00
[CATEGORIES]
cs.LG
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
[AUTHORS]
Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu
[ABSTRACT]
We present Vchitect-2.0, a parallel transformer architecture designed to
scale up video diffusion models for large-scale text-to-video generation. The
overall Vchitect-2.0 system has several key designs. (1) By introducing a novel
Multimodal Diffusion Block, our approach achieves consistent alignment between
text descriptions and generated video frames, while maintaining temporal
coherence across sequences. (2) To overcome memory and computational
bottlenecks, we propose a Memory-efficient Training framework that incorporates
hybrid parallelism and other memory reduction techniques, enabling efficient
training of long video sequences on distributed systems. (3) Additionally, our
enhanced data processing pipeline ensures the creation of Vchitect T2V
DataVerse, a high-quality million-scale training dataset through rigorous
annotation and aesthetic evaluation. Extensive benchmarking demonstrates that
Vchitect-2.0 outperforms existing methods in video quality, training
efficiency, and scalability, serving as a suitable base for high-fidelity video
generation.
[LINK]
http://arxiv.org/abs/2501.08453v1
[DATE]
2025-01-15 05:53:11+08:00
[CATEGORIES]
cs.LG
Statistical Properties of Deep Neural Networks with Dependent Data
[AUTHORS]
Chad Brown
[ABSTRACT]
This paper establishes statistical properties of deep neural network (DNN)
estimators under dependent data. Two general results for nonparametric sieve
estimators directly applicable to DNN estimators are given. The first
establishes rates for convergence in probability under nonstationary data. The
second provides non-asymptotic probability bounds on $\mathcal{L}^{2}$-errors
under stationary $\beta$-mixing data. I apply these results to DNN estimators
in both regression and classification contexts imposing only a standard
H"older smoothness assumption. The DNN architectures considered are common in
applications, featuring fully connected feedforward networks with any
continuous piecewise linear activation function, unbounded weights, and a width
and depth that grows with sample size. The framework provided also offers
potential for research into other DNN architectures and time-series
applications.
[COMMENTS]
86 pages, 2 figures, removed partially linear model section and
uploaded as a separate paper (arXiv:2410.22574v1)
[LINK]
http://arxiv.org/abs/2410.11113v3
[DATE]
2025-01-15 05:50:37+08:00
[CATEGORIES]
cs.LG
SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine
[AUTHORS]
Sadanand Modak, Noah Patton, Isil Dillig, Joydeep Biswas
[ABSTRACT]
This paper addresses the problem of preference learning, which aims to align
robot behaviors through learning user specific preferences (e.g. “good
pull-over location”) from visual demonstrations. Despite its similarity to
learning factual concepts (e.g. “red door”), preference learning is a
fundamentally harder problem due to its subjective nature and the paucity of
person-specific training data. We address this problem using a novel framework
called SYNAPSE, which is a neuro-symbolic approach designed to efficiently
learn preferential concepts from limited data. SYNAPSE represents preferences
as neuro-symbolic programs, facilitating inspection of individual parts for
alignment, in a domain-specific language (DSL) that operates over images and
leverages a novel combination of visual parsing, large language models, and
program synthesis to learn programs representing individual preferences. We
perform extensive evaluations on various preferential concepts as well as user
case studies demonstrating its ability to align well with dissimilar user
preferences. Our method significantly outperforms baselines, especially when it
comes to out of distribution generalization. We show the importance of the
design choices in the framework through multiple ablation studies. Code,
additional results, and supplementary material can be found on the website:
https://amrl.cs.utexas.edu/synapse
[COMMENTS]
Accepted (oral) at AAAI 25
[LINK]
http://arxiv.org/abs/2403.16689v3
[DATE]
2025-01-15 05:37:31+08:00
[CATEGORIES]
cs.LG
Augmentation Invariant Manifold Learning
[AUTHORS]
Shulei Wang
[ABSTRACT]
Data augmentation is a widely used technique and an essential ingredient in
the recent advance in self-supervised representation learning. By preserving
the similarity between augmented data, the resulting data representation can
improve various downstream analyses and achieve state-of-the-art performance in
many applications. Despite the empirical effectiveness, most existing methods
lack theoretical understanding under a general nonlinear setting. To fill this
gap, we develop a statistical framework on a low-dimension product manifold to
model the data augmentation transformation. Under this framework, we introduce
a new representation learning method called augmentation invariant manifold
learning and design a computationally efficient algorithm by reformulating it
as a stochastic optimization problem. Compared with existing self-supervised
methods, the new method simultaneously exploits the manifold’s geometric
structure and invariant property of augmented data and has an explicit
theoretical guarantee. Our theoretical investigation characterizes the role of
data augmentation in the proposed method and reveals why and how the data
representation learned from augmented data can improve the $k$-nearest neighbor
classifier in the downstream analysis, showing that a more complex data
augmentation leads to more improvement in downstream analysis. Finally,
numerical experiments on simulated and real data sets are presented to
demonstrate the merit of the proposed method.
[LINK]
http://arxiv.org/abs/2211.00460v3
[DATE]
2025-01-15 05:26:20+08:00
[CATEGORIES]
cs.LG
FARE: A Deep Learning-Based Framework for Radar-based Face Recognition and Out-of-distribution Detection
[AUTHORS]
Sabri Mustafa Kahya, Boran Hamdi Sivrikaya, Muhammet Sami Yavuz, Eckehard Steinbach
[ABSTRACT]
In this work, we propose a novel pipeline for face recognition and
out-of-distribution (OOD) detection using short-range FMCW radar. The proposed
system utilizes Range-Doppler and micro Range-Doppler Images. The architecture
features a primary path (PP) responsible for the classification of
in-distribution (ID) faces, complemented by intermediate paths (IPs) dedicated
to OOD detection. The network is trained in two stages: first, the PP is
trained using triplet loss to optimize ID face classification. In the second
stage, the PP is frozen, and the IPs-comprising simple linear autoencoder
networks-are trained specifically for OOD detection. Using our dataset
generated with a 60 GHz FMCW radar, our method achieves an ID classification
accuracy of 99.30% and an OOD detection AUROC of 96.91%.
[COMMENTS]
Accepted at ICASSP 2025
[LINK]
http://arxiv.org/abs/2501.08440v1
[DATE]
2025-01-15 05:08:08+08:00
[CATEGORIES]
cs.LG
Do generative video models learn physical principles from watching videos?
[AUTHORS]
Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, Robert Geirhos
[ABSTRACT]
AI video generation is undergoing a revolution, with quality and realism
advancing rapidly. These advances have led to a passionate scientific debate:
Do video models learn “world models” that discover laws of physics – or,
alternatively, are they merely sophisticated pixel predictors that achieve
visual realism without understanding the physical principles of reality? We
address this question by developing Physics-IQ, a comprehensive benchmark
dataset that can only be solved by acquiring a deep understanding of various
physical principles, like fluid dynamics, optics, solid mechanics, magnetism
and thermodynamics. We find that across a range of current models (Sora,
Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical
understanding is severely limited, and unrelated to visual realism. At the same
time, some test cases can already be successfully solved. This indicates that
acquiring certain physical principles from observation alone may be possible,
but significant challenges remain. While we expect rapid advances ahead, our
work demonstrates that visual realism does not imply physical understanding.
Our project page is at https://physics-iq.github.io; code at
https://github.com/google-deepmind/physics-IQ-benchmark.
[LINK]
http://arxiv.org/abs/2501.09038v1
[DATE]
2025-01-15 04:59:37+08:00
[CATEGORIES]
cs.LG
Learning Discrete Concepts in Latent Hierarchical Models
[AUTHORS]
Lingjing Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang
[ABSTRACT]
Learning concepts from natural high-dimensional data (e.g., images) holds
potential in building human-aligned and interpretable machine learning models.
Despite its encouraging prospect, formalization and theoretical insights into
this crucial task are still lacking. In this work, we formalize concepts as
discrete latent causal variables that are related via a hierarchical causal
model that encodes different abstraction levels of concepts embedded in
high-dimensional data (e.g., a dog breed and its eye shapes in natural images).
We formulate conditions to facilitate the identification of the proposed causal
model, which reveals when learning such concepts from unsupervised data is
possible. Our conditions permit complex causal hierarchical structures beyond
latent trees and multi-level directed acyclic graphs in prior work and can
handle high-dimensional, continuous observed variables, which is well-suited
for unstructured data modalities such as images. We substantiate our
theoretical claims with synthetic data experiments. Further, we discuss our
theory’s implications for understanding the underlying mechanisms of latent
diffusion models and provide corresponding empirical evidence for our
theoretical insights.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.00519v2
[DATE]
2025-01-15 04:44:45+08:00
[CATEGORIES]
cs.LG
Physics-informed neural networks for phase-resolved data assimilation and prediction of nonlinear ocean waves
[AUTHORS]
Svenja Ehlers, Norbert Hoffmann, Tianning Tang, Adrian H. Callaghan, Rui Cao, Enrique M. Padilla, Yuxin Fang, Merten Stender
[ABSTRACT]
The assimilation and prediction of phase-resolved surface gravity waves are
critical challenges in ocean science and engineering. Potential flow theory
(PFT) has been widely employed to develop wave models and numerical techniques
for wave prediction. However, traditional wave prediction methods are often
limited. For example, most simplified wave models have a limited ability to
capture strong wave nonlinearity, while fully nonlinear PFT solvers often fail
to meet the speed requirements of engineering applications. This computational
inefficiency also hinders the development of effective data assimilation
techniques, which are required to reconstruct spatial wave information from
sparse measurements to initialize the wave prediction. To address these
challenges, we propose a novel solver method that leverages physics-informed
neural networks (PINNs) that parameterize PFT solutions as neural networks.
This provides a computationally inexpensive way to assimilate and predict wave
data. The proposed PINN framework is validated through comparisons with
analytical linear PFT solutions and experimental data collected in a laboratory
wave flume. The results demonstrate that our approach accurately captures and
predicts irregular, nonlinear, and dispersive wave surface dynamics. Moreover,
the PINN can infer the fully nonlinear velocity potential throughout the entire
fluid volume solely from surface elevation measurements, enabling the
calculation of fluid velocities that are difficult to measure experimentally.
[COMMENTS]
22 pages, 12 Figures, preprint
[LINK]
http://arxiv.org/abs/2501.08430v1
[DATE]
2025-01-15 04:44:17+08:00
[CATEGORIES]
cs.LG
Physics-Informed Latent Neural Operator for Real-time Predictions of Complex Physical Systems
[AUTHORS]
Sharmila Karumuri, Lori Graham-Brady, Somdatta Goswami
[ABSTRACT]
Deep operator network (DeepONet) has shown great promise as a surrogate model
for systems governed by partial differential equations (PDEs), learning
mappings between infinite-dimensional function spaces with high accuracy.
However, achieving low generalization errors often requires highly
overparameterized networks, posing significant challenges for large-scale,
complex systems. To address these challenges, latent DeepONet was proposed,
introducing a two-step approach: first, a reduced-order model is used to learn
a low-dimensional latent space, followed by operator learning on this latent
space. While effective, this method is inherently data-driven, relying on large
datasets and making it difficult to incorporate governing physics into the
framework. Additionally, the decoupled nature of these steps prevents
end-to-end optimization and the ability to handle data scarcity. This work
introduces PI-Latent-NO, a physics-informed latent operator learning framework
that overcomes these limitations. Our architecture employs two coupled
DeepONets in an end-to-end training scheme: the first, termed Latent-DeepONet,
identifies and learns the low-dimensional latent space, while the second,
Reconstruction-DeepONet, maps the latent representations back to the original
physical space. By integrating governing physics directly into the training
process, our approach requires significantly fewer data samples while achieving
high accuracy. Furthermore, the framework is computationally and memory
efficient, exhibiting nearly constant scaling behavior on a single GPU and
demonstrating the potential for further efficiency gains with distributed
training. We validate the proposed method on high-dimensional parametric PDEs,
demonstrating its effectiveness as a proof of concept and its potential
scalability for large-scale systems.
[LINK]
http://arxiv.org/abs/2501.08428v1
[DATE]
2025-01-15 04:38:30+08:00
[CATEGORIES]
cs.LG
Causal vs. Anticausal merging of predictors
[AUTHORS]
Sergio Hernan Garrido Mejia, Patrick Blöbaum, Bernhard Schölkopf, Dominik Janzing
[COMMENTS]
Presented at the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2501.08426v1
[DATE]
2025-01-15 04:38:15+08:00
[CATEGORIES]
cs.LG
Is Stochastic Gradient Descent Effective? A PDE Perspective on Machine Learning processes
[AUTHORS]
Davide Barbieri, Matteo Bonforte, Peio Ibarrondo
[ABSTRACT]
In this paper we analyze the behaviour of the stochastic gradient descent
(SGD), a widely used method in supervised learning for optimizing neural
network weights via a minimization of non-convex loss functions. Since the
pioneering work of E, Li and Tai (2017), the underlying structure of such
processes can be understood via parabolic PDEs of Fokker-Planck type, which are
at the core of our analysis. Even if Fokker-Planck equations have a long
history and a extensive literature, almost nothing is known when the potential
is non-convex or when the diffusion matrix is degenerate, and this is the main
difficulty that we face in our analysis.
We identify two different regimes: in the initial phase of SGD, the loss
function drives the weights to concentrate around the nearest local minimum. We
refer to this phase as the drift regime and we provide quantitative estimates
on this concentration phenomenon. Next, we introduce the diffusion regime,
where stochastic fluctuations help the learning process to escape suboptimal
local minima. We analyze the Mean Exit Time (MET) and prove upper and lower
bounds of the MET. Finally, we address the asymptotic convergence of SGD, for a
non-convex cost function and a degenerate diffusion matrix, that do not allow
to use the standard approaches, and require new techniques. For this purpose,
we exploit two different methods: duality and entropy methods.
We provide new results about the dynamics and effectiveness of SGD, offering
a deep connection between stochastic optimization and PDE theory, and some
answers and insights to basic questions in the Machine Learning processes: How
long does SGD take to escape from a bad minimum? Do neural network parameters
converge using SGD? How do parameters evolve in the first stage of training
with SGD?
[LINK]
http://arxiv.org/abs/2501.08425v1
[DATE]
2025-01-15 04:33:30+08:00
[CATEGORIES]
cs.LG
A Constant Velocity Latent Dynamics Approach for Accelerating Simulation of Stiff Nonlinear Systems
[AUTHORS]
William Cole Nockolds, C. G. Krishnanunni, Tan Bui-Thanh
[ABSTRACT]
Solving stiff ordinary differential equations (StODEs) requires sophisticated
numerical solvers, which are often computationally expensive. In particular,
StODE’s often cannot be solved with traditional explicit time integration
schemes and one must resort to costly implicit methods to compute solutions. On
the other hand, state-of-the-art machine learning (ML) based methods such as
Neural ODE (NODE) poorly handle the timescale separation of various elements of
the solutions to StODEs and require expensive implicit solvers for integration
at inference time. In this work, we embark on a different path which involves
learning a latent dynamics for StODEs, in which one completely avoids numerical
integration. To that end, we consider a constant velocity latent dynamical
system whose solution is a sequence of straight lines. Given the initial
condition and parameters of the ODE, the encoder networks learn the slope (i.e
the constant velocity) and the initial condition for the latent dynamics. In
other words, the solution of the original dynamics is encoded into a sequence
of straight lines which can be decoded back to retrieve the actual solution as
and when required. Another key idea in our approach is a nonlinear
transformation of time, which allows for the “stretching/squeezing” of time in
the latent space, thereby allowing for varying levels of attention to different
temporal regions in the solution. Additionally, we provide a simple
universal-approximation-type proof showing that our approach can approximate
the solution of stiff nonlinear system on a compact set to any degree of
accuracy, {\epsilon}. We show that the dimension of the latent dynamical system
in our approach is independent of {\epsilon}. Numerical investigation on
prototype StODEs suggest that our method outperforms state-of-the art machine
learning approaches for handling StODEs.
[LINK]
http://arxiv.org/abs/2501.08423v1
[DATE]
2025-01-15 04:32:31+08:00
[CATEGORIES]
cs.LG
CVaR-Based Variational Quantum Optimization for User Association in Handoff-Aware Vehicular Networks
[AUTHORS]
Zijiang Yan, Hao Zhou, Jianhua Pei, Aryan Kaushik, Hina Tabassum, Ping Wang
[ABSTRACT]
Efficient resource allocation is essential for optimizing various tasks in
wireless networks, which are usually formulated as generalized assignment
problems (GAP). GAP, as a generalized version of the linear sum assignment
problem, involves both equality and inequality constraints that add
computational challenges. In this work, we present a novel Conditional Value at
Risk (CVaR)-based Variational Quantum Eigensolver (VQE) framework to address
GAP in vehicular networks (VNets). Our approach leverages a hybrid
quantum-classical structure, integrating a tailored cost function that balances
both objective and constraint-specific penalties to improve solution quality
and stability. Using the CVaR-VQE model, we handle the GAP efficiently by
focusing optimization on the lower tail of the solution space, enhancing both
convergence and resilience on noisy intermediate-scale quantum (NISQ) devices.
We apply this framework to a user-association problem in VNets, where our
method achieves 23.5% improvement compared to the deep neural network (DNN)
approach.
[COMMENTS]
Accepted in IEEE International Conference on Communications (ICC
2025)
[LINK]
http://arxiv.org/abs/2501.08418v1
[DATE]
2025-01-15 04:21:06+08:00
[CATEGORIES]
cs.LG
Accelerating the discovery of low-energy structure configurations: a computational approach that integrates first-principles calculations, Monte Carlo sampling, and Machine Learning
[AUTHORS]
Md Rajib Khan Musa, Yichen Qian, Jie Peng, David Cereceda
[ABSTRACT]
Finding Minimum Energy Configurations (MECs) is essential in fields such as
physics, chemistry, and materials science, as they represent the most stable
states of the systems. In particular, identifying such MECs in multi-component
alloys considered candidate PFMs is key because it determines the most stable
arrangement of atoms within the alloy, directly influencing its phase
stability, structural integrity, and thermo-mechanical properties. However,
since the search space grows exponentially with the number of atoms considered,
obtaining such MECs using computationally expensive first-principles DFT
calculations often results in a cumbersome task. To escape the above compromise
between physical fidelity and computational efficiency, we have developed a
novel physics-based data-driven approach that combines Monte Carlo sampling,
first-principles DFT calculations, and Machine Learning to accelerate the
discovery of MECs in multi-component alloys. More specifically, we have
leveraged well-established Cluster Expansion (CE) techniques with Local Outlier
Factor models to establish strategies that enhance the reliability of the CE
method. In this work, we demonstrated the capabilities of the proposed approach
for the particular case of a tungsten-based quaternary high-entropy alloy.
However, the method is applicable to other types of alloys and enables a wide
range of applications.
[COMMENTS]
added changes made during revision of manuscript
[LINK]
http://arxiv.org/abs/2410.05604v2
[DATE]
2025-01-15 04:02:17+08:00
[CATEGORIES]
cs.LG
BiDepth Multimodal Neural Network: Bidirectional Depth Deep Learning Arcitecture for Spatial-Temporal Prediction
[AUTHORS]
Sina Ehsani, Fenglian Pan, Qingpei Hu, Jian Liu
[ABSTRACT]
Accurate prediction of spatial-temporal (ST) information in dynamic systems,
such as urban mobility and weather patterns, is a crucial yet challenging
problem. The complexity stems from the intricate interplay between spatial
proximity and temporal relevance, where both long-term trends and short-term
fluctuations are present in convoluted patterns. Existing approaches, including
traditional statistical methods and conventional neural networks, may provide
inaccurate results due to the lack of an effective mechanism that
simultaneously incorporates information at variable temporal depths while
maintaining spatial context, resulting in a trade-off between comprehensive
long-term historical analysis and responsiveness to short-term new information.
To bridge this gap, this paper proposes the BiDepth Multimodal Neural Network
(BDMNN) with bidirectional depth modulation that enables a comprehensive
understanding of both long-term seasonality and short-term fluctuations,
adapting to the complex ST context. Case studies with real-world public data
demonstrate significant improvements in prediction accuracy, with a 12%
reduction in Mean Squared Error for urban traffic prediction and a 15%
improvement in rain precipitation forecasting compared to state-of-the-art
benchmarks, without demanding extra computational resources.
[COMMENTS]
This paper has been submitted to Applied Intelligence for review
[LINK]
http://arxiv.org/abs/2501.08411v1
[DATE]
2025-01-15 03:59:59+08:00
[CATEGORIES]
cs.LG
Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation
[AUTHORS]
Hansoo Park, Chanwoo Kim, Jihyeon Kim, Hoseong Cho, Nhat Nguyen Bao Truong, Taehwan Kim, Seungryul Baek
[ABSTRACT]
RGB-based 3D pose estimation methods have been successful with the
development of deep learning and the emergence of high-quality 3D pose
datasets. However, most existing methods do not operate well for testing images
whose distribution is far from that of training data. However, most existing
methods do not operate well for testing images whose distribution is far from
that of training data. This problem might be alleviated by involving diverse
data during training, however it is non-trivial to collect such diverse data
with corresponding labels (i.e. 3D pose). In this paper, we introduced an
unsupervised domain adaptation framework for 3D pose estimation that utilizes
the unlabeled data in addition to labeled data via masked image modeling (MIM)
framework. Foreground-centric reconstruction and attention regularization are
further proposed to increase the effectiveness of unlabeled data usage.
Experiments are conducted on the various datasets in human and hand pose
estimation tasks, especially using the cross-domain scenario. We demonstrated
the effectiveness of ours by achieving the state-of-the-art accuracy on all
datasets.
[COMMENTS]
16 pages, 7 figures
[LINK]
http://arxiv.org/abs/2501.08408v1
[DATE]
2025-01-15 03:56:43+08:00
[CATEGORIES]
cs.LG
Predict Confidently, Predict Right: Abstention in Dynamic Graph Learning
[AUTHORS]
Jayadratha Gayen, Himanshu Pal, Naresh Manwani, Charu Sharma
[ABSTRACT]
Many real-world systems can be modeled as dynamic graphs, where nodes and
edges evolve over time, requiring specialized models to capture their evolving
dynamics in risk-sensitive applications effectively. Temporal graph neural
networks (GNNs) are one such category of specialized models. For the first
time, our approach integrates a reject option strategy within the framework of
GNNs for continuous-time dynamic graphs. This allows the model to strategically
abstain from making predictions when the uncertainty is high and confidence is
low, thus minimizing the risk of critical misclassification and enhancing the
results and reliability. We propose a coverage-based abstention prediction
model to implement the reject option that maximizes prediction within a
specified coverage. It improves the prediction score for link prediction and
node classification tasks. Temporal GNNs deal with extremely skewed datasets
for the next state prediction or node classification task. In the case of class
imbalance, our method can be further tuned to provide a higher weightage to the
minority class. Exhaustive experiments are presented on four datasets for
dynamic link prediction and two datasets for dynamic node classification tasks.
This demonstrates the effectiveness of our approach in improving the
reliability and area under the curve (AUC)/ average precision (AP) scores for
predictions in dynamic graph scenarios. The results highlight our model’s
ability to efficiently handle the trade-offs between prediction confidence and
coverage, making it a dependable solution for applications requiring high
precision in dynamic and uncertain environments.
[LINK]
http://arxiv.org/abs/2501.08397v1
[DATE]
2025-01-15 03:27:58+08:00
[CATEGORIES]
cs.LG
Gradient Equilibrium in Online Learning: Theory and Applications
[AUTHORS]
Anastasios N. Angelopoulos, Michael I. Jordan, Ryan J. Tibshirani
[ABSTRACT]
We present a new perspective on online learning that we refer to as gradient
equilibrium: a sequence of iterates achieves gradient equilibrium if the
average of gradients of losses along the sequence converges to zero. In
general, this condition is not implied by nor implies sublinear regret. It
turns out that gradient equilibrium is achievable by standard online learning
methods such as gradient descent and mirror descent with constant step sizes
(rather than decaying step sizes, as is usually required for no regret).
Further, as we show through examples, gradient equilibrium translates into an
interpretable and meaningful property in online prediction problems spanning
regression, classification, quantile estimation, and others. Notably, we show
that the gradient equilibrium framework can be used to develop a debiasing
scheme for black-box predictions under arbitrary distribution shift, based on
simple post hoc online descent updates. We also show that post hoc gradient
updates can be used to calibrate predicted quantiles under distribution shift,
and that the framework leads to unbiased Elo scores for pairwise preference
prediction.
[COMMENTS]
Code available at
https://github.com/aangelopoulos/gradient-equilibrium/
[LINK]
http://arxiv.org/abs/2501.08330v1
[DATE]
2025-01-15 02:59:09+08:00
[CATEGORIES]
cs.LG
Diffusion Adversarial Post-Training for One-Step Video Generation
[AUTHORS]
Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang
[ABSTRACT]
The diffusion models are widely used for image and video generation, but
their iterative generation process is slow and expansive. While existing
distillation approaches have demonstrated the potential for one-step generation
in the image domain, they still suffer from significant quality degradation. In
this work, we propose Adversarial Post-Training (APT) against real data
following diffusion pre-training for one-step video generation. To improve the
training stability and quality, we introduce several improvements to the model
architecture and training procedures, along with an approximated R1
regularization objective. Empirically, our experiments show that our
adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720,
24fps videos in real time using a single forward evaluation step. Additionally,
our model is capable of generating 1024px images in a single step, achieving
quality comparable to state-of-the-art methods.
[LINK]
http://arxiv.org/abs/2501.08316v1
[DATE]
2025-01-15 02:51:48+08:00
[CATEGORIES]
cs.LG
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
[AUTHORS]
Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun, Lawrence H. Staib, John A. Onofrey
[ABSTRACT]
Accurate uncertainty estimation is crucial for deploying neural networks in
risk-sensitive applications such as medical diagnosis. Monte Carlo Dropout is a
widely used technique for approximating predictive uncertainty by performing
stochastic forward passes with dropout during inference. However, using static
dropout rates across all layers and inputs can lead to suboptimal uncertainty
estimates, as it fails to adapt to the varying characteristics of individual
inputs and network layers. Existing approaches optimize dropout rates during
training using labeled data, resulting in fixed inference-time parameters that
cannot adjust to new data distributions, compromising uncertainty estimates in
Monte Carlo simulations.
In this paper, we propose Rate-In, an algorithm that dynamically adjusts
dropout rates during inference by quantifying the information loss induced by
dropout in each layer’s feature maps. By treating dropout as controlled noise
injection and leveraging information-theoretic principles, Rate-In adapts
dropout rates per layer and per input instance without requiring ground truth
labels. By quantifying the functional information loss in feature maps, we
adaptively tune dropout rates to maintain perceptual quality across diverse
medical imaging tasks and architectural configurations. Our extensive empirical
study on synthetic data and real-world medical imaging tasks demonstrates that
Rate-In improves calibration and sharpens uncertainty estimates compared to
fixed or heuristic dropout rates without compromising predictive performance.
Rate-In offers a practical, unsupervised, inference-time approach to optimizing
dropout for more reliable predictive uncertainty estimation in critical
applications.
[COMMENTS]
Updated author affiliation
[LINK]
http://arxiv.org/abs/2412.07169v3
[DATE]
2025-01-15 02:51:43+08:00
[CATEGORIES]
cs.LG
Polynomial Threshold Functions of Bounded Tree-Width: Some Explainability and Complexity Aspects
[AUTHORS]
Karine Chubarian, Johnny Joyce, Gyorgy Turan
[ABSTRACT]
The tree-width of a multivariate polynomial is the tree-width of the
hypergraph with hyperedges corresponding to its terms. Multivariate polynomials
of bounded tree-width have been studied by Makowsky and Meer as a new sparsity
condition that allows for polynomial solvability of problems which are
intractable in general. We consider a variation on this theme for Boolean
variables. A representation of a Boolean function as the sign of a polynomial
is called a polynomial threshold representation. We discuss Boolean functions
representable as polynomial threshold functions of bounded tree-width and
present two applications to Bayesian network classifiers, a probabilistic
graphical model. Both applications are in Explainable Artificial Intelligence
(XAI), the research area dealing with the black-box nature of many recent
machine learning models. We also give a separation result between the
representational power of positive and general polynomial threshold functions.
[COMMENTS]
22 pages, 3 figures. To be published in Festschrift in honor of
Johann A. Makowsky
[LINK]
http://arxiv.org/abs/2501.08297v1
[DATE]
2025-01-15 02:28:08+08:00
[CATEGORIES]
cs.LG
Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve
[AUTHORS]
Pedro Pessoa, Max Schweiger, Lance W. Q. Xu, Tristan Manha, Ayush Saurabh, Julian Antolin Camarena, Steve Pressé
[ABSTRACT]
Across the scientific realm, we find ourselves subtracting or dividing
stochastic signals. For instance, consider a stochastic realization, $x$,
generated from the addition or multiplication of two stochastic signals $a$ and
$b$, namely $x=a+b$ or $x = ab$. For the $x=a+b$ example, $a$ can be
fluorescence background and $b$ the signal of interest whose statistics are to
be learned from the measured $x$. Similarly, when writing $x=ab$, $a$ can be
thought of as the illumination intensity and $b$ the density of fluorescent
molecules of interest. Yet dividing or subtracting stochastic signals amplifies
noise, and we ask instead whether, using the statistics of $a$ and the
measurement of $x$ as input, we can recover the statistics of $b$. Here, we
show how normalizing flows can generate an approximation of the probability
distribution over $b$, thereby avoiding subtraction or division altogether.
This method is implemented in our software package, NFdeconvolve, available on
GitHub with a tutorial linked in the main text.
[LINK]
http://arxiv.org/abs/2501.08288v1
[DATE]
2025-01-15 02:08:52+08:00
[CATEGORIES]
cs.LG
Can Bayesian Neural Networks Explicitly Model Input Uncertainty?
[AUTHORS]
Matias Valdenegro-Toro, Marco Zullich
[ABSTRACT]
Inputs to machine learning models can have associated noise or uncertainties,
but they are often ignored and not modelled. It is unknown if Bayesian Neural
Networks and their approximations are able to consider uncertainty in their
inputs. In this paper we build a two input Bayesian Neural Network (mean and
standard deviation) and evaluate its capabilities for input uncertainty
estimation across different methods like Ensembles, MC-Dropout, and Flipout.
Our results indicate that only some uncertainty estimation methods for
approximate Bayesian NNs can model input uncertainty, in particular Ensembles
and Flipout.
[COMMENTS]
12 pages, 11 figures, VISAPP 2025 camera ready
[LINK]
http://arxiv.org/abs/2501.08285v1
[DATE]
2025-01-15 02:00:41+08:00
[CATEGORIES]
cs.LG
Decoding Interpretable Logic Rules from Neural Networks
[AUTHORS]
Chuqin Geng, Xiaojie Xu, Zhaoyue Wang, Ziyu Zhao, Xujie Si
[ABSTRACT]
As deep neural networks continue to excel across various domains, their
black-box nature has raised concerns about transparency and trust. In
particular, interpretability has become increasingly essential for applications
that demand high safety and knowledge rigor, such as drug discovery, autonomous
driving, and genomics. However, progress in understanding even the simplest
deep neural networks - such as fully connected networks - has been limited,
despite their role as foundational elements in state-of-the-art models like
ResNet and Transformer. In this paper, we address this challenge by introducing
NeuroLogic, a novel approach for decoding interpretable logic rules from neural
networks. NeuroLogic leverages neural activation patterns to capture the
model’s critical decision-making processes, translating them into logical rules
represented by hidden predicates. Thanks to its flexible design in the
grounding phase, NeuroLogic can be adapted to a wide range of neural networks.
For simple fully connected neural networks, hidden predicates can be grounded
in certain split patterns of original input features to derive
decision-tree-like rules. For large, complex vision neural networks, NeuroLogic
grounds hidden predicates into high-level visual concepts that are
understandable to humans. Our empirical study demonstrates that NeuroLogic can
extract global and interpretable rules from state-of-the-art models such as
ResNet, a task at which existing work struggles. We believe NeuroLogic can help
pave the way for understanding the black-box nature of neural networks.
[COMMENTS]
23 pages, 7 figures
[LINK]
http://arxiv.org/abs/2501.08281v1
[DATE]
2025-01-15 01:57:26+08:00
[CATEGORIES]
cs.LG
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
[AUTHORS]
Ivan Butakov, Alexander Semenenko, Alexander Tolmachev, Andrey Gladkov, Marina Munkhoeva, Alexey Frolov
[ABSTRACT]
Deep InfoMax (DIM) is a well-established method for self-supervised
representation learning (SSRL) based on maximization of the mutual information
between the input and the output of a deep neural network encoder. Despite the
DIM and contrastive SSRL in general being well-explored, the task of learning
representations conforming to a specific distribution (i.e., distribution
matching, DM) is still under-addressed. Motivated by the importance of DM to
several downstream tasks (including generative modeling, disentanglement,
outliers detection and other), we enhance DIM to enable automatic matching of
learned representations to a selected prior distribution. To achieve this, we
propose injecting an independent noise into the normalized outputs of the
encoder, while keeping the same InfoMax training objective. We show that such
modification allows for learning uniformly and normally distributed
representations, as well as representations of other absolutely continuous
distributions. Our approach is tested on various downstream tasks. The results
indicate a moderate trade-off between the performance on the downstream tasks
and quality of DM.
[COMMENTS]
25 pages, 7 fugures
[LINK]
http://arxiv.org/abs/2410.06993v2
[DATE]
2025-01-15 01:52:40+08:00
[CATEGORIES]
cs.LG
Multiplayer Federated Learning: Reaching Equilibrium with Less Communication
[AUTHORS]
TaeHo Yoon, Sayantan Choudhury, Nicolas Loizou
[ABSTRACT]
Traditional Federated Learning (FL) approaches assume collaborative clients
with aligned objectives working towards a shared global model. However, in many
real-world scenarios, clients act as rational players with individual
objectives and strategic behaviors, a concept that existing FL frameworks are
not equipped to adequately address. To bridge this gap, we introduce
Multiplayer Federated Learning (MpFL), a novel framework that models the
clients in the FL environment as players in a game-theoretic context, aiming to
reach an equilibrium. In this scenario, each player tries to optimize their own
utility function, which may not align with the collective goal. Within MpFL, we
propose Per-Player Local Stochastic Gradient Descent (PEARL-SGD), an algorithm
in which each player/client performs local updates independently and
periodically communicates with other players. We theoretically analyze
PEARL-SGD and prove that it reaches a neighborhood of equilibrium with less
communication in the stochastic setup compared to its non-local counterpart.
Finally, we verify our theoretical findings through numerical experiments.
[COMMENTS]
43 pages, 5 figures
[LINK]
http://arxiv.org/abs/2501.08263v1
[DATE]
2025-01-15 01:23:14+08:00
[CATEGORIES]
cs.LG
FDPP: Fine-tune Diffusion Policy with Human Preference
[AUTHORS]
Yuxin Chen, Devesh K. Jha, Masayoshi Tomizuka, Diego Romeres
[ABSTRACT]
Imitation learning from human demonstrations enables robots to perform
complex manipulation tasks and has recently witnessed huge success. However,
these techniques often struggle to adapt behavior to new preferences or changes
in the environment. To address these limitations, we propose Fine-tuning
Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function
through preference-based learning. This reward is then used to fine-tune the
pre-trained policy with reinforcement learning (RL), resulting in alignment of
pre-trained policy with new human preferences while still solving the original
task. Our experiments across various robotic tasks and preferences demonstrate
that FDPP effectively customizes policy behavior without compromising
performance. Additionally, we show that incorporating Kullback-Leibler (KL)
regularization during fine-tuning prevents over-fitting and helps maintain the
competencies of the initial policy.
[LINK]
http://arxiv.org/abs/2501.08259v1
[DATE]
2025-01-15 01:15:27+08:00
[CATEGORIES]
cs.LG
Particle Semi-Implicit Variational Inference
[AUTHORS]
Jen Ning Lim, Adam M. Johansen
[ABSTRACT]
Semi-implicit variational inference (SIVI) enriches the expressiveness of
variational families by utilizing a kernel and a mixing distribution to
hierarchically define the variational distribution. Existing SIVI methods
parameterize the mixing distribution using implicit distributions, leading to
intractable variational densities. As a result, directly maximizing the
evidence lower bound (ELBO) is not possible, so they resort to one of the
following: optimizing bounds on the ELBO, employing costly inner-loop Markov
chain Monte Carlo runs, or solving minimax objectives. In this paper, we
propose a novel method for SIVI called Particle Variational Inference (PVI)
which employs empirical measures to approximate the optimal mixing
distributions characterized as the minimizer of a free energy functional. PVI
arises naturally as a particle approximation of a Euclidean–Wasserstein
gradient flow and, unlike prior works, it directly optimizes the ELBO whilst
making no parametric assumption about the mixing distribution. Our empirical
results demonstrate that PVI performs favourably compared to other SIVI methods
across various tasks. Moreover, we provide a theoretical analysis of the
behaviour of the gradient flow of a related free energy functional:
establishing the existence and uniqueness of solutions as well as propagation
of chaos results.
[COMMENTS]
NeurIPS 2024 Camera ready
[LINK]
http://arxiv.org/abs/2407.00649v3
[DATE]
2025-01-15 01:08:47+08:00
[CATEGORIES]
cs.LG
Automated Detection and Analysis of Minor Deformations in Flat Walls Due to Railway Vibrations Using LiDAR and Machine Learning
[AUTHORS]
Surjo Dey, Ankit Sharma, Hritu Raj, Susham Biswas
[ABSTRACT]
This study introduces an advanced methodology for automatically identifying
minor deformations in flat walls caused by vibrations from nearby railway
tracks. It leverages high-density Terrestrial Laser Scanner (TLS) LiDAR surveys
and AI/ML techniques to collect and analyze data. The scan data is processed
into a detailed point cloud, which is segmented to distinguish ground points,
trees, buildings, and other objects. The analysis focuses on identifying
sections along flat walls and estimating their deformations relative to the
ground orientation.
Findings from the study, conducted at the RGIPT campus, reveal significant
deformations in walls close to the railway corridor, with the highest
deformations ranging from 7 to 8 cm and an average of 3 to 4 cm. In contrast,
walls further from the corridor show negligible deformations. The developed
automated process for feature extraction and deformation monitoring
demonstrates potential for structural health monitoring. By integrating LiDAR
data with machine learning, the methodology provides an efficient system for
identifying and analyzing structural deformations, highlighting the importance
of continuous monitoring for ensuring structural integrity and public safety in
urban infrastructure. This approach represents a substantial advancement in
automated feature extraction and deformation analysis, contributing to more
effective management of urban infrastructure.
[COMMENTS]
I am requesting the withdrawal of my paper due to the need for
significant revisions to ensure the accuracy and integrity of the presented
findings
[LINK]
http://arxiv.org/abs/2501.06457v2
[DATE]
2025-01-15 00:58:26+08:00
[CATEGORIES]
cs.LG
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
[AUTHORS]
Jonathan Nöther, Adish Singla, Goran Radanović
[ABSTRACT]
Recent work has proposed automated red-teaming methods for testing the
vulnerabilities of a given target large language model (LLM). These methods use
red-teaming LLMs to uncover inputs that induce harmful behavior in a target
LLM. In this paper, we study red-teaming strategies that enable a targeted
security assessment. We propose an optimization framework for red-teaming with
proximity constraints, where the discovered prompts must be similar to
reference prompts from a given dataset. This dataset serves as a template for
the discovered prompts, anchoring the search for test-cases to specific topics,
writing styles, or types of harmful behavior. We show that established
auto-regressive model architectures do not perform well in this setting. We
therefore introduce a black-box red-teaming method inspired by text-diffusion
models: Diffusion for Auditing and Red-Teaming (DART). DART modifies the
reference prompt by perturbing it in the embedding space, directly controlling
the amount of change introduced. We systematically evaluate our method by
comparing its effectiveness with established methods based on model fine-tuning
and zero- and few-shot prompting. Our results show that DART is significantly
more effective at discovering harmful inputs in close proximity to the
reference prompt.
[COMMENTS]
This is an extended version of a paper published at AAAI 25
[LINK]
http://arxiv.org/abs/2501.08246v1
[DATE]
2025-01-15 00:32:01+08:00
[CATEGORIES]
cs.LG
Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps
[AUTHORS]
Kannan Parthasarathy, Karthik Vaidhyanathan, Rudra Dhar, Venkat Krishnamachari, Basil Muhammed, Adyansh Kakran, Sreemaee Akshathala, Shrikara Arun, Sumant Dubey, Mohan Veerubhotla, Amey Karan
[ABSTRACT]
Cloud Operations (CloudOps) is a rapidly growing field focused on the
automated management and optimization of cloud infrastructure which is
essential for organizations navigating increasingly complex cloud environments.
MontyCloud Inc. is one of the major companies in the CloudOps domain that
leverages autonomous bots to manage cloud compliance, security, and continuous
operations. To make the platform more accessible and effective to the
customers, we leveraged the use of GenAI.
Developing a GenAI-based solution for autonomous CloudOps for the existing
MontyCloud system presented us with various challenges such as i) diverse data
sources; ii) orchestration of multiple processes; and iii) handling complex
workflows to automate routine tasks. To this end, we developed MOYA, a
multi-agent framework that leverages GenAI and balances autonomy with the
necessary human control. This framework integrates various internal and
external systems and is optimized for factors like task orchestration,
security, and error mitigation while producing accurate, reliable, and relevant
insights by utilizing Retrieval Augmented Generation (RAG). Evaluations of our
multi-agent system with the help of practitioners as well as using automated
checks demonstrate enhanced accuracy, responsiveness, and effectiveness over
non-agentic approaches across complex workflows.
[COMMENTS]
The paper has been accepted as full paper to CAIN 2025
(https://conf.researchr.org/home/cain-2025), co-located with ICSE 2025
(https://conf.researchr.org/home/icse-2025). The paper was submitted to CAIN
for review on 9 November 2024
[LINK]
http://arxiv.org/abs/2501.08243v1
[DATE]
2025-01-15 00:30:10+08:00
[CATEGORIES]
cs.LG
A Feature-Level Ensemble Model for COVID-19 Identification in CXR Images using Choquet Integral and Differential Evolution Optimization
[AUTHORS]
Amir Reza Takhsha, Maryam Rastgarpour, Mozhgan Naderi
[ABSTRACT]
The COVID-19 pandemic has profoundly impacted billions globally. It
challenges public health and healthcare systems due to its rapid spread and
severe respiratory effects. An effective strategy to mitigate the COVID-19
pandemic involves integrating testing to identify infected individuals. While
RT-PCR is considered the gold standard for diagnosing COVID-19, it has some
limitations such as the risk of false negatives. To address this problem, this
paper introduces a novel Deep Learning Diagnosis System that integrates
pre-trained Deep Convolutional Neural Networks (DCNNs) within an ensemble
learning framework to achieve precise identification of COVID-19 cases from
Chest X-ray (CXR) images. We combine feature vectors from the final hidden
layers of pre-trained DCNNs using the Choquet integral to capture interactions
between different DCNNs that a linear approach cannot. We employed
Sugeno-$\lambda$ measure theory to derive fuzzy measures for subsets of
networks to enable aggregation. We utilized Differential Evolution to estimate
fuzzy densities. We developed a TensorFlow-based layer for Choquet operation to
facilitate efficient aggregation, due to the intricacies involved in
aggregating feature vectors. Experimental results on the COVIDx dataset show
that our ensemble model achieved 98\% accuracy in three-class classification
and 99.50\% in binary classification, outperforming its components-DenseNet-201
(97\% for three-class, 98.75\% for binary), Inception-v3 (96.25\% for
three-class, 98.50\% for binary), and Xception (94.50\% for three-class, 98\%
for binary)-and surpassing many previous methods.
[LINK]
http://arxiv.org/abs/2501.08241v1
[DATE]
2025-01-15 00:28:02+08:00
[CATEGORIES]
cs.LG
Privacy-Preserving Model and Preprocessing Verification for Machine Learning
[AUTHORS]
Wenbiao Li, Anisa Halimi, Xiaoqian Jiang, Jaideep Vaidya, Erman Ayday
[ABSTRACT]
This paper presents a framework for privacy-preserving verification of
machine learning models, focusing on models trained on sensitive data.
Integrating Local Differential Privacy (LDP) with model explanations from LIME
and SHAP, our framework enables robust verification without compromising
individual privacy. It addresses two key tasks: binary classification, to
verify if a target model was trained correctly by applying the appropriate
preprocessing steps, and multi-class classification, to identify specific
preprocessing errors. Evaluations on three real-world datasets-Diabetes, Adult,
and Student Record-demonstrate that while the ML-based approach is particularly
effective in binary tasks, the threshold-based method performs comparably in
multi-class tasks. Results indicate that although verification accuracy varies
across datasets and noise levels, the framework provides effective detection of
preprocessing errors, strong privacy guarantees, and practical applicability
for safeguarding sensitive data.
[LINK]
http://arxiv.org/abs/2501.08236v1
[DATE]
2025-01-15 00:21:54+08:00
[CATEGORIES]
cs.LG
Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning
[AUTHORS]
Enrique Adrian Villarrubia-Martin, Luis Rodriguez-Benitez, David Muñoz-Valero, Giovanni Montana, Luis Jimenez-Linares
[ABSTRACT]
This paper addresses a critical challenge in the high-speed passenger railway
industry: designing effective dynamic pricing strategies in the context of
competing and cooperating operators. To address this, a multi-agent
reinforcement learning (MARL) framework based on a non-zero-sum Markov game is
proposed, incorporating random utility models to capture passenger decision
making. Unlike prior studies in areas such as energy, airlines, and mobile
networks, dynamic pricing for railway systems using deep reinforcement learning
has received limited attention. A key contribution of this paper is a
parametrisable and versatile reinforcement learning simulator designed to model
a variety of railway network configurations and demand patterns while enabling
realistic, microscopic modelling of user behaviour, called RailPricing-RL. This
environment supports the proposed MARL framework, which models heterogeneous
agents competing to maximise individual profits while fostering cooperative
behaviour to synchronise connecting services. Experimental results validate the
framework, demonstrating how user preferences affect MARL performance and how
pricing policies influence passenger choices, utility, and overall system
dynamics. This study provides a foundation for advancing dynamic pricing
strategies in railway systems, aligning profitability with system-wide
efficiency, and supporting future research on optimising pricing policies.
[COMMENTS]
37 pages, 5 figures
[LINK]
http://arxiv.org/abs/2501.08234v1
[DATE]
2025-01-15 00:19:25+08:00
[CATEGORIES]
cs.LG
Efficient Deep Learning-based Forward Solvers for Brain Tumor Growth Models
[AUTHORS]
Zeineb Haouari, Jonas Weidner, Ivan Ezhov, Aswathi Varma, Daniel Rueckert, Bjoern Menze, Benedikt Wiestler
[ABSTRACT]
Glioblastoma, a highly aggressive brain tumor, poses major challenges due to
its poor prognosis and high morbidity rates. Partial differential
equation-based models offer promising potential to enhance therapeutic outcomes
by simulating patient-specific tumor behavior for improved radiotherapy
planning. However, model calibration remains a bottleneck due to the high
computational demands of optimization methods like Monte Carlo sampling and
evolutionary algorithms. To address this, we recently introduced an approach
leveraging a neural forward solver with gradient-based optimization to
significantly reduce calibration time. This approach requires a highly accurate
and fully differentiable forward model. We investigate multiple architectures,
including (i) an enhanced TumorSurrogate, (ii) a modified nnU-Net, and (iii) a
3D Vision Transformer (ViT). The optimized TumorSurrogate achieved the best
overall results, excelling in both tumor outline matching and voxel-level
prediction of tumor cell concentration. It halved the MSE relative to the
baseline model and achieved the highest Dice score across all tumor cell
concentration thresholds. Our study demonstrates significant enhancement in
forward solver performance and outlines important future research directions.
[LINK]
http://arxiv.org/abs/2501.08226v1
[DATE]
2025-01-15 00:10:25+08:00
[CATEGORIES]
cs.LG
Pareto Set Learning for Multi-Objective Reinforcement Learning
[AUTHORS]
Erlong Liu, Yu-Chang Wu, Xiaobin Huang, Chengrui Gao, Ren-Jian Wang, Ke Xue, Chao Qian
[ABSTRACT]
Multi-objective decision-making problems have emerged in numerous real-world
scenarios, such as video games, navigation and robotics. Considering the clear
advantages of Reinforcement Learning (RL) in optimizing decision-making
processes, researchers have delved into the development of Multi-Objective RL
(MORL) methods for solving multi-objective decision problems. However, previous
methods either cannot obtain the entire Pareto front, or employ only a single
policy network for all the preferences over multiple objectives, which may not
produce personalized solutions for each preference. To address these
limitations, we propose a novel decomposition-based framework for MORL, Pareto
Set Learning for MORL (PSL-MORL), that harnesses the generation capability of
hypernetwork to produce the parameters of the policy network for each
decomposition weight, generating relatively distinct policies for various
scalarized subproblems with high efficiency. PSL-MORL is a general framework,
which is compatible for any RL algorithm. The theoretical result guarantees the
superiority of the model capacity of PSL-MORL and the optimality of the
obtained policy network. Through extensive experiments on diverse benchmarks,
we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the
Pareto front, significantly outperforming state-of-the-art MORL methods in the
hypervolume and sparsity indicators.
[COMMENTS]
AAAI 2025 Accept
[LINK]
http://arxiv.org/abs/2501.06773v2
[DATE]
2025-01-15 00:08:28+08:00
[CATEGORIES]
cs.LG
Big Batch Bayesian Active Learning by Considering Predictive Probabilities
[AUTHORS]
Sebastian W. Ober, Samuel Power, Tom Diethe, Henry B. Moss
[ABSTRACT]
We observe that BatchBALD, a popular acquisition function for batch Bayesian
active learning for classification, can conflate epistemic and aleatoric
uncertainty, leading to suboptimal performance. Motivated by this observation,
we propose to focus on the predictive probabilities, which only exhibit
epistemic uncertainty. The result is an acquisition function that not only
performs better, but is also faster to evaluate, allowing for larger batches
than before.
[COMMENTS]
7 pages, 2 figures; presented as a lightning talk at the NeurIPS
Workshop on Bayesian Decision-making and Uncertainty (BDU; 2024)
[LINK]
http://arxiv.org/abs/2501.08223v1
[DATE]
2025-01-15 00:06:54+08:00
[CATEGORIES]
cs.LG
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings
[AUTHORS]
Paul Joe Maliakel, Shashikant Ilager, Ivona Brandic
[ABSTRACT]
Large language models (LLMs) have shown significant improvements in many
natural language processing (NLP) tasks, accelerating their rapid adoption
across many industries. These models are resource-intensive, requiring
extensive computational resources both during training and inference, leading
to increased energy consumption and negative environmental impact. As their
adoption accelerates, the sustainability of LLMs has become a critical issue,
necessitating strategies to optimize their runtime efficiency without
compromising performance. Hence, it is imperative to identify the parameters
that significantly influence the performance and energy efficiency of LLMs. To
that end, in this work, we investigate the effect of important parameters on
the performance and energy efficiency of LLMs during inference and examine
their trade-offs.
First, we analyze how different types of models with varying numbers of
parameters and architectures perform on tasks like text generation, question
answering, and summarization by benchmarking LLMs such as Falcon-7B,
Mistral-7B-v0.1, T5-3B, GPT-2, GPT-J-6B, and GPT-Neo-2.7B. Second, we study
input and output sequence characteristics such as sequence length concerning
energy consumption, performance, and throughput. Finally, we explore the impact
of hardware-based power-saving techniques, i.e., Dynamic Voltage Frequency
Scaling (DVFS), on the models’ latency and energy efficiency. Our extensive
benchmarking and statistical analysis reveal many interesting findings,
uncovering how specific optimizations can reduce energy consumption while
maintaining throughput and accuracy. This study provides actionable insights
for researchers and practitioners to design energy-efficient LLM inference
systems.
[LINK]
http://arxiv.org/abs/2501.08219v1
[DATE]
2025-01-15 00:02:33+08:00
[CATEGORIES]
cs.LG
Logic Augmented Generation
[AUTHORS]
Aldo Gangemi, Andrea Giovanni Nuzzolese
[ABSTRACT]
Semantic Knowledge Graphs (SKG) face challenges with scalability,
flexibility, contextual understanding, and handling unstructured or ambiguous
information. However, they offer formal and structured knowledge enabling
highly interpretable and reliable results by means of reasoning and querying.
Large Language Models (LLMs) overcome those limitations making them suitable in
open-ended tasks and unstructured environments. Nevertheless, LLMs are neither
interpretable nor reliable. To solve the dichotomy between LLMs and SKGs we
envision Logic Augmented Generation (LAG) that combines the benefits of the two
worlds. LAG uses LLMs as Reactive Continuous Knowledge Graphs that can generate
potentially infinite relations and tacit knowledge on-demand. SKGs are key for
injecting a discrete heuristic dimension with clear logical and factual
boundaries. We exemplify LAG in two tasks of collective intelligence, i.e.,
medical diagnostics and climate projections. Understanding the properties and
limitations of LAG, which are still mostly unknown, is of utmost importance for
enabling a variety of tasks involving tacit knowledge in order to provide
interpretable and effective results.
[COMMENTS]
10 pages, 2 figures
[LINK]
http://arxiv.org/abs/2411.14012v2
[DATE]
2025-01-14 23:58:02+08:00
[CATEGORIES]
cs.CL
ASTRID – An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
[AUTHORS]
Mohita Chowdhury, Yajie Vera He, Aisling Higham, Ernest Lim
[ABSTRACT]
Large Language Models (LLMs) have shown impressive potential in clinical
question answering (QA), with Retrieval Augmented Generation (RAG) emerging as
a leading approach for ensuring the factual accuracy of model responses.
However, current automated RAG metrics perform poorly in clinical and
conversational use cases. Using clinical human evaluations of responses is
expensive, unscalable, and not conducive to the continuous iterative
development of RAG systems. To address these challenges, we introduce ASTRID -
an Automated and Scalable TRIaD for evaluating clinical QA systems leveraging
RAG - consisting of three metrics: Context Relevance (CR), Refusal Accuracy
(RA), and Conversational Faithfulness (CF). Our novel evaluation metric, CF, is
designed to better capture the faithfulness of a model’s response to the
knowledge base without penalising conversational elements. To validate our
triad, we curate a dataset of over 200 real-world patient questions posed to an
LLM-based QA agent during surgical follow-up for cataract surgery - the highest
volume operation in the world - augmented with clinician-selected questions for
emergency, clinical, and non-clinical out-of-domain scenarios. We demonstrate
that CF can predict human ratings of faithfulness better than existing
definitions for conversational use cases. Furthermore, we show that evaluation
using our triad consisting of CF, RA, and CR exhibits alignment with clinician
assessment for inappropriate, harmful, or unhelpful responses. Finally, using
nine different LLMs, we demonstrate that the three metrics can closely agree
with human evaluations, highlighting the potential of these metrics for use in
LLM-driven automated evaluation pipelines. We also publish the prompts and
datasets for these experiments, providing valuable resources for further
research and development.
[COMMENTS]
29 pages
[LINK]
http://arxiv.org/abs/2501.08208v1
[DATE]
2025-01-14 23:46:39+08:00
[CATEGORIES]
cs.CL
Personalized LLM Response Generation with Parameterized Memory Injection
[AUTHORS]
Kai Zhang, Yejin Kim, Xiaozhong Liu
[ABSTRACT]
Large Language Models (LLMs) have exhibited remarkable proficiency in
comprehending and generating natural language. On the other hand, personalized
LLM response generation holds the potential to offer substantial benefits for
individuals in critical areas such as medical. Existing research has explored
memory-augmented methods to prompt the LLM with pre-stored user-specific
knowledge for personalized response generation in terms of new queries. We
contend that such paradigm is unable to perceive fine-granularity information.
In this study, we propose a novel \textbf{M}emory-\textbf{i}njected approach
using parameter-efficient fine-tuning (PEFT) and along with a Bayesian
Optimisation searching strategy to achieve \textbf{L}LM
\textbf{P}ersonalization(\textbf{MiLP}).
[LINK]
http://arxiv.org/abs/2404.03565v3
[DATE]
2025-01-14 23:30:50+08:00
[CATEGORIES]
cs.CL
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
[AUTHORS]
Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, Baishakhi Ray
[ABSTRACT]
Large Language Models (LLMs) have significantly aided developers by
generating or assisting in code writing, enhancing productivity across various
tasks. While identifying incorrect code is often straightforward, detecting
vulnerabilities in functionally correct code is more challenging, especially
for developers with limited security knowledge, which poses considerable
security risks of using LLM-generated code and underscores the need for robust
evaluation benchmarks that assess both functional correctness and security.
Current benchmarks like CyberSecEval and SecurityEval attempt to solve it but
are hindered by unclear and impractical specifications, failing to assess both
functionality and security accurately. To tackle these deficiencies, we
introduce CWEval, a novel outcome-driven evaluation framework designed to
enhance the evaluation of secure code generation by LLMs. This framework not
only assesses code functionality but also its security simultaneously with
high-quality task specifications and outcome-driven test oracles which provides
high accuracy. Coupled with CWEval-bench, a multilingual, security-critical
coding benchmark, CWEval provides a rigorous empirical security evaluation on
LLM-generated code, overcoming previous benchmarks’ shortcomings. Through our
evaluations, CWEval reveals a notable portion of functional but insecure code
produced by LLMs, and shows a serious inaccuracy of previous evaluations,
ultimately contributing significantly to the field of secure code generation.
We open-source our artifact at: https://github.com/Co1lin/CWEval .
[COMMENTS]
to be published in LLM4Code 2025
[LINK]
http://arxiv.org/abs/2501.08200v1
[DATE]
2025-01-14 23:27:01+08:00
[CATEGORIES]
cs.CL
cs.LG
WebWalker: Benchmarking LLMs in Web Traversal
[AUTHORS]
Jialong Wu, Wenbiao Yin, Yong Jiang, Zhenglin Wang, Zekun Xi, Runnan Fang, Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang
[ABSTRACT]
Retrieval-augmented generation (RAG) demonstrates remarkable performance
across tasks in open-domain question-answering. However, traditional search
engines may retrieve shallow content, limiting the ability of LLMs to handle
complex, multi-layered information. To address it, we introduce WebWalkerQA, a
benchmark designed to assess the ability of LLMs to perform web traversal. It
evaluates the capacity of LLMs to traverse a website’s subpages to extract
high-quality data systematically. We propose WebWalker, which is a multi-agent
framework that mimics human-like web navigation through an explore-critic
paradigm. Extensive experimental results show that WebWalkerQA is challenging
and demonstrates the effectiveness of RAG combined with WebWalker, through the
horizontal and vertical integration in real-world scenarios.
[LINK]
http://arxiv.org/abs/2501.07572v2
[DATE]
2025-01-14 23:06:56+08:00
[CATEGORIES]
cs.CL
Inductive Learning of Logical Theories with LLMs: An Expressivity-Graded Analysis
[AUTHORS]
João Pedro Gandarela, Danilo S. Carvalho, André Freitas
[ABSTRACT]
This work presents a novel systematic methodology to analyse the capabilities
and limitations of Large Language Models (LLMs) with feedback from a formal
inference engine, on logic theory induction. The analysis is complexity-graded
w.r.t. rule dependency structure, allowing quantification of specific inference
challenges on LLM performance. Integrating LLMs with formal methods is a
promising frontier in the Natural Language Processing field, as an important
avenue for improving model inference control and explainability. In particular,
inductive learning over complex sets of facts and rules, poses unique
challenges for current autoregressive models, as they lack explicit symbolic
grounding. While they can be complemented by formal systems, the properties
delivered by LLMs regarding inductive learning, are not well understood and
quantified. Empirical results indicate that the largest LLMs can achieve
competitive results against a SOTA Inductive Logic Programming (ILP) system
baseline, but also that tracking long predicate relationship chains is a more
difficult obstacle than theory complexity for LLMs.
[LINK]
http://arxiv.org/abs/2408.16779v2
[DATE]
2025-01-14 22:26:03+08:00
[CATEGORIES]
cs.CL
In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR
[AUTHORS]
Markus J. Buehler
[ABSTRACT]
The pursuit of automated scientific discovery has fueled progress from
symbolic logic to modern AI, forging new frontiers in reasoning and pattern
recognition. Transformers function as potential systems, where every possible
relationship remains latent potentiality until tasks impose constraints, akin
to measurement. Yet, refining their sampling requires more than probabilistic
selection: solutions must conform to specific structures or rules, ensuring
consistency and the invocation of general principles. We present
Graph-PReFLexOR (Graph-based Preference-based Recursive Language Modeling for
Exploratory Optimization of Reasoning), a framework that combines graph
reasoning with symbolic abstraction to dynamically expand domain knowledge.
Inspired by reinforcement learning, Graph-PReFLexOR defines reasoning as a
structured mapping, where tasks yield knowledge graphs, abstract patterns, and
ultimately, final answers. Inspired by category theory, it encodes concepts as
nodes and their relationships as edges, supporting hierarchical inference and
adaptive learning through isomorphic representations. Demonstrations include
hypothesis generation, materials design, and creative reasoning, such as
discovering relationships between mythological concepts like ‘thin places’ with
materials science. We propose a ‘knowledge garden growth’ strategy that
integrates insights across domains, promoting interdisciplinary connections.
Results with a 3-billion-parameter Graph-PReFLexOR model show superior
reasoning depth and adaptability, underscoring the potential for transparent,
multidisciplinary AI-driven discovery. It lays the groundwork for general
autonomous reasoning solutions.
[LINK]
http://arxiv.org/abs/2501.08120v1
[DATE]
2025-01-14 21:52:41+08:00
[CATEGORIES]
cs.CL
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
[AUTHORS]
Chang Gao, Wenxuan Zhang, Guizhen Chen, Wai Lam
[ABSTRACT]
Instruction tuning is vital for enhancing the performance of large language
models (LLMs), but existing text-to-text methods, referred to as TextTuning,
struggle with issues such as generalization, robustness, and controllability
due to their lack of explicit task structures. We introduce JsonTuning, a
structure-to-structure approach that uses JSON structures to represent tasks.
This method improves generalization by clarifying task elements and their
relations, boosts robustness by minimizing ambiguity, and enhances
controllability by allowing precise control over outputs. We conduct an
extensive comparative analysis between JsonTuning and TextTuning using various
language models and benchmarks. Our findings reveal that JsonTuning
consistently surpasses TextTuning in terms of performance, robustness, and
controllability across different scenarios. By overcoming the limitations of
TextTuning, JsonTuning demonstrates significant potential for developing more
effective and reliable LLMs capable of handling diverse scenarios.
[LINK]
http://arxiv.org/abs/2310.02953v4
[DATE]
2025-01-14 20:55:27+08:00
[CATEGORIES]
cs.CL
Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification
[AUTHORS]
Hui Lee, Singh Suniljit, Yong Siang Ong
[ABSTRACT]
This paper explores the development of a multimodal sentiment analysis model
that integrates text, audio, and visual data to enhance sentiment
classification. The goal is to improve emotion detection by capturing the
complex interactions between these modalities, thereby enabling more accurate
and nuanced sentiment interpretation. The study evaluates three feature fusion
strategies – late stage fusion, early stage fusion, and multi-headed attention
– within a transformer-based architecture. Experiments were conducted using
the CMU-MOSEI dataset, which includes synchronized text, audio, and visual
inputs labeled with sentiment scores. Results show that early stage fusion
significantly outperforms late stage fusion, achieving an accuracy of 71.87\%,
while the multi-headed attention approach offers marginal improvement, reaching
72.39\%. The findings suggest that integrating modalities early in the process
enhances sentiment classification, while attention mechanisms may have limited
impact within the current framework. Future work will focus on refining feature
fusion techniques, incorporating temporal data, and exploring dynamic feature
weighting to further improve model performance.
[LINK]
http://arxiv.org/abs/2501.08085v1
[DATE]
2025-01-14 20:54:19+08:00
[CATEGORIES]
cs.CL
cs.LG
Optimizing Speech Multi-View Feature Fusion through Conditional Computation
[AUTHORS]
Weiqiao Shan, Yuhao Zhang, Yuchen Han, Bei Li, Xiaofeng Zhao, Yuang Li, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu
[ABSTRACT]
Recent advancements have highlighted the efficacy of self-supervised learning
(SSL) features in various speech-related tasks, providing lightweight and
versatile multi-view speech representations. However, our study reveals that
while SSL features expedite model convergence, they conflict with traditional
spectral features like FBanks in terms of update directions. In response, we
propose a novel generalized feature fusion framework grounded in conditional
computation, featuring a gradient-sensitive gating network and a multi-stage
dropout strategy. This framework mitigates feature conflicts and bolsters model
robustness to multi-view input features. By integrating SSL and spectral
features, our approach accelerates convergence and maintains performance on par
with spectral models across multiple speech translation tasks on the MUSTC
dataset.
[COMMENTS]
ICASSP 2025
[LINK]
http://arxiv.org/abs/2501.08057v1
[DATE]
2025-01-14 20:12:06+08:00
[CATEGORIES]
cs.CL
TreeKV: Smooth Key-Value Cache Compression with Tree Structures
[AUTHORS]
Ziwei He, Jian Yuan, Haoli Bai, Jingwen Leng, Bo Jiang
[ABSTRACT]
Efficient key-value (KV) cache compression is critical for scaling
transformer-based Large Language Models (LLMs) in long sequences and
resource-limited settings. Existing methods evict tokens based on their
positions or importance scores, but position-based strategies can miss crucial
information outside predefined regions, while those relying on global
importance scores resulting in strong regional biases, limiting the KV cache’s
overall context retention and potentially impairing the performance of LLMs on
complex tasks. Our wavelet analysis reveals that as tokens approach the end of
sequence, their contributions to generation gradually increase and tends to
diverge more from neighboring tokens, indicating a smooth transition with
increasing complexity and variability from distant to nearby context. Motivated
by this observation, we propose TreeKV, an intuitive, training-free method that
employs a tree structure for smooth cache compression. TreeKV maintains a fixed
cache size, allowing LLMs to deliver high-quality output even in long text
scenarios. Unlike most compression methods, TreeKV is applicable to both the
generation and prefilling stages. TreeKV consistently surpasses all baseline
models in language modeling tasks on PG19 and OpenWebText2, allowing LLMs
trained with short context window to generalize to longer window with a 16x
cache reduction. On the Longbench benchmark, TreeKV achieves the best
performance with only 6\% of the budget at optimal efficiency.
[LINK]
http://arxiv.org/abs/2501.04987v2
[DATE]
2025-01-14 20:06:33+08:00
[CATEGORIES]
cs.CL
Exploring Narrative Clustering in Large Language Models: A Layerwise Analysis of BERT
[AUTHORS]
Awritrojit Banerjee, Achim Schilling, Patrick Krauss
[ABSTRACT]
This study investigates the internal mechanisms of BERT, a transformer-based
large language model, with a focus on its ability to cluster narrative content
and authorial style across its layers. Using a dataset of narratives developed
via GPT-4, featuring diverse semantic content and stylistic variations, we
analyze BERT’s layerwise activations to uncover patterns of localized neural
processing. Through dimensionality reduction techniques such as Principal
Component Analysis (PCA) and Multidimensional Scaling (MDS), we reveal that
BERT exhibits strong clustering based on narrative content in its later layers,
with progressively compact and distinct clusters. While strong stylistic
clustering might occur when narratives are rephrased into different text types
(e.g., fables, sci-fi, kids’ stories), minimal clustering is observed for
authorial style specific to individual writers. These findings highlight BERT’s
prioritization of semantic content over stylistic features, offering insights
into its representational capabilities and processing hierarchy. This study
contributes to understanding how transformer models like BERT encode linguistic
information, paving the way for future interdisciplinary research in artificial
intelligence and cognitive neuroscience.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2408.03062,
arXiv:2408.04270, arXiv:2307.01577
[LINK]
http://arxiv.org/abs/2501.08053v1
[DATE]
2025-01-14 20:01:54+08:00
[CATEGORIES]
cs.CL
READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data
[AUTHORS]
Rohit Sharma, Shanu Kumar, Avinash Kumar
[ABSTRACT]
Pre-trained transformer models such as BERT have shown massive gains across
many text classification tasks. However, these models usually need enormous
labeled data to achieve impressive performances. Obtaining labeled data is
often expensive and time-consuming, whereas collecting unlabeled data using
some heuristics is relatively much cheaper for any task. Therefore, this paper
proposes a method that encapsulates reinforcement learning-based text
generation and semi-supervised adversarial learning approaches in a novel way
to improve the model’s performance. Our method READ, Reinforcement-based
Adversarial learning, utilizes an unlabeled dataset to generate diverse
synthetic text through reinforcement learning, improving the model’s
generalization capability using adversarial learning. Our experimental results
show that READ outperforms the existing state-of-art methods on multiple
datasets.
[LINK]
http://arxiv.org/abs/2501.08035v1
[DATE]
2025-01-14 19:39:55+08:00
[CATEGORIES]
cs.CL
AdaptVC: High Quality Voice Conversion with Adaptive Learning
[AUTHORS]
Jaehun Kim, Ji-Hoon Kim, Yeunju Choi, Tan Dat Nguyen, Seongkyu Mun, Joon Son Chung
[ABSTRACT]
The goal of voice conversion is to transform the speech of a source speaker
to sound like that of a reference speaker while preserving the original
content. A key challenge is to extract disentangled linguistic content from the
source and voice style from the reference. While existing approaches leverage
various methods to isolate the two, a generalization still requires further
attention, especially for robustness in zero-shot scenarios. In this paper, we
achieve successful disentanglement of content and speaker features by tuning
self-supervised speech features with adapters. The adapters are trained to
dynamically encode nuanced features from rich self-supervised features, and the
decoder fuses them to produce speech that accurately resembles the reference
with minimal loss of content. Moreover, we leverage a conditional flow matching
decoder with cross-attention speaker conditioning to further boost the
synthesis quality and efficiency. Subjective and objective evaluations in a
zero-shot scenario demonstrate that the proposed method outperforms existing
models in speech quality and similarity to the reference speech.
[COMMENTS]
ICASSP 2025; demo available https://mm.kaist.ac.kr/projects/AdaptVC
[LINK]
http://arxiv.org/abs/2501.01347v4
[DATE]
2025-01-14 19:36:42+08:00
[CATEGORIES]
cs.CL
Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey
[AUTHORS]
Hamza Kheddar
[ABSTRACT]
With significant advancements in Transformers LLMs, NLP has extended its
reach into many research fields due to its enhanced capabilities in text
generation and user interaction. One field benefiting greatly from these
advancements is cybersecurity. In cybersecurity, many parameters that need to
be protected and exchanged between senders and receivers are in the form of
text and tabular data, making NLP a valuable tool in enhancing the security
measures of communication protocols. This survey paper provides a comprehensive
analysis of the utilization of Transformers and LLMs in cyber-threat detection
systems. The methodology of paper selection and bibliometric analysis is
outlined to establish a rigorous framework for evaluating existing research.
The fundamentals of Transformers are discussed, including background
information on various cyber-attacks and datasets commonly used in this field.
The survey explores the application of Transformers in IDSs, focusing on
different architectures such as Attention-based models, LLMs like BERT and GPT,
CNN/LSTM-Transformer hybrids, emerging approaches like ViTs, among others.
Furthermore, it explores the diverse environments and applications where
Transformers and LLMs-based IDS have been implemented, including computer
networks, IoT devices, critical infrastructure protection, cloud computing,
SDN, as well as in autonomous vehicles. The paper also addresses research
challenges and future directions in this area, identifying key issues such as
interpretability, scalability, and adaptability to evolving threats, and more.
Finally, the conclusion summarizes the findings and highlights the significance
of Transformers and LLMs in enhancing cyber-threat detection capabilities,
while also outlining potential avenues for further research and development.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2405.04760 by other authors
[LINK]
http://arxiv.org/abs/2408.07583v2
[DATE]
2025-01-14 18:52:15+08:00
[CATEGORIES]
cs.CL
TriAdaptLoRA: Brain-Inspired Triangular Adaptive Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
[AUTHORS]
Yao Liang, Yuwei Wang, Yi Zeng
[ABSTRACT]
The fine-tuning of Large Language Models (LLMs) is pivotal for achieving
optimal performance across diverse downstream tasks. However, while full
fine-tuning delivers superior results, it entails significant computational and
resource costs. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA,
address these challenges by reducing the number of trainable parameters, but
they often struggle with rank adjustment efficiency and task-specific
adaptability. We propose Triangular Adaptive Low-Rank Adaptation
(TriAdaptLoRA), a novel PEFT framework inspired by neuroscience principles,
which dynamically optimizes the allocation of trainable parameters.
TriAdaptLoRA introduces three key innovations: 1) a triangular split of
transformation matrices into lower and upper triangular components to maximize
parameter utilization, 2) a parameter importance metric based on normalized
Frobenius norms for efficient adaptation, and 3) an adaptive rank-growth
strategy governed by dynamic thresholds, allowing flexible parameter allocation
across training steps. Experiments conducted on a variety of natural language
understanding and generation tasks demonstrate that TriAdaptLoRA consistently
outperforms existing PEFT methods. It achieves superior performance, enhanced
stability, and reduced computational overhead, particularly under linear
threshold-driven rank growth. These results highlight its efficacy as a
scalable and resource-efficient solution for fine-tuning LLMs.
[LINK]
http://arxiv.org/abs/2501.08008v1
[DATE]
2025-01-14 18:51:31+08:00
[CATEGORIES]
cs.CL
Formalising lexical and syntactic diversity for data sampling in French
[AUTHORS]
Louis Estève, Manon Scholivet, Agata Savary
[ABSTRACT]
Diversity is an important property of datasets and sampling data for
diversity is useful in dataset creation. Finding the optimally diverse sample
is expensive, we therefore present a heuristic significantly increasing
diversity relative to random sampling. We also explore whether different kinds
of diversity – lexical and syntactic – correlate, with the purpose of
sampling for expensive syntactic diversity through inexpensive lexical
diversity. We find that correlations fluctuate with different datasets and
versions of diversity measures. This shows that an arbitrarily chosen measure
may fall short of capturing diversity-related properties of datasets.
[LINK]
http://arxiv.org/abs/2501.08003v1
[DATE]
2025-01-14 18:47:33+08:00
[CATEGORIES]
cs.CL
Gandalf the Red: Adaptive Security for LLMs
[AUTHORS]
Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Natalie Wu, Mateo Rojas-Carulla
[ABSTRACT]
Current evaluations of defenses against prompt attacks in large language
model (LLM) applications often overlook two critical factors: the dynamic
nature of adversarial behavior and the usability penalties imposed on
legitimate users by restrictive defenses. We propose D-SEC (Dynamic Security
Utility Threat Model), which explicitly separates attackers from legitimate
users, models multi-step interactions, and rigorously expresses the
security-utility in an optimizable form. We further address the shortcomings in
existing evaluations by introducing Gandalf, a crowd-sourced, gamified
red-teaming platform designed to generate realistic, adaptive attack datasets.
Using Gandalf, we collect and release a dataset of 279k prompt attacks.
Complemented by benign user data, our analysis reveals the interplay between
security and utility, showing that defenses integrated in the LLM (e.g., system
prompts) can degrade usability even without blocking requests. We demonstrate
that restricted application domains, defense-in-depth, and adaptive defenses
are effective strategies for building secure and useful LLM applications. Code
is available at
\href{https://github.com/lakeraai/dsec-gandalf}{\texttt{https://github.com/lakeraai/dsec-gandalf}}.
[COMMENTS]
Niklas Pfister, V'aclav Volhejn and Manuel Knott contributed equally
[LINK]
http://arxiv.org/abs/2501.07927v1
[DATE]
2025-01-14 16:30:49+08:00
[CATEGORIES]
cs.LG
cs.CL
Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques
[AUTHORS]
Aziida Nanyonga, Hassan Wasswa, Ugur Turhan, Keith Joiner, Graham Wild
[ABSTRACT]
Aviation safety is a global concern, requiring detailed investigations into
incidents to understand contributing factors comprehensively. This study uses
the National Transportation Safety Board (NTSB) dataset. It applies advanced
natural language processing (NLP) techniques, including Latent Dirichlet
Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic
Analysis (LSA), Probabilistic Latent Semantic Analysis (pLSA), and K-means
clustering. The main objectives are identifying latent themes, exploring
semantic relationships, assessing probabilistic connections, and cluster
incidents based on shared characteristics. This research contributes to
aviation safety by providing insights into incident narratives and
demonstrating the versatility of NLP and topic modelling techniques in
extracting valuable information from complex datasets. The results, including
topics identified from various techniques, provide an understanding of
recurring themes. Comparative analysis reveals that LDA performed best with a
coherence value of 0.597, pLSA of 0.583, LSA of 0.542, and NMF of 0.437.
K-means clustering further reveals commonalities and unique insights into
incident narratives. In conclusion, this study uncovers latent patterns and
thematic structures within incident narratives, offering a comparative analysis
of multiple-topic modelling techniques. Future research avenues include
exploring temporal patterns, incorporating additional datasets, and developing
predictive models for early identification of safety issues. This research lays
the groundwork for enhancing the understanding and improvement of aviation
safety by utilising the wealth of information embedded in incident narratives.
[LINK]
http://arxiv.org/abs/2501.07924v1
[DATE]
2025-01-14 16:23:15+08:00
[CATEGORIES]
cs.CL
Aviation Safety Enhancement via NLP & Deep Learning: Classifying Flight Phases in ATSB Safety Reports
[AUTHORS]
Aziida Nanyonga, Hassan Wasswa, Graham Wild
[ABSTRACT]
Aviation safety is paramount, demanding precise analysis of safety
occurrences during different flight phases. This study employs Natural Language
Processing (NLP) and Deep Learning models, including LSTM, CNN, Bidirectional
LSTM (BLSTM), and simple Recurrent Neural Networks (sRNN), to classify flight
phases in safety reports from the Australian Transport Safety Bureau (ATSB).
The models exhibited high accuracy, precision, recall, and F1 scores, with LSTM
achieving the highest performance of 87%, 88%, 87%, and 88%, respectively. This
performance highlights their effectiveness in automating safety occurrence
analysis. The integration of NLP and Deep Learning technologies promises
transformative enhancements in aviation safety analysis, enabling targeted
safety measures and streamlined report handling.
[COMMENTS]
NLP, Aviation Safety, ATSB, Deep learning, Flight phase. arXiv admin
note: substantial text overlap with arXiv:2501.01694
[LINK]
http://arxiv.org/abs/2501.07923v1
[DATE]
2025-01-14 16:18:41+08:00
[CATEGORIES]
cs.LG
cs.CL
MoPE: Mixture of Prompt Experts for Parameter-Efficient and Scalable Multimodal Fusion
[AUTHORS]
Ruixiang Jiang, Lingbo Liu, Changwen Chen
[ABSTRACT]
Despite the demonstrated parameter efficiency of prompt-based multimodal
fusion methods, their limited adaptivity and expressiveness often result in
suboptimal performance compared to other tuning approaches. In this paper, we
introduce the Mixture of Prompt Experts (MoPE), the first technique designed to
overcome these limitations by decomposing standard prompts to capture
instance-level features adaptively. Building on this decomposition, MoPE
enhances prompt fusion’s expressiveness by leveraging multimodal pairing priors
to route the most effective prompt for each instance dynamically. Compared to
vanilla prompting, our MoPE-based fusion method exhibits greater
expressiveness, scaling more effectively with the training data and the overall
number of trainable parameters. We also investigate regularization terms for
expert routing, which lead to emergent expert specialization with enhanced
adaptiveness and interpretablity. Extensive experiments across six multimodal
datasets spanning four modalities demonstrate state-of-the-art performance for
prompt fusion, matching or even surpassing the performance of fine-tuning while
requiring only 0.8% of the trainable parameters. Project homepage:
https://github.com/songrise/MoPE
[COMMENTS]
Under Review, Extended version of arxiv:2312.03734
[LINK]
http://arxiv.org/abs/2403.10568v3
[DATE]
2025-01-14 16:01:17+08:00
[CATEGORIES]
cs.LG
cs.CL
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
[AUTHORS]
Bo Yang, Qingping Yang, Yingwei Ma, Runtao Liu
[ABSTRACT]
The evaluation of mathematical reasoning capabilities is essential for
advancing Artificial General Intelligence (AGI). While Large Language Models
(LLMs) have shown impressive performance in solving mathematical problems,
existing benchmarks such as GSM8K and MATH present limitations, including
narrow problem definitions with specific numbers and reliance on predetermined
rules that hinder accurate assessments of reasoning and generality. This paper
introduces the UTMath Benchmark, a robust evaluation framework designed to
assess LLMs through extensive unit tests, with a focus on both the accuracy and
generality of model responses. It comprises 1,053 cutting-edge problems
spanning nine mathematical domains, with an average of 68 test cases per
problem. UTMath is highly challenging, with the best-performing model, o1-mini,
solving only 32.57\% of the problems, followed by o1-preview at 27.16\%, and
GPT-4o at 26.93\%. Furthermore, we present the Reasoning-to-Coding of Thoughts
(RCoT) approach, which encourages LLMs to engage in explicit reasoning prior to
code generation, thereby facilitating the production of more sophisticated
solutions and enhancing overall performance and efficiency. Additionally, we
also release the UTMath-Train training dataset (more than 70k samples), to
support the community in further exploring mathematical reasoning. Our
benchmark can be accessed via the following link:
https://github.com/UTMathGroup/UTMath
[LINK]
http://arxiv.org/abs/2411.07240v2
[DATE]
2025-01-14 15:57:26+08:00
[CATEGORIES]
cs.CL
GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism
[AUTHORS]
Chen Tang, Bo Lv, Zifan Zheng, Bohao Yang, Kun Zhao, Ning Liao, Xiaoxing Wang, Feiyu Xiong, Zhiyu Li, Nayu Liu, Jingchi Jiang
[ABSTRACT]
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple
smaller expert models as opposed to a single large network. However, these
experts typically operate independently, leaving a question open about whether
interconnecting these models could enhance the performance of MoE networks. In
response, we introduce GRAPHMOE, a novel method aimed at augmenting the
cognitive depth of language models via a self-rethinking mechanism constructed
on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to
simulate iterative thinking steps, thereby facilitating the flow of information
among expert nodes. We implement the GRAPHMOE architecture using Low-Rank
Adaptation techniques (LoRA) and conduct extensive experiments on various
benchmark datasets. The experimental results reveal that GRAPHMOE outperforms
other LoRA based models, achieving state-of-the-art (SOTA) performance.
Additionally, this study explores a novel recurrent routing strategy that may
inspire further advancements in enhancing the reasoning capabilities of
language models.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2501.07890v1
[DATE]
2025-01-14 14:59:51+08:00
[CATEGORIES]
cs.CL
What Makes Cryptic Crosswords Challenging for LLMs?
[AUTHORS]
Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar
[COMMENTS]
COLING 2025. arXiv admin note: text overlap with arXiv:2403.12094
[LINK]
http://arxiv.org/abs/2412.09012v2
[DATE]
2025-01-14 14:06:54+08:00
[CATEGORIES]
cs.CL
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding
[AUTHORS]
Zhongxiang Sun, Qipeng Wang, Weijie Yu, Xiaoxue Zang, Kai Zheng, Jun Xu, Xiao Zhang, Song Yang, Han Li
[ABSTRACT]
Retrieval-Augmented Generation (RAG) systems for Large Language Models (LLMs)
hold promise in knowledge-intensive tasks but face limitations in complex
multi-step reasoning. While recent methods have integrated RAG with
chain-of-thought reasoning or test-time search using Process Reward Models
(PRMs), these approaches encounter challenges such as a lack of explanations,
bias in PRM training data, early-step bias in PRM scores, and insufficient
post-training optimization of reasoning potential. To address these issues, we
propose Retrieval-Augmented Reasoning through Trustworthy Process Rewarding
(ReARTeR), a framework that enhances RAG systems’ reasoning capabilities
through post-training and test-time scaling. At test time, ReARTeR introduces
Trustworthy Process Rewarding via a Process Reward Model for accurate scalar
scoring and a Process Explanation Model (PEM) for generating natural language
explanations, enabling step refinement. During post-training, it utilizes Monte
Carlo Tree Search guided by Trustworthy Process Rewarding to collect
high-quality step-level preference data, optimized through Iterative Preference
Optimization. ReARTeR addresses three core challenges: (1) misalignment between
PRM and PEM, tackled through off-policy preference learning; (2) bias in PRM
training data, mitigated by balanced annotation methods and stronger
annotations for challenging examples; and (3) early-step bias in PRM, resolved
through a temporal-difference-based look-ahead search strategy. Experimental
results on multi-step reasoning benchmarks demonstrate significant
improvements, underscoring ReARTeR’s potential to advance the reasoning
capabilities of RAG systems.
[COMMENTS]
11 pages, 5 figures
[LINK]
http://arxiv.org/abs/2501.07861v1
[DATE]
2025-01-14 13:56:26+08:00
[CATEGORIES]
cs.CL
Multi-matrix Factorization Attention
[AUTHORS]
Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, Daxin Jiang
[ABSTRACT]
We propose novel attention architectures, Multi-matrix Factorization
Attention (MFA) and MFA-Key-Reuse (MFA-KR). Existing variants for standard
Multi-Head Attention (MHA), including SOTA methods like MLA, fail to maintain
as strong performance under stringent Key-Value cache (KV cache) constraints.
MFA enhances model capacity by efficiently scaling up both the number and
dimension of attention heads through low-rank matrix factorization in the
Query-Key (QK) circuit. Extending MFA, MFA-KR further reduces memory
requirements by repurposing the key cache as value through value projection
re-parameterization. MFA’s design enables strong model capacity when working
under tight KV cache budget, while MFA-KR is suitable for even harsher KV cache
limits with minor performance trade-off. Notably, in our extensive and
large-scale experiments, the proposed architecture outperforms MLA and performs
comparably to MHA, while reducing KV cache usage by up to 56% and 93.7%,
respectively.
[LINK]
http://arxiv.org/abs/2412.19255v2
[DATE]
2025-01-14 13:48:07+08:00
[CATEGORIES]
cs.LG
cs.CL
Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques
[AUTHORS]
Shobhit Ratan, Farley Knight, Ghada Jerfel, Sze Chung Ho
[ABSTRACT]
This study explores the fine-tuning (FT) of the Open Pre-trained Transformer
(OPT-125M) for grammatical acceptability tasks using the CoLA dataset. By
comparing Vanilla-Fine-Tuning (VFT), Pattern-Based-Fine-Tuning (PBFT), and
Parameter-Efficient Fine-Tuning techniques (PEFT) like Low-Rank Adaptation
(LoRA), we demonstrate significant improvements in computational efficiency
while maintaining high accuracy. Our experiments reveal that while VFT achieves
the highest accuracy (81.2%), LoRA enhancing FT by reducing memory usage and
iteration time by more than 50%, and increases accuracy in PBFT case. Context
Distillation (CD), though computationally efficient, underperformed with
accuracy around 31%. Our findings contribute to democratizing access to large
language models (LLM) by reducing computational barriers.
[LINK]
http://arxiv.org/abs/2501.07853v1
[DATE]
2025-01-14 13:41:09+08:00
[CATEGORIES]
cs.CL
Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning
[AUTHORS]
Haoyu Han, Yaochen Xie, Hui Liu, Xianfeng Tang, Sreyashi Nag, William Headden, Hui Liu, Yang Li, Chen Luo, Shuiwang Ji, Qi He, Jiliang Tang
[ABSTRACT]
Large language models (LLMs) have demonstrated remarkable success across a
wide range of tasks; however, they still encounter challenges in reasoning
tasks that require understanding and inferring relationships between distinct
pieces of information within text sequences. This challenge is particularly
pronounced in tasks involving multi-step processes, such as logical reasoning
and multi-hop question answering, where understanding implicit relationships
between entities and leveraging multi-hop connections in the given context are
crucial. Graphs, as fundamental data structures, explicitly represent pairwise
relationships between entities, thereby offering the potential to enhance LLMs’
reasoning capabilities. External graphs have proven effective in supporting
LLMs across multiple tasks. However, in many reasoning tasks, no pre-existing
graph structure is provided. Can we structure implicit knowledge derived from
context into graphs to assist LLMs in reasoning? In this paper, we propose
Reasoning with Graphs (RwG) by first constructing explicit graphs from the
context and then leveraging these graphs to enhance LLM reasoning performance
on reasoning tasks. Extensive experiments demonstrate the effectiveness of the
proposed method in improving both logical reasoning and multi-hop question
answering tasks.
[LINK]
http://arxiv.org/abs/2501.07845v1
[DATE]
2025-01-14 13:18:20+08:00
[CATEGORIES]
cs.CL
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
[AUTHORS]
Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe
[ABSTRACT]
End-to-end automatic speech recognition (E2E-ASR) can be classified by its
decoder architectures, such as connectionist temporal classification (CTC),
recurrent neural network transducer (RNN-T), attention-based encoder-decoder,
and Mask-CTC models. Each decoder architecture has advantages and
disadvantages, leading practitioners to switch between these different models
depending on application requirements. Instead of building separate models, we
propose a joint modeling scheme where four decoders (CTC, RNN-T, attention, and
Mask-CTC) share the same encoder – we refer to this as 4D modeling. The 4D
model is trained jointly, which will bring model regularization and maximize
the model robustness thanks to their complementary properties. To efficiently
train the 4D model, we introduce a two-stage training strategy that stabilizes
the joint training. In addition, we propose three novel joint beam search
algorithms by combining three decoders (CTC, RNN-T, and attention) to further
improve performance. These three beam search algorithms differ in which decoder
is used as the primary decoder. We carefully evaluate the performance and
computational tradeoffs associated with each algorithm. Experimental results
demonstrate that the jointly trained 4D model outperforms the E2E-ASR models
trained with only one individual decoder. Furthermore, we demonstrate that the
proposed joint beam search algorithm outperforms the previously proposed
CTC/attention decoding.
[COMMENTS]
accepted to IEEE/ACM Transactions on Audio Speech and Language
Processing
[LINK]
http://arxiv.org/abs/2406.02950v2
[DATE]
2025-01-14 13:03:19+08:00
[CATEGORIES]
cs.CL
ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA
[AUTHORS]
Jiaang Li, Quan Wang, Zhongnan Wang, Yongdong Zhang, Zhendong Mao
[ABSTRACT]
Large language models (LLMs) require model editing to efficiently update
specific knowledge within them and avoid factual errors. Most model editing
methods are solely designed for single-time use and result in a significant
forgetting effect in lifelong editing scenarios, where sequential edits are
conducted over time. Previous approaches manage sequential edits by freezing
original parameters and discretely allocating new parameters for each knowledge
update. However, these methods lack robustness to minor input variations due to
the discrete mapping between data and parameters. To overcome this challenge,
we propose ELDER, a novel approach to create a continuous association between
data and adapters. ELDER integrates multiple LoRAs through a router network and
is trained to establish a smooth data-adapter association, thereby enhancing
the edit robustness and generalization of semantically equivalent inputs. To
ensure inputs containing the same knowledge will be processed by the same
LoRAs, we design a novel loss to guide the model link LoRA allocations with
edit knowledge. Furthermore, we propose a deferral mechanism to retain the
original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and
LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong
setting, outperforming eight baselines while exhibiting strong scalability and
preserving LLMs’ general abilities on downstream tasks. Our code is available
at https://github.com/JiaangL/ELDER.
[COMMENTS]
Accepted by AAAI-25
[LINK]
http://arxiv.org/abs/2408.11869v3
[DATE]
2025-01-14 12:25:23+08:00
[CATEGORIES]
cs.CL
cs.LG
Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation
[AUTHORS]
Zerui Xu, Fang Wu, Yuanyuan Zhang, Yue Zhao
[ABSTRACT]
Machine learning (ML) exhibits promise in the clinical domain. However, it is
constrained by data scarcity and ethical considerations, as the generation of
clinical trials presents significant challenges due to stringent privacy
regulations, high costs, and the extended duration required for conducting
studies with human participants. Despite the advancements of large language
models (LLMs) in general generation tasks, their potential in facilitating the
generation of synthetic clinical trials is under-explored. To address this gap,
we introduce a novel Retrieval-Reasoning few-shot framework that leverages LLMs
to generate artificial yet realistic and diverse clinical trials with binary
success/failure labels. Experiments conducted on real clinical trials from the
\url{ClinicalTrials.gov} database demonstrate that our synthetic data can
effectively augment real datasets. Furthermore, by fine-tuning a pre-trained
model as a binary classifier on synthetic clinical trial datasets, we
demonstrate that this augmentation enhances model training for downstream tasks
such as trial outcome prediction. Our findings suggest that LLMs for synthetic
clinical trial generation hold promise for accelerating clinical research and
upholding ethical standards for patient privacy. The code is publicly available
at
https://anonymous.4open.science/r/Retrieval_Reasoning_Clinical_Trial_Generation-3EC4.
[LINK]
http://arxiv.org/abs/2410.12476v2
[DATE]
2025-01-14 12:19:49+08:00
[CATEGORIES]
cs.CL
cs.LG
Real-time Verification and Refinement of Language Model Text Generation
[AUTHORS]
Joonho Ko, Jinheon Baek, Sung Ju Hwang
[ABSTRACT]
Large language models (LLMs) have shown remarkable performance across a wide
range of natural language tasks. However, a critical challenge remains in that
they sometimes generate factually incorrect answers. To address this, while
many previous work has focused on identifying errors in their generation and
further refining them, they are slow in deployment since they are designed to
verify the response from LLMs only after their entire generation (from the
first to last tokens) is done. Further, we observe that once LLMs generate
incorrect tokens early on, there is a higher likelihood that subsequent tokens
will also be factually incorrect. To this end, in this work, we propose
Streaming-VR (Streaming Verification and Refinement), a novel approach designed
to enhance the efficiency of verification and refinement of LLM outputs.
Specifically, the proposed Streaming-VR enables on-the-fly verification and
correction of tokens as they are being generated, similar to a streaming
process, ensuring that each subset of tokens is checked and refined in
real-time by another LLM as the LLM constructs its response. Through
comprehensive evaluations on multiple datasets, we demonstrate that our
approach not only enhances the factual accuracy of LLMs, but also offers a more
efficient solution compared to prior refinement methods.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2501.07824v1
[DATE]
2025-01-14 11:59:48+08:00
[CATEGORIES]
cs.CL
cs.LG
A Multi-Encoder Frozen-Decoder Approach for Fine-Tuning Large Language Models
[AUTHORS]
Kaustubh D. Dhole
[ABSTRACT]
Among parameter-efficient fine-tuning methods, freezing has emerged as a
popular strategy for speeding up training, reducing catastrophic forgetting,
and improving downstream performance. We investigate the impact of freezing the
decoder in a multi-task setup comprising diverse natural language tasks, aiming
to reduce deployment overhead and enhance portability to novel tasks. Our
experiments, conducted by fine-tuning both individual and multi-task setups on
the AlexaTM model, reveal that freezing decoders is highly effective for tasks
with natural language outputs and mitigates catastrophic forgetting in
multilingual tasks. However, we find that pairing frozen decoders with a larger
model can effectively maintain or even enhance performance in structured and QA
tasks, making it a viable strategy for a broader range of task types.
[LINK]
http://arxiv.org/abs/2501.07818v1
[DATE]
2025-01-14 11:43:23+08:00
[CATEGORIES]
cs.CL
cs.LG
Energy-Efficient Split Learning for Fine-Tuning Large Language Models in Edge Networks
[AUTHORS]
Zuguang Li, Shaohua Wu, Liang Li, Songge Zhang
[ABSTRACT]
In this letter, we propose an energy-efficient split learning (SL) framework
for fine-tuning large language models (LLMs) using geo-distributed personal
data at the network edge, where LLMs are split and alternately across massive
mobile devices and an edge server. Considering the device heterogeneity and
channel dynamics in edge networks, a \underline{C}ut l\underline{A}yer and
computing \underline{R}esource \underline{D}ecision (CARD) algorithm is
developed to minimize training delay and energy consumption. Simulation results
demonstrate that the proposed approach reduces the average training delay and
server’s energy consumption by 70.8% and 53.1%, compared to the benchmarks,
respectively.
[COMMENTS]
5 pages, 6 figures
[LINK]
http://arxiv.org/abs/2412.00090v2
[DATE]
2025-01-14 11:27:10+08:00
[CATEGORIES]
cs.LG
cs.CL
Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models
[AUTHORS]
Dhruv Dhamani, Mary Lou Maher
[ABSTRACT]
Recent advances in prompting techniques and multi-agent systems for Large
Language Models (LLMs) have produced increasingly complex approaches. However,
we lack a framework for characterizing and comparing prompting techniques or
understanding their relationship to multi-agent LLM systems. This position
paper introduces and explains the concepts of linear contexts (a single,
continuous sequence of interactions) and non-linear contexts (branching or
multi-path) in LLM systems. These concepts enable the development of an
agent-centric projection of prompting techniques, a framework that can reveal
deep connections between prompting strategies and multi-agent systems. We
propose three conjectures based on this framework: (1) results from non-linear
prompting techniques can predict outcomes in equivalent multi-agent systems,
(2) multi-agent system architectures can be replicated through single-LLM
prompting techniques that simulate equivalent interaction patterns, and (3)
these equivalences suggest novel approaches for generating synthetic training
data. We argue that this perspective enables systematic cross-pollination of
research findings between prompting and multi-agent domains, while providing
new directions for improving both the design and training of future LLM
systems.
[COMMENTS]
8 pages, 5 figures. Accepted at ICAART 2025. Derived from an early
draft at 2312.17601. arXiv admin note: substantial text overlap with
arXiv:2312.17601
[LINK]
http://arxiv.org/abs/2501.07815v1
[DATE]
2025-01-14 11:26:43+08:00
[CATEGORIES]
cs.CL
Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering
[AUTHORS]
Feijie Wu, Zitao Li, Fei Wei, Yaliang Li, Bolin Ding, Jing Gao
[ABSTRACT]
Leveraging large language models (LLMs), an agent can utilize
retrieval-augmented generation (RAG) techniques to integrate external knowledge
and increase the reliability of its responses. Current RAG-based agents
integrate single, domain-specific knowledge sources, limiting their ability and
leading to hallucinated or inaccurate responses when addressing cross-domain
queries. Integrating multiple knowledge bases into a unified RAG-based agent
raises significant challenges, including increased retrieval overhead and data
sovereignty when sensitive data is involved. In this work, we propose RopMura,
a novel multi-agent system that addresses these limitations by incorporating
highly efficient routing and planning mechanisms. RopMura features two key
components: a router that intelligently selects the most relevant agents based
on knowledge boundaries and a planner that decomposes complex multi-hop queries
into manageable steps, allowing for coordinating cross-domain responses.
Experimental results demonstrate that RopMura effectively handles both
single-hop and multi-hop queries, with the routing mechanism enabling precise
answers for single-hop queries and the combined routing and planning mechanisms
achieving accurate, multi-step resolutions for complex queries.
[COMMENTS]
Work In Progress
[LINK]
http://arxiv.org/abs/2501.07813v1
[DATE]
2025-01-14 11:25:26+08:00
[CATEGORIES]
cs.CL
Don’t Command, Cultivate: An Exploratory Study of System-2 Alignment
[AUTHORS]
Yuhang Wang, Yuxiang Zhang, Yanxu Zhu, Xinyan Wen, Jitao Sang
[ABSTRACT]
The o1 system card identifies the o1 models as the most robust within OpenAI,
with their defining characteristic being the progression from rapid, intuitive
thinking to slower, more deliberate reasoning. This observation motivated us to
investigate the influence of System-2 thinking patterns on model safety. In our
preliminary research, we conducted safety evaluations of the o1 model,
including complex jailbreak attack scenarios using adversarial natural language
prompts and mathematical encoding prompts. Our findings indicate that the o1
model demonstrates relatively improved safety performance; however, it still
exhibits vulnerabilities, particularly against jailbreak attacks employing
mathematical encoding. Through detailed case analysis, we identified specific
patterns in the o1 model’s responses. We also explored the alignment of
System-2 safety in open-source models using prompt engineering and supervised
fine-tuning techniques. Experimental results show that some simple methods to
encourage the model to carefully scrutinize user requests are beneficial for
model safety. Additionally, we proposed a implementation plan for process
supervision to enhance safety alignment. The implementation details and
experimental results will be provided in future versions.
[COMMENTS]
In this version, the DPO and reinforcement learning methods have been
added
[LINK]
http://arxiv.org/abs/2411.17075v5
[DATE]
2025-01-14 11:05:10+08:00
[CATEGORIES]
cs.CL
$\text{Transformer}^2$: Self-adaptive LLMs
[AUTHORS]
Qi Sun, Edoardo Cetin, Yujin Tang
[ABSTRACT]
Self-adaptive large language models (LLMs) aim to solve the challenges posed
by traditional fine-tuning methods, which are often computationally intensive
and static in their ability to handle diverse tasks. We introduce
$\text{Transformer}^2$, a novel self-adaptation framework that adapts LLMs for
unseen tasks in real-time by selectively adjusting only the singular components
of their weight matrices. During inference, $\text{Transformer}^2$ employs a
two-pass mechanism: first, a dispatch system identifies the task properties,
and then task-specific “expert” vectors, trained using reinforcement learning,
are dynamically mixed to obtain targeted behavior for the incoming prompt. Our
method outperforms ubiquitous approaches such as LoRA, with fewer parameters
and greater efficiency. $\text{Transformer}^2$ demonstrates versatility across
different LLM architectures and modalities, including vision-language tasks.
$\text{Transformer}^2$ represents a significant leap forward, offering a
scalable, efficient solution for enhancing the adaptability and task-specific
performance of LLMs, paving the way for truly dynamic, self-organizing AI
systems.
[COMMENTS]
18 panges, 11 figures, 9 tables
[LINK]
http://arxiv.org/abs/2501.06252v2
[DATE]
2025-01-14 10:52:26+08:00
[CATEGORIES]
cs.LG
cs.CL
Gradient descent with generalized Newton’s method
[AUTHORS]
Zhiqi Bu, Shiyun Xu
[ABSTRACT]
We propose the generalized Newton’s method (GeN) – a Hessian-informed
approach that applies to any optimizer such as SGD and Adam, and covers the
Newton-Raphson method as a sub-case. Our method automatically and dynamically
selects the learning rate that accelerates the convergence, without the
intensive tuning of the learning rate scheduler. In practice, our method is
easily implementable, since it only requires additional forward passes with
almost zero computational overhead (in terms of training time and memory cost),
if the overhead is amortized over many iterations. We present extensive
experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that
GeN optimizers match the state-of-the-art performance, which was achieved with
carefully tuned learning rate schedulers.
[LINK]
http://arxiv.org/abs/2407.02772v2
[DATE]
2025-01-14 10:30:09+08:00
[CATEGORIES]
cs.LG
cs.CL
Let the Rule Speak: Enhancing In-context Learning Debiasing with Interpretability
[AUTHORS]
Ruixi Lin, Yang You
[ABSTRACT]
In-context learning, which allows large language models to perform diverse
tasks with a few demonstrations, is found to have imbalanced per-class
prediction accuracy on multi-class text classification. Although notable output
correction methods have been developed to tackle the issue and simultaneously
improve downstream prediction accuracy, they may fail to answer the core
interpretability challenges: why and which certain classes need corrections,
and more importantly, a tailored correction for per-sample, per-class’s
probability. To address such interpretability gaps, we first find that the
imbalance arises from certain classes consistently receiving high ICL output
probabilities, whereas others receiving lower or mixed ranges, so the former is
more frequently chosen, resulting in higher accuracy; more crucially, we find
that these ranges have significantly varying degrees of influence on the
accuracy bias, highlighting the need for precise, interpretable probability
corrections by range. Motivated by this, we propose FuRud, a Fuzzy Rule
Optimization based Debiasing method, that (1) detects which classes need
corrections, and (2) for each correction-needed class, detects its probability
ranges and applies asymmetric amplifications or reductions to correct them
interpretably. Notably, across seven benchmark datasets, FuRud reduces the
pairwise class accuracy bias (COBias) by more than half (56%), while achieving
a relative increase of 21% in accuracy, outperforming state-of-the-art
debiasing methods. Moreover, FuRud can optimize downstream tasks with as few as
10 optimization examples. Furthermore, FuRud can work for prompt formats that
lead to highly skewed predictions. For example, FuRud greatly improves ICL
outputs which use letter options, with 44% relative accuracy increase and 54%
relative COBias reduction.
[LINK]
http://arxiv.org/abs/2412.19018v2
[DATE]
2025-01-14 09:42:40+08:00
[CATEGORIES]
cs.CL
Large Language Models for Knowledge Graph Embedding Techniques, Methods, and Challenges: A Survey
[AUTHORS]
Bingchen Liu, Xin Li
[ABSTRACT]
Large Language Models (LLMs) have attracted a lot of attention in various
fields due to their superior performance, aiming to train hundreds of millions
or more parameters on large amounts of text data to understand and generate
natural language. As the superior performance of LLMs becomes apparent, they
are increasingly being applied to knowledge graph embedding (KGE) related tasks
to improve the processing results. As a deep learning model in the field of
Natural Language Processing (NLP), it learns a large amount of textual data to
predict the next word or generate content related to a given text. However,
LLMs have recently been invoked to varying degrees in different types of KGE
related scenarios such as multi-modal KGE and open KGE according to their task
characteristics. In this paper, we investigate a wide range of approaches for
performing LLMs-related tasks in different types of KGE scenarios. To better
compare the various approaches, we summarize each KGE scenario in a
classification. In addition to the categorization methods, we provide a tabular
overview of the methods and their source code links for a more direct
comparison. In the article we also discuss the applications in which the
methods are mainly used and suggest several forward-looking directions for the
development of this new research area.
[LINK]
http://arxiv.org/abs/2501.07766v1
[DATE]
2025-01-14 08:47:24+08:00
[CATEGORIES]
cs.CL
Modeling Feature Maps for Quantum Machine Learning
[AUTHORS]
Navneet Singh, Shiva Raj Pokhrel
[ABSTRACT]
Quantum Machine Learning (QML) offers significant potential for complex tasks
like genome sequence classification, but quantum noise on Noisy
Intermediate-Scale Quantum (NISQ) devices poses practical challenges. This
study systematically evaluates how various quantum noise models including
dephasing, amplitude damping, depolarizing, thermal noise, bit-flip, and
phase-flip affect key QML algorithms (QSVC, Peg-QSVC, QNN, VQC) and feature
mapping techniques (ZFeatureMap, ZZFeatureMap, and PauliFeatureMap). Results
indicate that QSVC is notably robust under noise, whereas Peg-QSVC and QNN are
more sensitive, particularly to depolarizing and amplitude-damping noise. The
PauliFeatureMap is especially vulnerable, highlighting difficulties in
maintaining accurate classification under noisy conditions. These findings
underscore the critical importance of feature map selection and noise
mitigation strategies in optimizing QML for genomic classification, with
promising implications for personalized medicine.
[LINK]
http://arxiv.org/abs/2501.08205v1
[DATE]
2025-01-14 23:45:27+08:00
[CATEGORIES]
cs.LG
Data-driven system identification using quadratic embeddings of nonlinear dynamics
[AUTHORS]
Stefan Klus, Joel-Pascal N’Konzi
[ABSTRACT]
We propose a novel data-driven method called QENDy (Quadratic Embedding of
Nonlinear Dynamics) that not only allows us to learn quadratic representations
of highly nonlinear dynamical systems, but also to identify the governing
equations. The approach is based on an embedding of the system into a
higher-dimensional feature space in which the dynamics become quadratic. Just
like SINDy (Sparse Identification of Nonlinear Dynamics), our method requires
trajectory data, time derivatives for the training data points, which can also
be estimated using finite difference approximations, and a set of preselected
basis functions, called dictionary. We illustrate the efficacy and accuracy of
QENDy with the aid of various benchmark problems and compare its performance
with SINDy and a deep learning method for identifying quadratic embeddings.
Furthermore, we analyze the convergence of QENDy and SINDy in the infinite data
limit, highlight their similarities and main differences, and compare the
quadratic embedding with linearization techniques based on the Koopman
operator.
[LINK]
http://arxiv.org/abs/2501.08202v1
[DATE]
2025-01-14 23:37:03+08:00
[CATEGORIES]
cs.LG
Globally Convergent Variational Inference
[AUTHORS]
Declan McNamara, Jackson Loper, Jeffrey Regier
[ABSTRACT]
In variational inference (VI), an approximation of the posterior distribution
is selected from a family of distributions through numerical optimization. With
the most common variational objective function, known as the evidence lower
bound (ELBO), only convergence to a local optimum can be guaranteed. In this
work, we instead establish the global convergence of a particular VI method.
This VI method, which may be considered an instance of neural posterior
estimation (NPE), minimizes an expectation of the inclusive (forward) KL
divergence to fit a variational distribution that is parameterized by a neural
network. Our convergence result relies on the neural tangent kernel (NTK) to
characterize the gradient dynamics that arise from considering the variational
objective in function space. In the asymptotic regime of a fixed,
positive-definite neural tangent kernel, we establish conditions under which
the variational objective admits a unique solution in a reproducing kernel
Hilbert space (RKHS). Then, we show that the gradient descent dynamics in
function space converge to this unique function. In ablation studies and
practical problems, we demonstrate that our results explain the behavior of NPE
in non-asymptotic finite-neuron settings, and show that NPE outperforms
ELBO-based optimization, which often converges to shallow local optima.
[COMMENTS]
Accepted to the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2501.08201v1
[DATE]
2025-01-14 23:36:32+08:00
[CATEGORIES]
cs.LG
Self-supervised Deep Hyperspectral Inpainting with the Plug and Play and Deep Image Prior Models
[AUTHORS]
Shuo Li, Mehrdad Yaghoobi
[ABSTRACT]
Hyperspectral images are typically composed of hundreds of narrow and
contiguous spectral bands, each containing information regarding the material
composition of the imaged scene. However, these images can be affected by
various sources of noise, distortions, or data loss, which can significantly
degrade their quality and usefulness. This paper introduces a convergent
guaranteed algorithm, LRS-PnP-DIP(1-Lip), which successfully addresses the
instability issue of DHP that has been reported before. The proposed algorithm
extends the successful joint low-rank and sparse model to further exploit the
underlying data structures beyond the conventional and sometimes restrictive
unions of subspace models. A stability analysis guarantees the convergence of
the proposed algorithm under mild assumptions , which is crucial for its
application in real-world scenarios. Extensive experiments demonstrate that the
proposed solution consistently delivers visually and quantitatively superior
inpainting results, establishing state-of-the-art performance.
[COMMENTS]
31 pages, 9 Figures, 7 Tables. arXiv admin note: text overlap with
arXiv:2306.08128
[LINK]
http://arxiv.org/abs/2501.08195v1
[DATE]
2025-01-14 23:18:28+08:00
[CATEGORIES]
cs.LG
Modeling Quantum Machine Learning for Genomic Data Analysis
[AUTHORS]
Navneet Singh, Shiva Raj Pokhrel
[ABSTRACT]
Quantum Machine Learning (QML) continues to evolve, unlocking new
opportunities for diverse applications. In this study, we investigate and
evaluate the applicability of QML models for binary classification of genome
sequence data by employing various feature mapping techniques. We present an
open-source, independent Qiskit-based implementation to conduct experiments on
a benchmark genomic dataset. Our simulations reveal that the interplay between
feature mapping techniques and QML algorithms significantly influences
performance. Notably, the Pegasos Quantum Support Vector Classifier
(Pegasos-QSVC) exhibits high sensitivity, particularly excelling in recall
metrics, while Quantum Neural Networks (QNN) achieve the highest training
accuracy across all feature maps. However, the pronounced variability in
classifier performance, dependent on feature mapping, highlights the risk of
overfitting to localized output distributions in certain scenarios. This work
underscores the transformative potential of QML for genomic data classification
while emphasizing the need for continued advancements to enhance the robustness
and accuracy of these methodologies.
[LINK]
http://arxiv.org/abs/2501.08193v1
[DATE]
2025-01-14 23:14:26+08:00
[CATEGORIES]
cs.LG
A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation
[AUTHORS]
Steven Landgraf, Rongjun Qin, Markus Ulrich
[ABSTRACT]
While recent foundation models have enabled significant breakthroughs in
monocular depth estimation, a clear path towards safe and reliable deployment
in the real-world remains elusive. Metric depth estimation, which involves
predicting absolute distances, poses particular challenges, as even the most
advanced foundation models remain prone to critical errors. Since quantifying
the uncertainty has emerged as a promising endeavor to address these
limitations and enable trustworthy deployment, we fuse five different
uncertainty quantification methods with the current state-of-the-art
DepthAnythingV2 foundation model. To cover a wide range of metric depth
domains, we evaluate their performance on four diverse datasets. Our findings
identify fine-tuning with the Gaussian Negative Log-Likelihood Loss (GNLL) as a
particularly promising approach, offering reliable uncertainty estimates while
maintaining predictive performance and computational efficiency on par with the
baseline, encompassing both training and inference time. By fusing uncertainty
quantification and foundation models within the context of monocular depth
estimation, this paper lays a critical foundation for future research aimed at
improving not only model performance but also its explainability. Extending
this critical synthesis of uncertainty quantification and foundation models
into other crucial tasks, such as semantic segmentation and pose estimation,
presents exciting opportunities for safer and more reliable machine vision
systems.
[LINK]
http://arxiv.org/abs/2501.08188v1
[DATE]
2025-01-14 23:13:00+08:00
[CATEGORIES]
cs.LG
D$^2$-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models
[AUTHORS]
Qian Zeng, Jie Song, Han Zheng, Hao Jiang, Mingli Song
[ABSTRACT]
Diffusion models have achieved cutting-edge performance in image generation.
However, their lengthy denoising process and computationally intensive score
estimation network impede their scalability in low-latency and
resource-constrained scenarios. Post-training quantization (PTQ) compresses and
accelerates diffusion models without retraining, but it inevitably introduces
additional quantization noise, resulting in mean and variance deviations. In
this work, we propose D2-DPM, a dual denoising mechanism aimed at precisely
mitigating the adverse effects of quantization noise on the noise estimation
network. Specifically, we first unravel the impact of quantization noise on the
sampling equation into two components: the mean deviation and the variance
deviation. The mean deviation alters the drift coefficient of the sampling
equation, influencing the trajectory trend, while the variance deviation
magnifies the diffusion coefficient, impacting the convergence of the sampling
trajectory. The proposed D2-DPM is thus devised to denoise the quantization
noise at each time step, and then denoise the noisy sample through the inverse
diffusion iterations. Experimental results demonstrate that D2-DPM achieves
superior generation quality, yielding a 1.42 lower FID than the full-precision
model while achieving 3.99x compression and 11.67x bit-operation acceleration.
[COMMENTS]
9 pages, 4 figures, acceptted by AAAI2025
[LINK]
http://arxiv.org/abs/2501.08180v1
[DATE]
2025-01-14 23:03:53+08:00
[CATEGORIES]
cs.LG
ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection
[AUTHORS]
Jui-Che Chiang, Hou-Ning Hu, Bo-Syuan Hou, Chia-Yu Tseng, Yu-Lun Liu, Min-Hung Chen, Yen-Yu Lin
[ABSTRACT]
Although facial landmark detection (FLD) has gained significant progress,
existing FLD methods still suffer from performance drops on partially
non-visible faces, such as faces with occlusions or under extreme lighting
conditions or poses. To address this issue, we introduce ORFormer, a novel
transformer-based method that can detect non-visible regions and recover their
missing features from visible parts. Specifically, ORFormer associates each
image patch token with one additional learnable token called the messenger
token. The messenger token aggregates features from all but its patch. This
way, the consensus between a patch and other patches can be assessed by
referring to the similarity between its regular and messenger embeddings,
enabling non-visible region identification. Our method then recovers occluded
patches with features aggregated by the messenger tokens. Leveraging the
recovered features, ORFormer compiles high-quality heatmaps for the downstream
FLD task. Extensive experiments show that our method generates heatmaps
resilient to partial occlusions. By integrating the resultant heatmaps into
existing FLD methods, our method performs favorably against the state of the
arts on challenging datasets such as WFLW and COFW.
[COMMENTS]
WACV 2025 Project Link: https://ben0919.github.io/ORFormer/
[LINK]
http://arxiv.org/abs/2412.13174v2
[DATE]
2025-01-14 22:48:32+08:00
[CATEGORIES]
cs.LG
FairTTTS: A Tree Test Time Simulation Method for Fairness-Aware Classification
[AUTHORS]
Nurit Cohen-Inger, Lior Rokach, Bracha Shapira, Seffi Cohen
[ABSTRACT]
Algorithmic decision-making has become deeply ingrained in many domains, yet
biases in machine learning models can still produce discriminatory outcomes,
often harming unprivileged groups. Achieving fair classification is inherently
challenging, requiring a careful balance between predictive performance and
ethical considerations. We present FairTTTS, a novel post-processing bias
mitigation method inspired by the Tree Test Time Simulation (TTTS) method.
Originally developed to enhance accuracy and robustness against adversarial
inputs through probabilistic decision-path adjustments, TTTS serves as the
foundation for FairTTTS. By building on this accuracy-enhancing technique,
FairTTTS mitigates bias and improves predictive performance. FairTTTS uses a
distance-based heuristic to adjust decisions at protected attribute nodes,
ensuring fairness for unprivileged samples. This fairness-oriented adjustment
occurs as a post-processing step, allowing FairTTTS to be applied to
pre-trained models, diverse datasets, and various fairness metrics without
retraining. Extensive evaluation on seven benchmark datasets shows that
FairTTTS outperforms traditional methods in fairness improvement, achieving a
20.96% average increase over the baseline compared to 18.78% for related work,
and further enhances accuracy by 0.55%. In contrast, competing methods
typically reduce accuracy by 0.42%. These results confirm that FairTTTS
effectively promotes more equitable decision-making while simultaneously
improving predictive performance.
[LINK]
http://arxiv.org/abs/2501.08155v1
[DATE]
2025-01-14 22:29:36+08:00
[CATEGORIES]
cs.LG
Multiple-Input Variational Auto-Encoder for Anomaly Detection in Heterogeneous Data
[AUTHORS]
Phai Vu Dinh, Diep N. Nguyen, Dinh Thai Hoang, Quang Uy Nguyen, Eryk Dutkiewicz
[ABSTRACT]
Anomaly detection (AD) plays a pivotal role in AI applications, e.g., in
classification, and intrusion/threat detection in cybersecurity. However, most
existing methods face challenges of heterogeneity amongst feature subsets posed
by non-independent and identically distributed (non-IID) data. We propose a
novel neural network model called Multiple-Input Auto-Encoder for AD (MIAEAD)
to address this. MIAEAD assigns an anomaly score to each feature subset of a
data sample to indicate its likelihood of being an anomaly. This is done by
using the reconstruction error of its sub-encoder as the anomaly score. All
sub-encoders are then simultaneously trained using unsupervised learning to
determine the anomaly scores of feature subsets. The final AUC of MIAEAD is
calculated for each sub-dataset, and the maximum AUC obtained among the
sub-datasets is selected. To leverage the modelling of the distribution of
normal data to identify anomalies of the generative models, we develop a novel
neural network architecture/model called Multiple-Input Variational
Auto-Encoder (MIVAE). MIVAE can process feature subsets through its
sub-encoders before learning distribution of normal data in the latent space.
This allows MIVAE to identify anomalies that deviate from the learned
distribution. We theoretically prove that the difference in the average anomaly
score between normal samples and anomalies obtained by the proposed MIVAE is
greater than that of the Variational Auto-Encoder (VAEAD), resulting in a
higher AUC for MIVAE. Extensive experiments on eight real-world anomaly
datasets demonstrate the superior performance of MIAEAD and MIVAE over
conventional methods and the state-of-the-art unsupervised models, by up to 6%
in terms of AUC score. Alternatively, MIAEAD and MIVAE have a high AUC when
applied to feature subsets with low heterogeneity based on the coefficient of
variation (CV) score.
[COMMENTS]
16 pages
[LINK]
http://arxiv.org/abs/2501.08149v1
[DATE]
2025-01-14 22:25:10+08:00
[CATEGORIES]
cs.LG
WINE: Wavelet-Guided GAN Inversion and Editing for High-Fidelity Refinement
[AUTHORS]
Chaewon Kim, Seung-Jun Moon, Gyeong-Moon Park
[ABSTRACT]
Recent advanced GAN inversion models aim to convey high-fidelity information
from original images to generators through methods using generator tuning or
high-dimensional feature learning. Despite these efforts, accurately
reconstructing image-specific details remains as a challenge due to the
inherent limitations both in terms of training and structural aspects, leading
to a bias towards low-frequency information. In this paper, we look into the
widely used pixel loss in GAN inversion, revealing its predominant focus on the
reconstruction of low-frequency features. We then propose WINE, a
Wavelet-guided GAN Inversion aNd Editing model, which transfers the
high-frequency information through wavelet coefficients via newly proposed
wavelet loss and wavelet fusion scheme. Notably, WINE is the first attempt to
interpret GAN inversion in the frequency domain. Our experimental results
showcase the precision of WINE in preserving high-frequency details and
enhancing image quality. Even in editing scenarios, WINE outperforms existing
state-of-the-art GAN inversion models with a fine balance between editability
and reconstruction quality.
[LINK]
http://arxiv.org/abs/2210.09655v2
[DATE]
2025-01-14 22:22:05+08:00
[CATEGORIES]
cs.LG
Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying
[AUTHORS]
Jonathan Lyhs, Lars Hinneburg, Michael Fischer, Florian Ölsner, Stefan Milz, Jeremy Tschirner, Patrick Mäder
[ABSTRACT]
Modern machine learning techniques have shown tremendous potential,
especially for object detection on camera images. For this reason, they are
also used to enable safety-critical automated processes such as autonomous
drone flights. We present a study on object detection for Detect and Avoid, a
safety critical function for drones that detects air traffic during automated
flights for safety reasons. An ill-posed problem is the generation of good and
especially large data sets, since detection itself is the corner case. Most
models suffer from limited ground truth in raw data, \eg recorded air traffic
or frontal flight with a small aircraft. It often leads to poor and critical
detection rates. We overcome this problem by using inpainting methods to
bootstrap the dataset such that it explicitly contains the corner cases of the
raw data. We provide an overview of inpainting methods and generative models
and present an example pipeline given a small annotated dataset. We validate
our method by generating a high-resolution dataset, which we make publicly
available and present it to an independent object detector that was fully
trained on real data.
[LINK]
http://arxiv.org/abs/2501.08142v1
[DATE]
2025-01-14 22:21:48+08:00
[CATEGORIES]
cs.LG
EEG-ReMinD: Enhancing Neurodegenerative EEG Decoding through Self-Supervised State Reconstruction-Primed Riemannian Dynamics
[AUTHORS]
Zirui Wang, Zhenxi Song, Yi Guo, Yuxin Liu, Guoyang Xu, Min Zhang, Zhiguo Zhang
[ABSTRACT]
The development of EEG decoding algorithms confronts challenges such as data
sparsity, subject variability, and the need for precise annotations, all of
which are vital for advancing brain-computer interfaces and enhancing the
diagnosis of diseases. To address these issues, we propose a novel two-stage
approach named Self-Supervised State Reconstruction-Primed Riemannian Dynamics
(EEG-ReMinD) , which mitigates reliance on supervised learning and integrates
inherent geometric features. This approach efficiently handles EEG data
corruptions and reduces the dependency on labels. EEG-ReMinD utilizes
self-supervised and geometric learning techniques, along with an attention
mechanism, to analyze the temporal dynamics of EEG features within the
framework of Riemannian geometry, referred to as Riemannian dynamics.
Comparative analyses on both intact and corrupted datasets from two different
neurodegenerative disorders underscore the enhanced performance of EEG-ReMinD.
[LINK]
http://arxiv.org/abs/2501.08139v1
[DATE]
2025-01-14 22:19:40+08:00
[CATEGORIES]
cs.LG
An Empirical Wall-Pressure Spectrum Model for Aeroacoustic Predictions Based on Symbolic Regression
[AUTHORS]
Laura Botero Bolívar, David Huergo, Fernanda L. dos Santos, Cornelis H. Venner, Leandro D. de Santana, Esteban Ferrer
[ABSTRACT]
Fast-turn around methods to predict airfoil trailing-edge noise are crucial
for incorporating noise limitations into design optimization loops of several
applications. Among these aeroacoustic predictive models, Amiet’s theory offers
the best balance between accuracy and simplicity. The accuracy of the model
relies heavily on precise wall-pressure spectrum predictions, which are often
based on single-equation formulations with adjustable parameters. These
parameters are calibrated for particular airfoils and flow conditions and
consequently tend to fail when applied outside their calibration range. This
paper introduces a new wall-pressure spectrum empirical model designed to
enhance the robustness and accuracy of current state-of-the-art predictions
while widening the range of applicability of the model to different airfoils
and flow conditions. The model is developed using AI-based symbolic regression
via a genetic-algorithm-based approach, and applied to a dataset of
wall-pressure fluctuations measured on NACA 0008 and NACA 63018 airfoils at
multiple angles of attack and inflow velocities, covering turbulent boundary
layers with both adverse and favorable pressure gradients. Validation against
experimental data (outside the training dataset) demonstrates the robustness of
the model compared to well-accepted semi-empirical models. Finally, the model
is integrated with Amiet’s theory to predict the aeroacoustic noise of a
full-scale wind turbine, showing good agreement with experimental measurements.
[LINK]
http://arxiv.org/abs/2501.08134v1
[DATE]
2025-01-14 22:14:22+08:00
[CATEGORIES]
cs.LG
Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression
[AUTHORS]
Abhinav Das, Stephan Schlüter, Lorenz Schneider
[ABSTRACT]
This paper presents a new hybrid model for predicting German electricity
prices. The algorithm is based on combining Gaussian Process Regression (GPR)
and Support Vector Regression (SVR). While GPR is a competent model for
learning the stochastic pattern within the data and interpolation, its
performance for out-of-sample data is not very promising. By choosing a
suitable data-dependent covariance function, we can enhance the performance of
GPR for the tested German hourly power prices. However, since the out-of-sample
prediction depends on the training data, the prediction is vulnerable to noise
and outliers. To overcome this issue, a separate prediction is made using SVR,
which applies margin-based optimization, having an advantage in dealing with
non-linear processes and outliers, since only certain necessary points (support
vectors) in the training data are responsible for regression. Both individual
predictions are later combined using the performance-based weight assignment
method. A test on historic German power prices shows that this approach
outperforms its chosen benchmarks such as the autoregressive exogenous model,
the naive approach, as well as the long short-term memory approach of
prediction.
[LINK]
http://arxiv.org/abs/2412.00123v3
[DATE]
2025-01-14 22:01:36+08:00
[CATEGORIES]
cs.LG
Set-based Neural Network Encoding Without Weight Tying
[AUTHORS]
Bruno Andreis, Soro Bedionita, Philip H. S. Torr, Sung Ju Hwang
[ABSTRACT]
We propose a neural network weight encoding method for network property
prediction that utilizes set-to-set and set-to-vector functions to efficiently
encode neural network parameters. Our approach is capable of encoding neural
networks in a model zoo of mixed architecture and different parameter sizes as
opposed to previous approaches that require custom encoding models for
different architectures. Furthermore, our \textbf{S}et-based \textbf{N}eural
network \textbf{E}ncoder (SNE) takes into consideration the hierarchical
computational structure of neural networks. To respect symmetries inherent in
network weight space, we utilize Logit Invariance to learn the required minimal
invariance properties. Additionally, we introduce a \textit{pad-chunk-encode}
pipeline to efficiently encode neural network layers that is adjustable to
computational and memory constraints. We also introduce two new tasks for
neural network property prediction: cross-dataset and cross-architecture. In
cross-dataset property prediction, we evaluate how well property predictors
generalize across model zoos trained on different datasets but of the same
architecture. In cross-architecture property prediction, we evaluate how well
property predictors transfer to model zoos of different architecture not seen
during training. We show that SNE outperforms the relevant baselines on
standard benchmarks.
[COMMENTS]
23 pages
[LINK]
http://arxiv.org/abs/2305.16625v3
[DATE]
2025-01-14 21:48:49+08:00
[CATEGORIES]
cs.LG
Approximation Rates in Fréchet Metrics: Barron Spaces, Paley-Wiener Spaces, and Fourier Multipliers
[AUTHORS]
Ahmed Abdeljawad, Thomas Dittrich
[ABSTRACT]
Operator learning is a recent development in the simulation of Partial
Differential Equations (PDEs) by means of neural networks. The idea behind this
approach is to learn the behavior of an operator, such that the resulting
neural network is an (approximate) mapping in infinite-dimensional spaces that
is capable of (approximately) simulating the solution operator governed by the
PDE. In our work, we study some general approximation capabilities for linear
differential operators by approximating the corresponding symbol in the Fourier
domain. Analogous to the structure of the class of H"ormander-Symbols, we
consider the approximation with respect to a topology that is induced by a
sequence of semi-norms. In that sense, we measure the approximation error in
terms of a Fr'echet metric, and our main result identifies sufficient
conditions for achieving a predefined approximation error. Secondly, we then
focus on a natural extension of our main theorem, in which we manage to reduce
the assumptions on the sequence of semi-norms. Based on existing approximation
results for the exponential spectral Barron space, we then present a concrete
example of symbols that can be approximated well.
[COMMENTS]
Minor revision
[LINK]
http://arxiv.org/abs/2501.04023v2
[DATE]
2025-01-14 21:40:35+08:00
[CATEGORIES]
cs.LG
Smooth Handovers via Smoothed Online Learning
[AUTHORS]
Michail Kalntis, Andra Lutu, Jesús Omaña Iglesias, Fernando A. Kuipers, George Iosifidis
[ABSTRACT]
With users demanding seamless connectivity, handovers (HOs) have become a
fundamental element of cellular networks. However, optimizing HOs is a
challenging problem, further exacerbated by the growing complexity of mobile
networks. This paper presents the first countrywide study of HO optimization,
through the prism of Smoothed Online Learning (SOL). We first analyze an
extensive dataset from a commercial mobile network operator (MNO) in Europe
with more than 40M users, to understand and reveal important features and
performance impacts on HOs. Our findings highlight a correlation between HO
failures/delays, and the characteristics of radio cells and end-user devices,
showcasing the impact of heterogeneity in mobile networks nowadays. We
subsequently model UE-cell associations as dynamic decisions and propose a
realistic system model for smooth and accurate HOs that extends existing
approaches by (i) incorporating device and cell features on HO optimization,
and (ii) eliminating (prior) strong assumptions about requiring future signal
measurements and knowledge of end-user mobility. Our algorithm, aligned with
the O-RAN paradigm, provides robust dynamic regret guarantees, even in
challenging environments, and shows superior performance in multiple scenarios
with real-world and synthetic data.
[LINK]
http://arxiv.org/abs/2501.08099v1
[DATE]
2025-01-14 21:16:33+08:00
[CATEGORIES]
cs.LG
Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning
[AUTHORS]
Yan Fan, Yu Wang, Pengfei Zhu, Qinghua Hu
[ABSTRACT]
Continual learning (CL) has shown promising results and comparable
performance to learning at once in a fully supervised manner. However, CL
strategies typically require a large number of labeled samples, making their
real-life deployment challenging. In this work, we focus on semi-supervised
continual learning (SSCL), where the model progressively learns from partially
labeled data with unknown categories. We provide a comprehensive analysis of
SSCL and demonstrate that unreliable distributions of unlabeled data lead to
unstable training and refinement of the progressing stages. This problem
severely impacts the performance of SSCL. To address the limitations, we
propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for
semi-supervised continual learning, which leverages both semantic and
structural information to achieve more stable knowledge distillation on
unlabeled data and exhibit robustness against distribution bias. Firstly, we
formalize a general model of structural distillation and design a dynamic graph
construction for the continual learning progress. Next, we define a structure
distillation vector and design a dynamic sub-graph distillation algorithm,
which enables end-to-end training and adaptability to scale up tasks. The
entire proposed method is adaptable to various CL methods and supervision
settings. Finally, experiments conducted on three datasets CIFAR10, CIFAR100,
and ImageNet-100, with varying supervision ratios, demonstrate the
effectiveness of our proposed approach in mitigating the catastrophic
forgetting problem in semi-supervised continual learning scenarios.
[LINK]
http://arxiv.org/abs/2312.16409v2
[DATE]
2025-01-14 21:14:00+08:00
[CATEGORIES]
cs.LG
Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations
[AUTHORS]
Julius Aka, Johannes Brunnemann, Jörg Eiden, Arne Speerforck, Lars Mikelsons
[ABSTRACT]
Variational Autoencoders (VAEs) are a powerful framework for learning latent
representations of reduced dimensionality, while Neural ODEs excel in learning
transient system dynamics. This work combines the strengths of both to generate
fast surrogate models with adjustable complexity reacting on time-varying
inputs signals. By leveraging the VAE’s dimensionality reduction using a
nonhierarchical prior, our method adaptively assigns stochastic noise,
naturally complementing known NeuralODE training enhancements and enabling
probabilistic time series modeling. We show that standard Latent ODEs struggle
with dimensionality reduction in systems with time-varying inputs. Our approach
mitigates this by continuously propagating variational parameters through time,
establishing fixed information channels in latent space. This results in a
flexible and robust method that can learn different system complexities, e.g.
deep neural networks or linear matrices. Hereby, it enables efficient
approximation of the Koopman operator without the need for predefining its
dimensionality. As our method balances dimensionality reduction and
reconstruction accuracy, we call it Balanced Neural ODE (B-NODE). We
demonstrate the effectiveness of this methods on several academic and
real-world test cases, e.g. a power plant or MuJoCo data.
[COMMENTS]
Conference paper under review, after revision
[LINK]
http://arxiv.org/abs/2410.10174v3
[DATE]
2025-01-14 21:11:05+08:00
[CATEGORIES]
cs.LG
Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving
[AUTHORS]
Guizhe Jin, Zhuoren Li, Bo Leng, Wei Han, Lu Xiong, Chen Sun
[ABSTRACT]
Reinforcement Learning (RL) has shown excellent performance in solving
decision-making and control problems of autonomous driving, which is
increasingly applied in diverse driving scenarios. However, driving is a
multi-attribute problem, leading to challenges in achieving multi-objective
compatibility for current RL methods, especially in both policy execution and
policy iteration. On the one hand, the common action space structure with
single action type limits driving flexibility or results in large behavior
fluctuations during policy execution. On the other hand, the multi-attribute
weighted single reward function result in the agent’s disproportionate
attention to certain objectives during policy iterations. To this end, we
propose a Multi-objective Ensemble-Critic reinforcement learning method with
Hybrid Parametrized Action for multi-objective compatible autonomous driving.
Specifically, a parameterized action space is constructed to generate hybrid
driving actions, combining both abstract guidance and concrete control
commands. A multi-objective critics architecture is constructed considering
multiple attribute rewards, to ensure simultaneously focusing on different
driving objectives. Additionally, uncertainty-based exploration strategy is
introduced to help the agent faster approach viable driving policy. The
experimental results in both the simulated traffic environment and the HighD
dataset demonstrate that our method can achieve multi-objective compatible
autonomous driving in terms of driving efficiency, action consistency, and
safety. It enhances the general performance of the driving while significantly
increasing training efficiency.
[COMMENTS]
12 pages, 9 figures, 5 tables
[LINK]
http://arxiv.org/abs/2501.08096v1
[DATE]
2025-01-14 21:10:13+08:00
[CATEGORIES]
cs.LG
Optimal Policy Adaptation under Covariate Shift
[AUTHORS]
Xueqing Liu, Qinwei Yang, Zhaoqing Tian, Ruocheng Guo, Peng Wu
[ABSTRACT]
Transfer learning of prediction models has been extensively studied, while
the corresponding policy learning approaches are rarely discussed. In this
paper, we propose principled approaches for learning the optimal policy in the
target domain by leveraging two datasets: one with full information from the
source domain and the other from the target domain with only covariates. First,
under the setting of covariate shift, we formulate the problem from a
perspective of causality and present the identifiability assumptions for the
reward induced by a given policy. Then, we derive the efficient influence
function and the semiparametric efficiency bound for the reward. Based on this,
we construct a doubly robust and semiparametric efficient estimator for the
reward and then learn the optimal policy by optimizing the estimated reward.
Moreover, we theoretically analyze the bias and the generalization error bound
for the learned policy. Furthermore, in the presence of both covariate and
concept shifts, we propose a novel sensitivity analysis method to evaluate the
robustness of the proposed policy learning approach. Extensive experiments
demonstrate that the approach not only estimates the reward more accurately but
also yields a policy that closely approximates the theoretically optimal
policy.
[LINK]
http://arxiv.org/abs/2501.08067v1
[DATE]
2025-01-14 20:33:02+08:00
[CATEGORIES]
cs.LG
ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection
[AUTHORS]
Delyan Boychev, Radostin Cholakov
[ABSTRACT]
Recent generative models produce images with a level of authenticity that
makes them nearly indistinguishable from real photos and artwork. Potential
harmful use cases of these models, necessitate the creation of robust synthetic
image detectors. However, current datasets in the field contain generated
images with questionable quality or have examples from one predominant content
type which leads to poor generalizability of the underlying detectors. We find
that the curation of a balanced amount of high-resolution generated images
across various content types is crucial for the generalizability of detectors,
and introduce ImagiNet, a dataset of 200K examples, spanning four categories:
photos, paintings, faces, and miscellaneous. Synthetic images in ImagiNet are
produced with both open-source and proprietary generators, whereas real
counterparts for each content type are collected from public datasets. The
structure of ImagiNet allows for a two-track evaluation system: i)
classification as real or synthetic and ii) identification of the generative
model. To establish a strong baseline, we train a ResNet-50 model using a
self-supervised contrastive objective (SelfCon) for each track which achieves
evaluation AUC of up to 0.99 and balanced accuracy ranging from 86% to 95%,
even under conditions that involve compression and resizing. The provided model
is generalizable enough to achieve zero-shot state-of-the-art performance on
previous synthetic detection benchmarks. We provide ablations to demonstrate
the importance of content types and publish code and data.
[COMMENTS]
Workshop on Datasets and Evaluators of AI Safety, AAAI 2025
[LINK]
http://arxiv.org/abs/2407.20020v3
[DATE]
2025-01-14 20:31:48+08:00
[CATEGORIES]
cs.LG
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
[AUTHORS]
Zixuan Wang, Chi-Keung Tang, Yu-Wing Tai
[ABSTRACT]
We introduce Audio-Agent, a multimodal framework for audio generation,
editing and composition based on text or video inputs. Conventional approaches
for text-to-audio (TTA) tasks often make single-pass inferences from text
descriptions. While straightforward, this design struggles to produce
high-quality audio when given complex text conditions. In our method, we
utilize a pre-trained TTA diffusion network as the audio generation agent to
work in tandem with GPT-4, which decomposes the text condition into atomic,
specific instructions and calls the agent for audio generation. In doing so,
Audio-Agent can generate high-quality audio that is closely aligned with the
provided text or video exhibiting complex and multiple events, while supporting
variable-length and variable-volume generation. For video-to-audio (VTA) tasks,
most existing methods require training a timestamp detector to synchronize
video events with the generated audio, a process that can be tedious and
time-consuming. Instead, we propose a simpler approach by fine-tuning a
pre-trained Large Language Model (LLM), e.g., Gemma2-2B-it, to obtain both
semantic and temporal conditions that bridge the video and audio modality.
Consequently, our framework contributes a comprehensive solution for both TTA
and VTA tasks without substantial computational overhead in training.
[LINK]
http://arxiv.org/abs/2410.03335v2
[DATE]
2025-01-14 19:59:03+08:00
[CATEGORIES]
cs.LG
On the use of Statistical Learning Theory for model selection in Structural Health Monitoring
[AUTHORS]
C. A. Lindley, N. Dervilis, K. Worden
[ABSTRACT]
Whenever data-based systems are employed in engineering applications,
defining an optimal statistical representation is subject to the problem of
model selection. This paper focusses on how well models can generalise in
Structural Health Monitoring (SHM). Although statistical model validation in
this field is often performed heuristically, it is possible to estimate
generalisation more rigorously using the bounds provided by Statistical
Learning Theory (SLT). Therefore, this paper explores the selection process of
a kernel smoother for modelling the impulse response of a linear oscillator
from the perspective of SLT. It is demonstrated that incorporating domain
knowledge into the regression problem yields a lower guaranteed risk, thereby
enhancing generalisation.
[LINK]
http://arxiv.org/abs/2501.08050v1
[DATE]
2025-01-14 19:56:05+08:00
[CATEGORIES]
cs.LG
UFGraphFR: An attempt at a federated recommendation system based on user text characteristics
[AUTHORS]
Xudong Wang
[ABSTRACT]
Federated learning has become an important research area in ‘private
computing’ due to the ‘useable invisibility’ of data during training. Inspired
by Federated learning, the federated recommendation system has gradually become
a new recommendation service architecture that can protect users’ privacy. The
use of user diagrams to enhance federated recommendations is a promising topic.
How to use user diagrams to enhance federated recommendations is a promising
research topic. However, it’s a great challenge to construct a user diagram
without compromising privacy in a federated learning scenario. Inspired by the
simple idea that similar users often have the same attribute characteristics,
we propose a personalized federated recommendation algorithm based on the user
relationship graph constructed by the user text characteristics(Graph
Federation Recommendation System based on User Text description Features,
UFGraphFR). The method uses the embedding layer weight of the user’s text
feature description to construct the user relationship graph. It introduces the
Transformer mechanism to capture the sequence modeling of the user’s historical
interaction sequence. Without access to user history interactions and specific
user attributes, the federal learning privacy protection of data ‘useable
invisibility’ is embodied. Preliminary experiments on some benchmark datasets
demonstrate the superior performance of UFGraphFR. Our experiments show that
this model can protect user privacy to some extent without affecting the
performance of the recommendation system. The code will be easily available on
https://github.com/trueWangSyutung/UFGraphFR.
[LINK]
http://arxiv.org/abs/2501.08044v1
[DATE]
2025-01-14 19:52:16+08:00
[CATEGORIES]
cs.LG
PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning
[AUTHORS]
Marta Andronic, Jiawen Li, George A. Constantinides
[ABSTRACT]
Standard deep neural network inference involves the computation of
interleaved linear maps and nonlinear activation functions. Prior work for
ultra-low latency implementations has hardcoded these operations inside FPGA
lookup tables (LUTs). However, FPGA LUTs can implement a much greater variety
of functions. In this paper, we propose a novel approach to training DNNs for
FPGA deployment using multivariate polynomials as the basic building block. Our
method takes advantage of the flexibility offered by the soft logic, hiding the
polynomial evaluation inside the LUTs with minimal overhead. By using
polynomial building blocks, we achieve the same accuracy using considerably
fewer layers of soft logic than by using linear functions, leading to
significant latency and area improvements. LUT-based implementations also face
a significant challenge: the LUT size grows exponentially with the number of
inputs. Prior work relies on a priori fixed sparsity, with results heavily
dependent on seed selection. To address this, we propose a structured pruning
strategy using a bespoke hardware-aware group regularizer that encourages a
particular sparsity pattern that leads to a small number of inputs per neuron.
We demonstrate the effectiveness of PolyLUT on three tasks: network intrusion
detection, jet identification at the CERN Large Hadron Collider, and MNIST.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2309.02334
[LINK]
http://arxiv.org/abs/2501.08043v1
[DATE]
2025-01-14 19:51:57+08:00
[CATEGORIES]
cs.LG
Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks
[AUTHORS]
Samuel Chun-Hei Lam, Justin Sirignano, Konstantinos Spiliopoulos
[ABSTRACT]
Recurrent neural networks (RNNs) are commonly trained with the truncated
backpropagation-through-time (TBPTT) algorithm. For the purposes of
computational tractability, the TBPTT algorithm truncates the chain rule and
calculates the gradient on a finite block of the overall data sequence. Such
approximation could lead to significant inaccuracies, as the block length for
the truncated backpropagation is typically limited to be much smaller than the
overall sequence length. In contrast, Real-time recurrent learning (RTRL) is an
online optimization algorithm which asymptotically follows the true gradient of
the loss on the data sequence as the number of sequence time steps $t
\rightarrow \infty$. RTRL forward propagates the derivatives of the RNN
hidden/memory units with respect to the parameters and, using the forward
derivatives, performs online updates of the parameters at each time step in the
data sequence. RTRL’s online forward propagation allows for exact optimization
over extremely long data sequences, although it can be computationally costly
for models with large numbers of parameters. We prove convergence of the RTRL
algorithm for a class of RNNs. The convergence analysis establishes a fixed
point for the joint distribution of the data sequence, RNN hidden layer, and
the RNN hidden layer forward derivatives as the number of data samples from the
sequence and the number of training steps tend to infinity. We prove
convergence of the RTRL algorithm to a stationary point of the loss. Numerical
studies illustrate our theoretical results. One potential application area for
RTRL is the analysis of financial data, which typically involve long time
series and models with small to medium numbers of parameters. This makes RTRL
computationally tractable and a potentially appealing optimization method for
training models. Thus, we include an example of RTRL applied to limit order
book data.
[LINK]
http://arxiv.org/abs/2501.08040v1
[DATE]
2025-01-14 19:46:36+08:00
[CATEGORIES]
cs.LG
A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation
[AUTHORS]
Hugo Lebeau, Florent Chatelain, Romain Couillet
[ABSTRACT]
This work presents a comprehensive understanding of the estimation of a
planted low-rank signal from a general spiked tensor model near the
computational threshold. Relying on standard tools from the theory of large
random matrices, we characterize the large-dimensional spectral behavior of the
unfoldings of the data tensor and exhibit relevant signal-to-noise ratios
governing the detectability of the principal directions of the signal. These
results allow to accurately predict the reconstruction performance of truncated
multilinear SVD (MLSVD) in the non-trivial regime. This is particularly
important since it serves as an initialization of the higher-order orthogonal
iteration (HOOI) scheme, whose convergence to the best low-multilinear-rank
approximation depends entirely on its initialization. We give a sufficient
condition for the convergence of HOOI and show that the number of iterations
before convergence tends to $1$ in the large-dimensional limit.
[LINK]
http://arxiv.org/abs/2402.03169v3
[DATE]
2025-01-14 19:32:56+08:00
[CATEGORIES]
cs.LG
Fast, Scale-Adaptive, and Uncertainty-Aware Downscaling of Earth System Model Fields with Generative Machine Learning
[AUTHORS]
Philipp Hess, Michael Aich, Baoxiang Pan, Niklas Boers
[ABSTRACT]
Accurate and high-resolution Earth system model (ESM) simulations are
essential to assess the ecological and socio-economic impacts of anthropogenic
climate change, but are computationally too expensive to be run at sufficiently
high spatial resolution. Recent machine learning approaches have shown
promising results in downscaling ESM simulations, outperforming
state-of-the-art statistical approaches. However, existing methods require
computationally costly retraining for each ESM and extrapolate poorly to
climates unseen during training. We address these shortcomings by learning a
consistency model (CM) that efficiently and accurately downscales arbitrary ESM
simulations without retraining in a zero-shot manner. Our approach yields
probabilistic downscaled fields at a resolution only limited by the
observational reference data. We show that the CM outperforms state-of-the-art
diffusion models at a fraction of computational cost while maintaining high
controllability on the downscaling task. Further, our method generalizes to
climate states unseen during training without explicitly formulated physical
constraints.
[LINK]
http://arxiv.org/abs/2403.02774v3
[DATE]
2025-01-14 19:14:57+08:00
[CATEGORIES]
cs.LG
Learning Symmetries via Weight-Sharing with Doubly Stochastic Tensors
[AUTHORS]
Putri A. van der Linden, Alejandro García-Castellanos, Sharvaree Vadgama, Thijs P. Kuipers, Erik J. Bekkers
[ABSTRACT]
Group equivariance has emerged as a valuable inductive bias in deep learning,
enhancing generalization, data efficiency, and robustness. Classically, group
equivariant methods require the groups of interest to be known beforehand,
which may not be realistic for real-world data. Additionally, baking in fixed
group equivariance may impose overly restrictive constraints on model
architecture. This highlights the need for methods that can dynamically
discover and apply symmetries as soft constraints. For neural network
architectures, equivariance is commonly achieved through group transformations
of a canonical weight tensor, resulting in weight sharing over a given group
$G$. In this work, we propose to learn such a weight-sharing scheme by defining
a collection of learnable doubly stochastic matrices that act as soft
permutation matrices on canonical weight tensors, which can take regular group
representations as a special case. This yields learnable kernel transformations
that are jointly optimized with downstream tasks. We show that when the dataset
exhibits strong symmetries, the permutation matrices will converge to regular
group representations and our weight-sharing networks effectively become
regular group convolutions. Additionally, the flexibility of the method enables
it to effectively pick up on partial symmetries.
[COMMENTS]
19 pages, 14 figures, 4 tables
[LINK]
http://arxiv.org/abs/2412.04594v2
[DATE]
2025-01-14 19:03:05+08:00
[CATEGORIES]
cs.LG
Unsupervised Feature Construction for Anomaly Detection in Time Series – An Evaluation
[AUTHORS]
Marine Hamon, Vincent Lemaire, Nour Eddine Yassine Nair-Benrekia, Samuel Berlemont, Julien Cumin
[ABSTRACT]
To detect anomalies with precision and without prior knowledge in time
series, is it better to build a detector from the initial temporal
representation, or to compute a new (tabular) representation using an existing
automatic variable construction library? In this article, we address this
question by conducting an in-depth experimental study for two popular detectors
(Isolation Forest and Local Outlier Factor). The obtained results, for 5
different datasets, show that the new representation, computed using the
tsfresh library, allows Isolation Forest to significantly improve its
performance.
[COMMENTS]
7
[LINK]
http://arxiv.org/abs/2501.07999v1
[DATE]
2025-01-14 18:41:46+08:00
[CATEGORIES]
cs.LG
Scalable and Resource-Efficient Second-Order Federated Learning via Over-the-Air Aggregation
[AUTHORS]
Abdulmomen Ghalkha, Chaouki Ben Issaid, Mehdi Bennis
[ABSTRACT]
Second-order federated learning (FL) algorithms offer faster convergence than
their first-order counterparts by leveraging curvature information. However,
they are hindered by high computational and storage costs, particularly for
large-scale models. Furthermore, the communication overhead associated with
large models and digital transmission exacerbates these challenges, causing
communication bottlenecks. In this work, we propose a scalable second-order FL
algorithm using a sparse Hessian estimate and leveraging over-the-air
aggregation, making it feasible for larger models. Our simulation results
demonstrate more than $67\%$ of communication resources and energy savings
compared to other first and second-order baselines.
[COMMENTS]
6 pages, 1 figure, 4 subfigures, letter
[LINK]
http://arxiv.org/abs/2410.07662v3
[DATE]
2025-01-14 18:41:34+08:00
[CATEGORIES]
cs.LG
Reward Compatibility: A Framework for Inverse RL
[AUTHORS]
Filippo Lazzati, Mirco Mutti, Alberto Metelli
[ABSTRACT]
We provide an original theoretical study of Inverse Reinforcement Learning
(IRL) through the lens of reward compatibility, a novel framework to quantify
the compatibility of a reward with the given expert’s demonstrations.
Intuitively, a reward is more compatible with the demonstrations the closer the
performance of the expert’s policy computed with that reward is to the optimal
performance for that reward. This generalizes the notion of feasible reward
set, the most common framework in the theoretical IRL literature, for which a
reward is either compatible or not compatible. The grayscale introduced by the
reward compatibility is the key to extend the realm of provably efficient IRL
far beyond what is attainable with the feasible reward set: from tabular to
large-scale MDPs. We analyze the IRL problem across various settings, including
optimal and suboptimal expert’s demonstrations and both online and offline data
collection. For all of these dimensions, we provide a tractable algorithm and
corresponding sample complexity analysis, as well as various insights on reward
compatibility and how the framework can pave the way to yet more general
problem settings.
[LINK]
http://arxiv.org/abs/2501.07996v1
[DATE]
2025-01-14 18:39:04+08:00
[CATEGORIES]
cs.LG
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
[AUTHORS]
Qishuai Wen, Chun-Guang Li
[ABSTRACT]
State-of-the-art methods for Transformer-based semantic segmentation
typically adopt Transformer decoders that are used to extract additional
embeddings from image embeddings via cross-attention, refine either or both
types of embeddings via self-attention, and project image embeddings onto the
additional embeddings via dot-product. Despite their remarkable success, these
empirical designs still lack theoretical justifications or interpretations,
thus hindering potentially principled improvements. In this paper, we argue
that there are fundamental connections between semantic segmentation and
compression, especially between the Transformer decoders and Principal
Component Analysis (PCA). From such a perspective, we derive a white-box, fully
attentional DEcoder for PrIncipled semantiC segemenTation (DEPICT), with the
interpretations as follows: 1) the self-attention operator refines image
embeddings to construct an ideal principal subspace that aligns with the
supervision and retains most information; 2) the cross-attention operator seeks
to find a low-rank approximation of the refined image embeddings, which is
expected to be a set of orthonormal bases of the principal subspace and
corresponds to the predefined classes; 3) the dot-product operation yields
compact representation for image embeddings as segmentation masks. Experiments
conducted on dataset ADE20K find that DEPICT consistently outperforms its
black-box counterpart, Segmenter, and it is light weight and more robust.
[COMMENTS]
NeurIPS2024. Code:https://github.com/QishuaiWen/DEPICT/
[LINK]
http://arxiv.org/abs/2411.03033v3
[DATE]
2025-01-14 18:34:00+08:00
[CATEGORIES]
cs.LG
GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model
[AUTHORS]
Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma
[ABSTRACT]
Safe Reinforcement Learning (SRL) aims to realize a safe learning process for
Deep Reinforcement Learning (DRL) algorithms by incorporating safety
constraints. However, the efficacy of SRL approaches often relies on accurate
function approximations, which are notably challenging to achieve in the early
learning stages due to data insufficiency. To address this issue, we introduce
in this work a novel Generalizable Safety enhancer (GenSafe) that is able to
overcome the challenge of data insufficiency and enhance the performance of SRL
approaches. Leveraging model order reduction techniques, we first propose an
innovative method to construct a Reduced Order Markov Decision Process (ROMDP)
as a low-dimensional approximator of the original safety constraints. Then, by
solving the reformulated ROMDP-based constraints, GenSafe refines the actions
of the agent to increase the possibility of constraint satisfaction.
Essentially, GenSafe acts as an additional safety layer for SRL algorithms. We
evaluate GenSafe on multiple SRL approaches and benchmark problems. The results
demonstrate its capability to improve safety performance, especially in the
early learning phases, while maintaining satisfactory task performance. Our
proposed GenSafe not only offers a novel measure to augment existing SRL
methods but also shows broad compatibility with various SRL algorithms, making
it applicable to a wide range of systems and SRL problems.
[LINK]
http://arxiv.org/abs/2406.03912v2
[DATE]
2025-01-14 18:32:32+08:00
[CATEGORIES]
cs.LG
CHEQ-ing the Box: Safe Variable Impedance Learning for Robotic Polishing
[AUTHORS]
Emma Cramer, Lukas Jäschke, Sebastian Trimpe
[ABSTRACT]
Robotic systems are increasingly employed for industrial automation, with
contact-rich tasks like polishing requiring dexterity and compliant behaviour.
These tasks are difficult to model, making classical control challenging. Deep
reinforcement learning (RL) offers a promising solution by enabling the
learning of models and control policies directly from data. However, its
application to real-world problems is limited by data inefficiency and unsafe
exploration. Adaptive hybrid RL methods blend classical control and RL
adaptively, combining the strengths of both: structure from control and
learning from RL. This has led to improvements in data efficiency and
exploration safety. However, their potential for hardware applications remains
underexplored, with no evaluations on physical systems to date. Such
evaluations are critical to fully assess the practicality and effectiveness of
these methods in real-world settings. This work presents an experimental
demonstration of the hybrid RL algorithm CHEQ for robotic polishing with
variable impedance, a task requiring precise force and velocity tracking. In
simulation, we show that variable impedance enhances polishing performance. We
compare standalone RL with adaptive hybrid RL, demonstrating that CHEQ achieves
effective learning while adhering to safety constraints. On hardware, CHEQ
achieves effective polishing behaviour, requiring only eight hours of training
and incurring just five failures. These results highlight the potential of
adaptive hybrid RL for real-world, contact-rich tasks trained directly on
hardware.
[LINK]
http://arxiv.org/abs/2501.07985v1
[DATE]
2025-01-14 18:13:41+08:00
[CATEGORIES]
cs.LG
Fair CoVariance Neural Networks
[AUTHORS]
Andrea Cavallo, Madeline Navarro, Santiago Segarra, Elvin Isufi
[ABSTRACT]
Covariance-based data processing is widespread across signal processing and
machine learning applications due to its ability to model data
interconnectivities and dependencies. However, harmful biases in the data may
become encoded in the sample covariance matrix and cause data-driven methods to
treat different subpopulations unfairly. Existing works such as fair principal
component analysis (PCA) mitigate these effects, but remain unstable in low
sample regimes, which in turn may jeopardize the fairness goal. To address both
biases and instability, we propose Fair coVariance Neural Networks (FVNNs),
which perform graph convolutions on the covariance matrix for both fair and
accurate predictions. Our FVNNs provide a flexible model compatible with
several existing bias mitigation techniques. In particular, FVNNs allow for
mitigating the bias in two ways: first, they operate on fair covariance
estimates that remove biases from their principal components; second, they are
trained in an end-to-end fashion via a fairness regularizer in the loss
function so that the model parameters are tailored to solve the task directly
in a fair manner. We prove that FVNNs are intrinsically fairer than analogous
PCA approaches thanks to their stability in low sample regimes. We validate the
robustness and fairness of our model on synthetic and real-world data,
showcasing the flexibility of FVNNs along with the tradeoff between fair and
accurate performance.
[LINK]
http://arxiv.org/abs/2409.08558v2
[DATE]
2025-01-14 18:02:39+08:00
[CATEGORIES]
cs.LG
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
[AUTHORS]
Charles O’Neill
[ABSTRACT]
Self-attention mechanisms have revolutionised deep learning architectures,
yet their core mathematical structures remain incompletely understood. In this
work, we develop a category-theoretic framework focusing on the linear
components of self-attention. Specifically, we show that the query, key, and
value maps naturally define a parametric 1-morphism in the 2-category
$\mathbf{Para(Vect)}$. On the underlying 1-category $\mathbf{Vect}$, these maps
induce an endofunctor whose iterated composition precisely models multi-layer
attention. We further prove that stacking multiple self-attention layers
corresponds to constructing the free monad on this endofunctor. For positional
encodings, we demonstrate that strictly additive embeddings correspond to
monoid actions in an affine sense, while standard sinusoidal encodings, though
not additive, retain a universal property among injective (faithful)
position-preserving maps. We also establish that the linear portions of
self-attention exhibit natural equivariance to permutations of input tokens,
and show how the “circuits” identified in mechanistic interpretability can be
interpreted as compositions of parametric 1-morphisms. This categorical
perspective unifies geometric, algebraic, and interpretability-based approaches
to transformer analysis, making explicit the underlying structures of
attention. We restrict to linear maps throughout, deferring the treatment of
nonlinearities such as softmax and layer normalisation, which require more
advanced categorical constructions. Our results build on and extend recent work
on category-theoretic foundations for deep learning, offering deeper insights
into the algebraic structure of attention mechanisms.
[LINK]
http://arxiv.org/abs/2501.02931v2
[DATE]
2025-01-14 18:01:41+08:00
[CATEGORIES]
cs.LG
Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process
[AUTHORS]
Shuhei Watanabe
[ABSTRACT]
Gaussian process (GP) is arguably one of the most widely used machine
learning algorithms in practice. One of its prominent applications is Bayesian
optimization (BO). Although the vanilla GP itself is already a powerful tool
for BO, it is often beneficial to be able to consider the dependencies of
multiple outputs. To do so, Multi-task GP (MTGP) is formulated, but it is not
trivial to fully understand the derivations of its formulations and their
gradients from the previous literature. This paper serves friendly derivations
of the MTGP formulations and their gradients.
[LINK]
http://arxiv.org/abs/2501.07964v1
[DATE]
2025-01-14 17:35:49+08:00
[CATEGORIES]
cs.LG
Synthesis and Analysis of Data as Probability Measures with Entropy-Regularized Optimal Transport
[AUTHORS]
Brendan Mallery, James M. Murphy, Shuchin Aeron
[ABSTRACT]
We consider synthesis and analysis of probability measures using the
entropy-regularized Wasserstein-2 cost and its unbiased version, the Sinkhorn
divergence. The synthesis problem consists of computing the barycenter, with
respect to these costs, of $m$ reference measures given a set of coefficients
belonging to the $m$-dimensional simplex. The analysis problem consists of
finding the coefficients for the closest barycenter in the Wasserstein-2
distance to a given measure $\mu$. Under the weakest assumptions on the
measures thus far in the literature, we compute the derivative of the
entropy-regularized Wasserstein-2 cost. We leverage this to establish a
characterization of regularized barycenters as solutions to a fixed-point
equation for the average of the entropic maps from the barycenter to the
reference measures. This characterization yields a finite-dimensional, convex,
quadratic program for solving the analysis problem when $\mu$ is a barycenter.
It is shown that these coordinates, as well as the value of the barycenter
functional, can be estimated from samples with dimension-independent rates of
convergence, a hallmark of entropy-regularized optimal transport, and we verify
these rates experimentally. We also establish that barycentric coordinates are
stable with respect to perturbations in the Wasserstein-2 metric, suggesting a
robustness of these coefficients to corruptions. We employ the barycentric
coefficients as features for classification of corrupted point cloud data, and
show that compared to neural network baselines, our approach is more efficient
in small training data regimes.
[COMMENTS]
58 pages. Code to reproduce experiments:
https://github.com/brendanmallery9/Entropic-Barycenters
[LINK]
http://arxiv.org/abs/2501.07446v2
[DATE]
2025-01-14 17:17:26+08:00
[CATEGORIES]
cs.LG
Set-Based Training for Neural Network Verification
[AUTHORS]
Lukas Koller, Tobias Ladner, Matthias Althoff
[ABSTRACT]
Neural networks are vulnerable to adversarial attacks, i.e., small input
perturbations can significantly affect the outputs of a neural network.
Therefore, to ensure safety of safety-critical environments, the robustness of
a neural network must be formally verified against input perturbations, e.g.,
from noisy sensors. To improve the robustness of neural networks and thus
simplify the formal verification, we present a novel set-based training
procedure in which we compute the set of possible outputs given the set of
possible inputs and compute for the first time a gradient set, i.e., each
possible output has a different gradient. Therefore, we can directly reduce the
size of the output enclosure by choosing gradients toward its center. Small
output enclosures increase the robustness of a neural network and, at the same
time, simplify its formal verification. The latter benefit is due to the fact
that a larger size of propagated sets increases the conservatism of most
verification methods. Our extensive evaluation demonstrates that set-based
training produces robust neural networks with competitive performance, which
can be verified using fast (polynomial-time) verification algorithms due to the
reduced output set.
[LINK]
http://arxiv.org/abs/2401.14961v3
[DATE]
2025-01-14 16:56:48+08:00
[CATEGORIES]
cs.LG
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
[AUTHORS]
Jipeng Han
[ABSTRACT]
Program synthesis methods, whether formal or neural-based, lack fine-grained
control and flexible modularity, which limits their adaptation to complex
software development. These limitations stem from rigid Domain-Specific
Language (DSL) frameworks and neural network incorrect predictions. To this
end, we propose the Chain of Logic (CoL), which organizes the synthesis process
into an activity flow and provides heuristic control to guide the process.
Furthermore, by integrating neural networks with libraries and introducing a
Neural Network Feedback Control (NNFC) mechanism, our approach modularizes
synthesis and mitigates the impact of neural network mispredictions.
Experiments on relational and symbolic synthesis tasks show that CoL
significantly enhances the efficiency and reliability of DSL program synthesis
across multiple metrics. Specifically, CoL improves accuracy by 70% while
reducing tree operations by 91% and time by 95%. Additionally, NNFC further
boosts accuracy by 6%, with a 64% reduction in tree operations under
challenging conditions such as insufficient training data, increased
difficulty, and multidomain synthesis. These improvements confirm COOL as a
highly efficient and reliable program synthesis framework.
[COMMENTS]
31 pages, 11 figures
[LINK]
http://arxiv.org/abs/2410.13874v4
[DATE]
2025-01-14 16:42:23+08:00
[CATEGORIES]
cs.LG
Phase of Flight Classification in Aviation Safety using LSTM, GRU, and BiLSTM: A Case Study with ASN Dataset
[AUTHORS]
Aziida Nanyonga, Hassan Wasswa, Graham Wild
[ABSTRACT]
Safety is the main concern in the aviation industry, where even minor
operational issues can lead to serious consequences. This study addresses the
need for comprehensive aviation accident analysis by leveraging natural
language processing (NLP) and advanced AI models to classify the phase of
flight from unstructured aviation accident analysis narratives. The research
aims to determine whether the phase of flight can be inferred from narratives
of post-accident events using NLP techniques. The classification performance of
various deep learning models was evaluated. For single RNN-based models, LSTM
achieved an accuracy of 63%, precision 60%, and recall 61%. BiLSTM recorded an
accuracy of 64%, precision 63%, and a recall of 64%. GRU exhibited balanced
performance with an accuracy and recall of 60% and a precision of 63%. Joint
RNN-based models further enhanced predictive capabilities. GRU-LSTM,
LSTM-BiLSTM, and GRU-BiLSTM demonstrated accuracy rates of 62%, 67%, and 60%,
respectively, showcasing the benefits of combining these architectures. To
provide a comprehensive overview of model performance, single and combined
models were compared in terms of the various metrics. These results underscore
the models’ capacity to classify the phase of flight from raw text narratives,
equipping aviation industry stakeholders with valuable insights for proactive
decision-making. Therefore, this research signifies a substantial advancement
in the application of NLP and deep learning models to enhance aviation safety.
[COMMENTS]
Aviation Safety, Deep learning algorithms, Flight phase, NLP, ASN,
and Classification
[LINK]
http://arxiv.org/abs/2501.07925v1
[DATE]
2025-01-14 16:26:58+08:00
[CATEGORIES]
cs.LG
Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments
[AUTHORS]
Mohamed A. Taha
[ABSTRACT]
Long-range sequence modeling is a crucial aspect of natural language
processing and time series analysis. However, traditional models like Recurrent
Neural Networks (RNNs) and Transformers suffer from computational and memory
inefficiencies, especially when dealing with long sequences. This paper
introduces Logarithmic Memory Networks (LMNs), a novel architecture that
leverages a hierarchical logarithmic tree structure to efficiently store and
retrieve past information. LMNs dynamically summarize historical context,
significantly reducing the memory footprint and computational complexity of
attention mechanisms from O(n2) to O(log(n)). The model employs a
single-vector, targeted attention mechanism to access stored information, and
the memory block construction worker (summarizer) layer operates in two modes:
a parallel execution mode during training for efficient processing of
hierarchical tree structures and a sequential execution mode during inference,
which acts as a memory management system. It also implicitly encodes positional
information, eliminating the need for explicit positional encodings. These
features make LMNs a robust and scalable solution for processing long-range
sequences in resource-constrained environments, offering practical improvements
in efficiency and scalability. The code is publicly available under the MIT
License on GitHub: https://github.com/AhmedBoin/LogarithmicMemory.
[COMMENTS]
18 pages, 10 figures
[LINK]
http://arxiv.org/abs/2501.07905v1
[DATE]
2025-01-14 15:50:09+08:00
[CATEGORIES]
cs.LG
Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound
[AUTHORS]
Catalin E. Brita, Jacobus G. M. van der Linden, Emir Demirović
[ABSTRACT]
Computing an optimal classification tree that provably maximizes training
performance within a given size limit, is NP-hard, and in practice, most
state-of-the-art methods do not scale beyond computing optimal trees of depth
three. Therefore, most methods rely on a coarse binarization of continuous
features to maintain scalability. We propose a novel algorithm that optimizes
trees directly on the continuous feature data using dynamic programming with
branch-and-bound. We develop new pruning techniques that eliminate many
sub-optimal splits in the search when similar to previously computed splits and
we provide an efficient subroutine for computing optimal depth-two trees. Our
experiments demonstrate that these techniques improve runtime by one or more
orders of magnitude over state-of-the-art optimal methods and improve test
accuracy by 5% over greedy heuristics.
[COMMENTS]
In the proceedings of AAAI-25
[LINK]
http://arxiv.org/abs/2501.07903v1
[DATE]
2025-01-14 15:46:33+08:00
[CATEGORIES]
cs.LG
Layer-Adaptive State Pruning for Deep State Space Models
[AUTHORS]
Minseon Gwak, Seongrok Moon, Joohwan Ko, PooGyeon Park
[ABSTRACT]
Due to the lack of state dimension optimization methods, deep state space
models (SSMs) have sacrificed model capacity, training search space, or
stability to alleviate computational costs caused by high state dimensions. In
this work, we provide a structured pruning method for SSMs, Layer-Adaptive
STate pruning (LAST), which reduces the state dimension of each layer in
minimizing model-level output energy loss by extending modal truncation for a
single system. LAST scores are evaluated using the $\mathcal{H}_{\infty}$ norms
of subsystems and layer-wise energy normalization. The scores serve as global
pruning criteria, enabling cross-layer comparison of states and layer-adaptive
pruning. Across various sequence benchmarks, LAST optimizes previous SSMs,
revealing the redundancy and compressibility of their state spaces. Notably, we
demonstrate that, on average, pruning 33% of states still maintains performance
with 0.52% accuracy loss in multi-input multi-output SSMs without retraining.
Code is available at https://github.com/msgwak/LAST.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.02824v2
[DATE]
2025-01-14 15:30:20+08:00
[CATEGORIES]
cs.LG
Data-driven Bayesian State Estimation with Compressed Measurement of Model-free Process using Semi-supervised Learning
[AUTHORS]
Anubhab Ghosh, Yonina C. Eldar, Saikat Chatterjee
[ABSTRACT]
The research topic is: data-driven Bayesian state estimation with compressed
measurement (BSCM) of model-free process, say for a (causal) tracking
application. The dimension of the temporal measurement vector is lower than the
dimension of the temporal state vector to be estimated. Hence the state
estimation problem is an underdetermined inverse problem. The underlying
dynamical model of the states is assumed to be unknown and hence, we use the
terminology ‘model-free process’. In absence of the dynamical model, we can not
employ traditional model-driven methods like Kalman Filter (KF) and Particle
Filter (PF), and instead require data-driven methods. We first experimentally
show that two existing unsupervised learning-based data-driven methods fail to
address the BSCM problem for model-free process; they are - data-driven
nonlinear state estimation (DANSE) method and deep Markov model (DMM) method.
The unsupervised learning uses unlabelled data comprised of only noisy, linear
measurements. While DANSE provides a good predictive / forecasting performance
to model the temporal measurement data as time-series, its unsupervised
learning lacks a regularization for state estimation. We then investigate the
use of a semi-supervised learning approach, and develop a semi-supervised
learning-based DANSE method, referred to as SemiDANSE. In SemiDANSE, we use a
limited amount of labelled data along-with a large amount of unlabelled data,
and that helps to bring the desired regularization for addressing the BSCM
problem. The labelled data means pairwise measurement-and-state data. Using
three chaotic dynamical systems (or processes) with nonlinear dynamical models
as benchmark, we show that the data-driven SemiDANSE provides competitive
performance for BSCM against a hybrid method called KalmanNet and two
model-driven methods – an extended KF (EKF) and an unscented KF (UKF).
[COMMENTS]
14 pages, under review at IEEE TSP
[LINK]
http://arxiv.org/abs/2407.07368v2
[DATE]
2025-01-14 15:28:06+08:00
[CATEGORIES]
cs.LG
FoMo: A Foundation Model for Mobile Traffic Forecasting with Diffusion Model
[AUTHORS]
Haoye Chai, Xiaoqian Qi, Shiyuan Zhang, Yong Li
[ABSTRACT]
Mobile traffic forecasting allows operators to anticipate network dynamics
and performance in advance, offering substantial potential for enhancing
service quality and improving user experience. However, existing models are
often task-oriented and are trained with tailored data, which limits their
effectiveness in diverse mobile network tasks of Base Station (BS) deployment,
resource allocation, energy optimization, etc. and hinders generalization
across different urban environments. Foundation models have made remarkable
strides across various domains of NLP and CV due to their multi-tasking
adaption and zero/few-shot learning capabilities. In this paper, we propose an
innovative Foundation model for Mo}bile traffic forecasting (FoMo), aiming to
handle diverse forecasting tasks of short/long-term predictions and
distribution generation across multiple cities to support network planning and
optimization. FoMo combines diffusion models and transformers, where various
spatio-temporal masks are proposed to enable FoMo to learn intrinsic features
of different tasks, and a contrastive learning strategy is developed to capture
the correlations between mobile traffic and urban contexts, thereby improving
its transfer learning capability. Extensive experiments on 9 real-world
datasets demonstrate that FoMo outperforms current models concerning diverse
forecasting tasks and zero/few-shot learning, showcasing a strong universality.
[COMMENTS]
11 pages, 7 figures
[LINK]
http://arxiv.org/abs/2410.15322v2
[DATE]
2025-01-14 14:59:12+08:00
[CATEGORIES]
cs.LG
Generating Less Certain Adversarial Examples Improves Robust Generalization
[AUTHORS]
Minxing Zhang, Michael Backes, Xiao Zhang
[ABSTRACT]
This paper revisits the robust overfitting phenomenon of adversarial
training. Observing that models with better robust generalization performance
are less certain in predicting adversarially generated training inputs, we
argue that overconfidence in predicting adversarial examples is a potential
cause. Therefore, we hypothesize that generating less certain adversarial
examples improves robust generalization, and propose a formal definition of
adversarial certainty that captures the variance of the model’s predicted
logits on adversarial examples. Our theoretical analysis of synthetic
distributions characterizes the connection between adversarial certainty and
robust generalization. Accordingly, built upon the notion of adversarial
certainty, we develop a general method to search for models that can generate
training-time adversarial inputs with reduced certainty, while maintaining the
model’s capability in distinguishing adversarial examples. Extensive
experiments on image benchmarks demonstrate that our method effectively learns
models with consistently improved robustness and mitigates robust overfitting,
confirming the importance of generating less certain adversarial examples for
robust generalization. Our implementations are available as open-source code
at: https://github.com/TrustMLRG/AdvCertainty.
[COMMENTS]
Published in Transactions on Machine Learning Research (TMLR)
[LINK]
http://arxiv.org/abs/2310.04539v4
[DATE]
2025-01-14 14:42:51+08:00
[CATEGORIES]
cs.LG
Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal
[AUTHORS]
Deheng Yuan, Tao Guo, Zhongyi Huang
[ABSTRACT]
Consider the communication-constrained problem of nonparametric function
estimation, in which each distributed terminal holds multiple i.i.d. samples.
Under certain regularity assumptions, we characterize the minimax optimal rates
for all regimes, and identify phase transitions of the optimal rates as the
samples per terminal vary from sparse to dense. This fully solves the problem
left open by previous works, whose scopes are limited to regimes with either
dense samples or a single sample per terminal. To achieve the optimal rates, we
design a layered estimation protocol by exploiting protocols for the parametric
density estimation problem. We show the optimality of the protocol using
information-theoretic methods and strong data processing inequalities, and
incorporating the classic balls and bins model. The optimal rates are immediate
for various special cases such as density estimation, Gaussian, binary, Poisson
and heteroskedastic regression models.
[LINK]
http://arxiv.org/abs/2501.07879v1
[DATE]
2025-01-14 14:41:55+08:00
[CATEGORIES]
cs.LG
Exploring Gradient Subspaces: Addressing and Overcoming LoRA’s Limitations in Federated Fine-Tuning of Large Language Models
[AUTHORS]
Navyansh Mahla, Kshitij Sharad Jadhav, Ganesh Ramakrishnan
[ABSTRACT]
Large Language Models (LLMs) have demonstrated remarkable capabilities across
various domains, particularly in task generalization for both text and vision
data. While fine-tuning these models can significantly enhance their
performance on specific downstream tasks, it often requires high-quality data
that cannot be shared due to privacy concerns. Federated Learning (FL) offers a
promising solution for collaborative training without direct data sharing.
However, many parameter-efficient fine-tuning strategies for LLMs in FL,
particularly those based on Low-Rank Adaptation (LoRA), face limitations. In
this paper, we critically analyze the convergence and performance guarantees of
popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to
constrained subspace learning of low-rank matrices. This limitation hinders
effective fine-tuning of LLMs in federated settings. Through rigorous
analytical and empirical evaluations, we demonstrate that direct weight
averaging outperforms LoRA-based strategies, leading to superior performance
for fine-tuned models. Our comprehensive comparison unmasks inefficiencies in
LoRA approaches and underscores the advantages of direct weight aggregation. We
extend our analysis to low-rank gradient-based optimizers, such as GaLore, used
during local training steps. Our findings show that GaLore along with
direct-weight aggregation is a more effective approach, outperforming federated
LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities.
While privacy remains paramount in FL discourse, our focus is on assessing
performance outcomes of federated fine-tuned models and evaluating various FL
frameworks from both theoretical and empirical perspectives. Our findings
advocate reassessing the reliance on LoRA within FL contexts, paving the way
for more efficient training methodologies.
[LINK]
http://arxiv.org/abs/2410.23111v6
[DATE]
2025-01-14 14:25:54+08:00
[CATEGORIES]
cs.LG
Random Policy Enables In-Context Reinforcement Learning within Trust Horizons
[AUTHORS]
Weiqin Chen, Santiago Paternain
[ABSTRACT]
Pretrained foundation models have exhibited extraordinary in-context learning
performance, allowing zero-shot generalization to new tasks not encountered
during pretraining. In the case of reinforcement learning (RL), in-context RL
(ICRL) emerges when pretraining FMs on decision-making problems in an
autoregressive-supervised manner. Nevertheless, current state-of-the-art ICRL
algorithms, like Algorithm Distillation, Decision Pretrained Transformer and
Decision Importance Transformer, impose stringent requirements on the
pretraining dataset concerning the source policies, context information, and
action labels. Notably, these algorithms either demand optimal policies or
require varying degrees of well-trained behavior policies for all pretraining
environments. This significantly hinders the application of ICRL to real-world
scenarios, where acquiring optimal or well-trained policies for a substantial
volume of real-world training environments can be intractable. To overcome this
challenge, we introduce a novel approach, termed State-Action Distillation
(SAD), that allows to generate an effective pretraining dataset guided solely
by random policies. In particular, SAD selects query states and corresponding
action labels by distilling outstanding state-action pairs from the entire
state and action spaces by using random policies within a trust horizon, and
then inherits the classical autoregressive-supervised mechanism during
pretraining. To the best of our knowledge, this is the first work that enables
effective ICRL under random policies and random contexts. We also establish
quantitative analysis of the trustworthiness as well as the performance
guarantees of SAD. Moreover, our empirical results across multiple popular ICRL
benchmark environments demonstrate that, on average, SAD outperforms the best
baseline by 236.3% in the offline evaluation and by 135.2% in the online
evaluation.
[LINK]
http://arxiv.org/abs/2410.19982v2
[DATE]
2025-01-14 14:18:03+08:00
[CATEGORIES]
cs.LG
Doubly-Bounded Queue for Constrained Online Learning: Keeping Pace with Dynamics of Both Loss and Constraint
[AUTHORS]
Juncheng Wang, Bingjie Yan, Yituo Liu
[ABSTRACT]
We consider online convex optimization with time-varying constraints and
conduct performance analysis using two stringent metrics: dynamic regret with
respect to the online solution benchmark, and hard constraint violation that
does not allow any compensated violation over time. We propose an efficient
algorithm called Constrained Online Learning with Doubly-bounded Queue (COLDQ),
which introduces a novel virtual queue that is both lower and upper bounded,
allowing tight control of the constraint violation without the need for the
Slater condition. We prove via a new Lyapunov drift analysis that COLDQ
achieves $O(T^\frac{1+V_x}{2})$ dynamic regret and $O(T^{V_g})$ hard constraint
violation, where $V_x$ and $V_g$ capture the dynamics of the loss and
constraint functions. For the first time, the two bounds smoothly approach to
the best-known $O(T^\frac{1}{2})$ regret and $O(1)$ violation, as the dynamics
of the losses and constraints diminish. For strongly convex loss functions,
COLDQ matches the best-known $O(\log{T})$ static regret while maintaining the
$O(T^{V_g})$ hard constraint violation. We further introduce an expert-tracking
variation of COLDQ, which achieves the same performance bounds without any
prior knowledge of the system dynamics. Simulation results demonstrate that
COLDQ outperforms the state-of-the-art approaches.
[COMMENTS]
To appear in AAAI 2025
[LINK]
http://arxiv.org/abs/2412.10703v2
[DATE]
2025-01-14 14:02:00+08:00
[CATEGORIES]
cs.LG
State-of-the-Art Transformer Models for Image Super-Resolution: Techniques, Challenges, and Applications
[AUTHORS]
Debasish Dutta, Deepjyoti Chetia, Neeharika Sonowal, Sanjib Kr Kalita
[ABSTRACT]
Image Super-Resolution (SR) aims to recover a high-resolution image from its
low-resolution counterpart, which has been affected by a specific degradation
process. This is achieved by enhancing detail and visual quality. Recent
advancements in transformer-based methods have remolded image super-resolution
by enabling high-quality reconstructions surpassing previous deep-learning
approaches like CNN and GAN-based. This effectively addresses the limitations
of previous methods, such as limited receptive fields, poor global context
capture, and challenges in high-frequency detail recovery. Additionally, the
paper reviews recent trends and advancements in transformer-based SR models,
exploring various innovative techniques and architectures that combine
transformers with traditional networks to balance global and local contexts.
These neoteric methods are critically analyzed, revealing promising yet
unexplored gaps and potential directions for future research. Several
visualizations of models and techniques are included to foster a holistic
understanding of recent trends. This work seeks to offer a structured roadmap
for researchers at the forefront of deep learning, specifically exploring the
impact of transformers on super-resolution techniques.
[COMMENTS]
8 pages
[LINK]
http://arxiv.org/abs/2501.07855v1
[DATE]
2025-01-14 13:43:59+08:00
[CATEGORIES]
cs.LG
An Intra- and Cross-frame Topological Consistency Scheme for Semi-supervised Atherosclerotic Coronary Plaque Segmentation
[AUTHORS]
Ziheng Zhang, Zihan Li, Dandan Shan, Yuehui Qiu, Qingqi Hong, Qingqiang Wu
[ABSTRACT]
Enhancing the precision of segmenting coronary atherosclerotic plaques from
CT Angiography (CTA) images is pivotal for advanced Coronary Atherosclerosis
Analysis (CAA), which distinctively relies on the analysis of vessel
cross-section images reconstructed via Curved Planar Reformation. This task
presents significant challenges due to the indistinct boundaries and structures
of plaques and blood vessels, leading to the inadequate performance of current
deep learning models, compounded by the inherent difficulty in annotating such
complex data. To address these issues, we propose a novel dual-consistency
semi-supervised framework that integrates Intra-frame Topological Consistency
(ITC) and Cross-frame Topological Consistency (CTC) to leverage labeled and
unlabeled data. ITC employs a dual-task network for simultaneous segmentation
mask and Skeleton-aware Distance Transform (SDT) prediction, achieving similar
prediction of topology structure through consistency constraint without
additional annotations. Meanwhile, CTC utilizes an unsupervised estimator for
analyzing pixel flow between skeletons and boundaries of adjacent frames,
ensuring spatial continuity. Experiments on two CTA datasets show that our
method surpasses existing semi-supervised methods and approaches the
performance of supervised methods on CAA. In addition, our method also performs
better than other methods on the ACDC dataset, demonstrating its
generalization.
[COMMENTS]
Accepted by ICASSP 2025
[LINK]
http://arxiv.org/abs/2501.07850v1
[DATE]
2025-01-14 13:23:42+08:00
[CATEGORIES]
cs.LG
AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making
[AUTHORS]
Yizhe Huang, Xingbo Wang, Hao Liu, Fanqi Kong, Aoyang Qin, Min Tang, Song-Chun Zhu, Mingjie Bi, Siyuan Qi, Xue Feng
[COMMENTS]
Accepted at NeurIPS D&B 2024
[LINK]
http://arxiv.org/abs/2411.03865v4
[DATE]
2025-01-14 13:23:03+08:00
[CATEGORIES]
cs.LG
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
[AUTHORS]
Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan
[ABSTRACT]
Transformer-based models have emerged as one of the most widely used
architectures for natural language processing, natural language generation, and
image generation. The size of the state-of-the-art models has increased
steadily reaching billions of parameters. These huge models are memory hungry
and incur significant inference latency even on cutting edge AI-accelerators,
such as GPUs. Specifically, the time and memory complexity of the attention
operation is quadratic in terms of the total context length, i.e., prompt and
output tokens. Thus, several optimizations such as key-value tensor caching and
FlashAttention computation have been proposed to deliver the low latency
demands of applications relying on such large models. However, these techniques
do not cater to the computationally distinct nature of different phases during
inference.
To that end, we propose LeanAttention, a scalable technique of computing
self-attention for the token-generation phase (decode-phase) of decoder-only
transformer models. LeanAttention enables scaling the attention mechanism
implementation for the challenging case of long context lengths by re-designing
the execution flow for the decode-phase. We identify that the associative
property of online softmax can be treated as a reduction operation thus
allowing us to parallelize the attention computation over these large context
lengths. We extend the “stream-K” style reduction of tiled calculation to
self-attention to enable parallel computation resulting in an average of 2.6x
attention execution speedup over FlashAttention-2 and up to 8.33x speedup for
512k context lengths.
[COMMENTS]
13 pages, 10 figures
[LINK]
http://arxiv.org/abs/2405.10480v2
[DATE]
2025-01-14 13:00:34+08:00
[CATEGORIES]
cs.LG
Poisoning Attacks on Federated Learning-based Wireless Traffic Prediction
[AUTHORS]
Zifan Zhang, Minghong Fang, Jiayuan Huang, Yuchen Liu
[ABSTRACT]
Federated Learning (FL) offers a distributed framework to train a global
control model across multiple base stations without compromising the privacy of
their local network data. This makes it ideal for applications like wireless
traffic prediction (WTP), which plays a crucial role in optimizing network
resources, enabling proactive traffic flow management, and enhancing the
reliability of downstream communication-aided applications, such as IoT
devices, autonomous vehicles, and industrial automation systems. Despite its
promise, the security aspects of FL-based distributed wireless systems,
particularly in regression-based WTP problems, remain inadequately
investigated. In this paper, we introduce a novel fake traffic injection (FTI)
attack, designed to undermine the FL-based WTP system by injecting fabricated
traffic distributions with minimal knowledge. We further propose a defense
mechanism, termed global-local inconsistency detection (GLID), which
strategically removes abnormal model parameters that deviate beyond a specific
percentile range estimated through statistical methods in each dimension.
Extensive experimental evaluations, performed on real-world wireless traffic
datasets, demonstrate that both our attack and defense strategies significantly
outperform existing baselines.
[COMMENTS]
Accepted by IFIP/IEEE Networking 2024
[LINK]
http://arxiv.org/abs/2404.14389v2
[DATE]
2025-01-14 12:58:26+08:00
[CATEGORIES]
cs.LG
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
[AUTHORS]
Jitao Wang, Chengchun Shi, John D. Piette, Joshua R. Loftus, Donglin Zeng, Zhenke Wu
[ABSTRACT]
When applied in healthcare, reinforcement learning (RL) seeks to dynamically
match the right interventions to subjects to maximize population benefit.
However, the learned policy may disproportionately allocate efficacious actions
to one subpopulation, creating or exacerbating disparities in other
socioeconomically-disadvantaged subgroups. These biases tend to occur in
multi-stage decision making and can be self-perpetuating, which if unaccounted
for could cause serious unintended consequences that limit access to care or
treatment benefit. Counterfactual fairness (CF) offers a promising statistical
tool grounded in causal inference to formulate and study fairness. In this
paper, we propose a general framework for fair sequential decision making. We
theoretically characterize the optimal CF policy and prove its stationarity,
which greatly simplifies the search for optimal CF policies by leveraging
existing RL algorithms. The theory also motivates a sequential data
preprocessing algorithm to achieve CF decision making under an additive noise
assumption. We prove and then validate our policy learning approach in
controlling unfairness and attaining optimal value through simulations.
Analysis of a digital health dataset designed to reduce opioid misuse shows
that our proposal greatly enhances fair access to counseling.
[LINK]
http://arxiv.org/abs/2501.06366v2
[DATE]
2025-01-14 12:42:08+08:00
[CATEGORIES]
cs.LG
Flow: A Modular Approach to Automated Agentic Workflow Generation
[AUTHORS]
Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, Tongliang Liu
[ABSTRACT]
Multi-agent frameworks powered by large language models (LLMs) have
demonstrated great success in automated planning and task execution. However,
the effective adjustment of Agentic workflows during execution has not been
well-studied. A effective workflow adjustment is crucial, as in many real-world
scenarios, the initial plan must adjust to unforeseen challenges and changing
conditions in real-time to ensure the efficient execution of complex tasks. In
this paper, we define workflows as an activity-on-vertex (AOV) graphs. We
continuously refine the workflow by dynamically adjusting task allocations
based on historical performance and previous AOV with LLM agents. To further
enhance system performance, we emphasize modularity in workflow design based on
measuring parallelism and dependence complexity. Our proposed multi-agent
framework achieved efficient sub-task concurrent execution, goal achievement,
and error tolerance. Empirical results across different practical tasks
demonstrate dramatic improvements in the efficiency of multi-agent frameworks
through dynamic workflow updating and modularization.
[LINK]
http://arxiv.org/abs/2501.07834v1
[DATE]
2025-01-14 12:35:37+08:00
[CATEGORIES]
cs.LG
Computational and Statistical Asymptotic Analysis of the JKO Scheme for Iterative Algorithms to update distributions
[AUTHORS]
Shang Wu, Yazhen Wang
[ABSTRACT]
The seminal paper of Jordan, Kinderlehrer, and Otto introduced what is now
widely known as the JKO scheme, an iterative algorithmic framework for
computing distributions. This scheme can be interpreted as a Wasserstein
gradient flow and has been successfully applied in machine learning contexts,
such as deriving policy solutions in reinforcement learning. In this paper, we
extend the JKO scheme to accommodate models with unknown parameters.
Specifically, we develop statistical methods to estimate these parameters and
adapt the JKO scheme to incorporate the estimated values. To analyze the
adopted statistical JKO scheme, we establish an asymptotic theory via
stochastic partial differential equations that describes its limiting dynamic
behavior. Our framework allows both the sample size used in parameter
estimation and the number of algorithmic iterations to go to infinity. This
study offers a unified framework for joint computational and statistical
asymptotic analysis of the statistical JKO scheme. On the computational side,
we examine the scheme’s dynamic behavior as the number of iterations increases,
while on the statistical side, we investigate the large-sample behavior of the
resulting distributions computed through the scheme. We conduct numerical
simulations to evaluate the finite-sample performance of the proposed methods
and validate the developed asymptotic theory.
[LINK]
http://arxiv.org/abs/2501.06408v2
[DATE]
2025-01-14 12:30:31+08:00
[CATEGORIES]
cs.LG
AI Foundation Models for Wearable Movement Data in Mental Health Research
[AUTHORS]
Franklin Y. Ruan, Aiwei Zhang, Jenny Y. Oh, SouYoung Jin, Nicholas C. Jacobson
[ABSTRACT]
Pretrained foundation models and transformer architectures have driven the
success of large language models (LLMs) and other modern AI breakthroughs.
However, similar advancements in health data modeling remain limited due to the
need for innovative adaptations. Wearable movement data offers a valuable
avenue for exploration, as it’s a core feature in nearly all commercial
smartwatches, well established in clinical and mental health research, and the
sequential nature of the data shares similarities to language. We introduce the
Pretrained Actigraphy Transformer (PAT), the first open source foundation model
designed for time-series wearable movement data. Leveraging transformer-based
architectures and novel techniques, such as patch embeddings, and pretraining
on data from 29,307 participants in a national U.S. sample, PAT achieves
state-of-the-art performance in several mental health prediction tasks. PAT is
also lightweight and easily interpretable, making it a robust tool for mental
health research.
GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/
[LINK]
http://arxiv.org/abs/2411.15240v3
[DATE]
2025-01-14 12:10:46+08:00
[CATEGORIES]
cs.LG
Prediction Interval Construction Method for Electricity Prices
[AUTHORS]
Xin Lu
[ABSTRACT]
Accurate prediction of electricity prices plays an essential role in the
electricity market. To reflect the uncertainty of electricity prices, price
intervals are predicted. This paper proposes a novel prediction interval
construction method. A conditional generative adversarial network is first
presented to generate electricity price scenarios, with which the prediction
intervals can be constructed. Then, different generated scenarios are stacked
to obtain the probability densities, which can be applied to accurately reflect
the uncertainty of electricity prices. Furthermore, a reinforced prediction
mechanism based on the volatility level of weather factors is introduced to
address the spikes or volatile prices. A case study is conducted to verify the
effectiveness of the proposed novel prediction interval construction method.
The method can also provide the probability density of each price scenario
within the prediction interval and has the superiority to address the volatile
prices and price spikes with a reinforced prediction mechanism.
[LINK]
http://arxiv.org/abs/2501.07827v1
[DATE]
2025-01-14 12:02:08+08:00
[CATEGORIES]
cs.LG
STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via
[AUTHORS]
Yuanyuan Liang, Tianhao Zhang, Tingyu Xie
[ABSTRACT]
Handling anomalies is a critical preprocessing step in multivariate time
series prediction. However, existing approaches that separate anomaly
preprocessing from model training for multivariate time series prediction
encounter significant limitations. Specifically, these methods fail to utilize
auxiliary information crucial for identifying latent anomalies associated with
spatiotemporal factors during the preprocessing stage. Instead, they rely
solely on data distribution for anomaly detection, which can result in the
incorrect processing of numerous samples that could otherwise contribute
positively to model training. To address this, we propose STTS-EAD, an
end-to-end method that seamlessly integrates anomaly detection into the
training process of multivariate time series forecasting and aims to improve
Spatio-Temporal learning based Time Series prediction via Embedded Anomaly
Detection. Our proposed STTS-EAD leverages spatio-temporal information for
forecasting and anomaly detection, with the two parts alternately executed and
optimized for each other. To the best of our knowledge, STTS-EAD is the first
to integrate anomaly detection and forecasting tasks in the training phase for
improving the accuracy of multivariate time series forecasting. Extensive
experiments on a public stock dataset and two real-world sales datasets from a
renowned coffee chain enterprise show that our proposed method can effectively
process detected anomalies in the training stage to improve forecasting
performance in the inference stage and significantly outperform baselines.
[COMMENTS]
11 pages
[LINK]
http://arxiv.org/abs/2501.07814v1
[DATE]
2025-01-14 11:26:05+08:00
[CATEGORIES]
cs.LG
Conformal mapping Coordinates Physics-Informed Neural Networks (CoCo-PINNs): learning neural networks for designing neutral inclusions
[AUTHORS]
Daehee Cho, Hyeonmin Yun, Jaeyong Lee, Mikyoung Lim
[ABSTRACT]
We focus on designing and solving the neutral inclusion problem via neural
networks. The neutral inclusion problem has a long history in the theory of
composite materials, and it is exceedingly challenging to identify the precise
condition that precipitates a general-shaped inclusion into a neutral
inclusion. Physics-informed neural networks (PINNs) have recently become a
highly successful approach to addressing both forward and inverse problems
associated with partial differential equations. We found that traditional PINNs
perform inadequately when applied to the inverse problem of designing neutral
inclusions with arbitrary shapes. In this study, we introduce a novel approach,
Conformal mapping Coordinates Physics-Informed Neural Networks (CoCo-PINNs),
which integrates complex analysis techniques into PINNs. This method exhibits
strong performance in solving forward-inverse problems to construct neutral
inclusions of arbitrary shapes in two dimensions, where the imperfect interface
condition on the inclusion’s boundary is modeled by training neural networks.
Notably, we mathematically prove that training with a single linear field is
sufficient to achieve neutrality for untrained linear fields in arbitrary
directions, given a minor assumption. We demonstrate that CoCo-PINNs offer
enhanced performances in terms of credibility, consistency, and stability.
[LINK]
http://arxiv.org/abs/2501.07809v1
[DATE]
2025-01-14 11:20:17+08:00
[CATEGORIES]
cs.LG
Can Go AIs be adversarially robust?
[AUTHORS]
Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave
[ABSTRACT]
Prior work found that superhuman Go AIs can be defeated by simple adversarial
strategies, especially “cyclic” attacks. In this paper, we study whether adding
natural countermeasures can achieve robustness in Go, a favorable domain for
robustness since it benefits from incredible average-case capability and a
narrow, innately adversarial setting. We test three defenses: adversarial
training on hand-constructed positions, iterated adversarial training, and
changing the network architecture. We find that though some of these defenses
protect against previously discovered attacks, none withstand freshly trained
adversaries. Furthermore, most of the reliably effective attacks these
adversaries discover are different realizations of the same overall class of
cyclic attacks. Our results suggest that building robust AI systems is
challenging even with extremely superhuman systems in some of the most
tractable settings, and highlight two key gaps: efficient generalization of
defenses, and diversity in training. For interactive examples of attacks and a
link to our codebase, see https://goattack.far.ai.
[COMMENTS]
63 pages, AAAI 2025
[LINK]
http://arxiv.org/abs/2406.12843v3
[DATE]
2025-01-14 11:08:02+08:00
[CATEGORIES]
cs.LG
BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos
[AUTHORS]
Farnoosh Koleini, Muhammad Usama Saleem, Pu Wang, Hongfei Xue, Ahmed Helmy, Abbey Fenwick
[ABSTRACT]
Recent advancements in 3D human pose estimation from single-camera images and
videos have relied on parametric models, like SMPL. However, these models
oversimplify anatomical structures, limiting their accuracy in capturing true
joint locations and movements, which reduces their applicability in
biomechanics, healthcare, and robotics. Biomechanically accurate pose
estimation, on the other hand, typically requires costly marker-based motion
capture systems and optimization techniques in specialized labs. To bridge this
gap, we propose BioPose, a novel learning-based framework for predicting
biomechanically accurate 3D human pose directly from monocular videos. BioPose
includes three key components: a Multi-Query Human Mesh Recovery model
(MQ-HMR), a Neural Inverse Kinematics (NeurIK) model, and a 2D-informed pose
refinement technique. MQ-HMR leverages a multi-query deformable transformer to
extract multi-scale fine-grained image features, enabling precise human mesh
recovery. NeurIK treats the mesh vertices as virtual markers, applying a
spatial-temporal network to regress biomechanically accurate 3D poses under
anatomical constraints. To further improve 3D pose estimations, a 2D-informed
refinement step optimizes the query tokens during inference by aligning the 3D
structure with 2D pose observations. Experiments on benchmark datasets
demonstrate that BioPose significantly outperforms state-of-the-art methods.
Project website:
\url{https://m-usamasaleem.github.io/publication/BioPose/BioPose.html}.
[LINK]
http://arxiv.org/abs/2501.07800v1
[DATE]
2025-01-14 10:56:19+08:00
[CATEGORIES]
cs.LG
Physically Guided Deep Unsupervised Inversion for 1D Magnetotelluric Models
[AUTHORS]
Paul Goyes-Peñafiel, Umair bin Waheed, Henry Arguello
[ABSTRACT]
The global demand for unconventional energy sources such as geothermal energy
and white hydrogen requires new exploration techniques for precise subsurface
structure characterization and potential reservoir identification. The
Magnetotelluric (MT) method is crucial for these tasks, providing critical
information on the distribution of subsurface electrical resistivity at depths
ranging from hundreds to thousands of meters. However, traditional iterative
algorithm-based inversion methods require the adjustment of multiple
parameters, demanding time-consuming and exhaustive tuning processes to achieve
proper cost function minimization. Recent advances have incorporated deep
learning algorithms for MT inversion, primarily based on supervised learning,
and large labeled datasets are needed for training. This work utilizes
TensorFlow operations to create a differentiable forward MT operator,
leveraging its automatic differentiation capability. Moreover, instead of
solving for the subsurface model directly, as classical algorithms perform,
this paper presents a new deep unsupervised inversion algorithm guided by
physics to estimate 1D MT models. Instead of using datasets with the observed
data and their respective model as labels during training, our method employs a
differentiable modeling operator that physically guides the cost function
minimization, making the proposed method solely dependent on observed data.
Therefore, the optimization algorithm updates the network weights to minimize
the data misfit. We test the proposed method with field and synthetic data at
different acquisition frequencies, demonstrating that the resistivity models
obtained are more accurate than those calculated using other techniques.
[COMMENTS]
5 pages, 6 figures, github repository, submitted to IEEE-GRSL
[LINK]
http://arxiv.org/abs/2410.15274v3
[DATE]
2025-01-14 10:52:40+08:00
[CATEGORIES]
cs.LG
E2ESlack: An End-to-End Graph-Based Framework for Pre-Routing Slack Prediction
[AUTHORS]
Saurabh Bodhe, Zhanguang Zhang, Atia Hamidizadeh, Shixiong Kai, Yingxue Zhang, Mingxuan Yuan
[ABSTRACT]
Pre-routing slack prediction remains a critical area of research in
Electronic Design Automation (EDA). Despite numerous machine learning-based
approaches targeting this task, there is still a lack of a truly end-to-end
framework that engineers can use to obtain TNS/WNS metrics from raw circuit
data at the placement stage. Existing works have demonstrated effectiveness in
Arrival Time (AT) prediction but lack a mechanism for Required Arrival Time
(RAT) prediction, which is essential for slack prediction and obtaining TNS/WNS
metrics. In this work, we propose E2ESlack, an end-to-end graph-based framework
for pre-routing slack prediction. The framework includes a TimingParser that
supports DEF, SDF and LIB files for feature extraction and graph construction,
an arrival time prediction model and a fast RAT estimation module. To the best
of our knowledge, this is the first work capable of predicting path-level
slacks at the pre-routing stage. We perform extensive experiments and
demonstrate that our proposed RAT estimation method outperforms the SOTA
ML-based prediction method and also pre-routing STA tool. Additionally, the
proposed E2ESlack framework achieves TNS/WNS values comparable to post-routing
STA results while saving up to 23x runtime.
[LINK]
http://arxiv.org/abs/2501.07564v2
[DATE]
2025-01-14 10:38:26+08:00
[CATEGORIES]
cs.LG
Linearly Convergent Mixup Learning
[AUTHORS]
Gakuto Obi, Ayato Saito, Yuto Sasaki, Tsuyoshi Kato
[ABSTRACT]
Learning in the reproducing kernel Hilbert space (RKHS) such as the support
vector machine has been recognized as a promising technique. It continues to be
highly effective and competitive in numerous prediction tasks, particularly in
settings where there is a shortage of training data or computational
limitations exist. These methods are especially valued for their ability to
work with small datasets and their interpretability. To address the issue of
limited training data, mixup data augmentation, widely used in deep learning,
has remained challenging to apply to learning in RKHS due to the generation of
intermediate class labels. Although gradient descent methods handle these
labels effectively, dual optimization approaches are typically not directly
applicable. In this study, we present two novel algorithms that extend to a
broader range of binary classification models. Unlike gradient-based
approaches, our algorithms do not require hyperparameters like learning rates,
simplifying their implementation and optimization. Both the number of
iterations to converge and the computational cost per iteration scale linearly
with respect to the dataset size. The numerical experiments demonstrate that
our algorithms achieve faster convergence to the optimal solution compared to
gradient descent approaches, and that mixup data augmentation consistently
improves the predictive performance across various loss functions.
[COMMENTS]
none
[LINK]
http://arxiv.org/abs/2501.07794v1
[DATE]
2025-01-14 10:33:40+08:00
[CATEGORIES]
cs.LG
Can AI Help with Your Personal Finances?
[AUTHORS]
Oudom Hean, Utsha Saha, Binita Saha
[ABSTRACT]
In recent years, Large Language Models (LLMs) have emerged as a
transformative development in artificial intelligence (AI), drawing significant
attention from industry and academia. Trained on vast datasets, these
sophisticated AI systems exhibit impressive natural language processing and
content generation capabilities. This paper explores the potential of LLMs to
address key challenges in personal finance, focusing on the United States. We
evaluate several leading LLMs, including OpenAI’s ChatGPT, Google’s Gemini,
Anthropic’s Claude, and Meta’s Llama, to assess their effectiveness in
providing accurate financial advice on topics such as mortgages, taxes, loans,
and investments. Our findings show that while these models achieve an average
accuracy rate of approximately 70%, they also display notable limitations in
certain areas. Specifically, LLMs struggle to provide accurate responses for
complex financial queries, with performance varying significantly across
different topics. Despite these limitations, the analysis reveals notable
improvements in newer versions of these models, highlighting their growing
utility for individuals and financial advisors. As these AI systems continue to
evolve, their potential for advancing AI-driven applications in personal
finance becomes increasingly promising.
[LINK]
http://arxiv.org/abs/2412.19784v4
[DATE]
2025-01-14 10:28:28+08:00
[CATEGORIES]
cs.LG
Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisation
[AUTHORS]
Nishan Gunawardena, Gough Yumu Lui, Jeewani Anupama Ginige, Bahman Javadi
[ABSTRACT]
A significant limitation of current smartphone-based eye-tracking algorithms
is their low accuracy when applied to video-type visual stimuli, as they are
typically trained on static images. Also, the increasing demand for real-time
interactive applications like games, VR, and AR on smartphones requires
overcoming the limitations posed by resource constraints such as limited
computational power, battery life, and network bandwidth. Therefore, we
developed two new smartphone eye-tracking techniques for video-type visuals by
combining Convolutional Neural Networks (CNN) with two different Recurrent
Neural Networks (RNN), namely Long Short Term Memory (LSTM) and Gated Recurrent
Unit (GRU). Our CNN+LSTM and CNN+GRU models achieved an average Root Mean
Square Error of 0.955 cm and 1.091 cm, respectively. To address the
computational constraints of smartphones, we developed an edge intelligence
architecture to enhance the performance of smartphone-based eye tracking. We
applied various optimisation methods like quantisation and pruning to deep
learning models for better energy, CPU, and memory usage on edge devices,
focusing on real-time processing. Using model quantisation, the model inference
time in the CNN+LSTM and CNN+GRU models was reduced by 21.72% and 19.50%,
respectively, on edge devices.
[COMMENTS]
I have included the three papers as reference, which are closely
related. We have expanded the future work section to provide a more thorough
discussion of the concepts of “varying lighting conditions” and “dynamic user
environments.” We have added a note below Table 4 to clarify the
abbreviations’ meaning. Elaborated the role of the Domain Expert within the
presentation layer in Section 4.1
[LINK]
http://arxiv.org/abs/2408.12463v2
[DATE]
2025-01-14 09:57:04+08:00
[CATEGORIES]
cs.LG
EPIC: Effective Prompting for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models
[AUTHORS]
Jinhee Kim, Taesung Kim, Jaegul Choo
[ABSTRACT]
Large language models (LLMs) have demonstrated remarkable in-context learning
capabilities across diverse applications. In this work, we explore the
effectiveness of LLMs for generating realistic synthetic tabular data,
identifying key prompt design elements to optimize performance. We introduce
EPIC, a novel approach that leverages balanced, grouped data samples and
consistent formatting with unique variable mapping to guide LLMs in generating
accurate synthetic data across all classes, even for imbalanced datasets.
Evaluations on real-world datasets show that EPIC achieves state-of-the-art
machine learning classification performance, significantly improving generation
efficiency. These findings highlight the effectiveness of EPIC for synthetic
tabular data generation, particularly in addressing class imbalance. Our source
code for our work is available at:
https://seharanul17.github.io/project-synthetic-tabular-llm/
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2404.12404v4
[DATE]
2025-01-14 09:41:21+08:00
[CATEGORIES]
cs.LG
A systematic review of the use of Deep Learning in Satellite Imagery for Agriculture
[AUTHORS]
Brandon Victor, Zhen He, Aiden Nibali
[ABSTRACT]
Agricultural research is essential for increasing food production to meet the
requirements of an increasing population in the coming decades. Recently,
satellite technology has been improving rapidly and deep learning has seen much
success in generic computer vision tasks and many application areas which
presents an important opportunity to improve analysis of agricultural land.
Here we present a systematic review of 150 studies to find the current uses of
deep learning on satellite imagery for agricultural research. Although we
identify 5 categories of agricultural monitoring tasks, the majority of the
research interest is in crop segmentation and yield prediction. We found that,
when used, modern deep learning methods consistently outperformed traditional
machine learning across most tasks; the only exception was that Long Short-Term
Memory (LSTM) Recurrent Neural Networks did not consistently outperform Random
Forests (RF) for yield prediction. The reviewed studies have largely adopted
methodologies from generic computer vision, except for one major omission:
benchmark datasets are not utilised to evaluate models across studies, making
it difficult to compare results. Additionally, some studies have specifically
utilised the extra spectral resolution available in satellite imagery, but
other divergent properties of satellite images - such as the hugely different
scales of spatial patterns - are not being taken advantage of in the reviewed
studies.
[COMMENTS]
23 pages, 5 figures and 10 tables in main paper. Final version, as
submitted and accepted at JSTARS
[LINK]
http://arxiv.org/abs/2210.01272v3
[DATE]
2025-01-14 09:34:10+08:00
[CATEGORIES]
cs.LG
Transforming Indoor Localization: Advanced Transformer Architecture for NLOS Dominated Wireless Environments with Distributed Sensors
[AUTHORS]
Saad Masrur, Jung-Fu, Cheng, Atieh R. Khamesi, Ismail Guvenc
[ABSTRACT]
Indoor localization in challenging non-line-of-sight (NLOS) environments
often leads to mediocre accuracy with traditional approaches. Deep learning
(DL) has been applied to tackle these challenges; however, many DL approaches
overlook computational complexity, especially for floating-point operations
(FLOPs), making them unsuitable for resource-limited devices. Transformer-based
models have achieved remarkable success in natural language processing (NLP)
and computer vision (CV) tasks, motivating their use in wireless applications.
However, their use in indoor localization remains nascent, and directly
applying Transformers for indoor localization can be both computationally
intensive and exhibit limitations in accuracy. To address these challenges, in
this work, we introduce a novel tokenization approach, referred to as Sensor
Snapshot Tokenization (SST), which preserves variable-specific representations
of power delay profile (PDP) and enhances attention mechanisms by effectively
capturing multi-variate correlation. Complementing this, we propose a
lightweight Swish-Gated Linear Unit-based Transformer (L-SwiGLU Transformer)
model, designed to reduce computational complexity without compromising
localization accuracy. Together, these contributions mitigate the computational
burden and dependency on large datasets, making Transformer models more
efficient and suitable for resource-constrained scenarios. The proposed
tokenization method enables the Vanilla Transformer to achieve a 90th
percentile positioning error of 0.388 m in a highly NLOS indoor factory,
surpassing conventional tokenization methods. The L-SwiGLU ViT further reduces
the error to 0.355 m, achieving an 8.51% improvement. Additionally, the
proposed model outperforms a 14.1 times larger model with a 46.13% improvement,
underscoring its computational efficiency.
[COMMENTS]
The paper has been submitted to IEEE Transactions on Machine Learning
in Communications and Networking
[LINK]
http://arxiv.org/abs/2501.07774v1
[DATE]
2025-01-14 09:16:30+08:00
[CATEGORIES]
cs.LG
Symmetry-Aware Generative Modeling through Learned Canonicalization
[AUTHORS]
Kusha Sareen, Daniel Levy, Arnab Kumar Mondal, Sékou-Oumar Kaba, Tara Akhound-Sadegh, Siamak Ravanbakhsh
[ABSTRACT]
Generative modeling of symmetric densities has a range of applications in AI
for science, from drug discovery to physics simulations. The existing
generative modeling paradigm for invariant densities combines an invariant
prior with an equivariant generative process. However, we observe that this
technique is not necessary and has several drawbacks resulting from the
limitations of equivariant networks. Instead, we propose to model a learned
slice of the density so that only one representative element per orbit is
learned. To accomplish this, we learn a group-equivariant canonicalization
network that maps training samples to a canonical pose and train a
non-equivariant generative model over these canonicalized samples. We implement
this idea in the context of diffusion models. Our preliminary experimental
results on molecular modeling are promising, demonstrating improved sample
quality and faster inference time.
[COMMENTS]
NeurReps 2024 Workshop Version
[LINK]
http://arxiv.org/abs/2501.07773v1
[DATE]
2025-01-14 09:08:15+08:00
[CATEGORIES]
cs.LG
PINN-FEM: A Hybrid Approach for Enforcing Dirichlet Boundary Conditions in Physics-Informed Neural Networks
[AUTHORS]
Nahil Sobh, Rini Jasmine Gladstone, Hadi Meidani
[ABSTRACT]
Physics-Informed Neural Networks (PINNs) solve partial differential equations
(PDEs) by embedding governing equations and boundary/initial conditions into
the loss function. However, enforcing Dirichlet boundary conditions accurately
remains challenging, often leading to soft enforcement that compromises
convergence and reliability in complex domains. We propose a hybrid approach,
PINN-FEM, which combines PINNs with finite element methods (FEM) to impose
strong Dirichlet boundary conditions via domain decomposition. This method
incorporates FEM-based representations near the boundary, ensuring exact
enforcement without compromising convergence. Through six experiments of
increasing complexity, PINN-FEM outperforms standard PINN models, showcasing
superior accuracy and robustness. While distance functions and similar
techniques have been proposed for boundary condition enforcement, they lack
generality for real-world applications. PINN-FEM bridges this gap by leveraging
FEM near boundaries, making it well-suited for industrial and scientific
problems.
[COMMENTS]
22 pages
[LINK]
http://arxiv.org/abs/2501.07765v1
[DATE]
2025-01-14 08:47:15+08:00
[CATEGORIES]
cs.LG
On the Statistical Capacity of Deep Generative Models
[AUTHORS]
Edric Tam, David B. Dunson
[ABSTRACT]
Deep generative models are routinely used in generating samples from complex,
high-dimensional distributions. Despite their apparent successes, their
statistical properties are not well understood. A common assumption is that
with enough training data and sufficiently large neural networks, deep
generative model samples will have arbitrarily small errors in sampling from
any continuous target distribution. We set up a unifying framework that debunks
this belief. We demonstrate that broad classes of deep generative models,
including variational autoencoders and generative adversarial networks, are not
universal generators. Under the predominant case of Gaussian latent variables,
these models can only generate concentrated samples that exhibit light tails.
Using tools from concentration of measure and convex geometry, we give
analogous results for more general log-concave and strongly log-concave latent
variable distributions. We extend our results to diffusion models via a
reduction argument. We use the Gromov–Levy inequality to give similar
guarantees when the latent variables lie on manifolds with positive Ricci
curvature. These results shed light on the limited capacity of common deep
generative models to handle heavy tails. We illustrate the empirical relevance
of our work with simulations and financial data.
[LINK]
http://arxiv.org/abs/2501.07763v1
[DATE]
2025-01-14 08:39:46+08:00
[CATEGORIES]
cs.LG
Impatient Bandits: Optimizing for the Long-Term Without Delay
[AUTHORS]
Kelly W. Zhang, Thomas Baldwin-McDonald, Kamil Ciosek, Lucas Maystre, Daniel Russo
[ABSTRACT]
Increasingly, recommender systems are tasked with improving users’ long-term
satisfaction. In this context, we study a content exploration task, which we
formalize as a bandit problem with delayed rewards. There is an apparent
trade-off in choosing the learning signal: waiting for the full reward to
become available might take several weeks, slowing the rate of learning,
whereas using short-term proxy rewards reflects the actual long-term goal only
imperfectly. First, we develop a predictive model of delayed rewards that
incorporates all information obtained to date. Rewards as well as shorter-term
surrogate outcomes are combined through a Bayesian filter to obtain a
probabilistic belief. Second, we devise a bandit algorithm that quickly learns
to identify content aligned with long-term success using this new predictive
model. We prove a regret bound for our algorithm that depends on the
\textit{Value of Progressive Feedback}, an information theoretic metric that
captures the quality of short-term leading indicators that are observed prior
to the long-term reward. We apply our approach to a podcast recommendation
problem, where we seek to recommend shows that users engage with repeatedly
over two months. We empirically validate that our approach significantly
outperforms methods that optimize for short-term proxies or rely solely on
delayed rewards, as demonstrated by an A/B test in a recommendation system that
serves hundreds of millions of users.
[LINK]
http://arxiv.org/abs/2501.07761v1
[DATE]
2025-01-14 08:28:26+08:00
[CATEGORIES]
cs.LG
Continuous GNN-based Anomaly Detection on Edge using Efficient Adaptive Knowledge Graph Learning
[AUTHORS]
Sanggeon Yun, Ryozo Masukawa, William Youngwoo Chung, Minhyoung Na, Nathaniel Bastian, Mohsen Imani
[ABSTRACT]
The increasing demand for robust security solutions across various industries
has made Video Anomaly Detection (VAD) a critical task in applications such as
intelligent surveillance, evidence investigation, and violence detection.
Traditional approaches to VAD often rely on finetuning large pre-trained
models, which can be computationally expensive and impractical for real-time or
resource-constrained environments. To address this, MissionGNN introduced a
more efficient method by training a graph neural network (GNN) using a fixed
knowledge graph (KG) derived from large language models (LLMs) like GPT-4.
While this approach demonstrated significant efficiency in computational power
and memory, it faces limitations in dynamic environments where frequent updates
to the KG are necessary due to evolving behavior trends and shifting data
patterns. These updates typically require cloud-based computation, posing
challenges for edge computing applications. In this paper, we propose a novel
framework that facilitates continuous KG adaptation directly on edge devices,
overcoming the limitations of cloud dependency. Our method dynamically modifies
the KG through a three-phase process: pruning, alternating, and creating nodes,
enabling real-time adaptation to changing data trends. This continuous learning
approach enhances the robustness of anomaly detection models, making them more
suitable for deployment in dynamic and resource-constrained environments.
[COMMENTS]
Accepted to DATE 2025
[LINK]
http://arxiv.org/abs/2411.09072v2
[DATE]
2025-01-14 08:21:51+08:00
[CATEGORIES]
cs.LG
Exploiting Boosting in Hyperdimensional Computing for Enhanced Reliability in Healthcare
[AUTHORS]
SungHeon Jeong, Hamza Errahmouni Barkam, Sanggeon Yun, Yeseong Kim, Shaahin Angizi, Mohsen Imani
[ABSTRACT]
Hyperdimensional computing (HDC) enables efficient data encoding and
processing in high-dimensional space, benefiting machine learning and data
analysis. However, underutilization of these spaces can lead to overfitting and
reduced model reliability, especially in data-limited systems a critical issue
in sectors like healthcare that demand robustness and consistent performance.
We introduce BoostHD, an approach that applies boosting algorithms to partition
the hyperdimensional space into subspaces, creating an ensemble of weak
learners. By integrating boosting with HDC, BoostHD enhances performance and
reliability beyond existing HDC methods. Our analysis highlights the importance
of efficient utilization of hyperdimensional spaces for improved model
performance. Experiments on healthcare datasets show that BoostHD outperforms
state-of-the-art methods. On the WESAD dataset, it achieved an accuracy of
98.37%, surpassing Random Forest, XGBoost, and OnlineHD. BoostHD also
demonstrated superior inference efficiency and stability, maintaining high
accuracy under data imbalance and noise. In person-specific evaluations, it
achieved an average accuracy of 96.19%, outperforming other models. By
addressing the limitations of both boosting and HDC, BoostHD expands the
applicability of HDC in critical domains where reliability and precision are
paramount.
[COMMENTS]
Accepted to DATE 2025
[LINK]
http://arxiv.org/abs/2411.14612v2
[DATE]
2025-01-14 08:20:32+08:00
[CATEGORIES]
cs.LG