MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
[AUTHORS]
Liyan Tang, Philippe Laban, Greg Durrett
[ABSTRACT]
Recognizing if LLM output can be grounded in evidence is central to many
tasks in NLP: retrieval-augmented generation, summarization, document-grounded
dialogue, and more. Current approaches to this kind of “fact-checking” are
based on verifying each piece of a model generation against potential evidence
using an LLM. However, this process can be very computationally expensive,
requiring many calls to LLMs to check a single response. In this work, we show
how to build small models that have GPT-4-level performance but for 400x lower
cost. We do this by constructing synthetic training data with GPT-4, which
involves creating realistic yet challenging instances of factual errors via a
structured generation procedure. Training on this data teaches models to check
each fact in the claim and recognize synthesis of information across sentences.
For evaluation, we unify pre-existing datasets into a benchmark LLM-AggreFact,
collected from recent work on fact-checking and grounding LLM generations. Our
best system MiniCheck-FT5 (770M parameters) outperforms all systems of
comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for
data synthesis, and models.
[COMMENTS]
LLM-AggreFact benchmark, MiniCheck models, data generation code at
https://github.com/Liyan06/MiniCheck
[LINK]
http://arxiv.org/abs/2404.10774v1
[DATE]
2024-04-17 01:59:10+08:00
[CATEGORIES]
cs.CL
Large Language Models as Generalizable Policies for Embodied Tasks
[AUTHORS]
Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev
[ABSTRACT]
We show that large language models (LLMs) can be adapted to be generalizable
policies for embodied visual tasks. Our approach, called Large LAnguage model
Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take
as input text instructions and visual egocentric observations and output
actions directly in the environment. Using reinforcement learning, we train
LLaRP to see and act solely through environmental interactions. We show that
LLaRP is robust to complex paraphrasings of task instructions and can
generalize to new tasks that require novel optimal behavior. In particular, on
1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other
common learned baselines or zero-shot applications of LLMs. Finally, to aid the
community in studying language conditioned, massively multi-task, embodied AI
problems we release a novel benchmark, Language Rearrangement, consisting of
150,000 training and 1,000 testing tasks for language-conditioned
rearrangement. Video examples of LLaRP in unseen Language Rearrangement
instructions are at https://llm-rl.github.io.
[LINK]
http://arxiv.org/abs/2310.17722v2
[DATE]
2024-04-17 01:54:06+08:00
[CATEGORIES]
cs.LG
cs.CL
When can transformers reason with abstract symbols?
[AUTHORS]
Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
[ABSTRACT]
We investigate the capabilities of transformer models on relational reasoning
tasks. In these tasks, models are trained on a set of strings encoding abstract
relations, and are then tested out-of-distribution on data that contains
symbols that did not appear in the training dataset. We prove that for any
relational reasoning task in a large family of tasks, transformers learn the
abstract relations and generalize to the test set when trained by gradient
descent on sufficiently large quantities of training data. This is in contrast
to classical fully-connected networks, which we prove fail to learn to reason.
Our results inspire modifications of the transformer architecture that add only
two trainable parameters per head, and that we empirically demonstrate improve
data efficiency for learning to reason.
[COMMENTS]
25 figures
[LINK]
http://arxiv.org/abs/2310.09753v2
[DATE]
2024-04-17 01:53:37+08:00
[CATEGORIES]
cs.CL
cs.LG
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
[AUTHORS]
Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun
[ABSTRACT]
Diffusion models have exhibited remarkable capabilities in text-to-image
generation. However, their performance in image-to-text generation,
specifically image captioning, has lagged behind Auto-Regressive (AR) models,
casting doubt on their applicability for such tasks. In this work, we revisit
diffusion models, highlighting their capacity for holistic context modeling and
parallel decoding. With these benefits, diffusion models can alleviate the
inherent limitations of AR methods, including their slow inference speed, error
propagation, and unidirectional constraints. Furthermore, we identify the prior
underperformance of diffusion models stemming from the absence of an effective
latent space for image-text alignment, and the discrepancy between continuous
diffusion processes and discrete textual data. In response, we introduce a
novel architecture, LaDiC, which utilizes a split BERT to create a dedicated
latent space for captions and integrates a regularization module to manage
varying text lengths. Our framework also includes a diffuser for semantic
image-to-text conversion and a Back&Refine technique to enhance token
interactivity during inference. LaDiC achieves state-of-the-art performance for
diffusion-based methods on the MS COCO dataset with 38.2 BLEU@4 and 126.2
CIDEr, demonstrating exceptional performance without pre-training or ancillary
modules. This indicates strong competitiveness with AR models, revealing the
previously untapped potential of diffusion models in image-to-text generation.
[LINK]
http://arxiv.org/abs/2404.10763v1
[DATE]
2024-04-17 01:47:16+08:00
[CATEGORIES]
cs.CL
AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models
[AUTHORS]
Zeyu Liu, Souvik Kundu, Anni Li, Junrui Wan, Lianghao Jiang, Peter Anthony Beerel
[ABSTRACT]
We present a novel Parameter-Efficient Fine-Tuning (PEFT) method, dubbed as
Adaptive Freezing of Low Rank Adaptation (AFLoRA). Specifically, for each
pre-trained frozen weight tensor, we add a parallel path of trainable low-rank
matrices, namely a down-projection and an up-projection matrix, each of which
is followed by a feature transformation vector. Based on a novel freezing
score, we the incrementally freeze these projection matrices during fine-tuning
to reduce the computation and alleviate over-fitting. Our experimental results
demonstrate that we can achieve state-of-the-art performance with an average
improvement of up to $0.85\%$ as evaluated on GLUE benchmark while yeilding up
to $9.5\times$ fewer average trainable parameters. While compared in terms of
runtime, AFLoRA can yield up to $1.86\times$ improvement as opposed to similar
PEFT alternatives. Besides the practical utility of our approach, we provide
insights on the trainability requirements of LoRA paths at different modules
and the freezing schedule for the different projection matrices. Code will be
released.
[COMMENTS]
5 pages, 5 figures
[LINK]
http://arxiv.org/abs/2403.13269v3
[DATE]
2024-04-17 01:37:12+08:00
[CATEGORIES]
cs.CL
cs.LG
Dataset Reset Policy Optimization for RLHF
[AUTHORS]
Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
[ABSTRACT]
Reinforcement Learning (RL) from Human Preference-based feedback is a popular
paradigm for fine-tuning generative models, which has produced impressive
models such as GPT-4 and Claude3 Opus. This framework often consists of two
steps: learning a reward model from an offline preference dataset followed by
running online RL to optimize the learned reward model. In this work,
leveraging the idea of reset, we propose a new RLHF algorithm with provable
guarantees. Motivated by the fact that offline preference dataset provides
informative states (i.e., data that is preferred by the labelers), our new
algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing
offline preference dataset into the online policy training procedure via
dataset reset: it directly resets the policy optimizer to the states in the
offline dataset, instead of always starting from the initial state
distribution. In theory, we show that DR-PO learns to perform at least as good
as any policy that is covered by the offline dataset under general function
approximation with finite sample complexity. In experiments, we demonstrate
that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH)
dataset, the generation from DR-PO is better than that from Proximal Policy
Optimization (PPO) and Direction Preference Optimization (DPO), under the
metric of GPT4 win-rate. Code for this work can be found at
https://github.com/Cornell-RL/drpo.
[COMMENTS]
28 pages, 6 tables, 3 Figures, 3 Algorithms
[LINK]
http://arxiv.org/abs/2404.08495v3
[DATE]
2024-04-17 01:36:39+08:00
[CATEGORIES]
cs.LG
cs.CL
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification
[AUTHORS]
Yu-Yang Li, Yu Bai, Cunshi Wang, Mengwei Qu, Ziteng Lu, Roberto Soria, Jifeng Liu
[ABSTRACT]
Light curves serve as a valuable source of information on stellar formation
and evolution. With the rapid advancement of machine learning techniques, it
can be effectively processed to extract astronomical patterns and information.
In this study, we present a comprehensive evaluation of deep-learning and large
language model (LLM) based models for the automatic classification of variable
star light curves, based on large datasets from the Kepler and K2 missions.
Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries,
examining the influence of observational cadence and phase distribution on
classification precision. Employing AutoDL optimization, we achieve striking
performance with the 1D-Convolution+BiLSTM architecture and the Swin
Transformer, hitting accuracies of 94\% and 99\% correspondingly, with the
latter demonstrating a notable 83\% accuracy in discerning the elusive Type II
Cepheids-comprising merely 0.02\% of the total dataset.We unveil StarWhisper
LightCurve (LC), an innovative Series comprising three LLM-based models: LLM,
multimodal large language model (MLLM), and Large Audio Language Model (LALM).
Each model is fine-tuned with strategic prompt engineering and customized
training methods to explore the emergent abilities of these models for
astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies
around 90\%, significantly reducing the need for explicit feature engineering,
thereby paving the way for streamlined parallel data processing and the
progression of multifaceted multimodal models in astronomical applications. The
study furnishes two detailed catalogs illustrating the impacts of phase and
sampling intervals on deep learning classification accuracy, showing that a
substantial decrease of up to 14\% in observation duration and 21\% in sampling
points can be realized without compromising accuracy by more than 10\%.
[COMMENTS]
35 pages, 20 figures
[LINK]
http://arxiv.org/abs/2404.10757v1
[DATE]
2024-04-17 01:35:25+08:00
[CATEGORIES]
cs.CL
cs.LG
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning
[AUTHORS]
Alexander Scarlatos, Andrew Lan
[ABSTRACT]
Recent developments in large pre-trained language models have enabled
unprecedented performance on a variety of downstream tasks. Achieving best
performance with these models often leverages in-context learning, where a
model performs a (possibly new) task given one or more examples. However,
recent work has shown that the choice of examples can have a large impact on
task performance and that finding an optimal set of examples is non-trivial.
While there are many existing methods for selecting in-context examples, they
generally score examples independently, ignoring the dependency between them
and the order in which they are provided to the model. In this work, we propose
Retrieval for In-Context Learning (RetICL), a learnable method for modeling and
optimally selecting examples sequentially for in-context learning. We frame the
problem of sequential example selection as a Markov decision process and train
an example retriever using reinforcement learning. We evaluate RetICL on math
word problem solving and scientific question answering tasks and show that it
consistently outperforms or matches heuristic and learnable baselines. We also
use case studies to show that RetICL implicitly learns representations of
problem solving strategies.
[LINK]
http://arxiv.org/abs/2305.14502v2
[DATE]
2024-04-17 01:25:25+08:00
[CATEGORIES]
cs.CL
cs.LG
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
[AUTHORS]
Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
[ABSTRACT]
Reinforcement Learning from Human Feedback (RLHF) is currently the most
widely used method to align large language models (LLMs) with human
preferences. Existing RLHF methods can be roughly categorized as either
reward-based or reward-free. Novel applications such as ChatGPT and Claude
leverage reward-based methods that first learn a reward model and apply
actor-critic algorithms, such as Proximal Policy Optimization (PPO). However,
in academic benchmarks, state-of-the-art results are often achieved via
reward-free methods, such as Direct Preference Optimization (DPO). Is DPO truly
superior to PPO? Why does PPO perform poorly on these benchmarks? In this
paper, we first conduct both theoretical and empirical studies on the
algorithmic properties of DPO and show that DPO may have fundamental
limitations. Moreover, we also comprehensively examine PPO and reveal the key
factors for the best performances of PPO in fine-tuning LLMs. Finally, we
benchmark DPO and PPO across various a collection of RLHF testbeds, ranging
from dialogue to code generation. Experiment results demonstrate that PPO is
able to surpass other alignment methods in all cases and achieve
state-of-the-art results in challenging code competitions.
[COMMENTS]
16 pages, 2 figures, 14 tables
[LINK]
http://arxiv.org/abs/2404.10719v1
[DATE]
2024-04-17 00:51:53+08:00
[CATEGORIES]
cs.CL
Dual Modalities of Text: Visual and Textual Generative Pre-training
[AUTHORS]
Yekun Chai, Qingyi Liu, Jingwu Xiao, Shuohuan Wang, Yu Sun, Hua Wu
[ABSTRACT]
Harnessing visual texts represents a burgeoning frontier in the evolution of
language modeling. In this paper, we introduce a novel pre-training framework
for a suite of pixel-based autoregressive language models, pre-training on a
corpus of over 400 million documents rendered as RGB images. Our approach is
characterized by a dual-modality training regimen, engaging both visual data
through next patch prediction with a regression head and textual data via next
token prediction with a classification head. This study is particularly focused
on investigating the synergistic interplay between visual and textual
modalities of language. Our comprehensive evaluation across a diverse array of
benchmarks reveals that the confluence of visual and textual data substantially
augments the efficacy of pixel-based language models. Notably, our findings
show that a unidirectional pixel-based model, devoid of textual data during
training, can match the performance levels of advanced bidirectional
pixel-based models on various language understanding benchmarks. This work
highlights the considerable untapped potential of integrating visual and
textual information for language modeling purposes. We will release our code,
data, and checkpoints to inspire further research advancement.
[LINK]
http://arxiv.org/abs/2404.10710v1
[DATE]
2024-04-17 00:36:50+08:00
[CATEGORIES]
cs.CL
Question Difficulty Ranking for Multiple-Choice Reading Comprehension
[AUTHORS]
Vatsal Raina, Mark Gales
[ABSTRACT]
Multiple-choice (MC) tests are an efficient method to assess English
learners. It is useful for test creators to rank candidate MC questions by
difficulty during exam curation. Typically, the difficulty is determined by
having human test takers trial the questions in a pretesting stage. However,
this is expensive and not scalable. Therefore, we explore automated approaches
to rank MC questions by difficulty. However, there is limited data for explicit
training of a system for difficulty scores. Hence, we compare task transfer and
zero-shot approaches: task transfer adapts level classification and reading
comprehension systems for difficulty ranking while zero-shot prompting of
instruction finetuned language models contrasts absolute assessment against
comparative. It is found that level classification transfers better than
reading comprehension. Additionally, zero-shot comparative assessment is more
effective at difficulty ranking than the absolute assessment and even the task
transfer approaches at question difficulty ranking with a Spearman’s
correlation of 40.4%. Combining the systems is observed to further boost the
correlation.
[COMMENTS]
7 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.10704v1
[DATE]
2024-04-17 00:23:10+08:00
[CATEGORIES]
cs.CL
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
[AUTHORS]
Qiwei Di, Jiafan He, Quanquan Gu
[ABSTRACT]
Learning from human feedback plays an important role in aligning generative
models, such as large language models (LLM). However, the effectiveness of this
approach can be influenced by adversaries, who may intentionally provide
misleading preferences to manipulate the output in an undesirable or harmful
direction. To tackle this challenge, we study a specific model within this
problem domain–contextual dueling bandits with adversarial feedback, where the
true preference label can be flipped by an adversary. We propose an algorithm
namely robust contextual dueling bandit (\algo), which is based on
uncertainty-weighted maximum likelihood estimation. Our algorithm achieves an
$\tilde O(d\sqrt{T}+dC)$ regret bound, where $T$ is the number of rounds, $d$
is the dimension of the context, and $ 0 \le C \le T$ is the total number of
adversarial feedback. We also prove a lower bound to show that our regret bound
is nearly optimal, both in scenarios with and without ($C=0$) adversarial
feedback. Additionally, we conduct experiments to evaluate our proposed
algorithm against various types of adversarial feedback. Experimental results
demonstrate its superiority over the state-of-the-art dueling bandit algorithms
in the presence of adversarial feedback.
[COMMENTS]
24pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.10776v1
[DATE]
2024-04-17 01:59:55+08:00
[CATEGORIES]
cs.LG
TENG: Time-Evolving Natural Gradient for Solving PDEs with Deep Neural Net
[AUTHORS]
Zhuo Chen, Jacob McCarran, Esteban Vizcaino, Marin Soljačić, Di Luo
[ABSTRACT]
Partial differential equations (PDEs) are instrumental for modeling dynamical
systems in science and engineering. The advent of neural networks has initiated
a significant shift in tackling these complexities though challenges in
accuracy persist, especially for initial value problems. In this paper, we
introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing
time-dependent variational principles and optimization-based time integration,
leveraging natural gradient optimization to obtain high accuracy in
neural-network-based PDE solutions. Our comprehensive development includes
algorithms like TENG-Euler and its high-order variants, such as TENG-Heun,
tailored for enhanced precision and efficiency. TENG’s effectiveness is further
validated through its performance, surpassing current leading methods and
achieving machine precision in step-by-step optimizations across a spectrum of
PDEs, including the heat equation, Allen-Cahn equation, and Burgers’ equation.
[LINK]
http://arxiv.org/abs/2404.10771v1
[DATE]
2024-04-17 01:55:31+08:00
[CATEGORIES]
cs.LG
Finite-dimensional approximations of push-forwards on locally analytic functionals and truncation of least-squares polynomials
[AUTHORS]
Isao Ishikawa
[ABSTRACT]
This paper introduces a theoretical framework for investigating analytic maps
from finite discrete data, elucidating mathematical machinery underlying the
polynomial approximation with least-squares in multivariate situations. Our
approach is to consider the push-forward on the space of locally analytic
functionals, instead of directly handling the analytic map itself. We establish
a methodology enabling appropriate finite-dimensional approximation of the
push-forward from finite discrete data, through the theory of the
Fourier–Borel transform and the Fock space. Moreover, we prove a rigorous
convergence result with a convergence rate. As an application, we prove that it
is not the least-squares polynomial, but the polynomial obtained by truncating
its higher-degree terms, that approximates analytic functions and further
allows for approximation beyond the support of the data distribution. One
advantage of our theory is that it enables us to apply linear algebraic
operations to the finite-dimensional approximation of the push-forward.
Utilizing this, we prove the convergence of a method for approximating an
analytic vector field from finite data of the flow map of an ordinary
differential equation.
[COMMENTS]
30 pages. 2 figures. Comments are welcome
[LINK]
http://arxiv.org/abs/2404.10769v1
[DATE]
2024-04-17 01:53:59+08:00
[CATEGORIES]
cs.LG
Confidential Federated Computations
[AUTHORS]
Hubert Eichner, Daniel Ramage, Kallista Bonawitz, Dzmitry Huba, Tiziano Santoro, Brett McLarnon, Timon Van Overveldt, Nova Fallen, Peter Kairouz, Albert Cheu, Katharine Daly, Adria Gascon, Marco Gruteser, Brendan McMahan
[ABSTRACT]
Federated Learning and Analytics (FLA) have seen widespread adoption by
technology platforms for processing sensitive on-device data. However, basic
FLA systems have privacy limitations: they do not necessarily require
anonymization mechanisms like differential privacy (DP), and provide limited
protections against a potentially malicious service provider. Adding DP to a
basic FLA system currently requires either adding excessive noise to each
device’s updates, or assuming an honest service provider that correctly
implements the mechanism and only uses the privatized outputs. Secure
multiparty computation (SMPC) -based oblivious aggregations can limit the
service provider’s access to individual user updates and improve DP tradeoffs,
but the tradeoffs are still suboptimal, and they suffer from scalability
challenges and susceptibility to Sybil attacks. This paper introduces a novel
system architecture that leverages trusted execution environments (TEEs) and
open-sourcing to both ensure confidentiality of server-side computations and
provide externally verifiable privacy properties, bolstering the robustness and
trustworthiness of private federated computations.
[LINK]
http://arxiv.org/abs/2404.10764v1
[DATE]
2024-04-17 01:47:27+08:00
[CATEGORIES]
cs.LG
Laplace-HDC: Understanding the geometry of binary hyperdimensional computing
[AUTHORS]
Saeid Pourmand, Wyatt D. Whiting, Alireza Aghasi, Nicholas F. Marshall
[ABSTRACT]
This paper studies the geometry of binary hyperdimensional computing (HDC), a
computational scheme in which data are encoded using high-dimensional binary
vectors. We establish a result about the similarity structure induced by the
HDC binding operator and show that the Laplace kernel naturally arises in this
setting, motivating our new encoding method Laplace-HDC, which improves upon
previous methods. We describe how our results indicate limitations of binary
HDC in encoding spatial information from images and discuss potential
solutions, including using Haar convolutional features and the definition of a
translation-equivariant HDC encoding. Several numerical experiments
highlighting the improved accuracy of Laplace-HDC in contrast to alternative
methods are presented. We also numerically study other aspects of the proposed
framework such as robustness and the underlying translation-equivariant
encoding.
[COMMENTS]
23 pages, 7 figures
[LINK]
http://arxiv.org/abs/2404.10759v1
[DATE]
2024-04-17 01:36:21+08:00
[CATEGORIES]
cs.LG
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
[AUTHORS]
Chris Cundy, Rishi Desai, Stefano Ermon
[ABSTRACT]
As reinforcement learning techniques are increasingly applied to real-world
decision problems, attention has turned to how these algorithms use potentially
sensitive information. We consider the task of training a policy that maximizes
reward while minimizing disclosure of certain sensitive state variables through
the actions. We give examples of how this setting covers real-world problems in
privacy for sequential decision-making. We solve this problem in the policy
gradients framework by introducing a regularizer based on the mutual
information (MI) between the sensitive state and the actions. We develop a
model-based stochastic gradient estimator for optimization of
privacy-constrained policies. We also discuss an alternative MI regularizer
that serves as an upper bound to our main MI regularizer and can be optimized
in a model-free setting, and a powerful direct estimator that can be used in an
environment with differentiable dynamics. We contrast previous work in
differentially-private RL to our mutual-information formulation of information
disclosure. Experimental results show that our training method results in
policies that hide the sensitive state, even in challenging high-dimensional
tasks.
[COMMENTS]
Accepted to AISTATS 2024
[LINK]
http://arxiv.org/abs/2012.15019v3
[DATE]
2024-04-17 01:27:34+08:00
[CATEGORIES]
cs.LG
Interpolation and differentiation of alchemical degrees of freedom in machine learning interatomic potentials
[AUTHORS]
Juno Nam, Rafael Gómez-Bombarelli
[ABSTRACT]
Machine learning interatomic potentials (MLIPs) have become a workhorse of
modern atomistic simulations, and recently published universal MLIPs,
pre-trained on large datasets, have demonstrated remarkable accuracy and
generalizability. However, the computational cost of MLIPs limits their
applicability to chemically disordered systems requiring large simulation cells
or to sample-intensive statistical methods. Here, we report the use of
continuous and differentiable alchemical degrees of freedom in atomistic
materials simulations, exploiting the fact that graph neural network MLIPs
represent discrete elements as real-valued tensors. The proposed method
introduces alchemical atoms with corresponding weights into the input graph,
alongside modifications to the message-passing and readout mechanisms of MLIPs,
and allows smooth interpolation between the compositional states of materials.
The end-to-end differentiability of MLIPs enables efficient calculation of the
gradient of energy with respect to the compositional weights. Leveraging these
gradients, we propose methodologies for optimizing the composition of solid
solutions towards target macroscopic properties and conducting alchemical free
energy simulations to quantify the free energy of vacancy formation and
composition changes. The approach offers an avenue for extending the
capabilities of universal MLIPs in the modeling of compositional disorder and
characterizing the phase stabilities of complex materials systems.
[LINK]
http://arxiv.org/abs/2404.10746v1
[DATE]
2024-04-17 01:24:22+08:00
[CATEGORIES]
cs.LG
Settling Constant Regrets in Linear Markov Decision Processes
[AUTHORS]
Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu
[ABSTRACT]
We study the constant regret guarantees in reinforcement learning (RL). Our
objective is to design an algorithm that incurs only finite regret over
infinite episodes with high probability. We introduce an algorithm,
Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs) where
both the transition kernel and the reward function can be approximated by some
linear function up to misspecification level $\zeta$. At the core of
Cert-LSVI-UCB is an innovative certified estimator, which facilitates a
fine-grained concentration analysis for multi-phase value-targeted regression,
enabling us to establish an instance-dependent regret bound that is constant
w.r.t. the number of episodes. Specifically, we demonstrate that for an MDP
characterized by a minimal suboptimality gap $\Delta$, Cert-LSVI-UCB has a
cumulative regret of $\tilde{\mathcal{O}}(d^3H^5/\Delta)$ with high
probability, provided that the misspecification level $\zeta$ is below
$\tilde{\mathcal{O}}(\Delta / (\sqrt{d}H^2))$. Remarkably, this regret bound
remains constant relative to the number of episodes $K$. To the best of our
knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant,
instance-dependent, high-probability regret bound in RL with linear function
approximation for infinite runs without relying on prior distribution
assumptions. This not only highlights the robustness of Cert-LSVI-UCB to model
misspecification but also introduces novel algorithmic designs and analytical
techniques of independent interest.
[COMMENTS]
46 pages, 2 tables
[LINK]
http://arxiv.org/abs/2404.10745v1
[DATE]
2024-04-17 01:23:19+08:00
[CATEGORIES]
cs.LG
Gaussian process learning of nonlinear dynamics
[AUTHORS]
Dongwei Ye, Mengwu Guo
[ABSTRACT]
One of the pivotal tasks in scientific machine learning is to represent
underlying dynamical systems from time series data. Many methods for such
dynamics learning explicitly require the derivatives of state data, which are
not directly available and can be approximated conventionally by finite
differences. However, the discrete approximations of time derivatives may
result in poor estimations when state data are scarce and/or corrupted by
noise, thus compromising the predictiveness of the learned dynamical models. To
overcome this technical hurdle, we propose a new method that learns nonlinear
dynamics through a Bayesian inference of characterizing model parameters. This
method leverages a Gaussian process representation of states, and constructs a
likelihood function using the correlation between state data and their
derivatives, yet prevents explicit evaluations of time derivatives. Through a
Bayesian scheme, a probabilistic estimate of the model parameters is given by
the posterior distribution, and thus a quantification is facilitated for
uncertainties from noisy state data and the learning process. Specifically, we
will discuss the applicability of the proposed method to several typical
scenarios for dynamical systems: identification and estimation with an affine
parametrization, nonlinear parametric approximation without prior knowledge,
and general parameter estimation for a given dynamical system.
[LINK]
http://arxiv.org/abs/2312.12193v2
[DATE]
2024-04-17 01:06:37+08:00
[CATEGORIES]
cs.LG
Insight Gained from Migrating a Machine Learning Model to Intelligence Processing Units
[AUTHORS]
Hieu Le, Zhenhua He, Mai Le, Dhruva K. Chakravorty, Lisa M. Perez, Akhil Chilumuru, Yan Yao, Jiefu Chen
[ABSTRACT]
The discoveries in this paper show that Intelligence Processing Units (IPUs)
offer a viable accelerator alternative to GPUs for machine learning (ML)
applications within the fields of materials science and battery research. We
investigate the process of migrating a model from GPU to IPU and explore
several optimization techniques, including pipelining and gradient
accumulation, aimed at enhancing the performance of IPU-based models.
Furthermore, we have effectively migrated a specialized model to the IPU
platform. This model is employed for predicting effective conductivity, a
parameter crucial in ion transport processes, which govern the performance of
multiple charge and discharge cycles of batteries. The model utilizes a
Convolutional Neural Network (CNN) architecture to perform prediction tasks for
effective conductivity. The performance of this model on the IPU is found to be
comparable to its execution on GPUs. We also analyze the utilization and
performance of Graphcore’s Bow IPU. Through benchmark tests, we observe
significantly improved performance with the Bow IPU when compared to its
predecessor, the Colossus IPU.
[LINK]
http://arxiv.org/abs/2404.10730v1
[DATE]
2024-04-17 01:02:52+08:00
[CATEGORIES]
cs.LG
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning
[AUTHORS]
Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu
[ABSTRACT]
We present the first study on provably efficient randomized exploration in
cooperative multi-agent reinforcement learning (MARL). We propose a unified
algorithm framework for randomized exploration in parallel Markov Decision
Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE
and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy
and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are
flexible in design and easy to implement in practice. For a special class of
parallel MDPs where the transition is (approximately) linear, we theoretically
prove that both CoopTS-PHE and CoopTS-LMC achieve a
$\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication
complexity $\widetilde{\mathcal{O}}(dHM^2)$, where $d$ is the feature
dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is
the number of episodes. This is the first theoretical result for randomized
exploration in cooperative MARL. We evaluate our proposed method on multiple
parallel RL environments, including a deep exploration problem (\textit{i.e.,}
$N$-chain), a video game, and a real-world problem in energy systems. Our
experimental results support that our framework can achieve better performance,
even under conditions of misspecified transition models. Additionally, we
establish a connection between our unified framework and the practical
application of federated learning.
[COMMENTS]
80 pages, 14 figures, 1 table. Hao-Lun Hsu and Weixin Wang
contributed equally to this work
[LINK]
http://arxiv.org/abs/2404.10728v1
[DATE]
2024-04-17 01:01:38+08:00
[CATEGORIES]
cs.LG
How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
[AUTHORS]
Umberto Tomasini, Matthieu Wyart
[ABSTRACT]
Understanding what makes high-dimensional data learnable is a fundamental
question in machine learning. On the one hand, it is believed that the success
of deep learning lies in its ability to build a hierarchy of representations
that become increasingly more abstract with depth, going from simple features
like edges to more complex concepts. On the other hand, learning to be
insensitive to invariances of the task, such as smooth transformations for
image datasets, has been argued to be important for deep networks and it
strongly correlates with their performance. In this work, we aim to explain
this correlation and unify these two viewpoints. We show that by introducing
sparsity to generative hierarchical models of data, the task acquires
insensitivity to spatial transformations that are discrete versions of smooth
transformations. In particular, we introduce the Sparse Random Hierarchy Model
(SRHM), where we observe and rationalize that a hierarchical representation
mirroring the hierarchical model is learnt precisely when such insensitivity is
learnt, thereby explaining the strong correlation between the latter and
performance. Moreover, we quantify how the sample complexity of CNNs learning
the SRHM depends on both the sparsity and hierarchical structure of the task.
[COMMENTS]
9 pages, 6 figures
[LINK]
http://arxiv.org/abs/2404.10727v1
[DATE]
2024-04-17 01:01:27+08:00
[CATEGORIES]
cs.LG
Automatic re-calibration of quantum devices by reinforcement learning
[AUTHORS]
T. Crosta, L. Rebón, F. Vilariño, J. M. Matera, M. Bilkis
[ABSTRACT]
During their operation, due to shifts in environmental conditions, devices
undergo various forms of detuning from their optimal settings. Typically, this
is addressed through control loops, which monitor variables and the device
performance, to maintain settings at their optimal values. Quantum devices are
particularly challenging since their functionality relies on precisely tuning
their parameters. At the same time, the detailed modeling of the environmental
behavior is often computationally unaffordable, while a direct measure of the
parameters defining the system state is costly and introduces extra noise in
the mechanism. In this study, we investigate the application of reinforcement
learning techniques to develop a model-free control loop for continuous
recalibration of quantum device parameters. Furthermore, we explore the
advantages of incorporating minimal environmental noise models. As an example,
the application to numerical simulations of a Kennedy receiver-based
long-distance quantum communication protocol is presented.
[LINK]
http://arxiv.org/abs/2404.10726v1
[DATE]
2024-04-17 00:59:50+08:00
[CATEGORIES]
cs.LG
PCN: A Deep Learning Approach to Jet Tagging Utilizing Novel Graph Construction Methods and Chebyshev Graph Convolutions
[AUTHORS]
Yash Semlani, Mihir Relan, Krithik Ramesh
[ABSTRACT]
Jet tagging is a classification problem in high-energy physics experiments
that aims to identify the collimated sprays of subatomic particles, jets, from
particle collisions and tag them to their emitter particle. Advances in jet
tagging present opportunities for searches of new physics beyond the Standard
Model. Current approaches use deep learning to uncover hidden patterns in
complex collision data. However, the representation of jets as inputs to a deep
learning model have been varied, and often, informative features are withheld
from models. In this study, we propose a graph-based representation of a jet
that encodes the most information possible. To learn best from this
representation, we design Particle Chebyshev Network (PCN), a graph neural
network (GNN) using Chebyshev graph convolutions (ChebConv). ChebConv has been
demonstrated as an effective alternative to classical graph convolutions in
GNNs and has yet to be explored in jet tagging. PCN achieves a substantial
improvement in accuracy over existing taggers and opens the door to future
studies into graph-based representations of jets and ChebConv layers in
high-energy physics experiments. Code is available at
https://github.com/YVSemlani/PCN-Jet-Tagging.
[COMMENTS]
16 pages, 2 figures, and 7 tables
[LINK]
http://arxiv.org/abs/2309.08630v4
[DATE]
2024-04-17 00:57:12+08:00
[CATEGORIES]
cs.LG
Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation
[AUTHORS]
Chanyoung Chung, Georgios Georgakis, Patrick Spieler, Curtis Padgett, Shehryar Khattak
[ABSTRACT]
Understanding terrain topology at long-range is crucial for the success of
off-road robotic missions, especially when navigating at high-speeds. LiDAR
sensors, which are currently heavily relied upon for geometric mapping, provide
sparse measurements when mapping at greater distances. To address this
challenge, we present a novel learning-based approach capable of predicting
terrain elevation maps at long-range using only onboard egocentric images in
real-time. Our proposed method is comprised of three main elements. First, a
transformer-based encoder is introduced that learns cross-view associations
between the egocentric views and prior bird-eye-view elevation map predictions.
Second, an orientation-aware positional encoding is proposed to incorporate the
3D vehicle pose information over complex unstructured terrain with multi-view
visual image features. Lastly, a history-augmented learn-able map embedding is
proposed to achieve better temporal consistency between elevation map
predictions to facilitate the downstream navigational tasks. We experimentally
validate the applicability of our proposed approach for autonomous offroad
robotic navigation in complex and unstructured terrain using real-world offroad
driving data. Furthermore, the method is qualitatively and quantitatively
compared against the current state-of-the-art methods. Extensive field
experiments demonstrate that our method surpasses baseline models in accurately
predicting terrain elevation while effectively capturing the overall terrain
topology at long-ranges. Finally, ablation studies are conducted to highlight
and understand the effect of key components of the proposed approach and
validate their suitability to improve offroad robotic navigation capabilities.
[COMMENTS]
8 pages, 6 figures, Accepted in IEEE Robotics and Automation Letters
[LINK]
http://arxiv.org/abs/2401.17484v2
[DATE]
2024-04-17 00:55:35+08:00
[CATEGORIES]
cs.LG
Dynamic Frequency-Based Fingerprinting Attacks against Modern Sandbox Environments
[AUTHORS]
Debopriya Roy Dipta, Thore Tiemann, Berk Gulmezoglu, Eduard Marin Fabregas, Thomas Eisenbarth
[LINK]
http://arxiv.org/abs/2404.10715v1
[DATE]
2024-04-17 00:45:47+08:00
[CATEGORIES]
cs.LG
RiemannONets: Interpretable Neural Operators for Riemann Problems
[AUTHORS]
Ahmad Peyvan, Vivek Oommen, Ameya D. Jagtap, George Em Karniadakis
[ABSTRACT]
Developing the proper representations for simulating high-speed flows with
strong shock waves, rarefactions, and contact discontinuities has been a
long-standing question in numerical analysis. Herein, we employ neural
operators to solve Riemann problems encountered in compressible flows for
extreme pressure jumps (up to $10^{10}$ pressure ratio). In particular, we
first consider the DeepONet that we train in a two-stage process, following the
recent work of \cite{lee2023training}, wherein the first stage, a basis is
extracted from the trunk net, which is orthonormalized and subsequently is used
in the second stage in training the branch net. This simple modification of
DeepONet has a profound effect on its accuracy, efficiency, and robustness and
leads to very accurate solutions to Riemann problems compared to the vanilla
version. It also enables us to interpret the results physically as the
hierarchical data-driven produced basis reflects all the flow features that
would otherwise be introduced using ad hoc feature expansion layers. We also
compare the results with another neural operator based on the U-Net for low,
intermediate, and very high-pressure ratios that are very accurate for Riemann
problems, especially for large pressure ratios, due to their multiscale nature
but computationally more expensive. Overall, our study demonstrates that simple
neural network architectures, if properly pre-trained, can achieve very
accurate solutions of Riemann problems for real-time forecasting. The source
code, along with its corresponding data, can be found at the following URL:
https://github.com/apey236/RiemannONet/tree/main
[LINK]
http://arxiv.org/abs/2401.08886v2
[DATE]
2024-04-17 00:37:28+08:00
[CATEGORIES]
cs.LG
Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions
[AUTHORS]
Zulqarnain Khan, Davin Hill, Aria Masoomi, Joshua Bone, Jennifer Dy
[ABSTRACT]
Machine learning methods have significantly improved in their predictive
capabilities, but at the same time they are becoming more complex and less
transparent. As a result, explainers are often relied on to provide
interpretability to these black-box prediction models. As crucial diagnostics
tools, it is important that these explainers themselves are robust. In this
paper we focus on one particular aspect of robustness, namely that an explainer
should give similar explanations for similar data inputs. We formalize this
notion by introducing and defining explainer astuteness, analogous to
astuteness of prediction functions. Our formalism allows us to connect
explainer robustness to the predictor’s probabilistic Lipschitzness, which
captures the probability of local smoothness of a function. We provide lower
bound guarantees on the astuteness of a variety of explainers (e.g., SHAP,
RISE, CXPlain) given the Lipschitzness of the prediction function. These
theoretical results imply that locally smooth prediction functions lend
themselves to locally robust explanations. We evaluate these results
empirically on simulated as well as real datasets.
[LINK]
http://arxiv.org/abs/2206.12481v3
[DATE]
2024-04-17 00:27:15+08:00
[CATEGORIES]
cs.LG
Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
[AUTHORS]
Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi, Radu Timofte
[ABSTRACT]
Modern smartphone camera quality heavily relies on the image signal processor
(ISP) to enhance captured raw images, utilizing carefully designed modules to
produce final output images encoded in a standard color space (e.g., sRGB).
Neural-based end-to-end learnable ISPs offer promising advancements,
potentially replacing traditional ISPs with their ability to adapt without
requiring extensive tuning for each new camera model, as is often the case for
nearly every module in traditional ISPs. However, the key challenge with the
recent learning-based ISPs is the urge to collect large paired datasets for
each distinct camera model due to the influence of intrinsic camera
characteristics on the formation of input raw images. This paper tackles this
challenge by introducing a novel method for unpaired learning of raw-to-raw
translation across diverse cameras. Specifically, we propose Rawformer, an
unsupervised Transformer-based encoder-decoder method for raw-to-raw
translation. It accurately maps raw images captured by a certain camera to the
target camera, facilitating the generalization of learnable ISPs to new unseen
cameras. Our method demonstrates superior performance on real camera datasets,
achieving higher accuracy compared to previous state-of-the-art techniques, and
preserving a more robust correlation between the original and translated raw
images.
[COMMENTS]
15 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.10700v1
[DATE]
2024-04-17 00:17:48+08:00
[CATEGORIES]
cs.LG
Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning
[AUTHORS]
Lukas Schäfer, Oliver Slumbers, Stephen McAleer, Yali Du, Stefano V. Albrecht, David Mguni
[ABSTRACT]
Existing value-based algorithms for cooperative multi-agent reinforcement
learning (MARL) commonly rely on random exploration, such as $\epsilon$-greedy,
to explore the environment. However, such exploration is inefficient at finding
effective joint actions in states that require cooperation of multiple agents.
In this work, we propose ensemble value functions for multi-agent exploration
(EMAX), a general framework to seamlessly extend value-based MARL algorithms
with ensembles of value functions. EMAX leverages the ensemble of value
functions to guide the exploration of agents, stabilises their optimisation,
and makes their policies more robust to miscoordination. These benefits are
achieved by using a combination of three techniques. (1) EMAX uses the
uncertainty of value estimates across the ensemble in a UCB policy to guide the
exploration. This exploration policy focuses on parts of the environment which
require cooperation across agents and, thus, enables agents to more efficiently
learn how to cooperate. (2) During the optimisation, EMAX computes target
values as average value estimates across the ensemble. These targets exhibit
lower variance compared to commonly applied target networks, leading to
significant benefits in MARL which commonly suffers from high variance caused
by the exploration and non-stationary policies of other agents. (3) During
evaluation, EMAX selects actions following a majority vote across the ensemble,
which reduces the likelihood of selecting sub-optimal actions. We instantiate
three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and
evaluate them in 21 tasks across four environments. Using ensembles of five
value functions, EMAX improves sample efficiency and final evaluation returns
of these algorithms by 60%, 47%, and 539%, respectively, averaged across 21
tasks.
[COMMENTS]
Preprint. Previously presented at the Adaptive and Learning Agents
Workshop (ALA) at the AAMAS conference 2023
[LINK]
http://arxiv.org/abs/2302.03439v6
[DATE]
2024-04-17 00:13:00+08:00
[CATEGORIES]
cs.LG
Network architecture search of X-ray based scientific applications
[AUTHORS]
Adarsha Balaji, Ramyad Hadidi, Gregory Kollmer, Mohammed E. Fouda, Prasanna Balaprakash
[ABSTRACT]
X-ray and electron diffraction-based microscopy use bragg peak detection and
ptychography to perform 3-D imaging at an atomic resolution. Typically, these
techniques are implemented using computationally complex tasks such as a
Psuedo-Voigt function or solving a complex inverse problem. Recently, the use
of deep neural networks has improved the existing state-of-the-art approaches.
However, the design and development of the neural network models depends on
time and labor intensive tuning of the model by application experts. To that
end, we propose a hyperparameter (HPS) and neural architecture search (NAS)
approach to automate the design and optimization of the neural network models
for model size, energy consumption and throughput. We demonstrate the improved
performance of the auto-tuned models when compared to the manually tuned
BraggNN and PtychoNN benchmark. We study and demonstrate the importance of the
exploring the search space of tunable hyperparameters in enhancing the
performance of bragg peak detection and ptychographic reconstruction. Our NAS
and HPS of (1) BraggNN achieves a 31.03\% improvement in bragg peak detection
accuracy with a 87.57\% reduction in model size, and (2) PtychoNN achieves a
16.77\% improvement in model accuracy and a 12.82\% reduction in model size
when compared to the baseline PtychoNN model. When inferred on the Orin-AGX
platform, the optimized Braggnn and Ptychonn models demonstrate a 10.51\% and
9.47\% reduction in inference latency and a 44.18\% and 15.34\% reduction in
energy consumption when compared to their respective baselines, when inferred
in the Orin-AGX edge platform.
[LINK]
http://arxiv.org/abs/2404.10689v1
[DATE]
2024-04-17 00:09:38+08:00
[CATEGORIES]
cs.LG
Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution
[AUTHORS]
Yutao Yuan, Chun Yuan
[ABSTRACT]
Image super-resolution is a fundamentally ill-posed problem because multiple
valid high-resolution images exist for one low-resolution image.
Super-resolution methods based on diffusion probabilistic models can deal with
the ill-posed nature by learning the distribution of high-resolution images
conditioned on low-resolution images, avoiding the problem of blurry images in
PSNR-oriented methods. However, existing diffusion-based super-resolution
methods have high time consumption with the use of iterative sampling, while
the quality and consistency of generated images are less than ideal due to
problems like color shifting. In this paper, we propose Efficient Conditional
Diffusion Model with Probability Flow Sampling (ECDP) for image
super-resolution. To reduce the time consumption, we design a continuous-time
conditional diffusion model for image super-resolution, which enables the use
of probability flow sampling for efficient generation. Additionally, to improve
the consistency of generated images, we propose a hybrid parametrization for
the denoiser network, which interpolates between the data-predicting
parametrization and the noise-predicting parametrization for different noise
scales. Moreover, we design an image quality loss as a complement to the score
matching loss of diffusion models, further improving the consistency and
quality of super-resolution. Extensive experiments on DIV2K, ImageNet, and
CelebA demonstrate that our method achieves higher super-resolution quality
than existing diffusion-based image super-resolution methods while having lower
time consumption. Our code is available at https://github.com/Yuan-Yutao/ECDP.
[COMMENTS]
AAAI 2024
[LINK]
http://arxiv.org/abs/2404.10688v1
[DATE]
2024-04-17 00:08:59+08:00
[CATEGORIES]
cs.LG
Driver Fatigue Prediction using Randomly Activated Neural Networks for Smart Ridesharing Platforms
[AUTHORS]
Sree Pooja Akula, Mukund Telukunta, Venkata Sriram Siddhardh Nadendla
[ABSTRACT]
Drivers in ridesharing platforms exhibit cognitive atrophy and fatigue as
they accept ride offers along the day, which can have a significant impact on
the overall efficiency of the ridesharing platform. In contrast to the current
literature which focuses primarily on modeling and learning driver’s
preferences across different ride offers, this paper proposes a novel Dynamic
Discounted Satisficing (DDS) heuristic to model and predict driver’s sequential
ride decisions during a given shift. Based on DDS heuristic, a novel stochastic
neural network with random activations is proposed to model DDS heuristic and
predict the final decision made by a given driver. The presence of random
activations in the network necessitated the development of a novel training
algorithm called Sampling-Based Back Propagation Through Time (SBPTT), where
gradients are computed for independent instances of neural networks (obtained
via sampling the distribution of activation threshold) and aggregated to update
the network parameters. Using both simulation experiments as well as on real
Chicago taxi dataset, this paper demonstrates the improved performance of the
proposed approach, when compared to state-of-the-art methods.
[LINK]
http://arxiv.org/abs/2404.10684v1
[DATE]
2024-04-17 00:04:11+08:00
[CATEGORIES]
cs.LG
Noncontact Respiratory Anomaly Detection Using Infrared Light-Wave Sensing
[AUTHORS]
Md Zobaer Islam, Brenden Martin, Carly Gotcher, Tyler Martinez, John F. O’Hara, Sabit Ekin
[ABSTRACT]
Human respiratory rate and its pattern convey essential information about the
physical and psychological states of the subject. Abnormal breathing can
indicate fatal health issues leading to further diagnosis and treatment.
Wireless light-wave sensing (LWS) using incoherent infrared light shows promise
in safe, discreet, efficient, and non-invasive human breathing monitoring
without raising privacy concerns. The respiration monitoring system needs to be
trained on different types of breathing patterns to identify breathing
anomalies.The system must also validate the collected data as a breathing
waveform, discarding any faulty data caused by external interruption, user
movement, or system malfunction. To address these needs, this study simulated
normal and different types of abnormal respiration using a robot that mimics
human breathing patterns. Then, time-series respiration data were collected
using infrared light-wave sensing technology. Three machine learning
algorithms, decision tree, random forest and XGBoost, were applied to detect
breathing anomalies and faulty data. Model performances were evaluated through
cross-validation, assessing classification accuracy, precision and recall
scores. The random forest model achieved the highest classification accuracy of
96.75% with data collected at a 0.5m distance. In general, ensemble models like
random forest and XGBoost performed better than a single model in classifying
the data collected at multiple distances from the light-wave sensing setup.
[COMMENTS]
12 pages, 15 figures, published in IEEE Transactions on Human-Machine
Systems
[LINK]
http://arxiv.org/abs/2301.03713v4
[DATE]
2024-04-17 00:00:09+08:00
[CATEGORIES]
cs.LG
The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text
[AUTHORS]
Yanzhu Guo, Guokan Shang, Michalis Vazirgiannis, Chloé Clavel
[ABSTRACT]
This study investigates the consequences of training language models on
synthetic data generated by their predecessors, an increasingly prevalent
practice given the prominence of powerful generative models. Diverging from the
usual emphasis on performance metrics, we focus on the impact of this training
methodology on linguistic diversity, especially when conducted recursively over
time. To assess this, we adapt and develop a set of novel metrics targeting
lexical, syntactic, and semantic diversity, applying them in recursive
finetuning experiments across various natural language generation tasks in
English. Our findings reveal a consistent decrease in the diversity of the
model outputs through successive iterations, especially remarkable for tasks
demanding high levels of creativity. This trend underscores the potential risks
of training language models on synthetic text, particularly concerning the
preservation of linguistic richness. Our study highlights the need for careful
consideration of the long-term effects of such training approaches on the
linguistic capabilities of language models.
[COMMENTS]
Accepted to NAACL 2024 Findings
[LINK]
http://arxiv.org/abs/2311.09807v2
[DATE]
2024-04-16 23:57:11+08:00
[CATEGORIES]
cs.CL
Self-playing Adversarial Language Game Enhances LLM Reasoning
[AUTHORS]
Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du
[ABSTRACT]
We explore the self-play training procedure of large language models (LLMs)
in a two-player adversarial language game called Adversarial Taboo. In this
game, an attacker and a defender communicate with respect to a target word only
visible to the attacker. The attacker aims to induce the defender to utter the
target word unconsciously, while the defender tries to infer the target word
from the attacker’s utterances. To win the game, both players should have
sufficient knowledge about the target word and high-level reasoning ability to
infer and express in this information-reserved conversation. Hence, we are
curious about whether LLMs’ reasoning ability can be further enhanced by
Self-Play in this Adversarial language Game (SPAG). With this goal, we let LLMs
act as the attacker and play with a copy of itself as the defender on an
extensive range of target words. Through reinforcement learning on the game
outcomes, we observe that the LLMs’ performance uniformly improves on a broad
range of reasoning benchmarks. Furthermore, iteratively adopting this self-play
process can continuously promote LLM’s reasoning ability. The code is at
https://github.com/Linear95/SPAG.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2404.10642v1
[DATE]
2024-04-16 23:16:22+08:00
[CATEGORIES]
cs.CL
cs.LG
WebArena: A Realistic Web Environment for Building Autonomous Agents
[AUTHORS]
Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
[COMMENTS]
Our code, data, environment reproduction resources, and video
demonstrations are publicly available at https://webarena.dev/
[LINK]
http://arxiv.org/abs/2307.13854v4
[DATE]
2024-04-16 23:13:18+08:00
[CATEGORIES]
cs.CL
cs.LG
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
[AUTHORS]
Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan
[ABSTRACT]
Getting large language models (LLMs) to perform well on the downstream tasks
requires pre-training over trillions of tokens. This typically demands a large
number of powerful computational devices in addition to a stable distributed
training framework to accelerate the training. The growing number of
applications leveraging AI/ML had led to a scarcity of the expensive
conventional accelerators (such as GPUs), which begs the need for the
alternative specialized-accelerators that are scalable and cost-efficient. AWS
Trainium is the second-generation machine learning accelerator that has been
purposely built for training large deep learning models. Its corresponding
instance, Amazon EC2 trn1, is an alternative to GPU instances for LLM training.
However, training LLMs with billions of parameters on trn1 is challenging due
to its relatively nascent software ecosystem. In this paper, we showcase HLAT:
a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over
1.8 trillion tokens. The performance of HLAT is benchmarked against popular
open source baseline models including LLaMA and OpenLLaMA, which have been
trained on NVIDIA GPUs and Google TPUs, respectively. On various evaluation
tasks, we show that HLAT achieves model quality on par with the baselines. We
also share the best practice of using the Neuron Distributed Training Library
(NDTL), a customized distributed training library for AWS Trainium to achieve
efficient training. Our work demonstrates that AWS Trainium powered by the NDTL
is able to successfully pre-train state-of-the-art LLM models with high
performance and cost-effectiveness.
[LINK]
http://arxiv.org/abs/2404.10630v1
[DATE]
2024-04-16 23:02:46+08:00
[CATEGORIES]
cs.CL
cs.LG
Anatomy of Industrial Scale Multilingual ASR
[AUTHORS]
Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka
[ABSTRACT]
This paper describes AssemblyAI’s industrial-scale automatic speech
recognition (ASR) system, designed to meet the requirements of large-scale,
multilingual ASR serving various application needs. Our system leverages a
diverse training dataset comprising unsupervised (12.5M hours), supervised
(188k hours), and pseudo-labeled (1.6M hours) data across four languages. We
provide a detailed description of our model architecture, consisting of a
full-context 600M-parameter Conformer encoder pre-trained with BEST-RQ and an
RNN-T decoder fine-tuned jointly with the encoder. Our extensive evaluation
demonstrates competitive word error rates (WERs) against larger and more
computationally expensive models, such as Whisper large and Canary-1B.
Furthermore, our architectural choices yield several key advantages, including
an improved code-switching capability, a 5x inference speedup compared to an
optimized Whisper baseline, a 30% reduction in hallucination rate on speech
data, and a 90% reduction in ambient noise compared to Whisper, along with
significantly improved time-stamp accuracy. Throughout this work, we adopt a
system-centric approach to analyzing various aspects of fully-fledged ASR
models to gain practically relevant insights useful for real-world services
operating at scale.
[LINK]
http://arxiv.org/abs/2404.09841v2
[DATE]
2024-04-16 22:55:13+08:00
[CATEGORIES]
cs.CL
cs.LG
Explicitly Representing Syntax Improves Sentence-to-layout Prediction of Unexpected Situations
[AUTHORS]
Wolf Nuyts, Ruben Cartuyvels, Marie-Francine Moens
[ABSTRACT]
Recognizing visual entities in a natural language sentence and arranging them
in a 2D spatial layout require a compositional understanding of language and
space. This task of layout prediction is valuable in text-to-image synthesis as
it allows localized and controlled in-painting of the image. In this
comparative study it is shown that we can predict layouts from language
representations that implicitly or explicitly encode sentence syntax, if the
sentences mention similar entity-relationships to the ones seen during
training. To test compositional understanding, we collect a test set of
grammatically correct sentences and layouts describing compositions of entities
and relations that unlikely have been seen during training. Performance on this
test set substantially drops, showing that current models rely on correlations
in the training data and have difficulties in understanding the structure of
the input sentences. We propose a novel structural loss function that better
enforces the syntactic structure of the input sentence and show large
performance gains in the task of 2D spatial layout prediction conditioned on
text. The loss has the potential to be used in other generation tasks where a
tree-like structure underlies the conditioning modality. Code, trained models
and the USCOCO evaluation set are available via github.
[COMMENTS]
Published in TACL
[LINK]
http://arxiv.org/abs/2401.14212v2
[DATE]
2024-04-16 22:25:39+08:00
[CATEGORIES]
cs.CL
Linguistic Analysis using Paninian System of Sounds and Finite State Machines
[AUTHORS]
Shreekanth M Prabhu, Abhisek Midye
[ABSTRACT]
The study of spoken languages comprises phonology, morphology, and grammar.
Analysis of a language can be based on its syntax, semantics, and pragmatics.
The languages can be classified as root languages, inflectional languages, and
stem languages. All these factors lead to the formation of vocabulary which has
commonality/similarity as well as distinct and subtle differences across
languages. In this paper, we make use of Paninian system of sounds to construct
a phonetic map and then words are represented as state transitions on the
phonetic map. Each group of related words that cut across languages is
represented by a m-language (morphological language). Morphological Finite
Automata (MFA) are defined that accept the words belonging to a given
m-language. This exercise can enable us to better understand the
inter-relationships between words in spoken languages in both language-agnostic
and language-cognizant manner. Based on our study and analysis, we propose an
Ecosystem Model for Linguistic Development with Sanskrit at the core, in place
of the widely accepted family tree model.
[COMMENTS]
47 Pages, 18 Figures, 24 Tables
[LINK]
http://arxiv.org/abs/2301.12463v2
[DATE]
2024-04-16 22:19:58+08:00
[CATEGORIES]
cs.CL
The application of Augmented Reality (AR) in Remote Work and Education
[AUTHORS]
Keqin Li, Peng Xirui, Jintong Song, Bo Hong, Jin Wang
[ABSTRACT]
With the rapid advancement of technology, Augmented Reality (AR) technology,
known for its ability to deeply integrate virtual information with the real
world, is gradually transforming traditional work modes and teaching methods.
Particularly in the realms of remote work and online education, AR technology
demonstrates a broad spectrum of application prospects. This paper delves into
the application potential and actual effects of AR technology in remote work
and education. Through a systematic literature review, this study outlines the
key features, advantages, and challenges of AR technology. Based on theoretical
analysis, it discusses the scientific basis and technical support that AR
technology provides for enhancing remote work efficiency and promoting
innovation in educational teaching models. Additionally, by designing an
empirical research plan and analyzing experimental data, this article reveals
the specific performance and influencing factors of AR technology in practical
applications. Finally, based on the results of the experiments, this research
summarizes the application value of AR technology in remote work and education,
looks forward to its future development trends, and proposes forward-looking
research directions and strategic suggestions, offering empirical foundation
and theoretical guidance for further promoting the in-depth application of AR
technology in related fields.
[LINK]
http://arxiv.org/abs/2404.10579v1
[DATE]
2024-04-16 22:04:46+08:00
[CATEGORIES]
cs.CL
Language of Bargaining
[AUTHORS]
Mourad Heddaya, Solomon Dworkin, Chenhao Tan, Rob Voigt, Alexander Zentefis
[COMMENTS]
ACL 2023 Main Conference
[LINK]
http://arxiv.org/abs/2306.07117v2
[DATE]
2024-04-16 21:19:04+08:00
[CATEGORIES]
cs.CL
CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity
[AUTHORS]
Moshe Berchansky, Daniel Fleischer, Moshe Wasserblat, Peter Izsak
[ABSTRACT]
State-of-the-art performance in QA tasks is currently achieved by systems
employing Large Language Models (LLMs), however these models tend to
hallucinate information in their responses. One approach focuses on enhancing
the generation process by incorporating attribution from the given input to the
output. However, the challenge of identifying appropriate attributions and
verifying their accuracy against a source is a complex task that requires
significant improvements in assessing such systems. We introduce an
attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy
of attributions. This approach focuses the reasoning process on generating an
attribution-centric output. Evaluations on two context-enhanced
question-answering datasets using GPT-4 demonstrate improved accuracy and
correctness of attributions. In addition, the combination of our method with
finetuning enhances the response and attribution accuracy of two smaller LLMs,
showing their potential to outperform GPT-4 in some cases.
[LINK]
http://arxiv.org/abs/2404.10513v1
[DATE]
2024-04-16 20:37:10+08:00
[CATEGORIES]
cs.CL
cs.LG
Self-Supervised Visual Preference Alignment
[AUTHORS]
Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang
[ABSTRACT]
This paper makes the first attempt towards unsupervised preference alignment
in Vision-Language Models (VLMs). We generate chosen and rejected responses
with regard to the original and augmented image pairs, and conduct preference
alignment with direct preference optimization. It is based on a core idea:
properly designed augmentation to the image input will induce VLM to generate
false but hard negative responses, which helps the model to learn from and
produce more robust and powerful answers. The whole pipeline no longer hinges
on supervision from GPT4 or human involvement during alignment, and is highly
efficient with few lines of code. With only 8k randomly sampled unsupervised
data, it achieves 90\% relative score to GPT-4 on complex reasoning in
LLaVA-Bench, and improves LLaVA-7B/13B by 6.7\%/5.6\% score on complex
multi-modal benchmark MM-Vet. Visualizations shows its improved ability to
align with user-intentions. A series of ablations are firmly conducted to
reveal the latent mechanism of the approach, which also indicates its potential
towards further scaling. Code will be available.
[LINK]
http://arxiv.org/abs/2404.10501v1
[DATE]
2024-04-16 20:19:54+08:00
[CATEGORIES]
cs.CL
cs.LG
When Emotional Stimuli meet Prompt Designing: An Auto-Prompt Graphical Paradigm
[AUTHORS]
Chenggian Ma, Xiangyu Zhao, Chunhui Zhang, Yanzhao Qin, Wentao Zhang
[ABSTRACT]
With the development of Large Language Models (LLM), numerous prompts have
been proposed, each with a rich set of features and their own merits. This
paper summarizes the prompt words for large language models (LLMs),
categorizing them into stimulating and framework types, and proposes an
Auto-Prompt Graphical Paradigm(APGP) that combines both stimulating and
framework prompts to enhance the problem-solving capabilities of LLMs across
multiple domains, then exemplifies it with a framework that adheres to this
paradigm. The framework involves automated prompt generation and consideration
of emotion-stimulus factors, guiding LLMs in problem abstraction, diversified
solutions generation, comprehensive optimization, and self-verification after
providing answers, ensuring solution accuracy. Compared to traditional stimuli
and framework prompts, this framework integrates the advantages of both by
adopting automated approaches inspired by APE work, overcoming the limitations
of manually designed prompts. Test results on the ruozhiba and BBH datasets
demonstrate that this framework can effectively improve the efficiency and
accuracy of LLMs in problem-solving, paving the way for new applications of
LLMs.
[COMMENTS]
9 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.10500v1
[DATE]
2024-04-16 20:19:08+08:00
[CATEGORIES]
cs.CL
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
[AUTHORS]
Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria
[ABSTRACT]
Generative multimodal content is increasingly prevalent in much of the
content creation arena, as it has the potential to allow artists and media
personnel to create pre-production mockups by quickly bringing their ideas to
life. The generation of audio from text prompts is an important aspect of such
processes in the music and film industry. Many of the recent diffusion-based
text-to-audio models focus on training increasingly sophisticated diffusion
models on a large set of datasets of prompt-audio pairs. These models do not
explicitly focus on the presence of concepts or events and their temporal
ordering in the output audio with respect to the input prompt. Our hypothesis
is focusing on how these aspects of audio generation could improve audio
generation performance in the presence of limited data. As such, in this work,
using an existing text-to-audio model Tango, we synthetically create a
preference dataset where each prompt has a winner audio output and some loser
audio outputs for the diffusion model to learn from. The loser outputs, in
theory, have some concepts from the prompt missing or in an incorrect order. We
fine-tune the publicly available Tango text-to-audio model using diffusion-DPO
(direct preference optimization) loss on our preference dataset and show that
it leads to improved audio output over Tango and AudioLDM2, in terms of both
automatic- and manual-evaluation metrics.
[COMMENTS]
https://github.com/declare-lab/tango
[LINK]
http://arxiv.org/abs/2404.09956v2
[DATE]
2024-04-16 20:12:39+08:00
[CATEGORIES]
cs.CL
No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks
[AUTHORS]
Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia Ananiadou, Wanlong Yu, Jimin Huang, Qianqian Xie
[ABSTRACT]
While the progression of Large Language Models (LLMs) has notably propelled
financial analysis, their application has largely been confined to singular
language realms, leaving untapped the potential of bilingual Chinese-English
capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating
the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis.
ICE-PIXIU uniquely integrates a spectrum of Chinese tasks, alongside translated
and original English datasets, enriching the breadth and depth of bilingual
financial modeling. It provides unrestricted access to diverse model variants,
a substantial compilation of diverse cross-lingual and multi-modal instruction
data, and an evaluation benchmark with expert annotations, comprising 10 NLP
tasks, 20 bilingual specific tasks, totaling 95k datasets. Our thorough
evaluation emphasizes the advantages of incorporating these bilingual datasets,
especially in translation tasks and utilizing original English data, enhancing
both linguistic flexibility and analytical acuity in financial contexts.
Notably, ICE-INTENT distinguishes itself by showcasing significant enhancements
over conventional LLMs and existing financial LLMs in bilingual milieus,
underscoring the profound impact of robust bilingual data on the accuracy and
efficacy of financial NLP.
[COMMENTS]
24 pages, 5 figures, 12 tables, including Appendix
[LINK]
http://arxiv.org/abs/2403.06249v2
[DATE]
2024-04-16 20:05:33+08:00
[CATEGORIES]
cs.CL
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
[AUTHORS]
Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin
[ABSTRACT]
Large language models (LLMs) have achieved success in acting as agents, which
interact with environments through tools such as search engines. However, LLMs
are optimized for language generation instead of tool use during training or
alignment, limiting their effectiveness as agents. To resolve this problem,
previous work has first collected interaction trajectories between LLMs and
environments, using only trajectories that successfully finished the task to
fine-tune smaller models, making fine-tuning data scarce and acquiring it both
difficult and costly. Discarding failed trajectories also leads to significant
wastage of data and resources and limits the possible optimization paths during
fine-tuning. In this paper, we argue that unsuccessful trajectories offer
valuable insights, and LLMs can learn from these trajectories through
appropriate quality control and fine-tuning strategies. By simply adding a
prefix or suffix that tells the model whether to generate a successful
trajectory during training, we improve model performance by a large margin on
mathematical reasoning, multi-hop question answering, and strategic question
answering tasks. We further analyze the inference results and find that our
method provides a better trade-off between valuable information and errors in
unsuccessful trajectories. To our knowledge, we are the first to demonstrate
the value of negative trajectories and their application in agent-tunning
scenarios. Our findings offer guidance for developing better agent-tuning
methods and low-resource data usage techniques.
[COMMENTS]
Agent, LLM, Large Language Model
[LINK]
http://arxiv.org/abs/2402.11651v2
[DATE]
2024-04-16 19:41:13+08:00
[CATEGORIES]
cs.CL
How Good Are LLMs at Out-of-Distribution Detection?
[AUTHORS]
Bo Liu, Liming Zhan, Zexin Lu, Yujie Feng, Lei Xue, Xiao-Ming Wu
[ABSTRACT]
Out-of-distribution (OOD) detection plays a vital role in enhancing the
reliability of machine learning (ML) models. The emergence of large language
models (LLMs) has catalyzed a paradigm shift within the ML community,
showcasing their exceptional capabilities across diverse natural language
processing tasks. While existing research has probed OOD detection with
relative small-scale Transformers like BERT, RoBERTa and GPT-2, the stark
differences in scales, pre-training objectives, and inference paradigms call
into question the applicability of these findings to LLMs. This paper embarks
on a pioneering empirical investigation of OOD detection in the domain of LLMs,
focusing on LLaMA series ranging from 7B to 65B in size. We thoroughly evaluate
commonly-used OOD detectors, scrutinizing their performance in both zero-grad
and fine-tuning scenarios. Notably, we alter previous discriminative
in-distribution fine-tuning into generative fine-tuning, aligning the
pre-training objective of LLMs with downstream tasks. Our findings unveil that
a simple cosine distance OOD detector demonstrates superior efficacy,
outperforming other OOD detectors. We provide an intriguing explanation for
this phenomenon by highlighting the isotropic nature of the embedding spaces of
LLMs, which distinctly contrasts with the anisotropic property observed in
smaller BERT family models. The new insight enhances our understanding of how
LLMs detect OOD data, thereby enhancing their adaptability and reliability in
dynamic environments. We have released the source code at
\url{https://github.com/Awenbocc/LLM-OOD} for other researchers to reproduce
our results.
[COMMENTS]
Accepted at COLING 2024
[LINK]
http://arxiv.org/abs/2308.10261v4
[DATE]
2024-04-16 19:38:35+08:00
[CATEGORIES]
cs.CL
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
[AUTHORS]
Yu Li, Zhihua Wei, Han Jiang, Chuanyang Gong
[ABSTRACT]
Despite the remarkable achievements of language models (LMs) across a broad
spectrum of tasks, their propensity for generating toxic outputs remains a
prevalent concern. Current solutions involving fine-tuning or auxiliary models
usually require extensive memory and computational resources, rendering them
less practical for deployment in large language models (LLMs). In this paper,
we propose DeStein, a novel method that detoxififies LMs by altering their
internal representations in the activation space with lower resource and time
cost. Specifically, we leverage self-induced steering pairs to identify
detoxification vectors through arithmetic operations in the activation space.
During inference, detoxification is achieved by blending the detoxification
vectors with the original representations. Empirical results demonstrate that
our method significantly outperforms previous state-of-the-art approaches on
popular detoxification metrics, while also maintaining satisfactory generation
quality and diversity. Furthermore, we extend our method to multiple LLMs,
demonstrating its practicality and scalability. Warning: some example model
outputs contain highly offensive or disturbing text.
[LINK]
http://arxiv.org/abs/2404.10464v1
[DATE]
2024-04-16 19:07:48+08:00
[CATEGORIES]
cs.CL
A Systematic Review of Aspect-based Sentiment Analysis (ABSA): Domains, Methods, and Trends
[AUTHORS]
Yan Cathy Hua, Paul Denny, Katerina Taskova, Jörg Wicker
[ABSTRACT]
Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment
analysis that identifies aspects and their associated opinions from a given
text. With the surge of digital opinionated text data, ABSA gained increasing
popularity for its ability to mine more detailed and targeted insights. Many
review papers on ABSA subtasks and solution methodologies exist, however, few
focus on trends over time or systemic issues relating to research application
domains, datasets, and solution approaches. To fill the gap, this paper
presents a Systematic Literature Review (SLR) of ABSA studies with a focus on
trends and high-level relationships among these fundamental components. This
review is one of the largest SLRs on ABSA, and also, to our knowledge, the
first that systematically examines the trends and inter-relations among ABSA
research and data distribution across domains and solution paradigms and
approaches. Our sample includes 519 primary studies screened from 4191 search
results without time constraints via an innovative automatic filtering process.
Our quantitative analysis not only identifies trends in nearly two decades of
ABSA research development but also unveils a systemic lack of dataset and
domain diversity as well as domain mismatch that may hinder the development of
future ABSA research. We discuss these findings and their implications and
propose suggestions for future research.
[LINK]
http://arxiv.org/abs/2311.10777v4
[DATE]
2024-04-16 18:59:11+08:00
[CATEGORIES]
cs.CL
cs.LG
Language Proficiency and F0 Entrainment: A Study of L2 English Imitation in Italian, French, and Slovak Speakers
[AUTHORS]
Zheng Yuan, Štefan Beňuš, Alessandro D’Ausilio
[ABSTRACT]
This study explores F0 entrainment in second language (L2) English speech
imitation during an Alternating Reading Task (ART). Participants with Italian,
French, and Slovak native languages imitated English utterances, and their F0
entrainment was quantified using the Dynamic Time Warping (DTW) distance
between the parameterized F0 contours of the imitated utterances and those of
the model utterances. Results indicate a nuanced relationship between L2
English proficiency and entrainment: speakers with higher proficiency generally
exhibit less entrainment in pitch variation and declination. However, within
dyads, the more proficient speakers demonstrate a greater ability to mimic
pitch range, leading to increased entrainment. This suggests that proficiency
influences entrainment differently at individual and dyadic levels,
highlighting the complex interplay between language skill and prosodic
adaptation.
[COMMENTS]
Accepted at Speech Prosody 2024
[LINK]
http://arxiv.org/abs/2404.10440v1
[DATE]
2024-04-16 18:10:19+08:00
[CATEGORIES]
cs.CL
On Training Data Influence of GPT Models
[AUTHORS]
Qingyi Liu, Yekun Chai, Shuohuan Wang, Yu Sun, Qiwei Peng, Keze Wang, Hua Wu
[ABSTRACT]
Amidst the rapid advancements in generative language models, the
investigation of how training data shapes the performance of GPT models is
still emerging. This paper presents GPTfluence, a novel approach that leverages
a featurized simulation to assess the impact of training examples on the
training dynamics of GPT models. Our approach not only traces the influence of
individual training instances on performance trajectories, such as loss and
other key metrics, on targeted test points but also enables a comprehensive
comparison with existing methods across various training scenarios in GPT
models, ranging from 14 million to 2.8 billion parameters, across a range of
downstream tasks. Contrary to earlier methods that struggle with generalization
to new data, GPTfluence introduces a parameterized simulation of training
dynamics, demonstrating robust generalization capabilities to unseen training
data. This adaptability is evident across both fine-tuning and
instruction-tuning scenarios, spanning tasks in natural language understanding
and generation. We will make our code and data publicly available.
[LINK]
http://arxiv.org/abs/2404.07840v2
[DATE]
2024-04-16 18:05:27+08:00
[CATEGORIES]
cs.CL
cs.LG
A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets
[AUTHORS]
Tanja Samardzic, Ximena Gutierrez, Christian Bentz, Steven Moran, Olga Pelloni
[ABSTRACT]
Typologically diverse benchmarks are increasingly created to track the
progress achieved in multilingual NLP. Linguistic diversity of these data sets
is typically measured as the number of languages or language families included
in the sample, but such measures do not consider structural properties of the
included languages. In this paper, we propose assessing linguistic diversity of
a data set against a reference language sample as a means of maximising
linguistic diversity in the long run. We represent languages as sets of
features and apply a version of the Jaccard index suitable for comparing sets
of measures. In addition to the features extracted from typological data bases,
we propose an automatic text-based measure, which can be used as a means of
overcoming the well-known problem of data sparsity in manually collected
features. Our diversity score is interpretable in terms of linguistic features
and can identify the types of languages that are not represented in a data set.
Using our method, we analyse a range of popular multilingual data sets (UD,
Bible100, mBERT, XTREME, XGLUE, XNLI, XCOPA, TyDiQA, XQuAD). In addition to
ranking these data sets, we find, for example, that (poly)synthetic languages
are missing in almost all of them.
[COMMENTS]
Accepted to NAACL 2024 Findings
[LINK]
http://arxiv.org/abs/2403.03909v2
[DATE]
2024-04-16 18:00:41+08:00
[CATEGORIES]
cs.CL
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
[AUTHORS]
Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao
[ABSTRACT]
This research aims to accelerate the inference speed of large language models
(LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel
\textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative
approach designed for achieving lossless acceleration of LLMs. By integrating
semi-autoregressive inference and speculative decoding capabilities, SPACE
uniquely enables autoregressive LLMs to parallelize token generation and
verification. This is realized through a specialized semi-autoregressive
supervised fine-tuning process that equips existing LLMs with the ability to
simultaneously predict multiple tokens. Additionally, an auto-correct decoding
algorithm facilitates the simultaneous generation and verification of token
sequences within a single model invocation. Through extensive experiments on a
range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x
on HumanEval-X while maintaining output quality.
[LINK]
http://arxiv.org/abs/2402.11809v2
[DATE]
2024-04-16 16:36:31+08:00
[CATEGORIES]
cs.CL
cs.LG
Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering
[AUTHORS]
Yuqi Wang, Boran Jiang, Yi Luo, Dawei He, Peng Cheng, Liangcai Gao
[ABSTRACT]
Large language models (LLMs), such as GPT3.5, GPT4 and LLAMA2 perform
surprisingly well and outperform human experts on many tasks. However, in many
domain-specific evaluations, these LLMs often suffer from hallucination
problems due to insufficient training of relevant corpus. Furthermore,
fine-tuning large models may face problems such as the LLMs are not open source
or the construction of high-quality domain instruction is difficult. Therefore,
structured knowledge databases such as knowledge graph can better provide
domain back- ground knowledge for LLMs and make full use of the reasoning and
analysis capabilities of LLMs. In some previous works, LLM was called multiple
times to determine whether the current triplet was suitable for inclusion in
the subgraph when retrieving subgraphs through a question. Especially for the
question that require a multi-hop reasoning path, frequent calls to LLM will
consume a lot of computing power. Moreover, when choosing the reasoning path,
LLM will be called once for each step, and if one of the steps is selected
incorrectly, it will lead to the accumulation of errors in the following steps.
In this paper, we integrated and optimized a pipeline for selecting reasoning
paths from KG based on LLM, which can reduce the dependency on LLM. In
addition, we propose a simple and effective subgraph retrieval method based on
chain of thought (CoT) and page rank which can returns the paths most likely to
contain the answer. We conduct experiments on three datasets: GenMedGPT-5k
[14], WebQuestions [2], and CMCQA [21]. Finally, RoK can demonstrate that using
fewer LLM calls can achieve the same results as previous SOTAs models.
[LINK]
http://arxiv.org/abs/2404.10384v1
[DATE]
2024-04-16 16:28:16+08:00
[CATEGORIES]
cs.CL
Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese
[AUTHORS]
Rifki Afina Putri, Faiz Ghifari Haznitrama, Dea Adhista, Alice Oh
[ABSTRACT]
Large Language Models (LLMs) are increasingly being used to generate
synthetic data for training and evaluating models. However, it is unclear
whether they can generate a good quality of question answering (QA) dataset
that incorporates knowledge and cultural nuance embedded in a language,
especially for low-resource languages. In this study, we investigate the
effectiveness of using LLMs in generating culturally relevant commonsense QA
datasets for Indonesian and Sundanese languages. To do so, we create datasets
for these languages using various methods involving both LLMs and human
annotators, resulting in ~4.5K questions per language (~9K in total), making
our dataset the largest of its kind. Our experiments show that automatic data
adaptation from an existing English dataset is less effective for Sundanese.
Interestingly, using the direct generation method on the target language, GPT-4
Turbo can generate questions with adequate general knowledge in both languages,
albeit not as culturally ‘deep’ as humans. We also observe a higher occurrence
of fluency errors in the Sundanese dataset, highlighting the discrepancy
between medium- and lower-resource languages.
[LINK]
http://arxiv.org/abs/2402.17302v2
[DATE]
2024-04-16 15:41:12+08:00
[CATEGORIES]
cs.CL
Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs
[AUTHORS]
Syed Mekael Wasti, Ken Q. Pu, Ali Neshati
[ABSTRACT]
The evolution of Large Language Models (LLMs) has showcased remarkable
capacities for logical reasoning and natural language comprehension. These
capabilities can be leveraged in solutions that semantically and textually
model complex problems. In this paper, we present our efforts toward
constructing a framework that can serve as an intermediary between a user and
their user interface (UI), enabling dynamic and real-time interactions. We
employ a system that stands upon textual semantic mappings of UI components, in
the form of annotations. These mappings are stored, parsed, and scaled in a
custom data structure, supplementary to an agent-based prompting backend
engine. Employing textual semantic mappings allows each component to not only
explain its role to the engine but also provide expectations. By comprehending
the needs of both the user and the components, our LLM engine can classify the
most appropriate application, extract relevant parameters, and subsequently
execute precise predictions of the user’s expected actions. Such an integration
evolves static user interfaces into highly dynamic and adaptable solutions,
introducing a new frontier of intelligent and responsive user experiences.
[COMMENTS]
Accepted as peer-reviewed publication
[LINK]
http://arxiv.org/abs/2402.07938v2
[DATE]
2024-04-16 15:39:05+08:00
[CATEGORIES]
cs.CL
cs.LG
Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models
[AUTHORS]
Xiaoshu Chen, Sihang Zhou, Ke Liang, Xinwang Liu
[ABSTRACT]
Chain of thought finetuning aims to endow small student models with reasoning
capacity to improve their performance towards a specific task by allowing them
to imitate the reasoning procedure of large language models (LLMs) beyond
simply predicting the answer to the question. However, the existing methods 1)
generate rationale before the answer, making their answer correctness sensitive
to the hallucination in the rationale;2) force the student model to repeat the
exact LLMs rationale expression word-after-word, which could have the model
biased towards learning the expression in rationale but count against the model
from understanding the core logic behind it. Therefore, we propose a robust
Post-Semantic-Thinking (PST) strategy to generate answers before rationale.
Thanks to this answer-first setting, 1) the answering procedure can escape from
the adverse effects caused by hallucinations in the rationale; 2) the complex
reasoning procedure is tightly bound with the relatively concise answer, making
the reasoning for questions easier with the prior information in the answer; 3)
the efficiency of the method can also benefit from the setting since users can
stop the generation right after answers are outputted when inference is
conducted. Furthermore, the PST strategy loose the constraint against the
generated rationale to be close to the LLMs gold standard in the hidden
semantic space instead of the vocabulary space, thus making the small student
model better comprehend the semantic reasoning logic in rationale. Extensive
experiments conducted across 12 reasoning tasks demonstrate the effectiveness
of PST.
[LINK]
http://arxiv.org/abs/2404.09170v2
[DATE]
2024-04-16 15:38:51+08:00
[CATEGORIES]
cs.CL
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
[AUTHORS]
Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou
[ABSTRACT]
The quadratic complexity and weak length extrapolation of Transformers limits
their ability to scale to long sequences, and while sub-quadratic solutions
like linear attention and state space models exist, they empirically
underperform Transformers in pretraining efficiency and downstream task
accuracy. We introduce Megalodon, a neural architecture for efficient sequence
modeling with unlimited context length. Megalodon inherits the architecture of
Mega (exponential moving average with gated attention), and further introduces
multiple technical components to improve its capability and stability,
including complex exponential moving average (CEMA), timestep normalization
layer, normalized attention mechanism and pre-norm with two-hop residual
configuration. In a controlled head-to-head comparison with Llama2, Megalodon
achieves better efficiency than Transformer in the scale of 7 billion
parameters and 2 trillion training tokens. Megalodon reaches a training loss of
1.70, landing mid-way between Llama2-7B (1.75) and 13B (1.67). Code:
https://github.com/XuezheMax/megalodon
[COMMENTS]
9 pages, 6 figures and 8 tables
[LINK]
http://arxiv.org/abs/2404.08801v2
[DATE]
2024-04-16 15:27:58+08:00
[CATEGORIES]
cs.LG
cs.CL
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
[AUTHORS]
Gang Liu, Jinlong He, Pengfei Li, Genrong He, Zhaolin Chen, Shenjun Zhong
[ABSTRACT]
Multimodal large language models (MLLMs) represent an evolutionary expansion
in the capabilities of traditional large language models, enabling them to
tackle challenges that surpass the scope of purely text-based applications. It
leverages the knowledge previously encoded within these language models,
thereby enhancing their applicability and functionality in the reign of
multimodal contexts. Recent works investigate the adaptation of MLLMs as a
universal solution to address medical multi-modal problems as a generative
task. In this paper, we propose a parameter efficient framework for fine-tuning
MLLMs, specifically validated on medical visual question answering (Med-VQA)
and medical report generation (MRG) tasks, using public benchmark datasets. We
also introduce an evaluation metric using the 5-point Likert scale and its
weighted average value to measure the quality of the generated reports for MRG
tasks, where the scale ratings are labelled by both humans manually and the
GPT-4 model. We further assess the consistency of performance metrics across
traditional measures, GPT-4, and human ratings for both VQA and MRG tasks. The
results indicate that semantic similarity assessments using GPT-4 align closely
with human annotators and provide greater stability, yet they reveal a
discrepancy when compared to conventional lexical similarity measurements. This
questions the reliability of lexical similarity metrics for evaluating the
performance of generative models in Med-VQA and report generation tasks.
Besides, our fine-tuned model significantly outperforms GPT-4v. This indicates
that without additional fine-tuning, multi-modal models like GPT-4v do not
perform effectively on medical imaging tasks. The code will be available here:
https://github.com/jinlHe/PeFoMed.
[COMMENTS]
12 pages, 8 figures, 12 tables
[LINK]
http://arxiv.org/abs/2401.02797v2
[DATE]
2024-04-16 14:50:58+08:00
[CATEGORIES]
cs.CL
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience
[AUTHORS]
Haixia Han, Tingyun Li, Shisong Chen, Jie Shi, Chengyu Du, Yanghua Xiao, Jiaqing Liang, Xin Lin
[ABSTRACT]
Large Language Models (LLMs) have exhibited remarkable performance across
various downstream tasks, but they may generate inaccurate or false information
with a confident tone. One of the possible solutions is to empower the LLM
confidence expression capability, in which the confidence expressed can be
well-aligned with the true probability of the generated answer being correct.
However, leveraging the intrinsic ability of LLMs or the signals from the
output logits of answers proves challenging in accurately capturing the
response uncertainty in LLMs. Therefore, drawing inspiration from cognitive
diagnostics, we propose a method of Learning from Past experience (LePe) to
enhance the capability for confidence expression. Specifically, we first
identify three key problems: (1) How to capture the inherent confidence of the
LLM? (2) How to teach the LLM to express confidence? (3) How to evaluate the
confidence expression of the LLM? Then we devise three stages in LePe to deal
with these problems. Besides, to accurately capture the confidence of an LLM
when constructing the training data, we design a complete pipeline including
question preparation and answer sampling. We also conduct experiments using the
Llama family of LLMs to verify the effectiveness of our proposed method on four
datasets.
[LINK]
http://arxiv.org/abs/2404.10315v1
[DATE]
2024-04-16 14:47:49+08:00
[CATEGORIES]
cs.CL
Scaling Properties of Speech Language Models
[AUTHORS]
Santiago Cuervo, Ricard Marxer
[ABSTRACT]
Speech Language Models (SLMs) aim to learn language from raw audio, without
textual resources. Despite significant advances, our current models exhibit
weak syntax and semantic abilities. However, if the scaling properties of
neural language models hold for the speech modality, these abilities will
improve as the amount of compute used for training increases. In this paper, we
use models of this scaling behavior to estimate the scale at which our current
methods will yield a SLM with the English proficiency of text-based Large
Language Models (LLMs). We establish a strong correlation between pre-training
loss and downstream syntactic and semantic performance in SLMs and LLMs, which
results in predictable scaling of linguistic performance. We show that the
linguistic performance of SLMs scales up to three orders of magnitude more
slowly than that of text-based LLMs. Additionally, we study the benefits of
synthetic data designed to boost semantic understanding and the effects of
coarser speech tokenization.
[LINK]
http://arxiv.org/abs/2404.00685v2
[DATE]
2024-04-16 14:46:18+08:00
[CATEGORIES]
cs.CL
Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models
[AUTHORS]
Linan Yue, Qi Liu, Lili Zhao, Li Wang, Weibo Gao, Yanqing An
[ABSTRACT]
With the development of legal intelligence, Criminal Court View Generation
has attracted much attention as a crucial task of legal intelligence, which
aims to generate concise and coherent texts that summarize case facts and
provide explanations for verdicts. Existing researches explore the key
information in case facts to yield the court views. Most of them employ a
coarse-grained approach that partitions the facts into broad segments (e.g.,
verdict-related sentences) to make predictions. However, this approach fails to
capture the complex details present in the case facts, such as various criminal
elements and legal events. To this end, in this paper, we propose an Event
Grounded Generation (EGG) method for criminal court view generation with
cooperative (Large) Language Models, which introduces the fine-grained event
information into the generation. Specifically, we first design a LLMs-based
extraction method that can extract events in case facts without massive
annotated events. Then, we incorporate the extracted events into court view
generation by merging case facts and events. Besides, considering the
computational burden posed by the use of LLMs in the extraction phase of EGG,
we propose a LLMs-free EGG method that can eliminate the requirement for event
extraction using LLMs in the inference phase. Extensive experimental results on
a real-world dataset clearly validate the effectiveness of our proposed method.
[COMMENTS]
Accepted to SIGIR2024
[LINK]
http://arxiv.org/abs/2404.07001v3
[DATE]
2024-04-16 14:34:31+08:00
[CATEGORIES]
cs.CL
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
[AUTHORS]
Hengyuan Zhang, Yanru Wu, Dawei Li, Zacc Yang, Rui Zhao, Yong Jiang, Fei Tan
[ABSTRACT]
Aligned Large Language Models (LLMs) showcase remarkable versatility, capable
of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected
to exhibit speciality, excelling in specific applications. However, fine-tuning
with extra data, a common practice to gain speciality, often leads to
catastrophic forgetting (CF) of previously acquired versatility, hindering the
model’s performance across diverse tasks. In response to this challenge, we
propose CoFiTune, a coarse to fine framework in an attempt to strike the
balance between speciality and versatility. At the coarse-grained level, an
empirical tree-search algorithm is utilized to pinpoint and update specific
modules that are crucial for speciality, while keeping other parameters frozen;
at the fine-grained level, a soft-masking mechanism regulates the update to the
LLMs, mitigating the CF issue without harming speciality. In an overall
evaluation of both speciality and versatility, CoFiTune consistently
outperforms baseline methods across diverse tasks and model scales. Compared to
the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and
marginal speciality loss on a 13B model. Lastly, based on further analysis, we
provide a speculative insight into the information forwarding process in LLMs,
which helps explain the effectiveness of the proposed method. The code is
available at https://github.com/rattlesnakey/CoFiTune.
[COMMENTS]
43 pages, 10 figures
[LINK]
http://arxiv.org/abs/2404.10306v1
[DATE]
2024-04-16 14:27:39+08:00
[CATEGORIES]
cs.CL
Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism
[AUTHORS]
Lang Cao
[ABSTRACT]
Large language models (LLMs) have demonstrated impressive language
understanding and generation capabilities, enabling them to answer a wide range
of questions across various domains. However, these models are not flawless and
often produce responses that contain errors or misinformation. These
inaccuracies, commonly referred to as hallucinations, render LLMs unreliable
and even unusable in many scenarios. In this paper, our focus is on mitigating
the issue of hallucination in LLMs, particularly in the context of
question-answering. Instead of attempting to answer all questions, we explore a
refusal mechanism that instructs LLMs to refuse to answer challenging questions
in order to avoid errors. We then propose a simple yet effective solution
called Learn to Refuse (L2R), which incorporates the refusal mechanism to
enable LLMs to recognize and refuse to answer questions that they find
difficult to address. To achieve this, we utilize a structured knowledge base
to represent all the LLM’s understanding of the world, enabling it to provide
traceable gold knowledge. This knowledge base is separate from the LLM and
initially empty. It can be filled with validated knowledge and progressively
expanded. When an LLM encounters questions outside its domain, the system
recognizes its knowledge scope and determines whether it can answer the
question independently. Additionally, we introduce a method for automatically
and efficiently expanding the knowledge base of LLMs. Through qualitative and
quantitative analysis, we demonstrate that our approach enhances the
controllability and reliability of LLMs.
[LINK]
http://arxiv.org/abs/2311.01041v2
[DATE]
2024-04-16 14:24:38+08:00
[CATEGORIES]
cs.CL
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
[AUTHORS]
Ziyin Zhang, Chaoyu Chen, Bingchang Liu, Cong Liao, Zi Gong, Hang Yu, Jianguo Li, Rui Wang
[ABSTRACT]
In this work we systematically review the recent advancements in code
processing with language models, covering 50+ models, 30+ evaluation tasks,
170+ datasets, and 800 related works. We break down code processing models into
general language models represented by the GPT family and specialized models
that are specifically pretrained on code, often with tailored objectives. We
discuss the relations and differences between these models, and highlight the
historical transition of code modeling from statistical models and RNNs to
pretrained Transformers and LLMs, which is exactly the same course that had
been taken by NLP. We also discuss code-specific features such as AST, CFG, and
unit tests, along with their application in training code language models, and
identify key challenges and potential future directions in this domain. We keep
the survey open and updated on GitHub at
https://github.com/codefuse-ai/Awesome-Code-LLM.
[COMMENTS]
Repo is available at https://github.com/codefuse-ai/Awesome-Code-LLM.
8 figures, 10 tables, and 796 references
[LINK]
http://arxiv.org/abs/2311.07989v5
[DATE]
2024-04-16 14:19:46+08:00
[CATEGORIES]
cs.CL
Future Language Modeling from Temporal Document History
[AUTHORS]
Changmao Li, Jeffrey Flanigan
[ABSTRACT]
Predicting the future is of great interest across many aspects of human
activity. Businesses are interested in future trends, traders are interested in
future stock prices, and companies are highly interested in future
technological breakthroughs. While there are many automated systems for
predicting future numerical data, such as weather, stock prices, and demand for
products, there is relatively little work in automatically predicting textual
data. Humans are interested in textual data predictions because it is a natural
format for our consumption, and experts routinely make predictions in a textual
format (Christensen et al., 2004; Tetlock & Gardner, 2015; Frick, 2015).
However, there has been relatively little formalization of this general problem
in the machine learning or natural language processing communities. To address
this gap, we introduce the task of future language modeling: probabilistic
modeling of texts in the future based on a temporal history of texts. To our
knowledge, our work is the first work to formalize the task of predicting the
future in this way. We show that it is indeed possible to build future language
models that improve upon strong non-temporal language model baselines, opening
the door to working on this important, and widely applicable problem.
[COMMENTS]
Accepted by ICLR 2024
[LINK]
http://arxiv.org/abs/2404.10297v1
[DATE]
2024-04-16 13:45:52+08:00
[CATEGORIES]
cs.CL
Large Language Models are In-Context Molecule Learners
[AUTHORS]
Jiatong Li, Wei Liu, Zhihao Ding, Wenqi Fan, Yuqiang Li, Qing Li
[ABSTRACT]
Large Language Models (LLMs) have demonstrated exceptional performance in
biochemical tasks, especially the molecule caption translation task, which aims
to bridge the gap between molecules and natural language texts. However,
previous methods in adapting LLMs to the molecule-caption translation task
required extra domain-specific pre-training stages, suffered weak alignment
between molecular and textual spaces, or imposed stringent demands on the scale
of LLMs. To resolve the challenges, we propose In-Context Molecule Adaptation
(ICMA), as a new paradigm allowing LLMs to learn the molecule-text alignment
from context examples via In-Context Molecule Tuning. Specifically, ICMA
incorporates the following three stages: Hybrid Context Retrieval,
Post-retrieval Re-ranking, and In-context Molecule Tuning. Initially, Hybrid
Context Retrieval utilizes BM25 Caption Retrieval and Molecule Graph Retrieval
to retrieve informative context examples. Additionally, we also propose
Post-retrieval Re-ranking with Sequence Reversal and Random Walk to further
improve the quality of retrieval results. Finally, In-Context Molecule Tuning
unlocks the in-context molecule learning capability of LLMs with retrieved
examples and adapts the parameters of LLMs for the molecule-caption translation
task. Experimental results demonstrate that ICMT can empower LLMs to achieve
state-of-the-art or comparable performance without extra training corpora and
intricate structures, showing that LLMs are inherently in-context molecule
learners.
[LINK]
http://arxiv.org/abs/2403.04197v2
[DATE]
2024-04-16 13:07:52+08:00
[CATEGORIES]
cs.CL
Evaluating Large Language Models at Evaluating Instruction Following
[AUTHORS]
Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2310.07641v2
[DATE]
2024-04-16 12:50:08+08:00
[CATEGORIES]
cs.CL
cs.LG
Social Choice for AI Alignment: Dealing with Diverse Human Feedback
[AUTHORS]
Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mossé, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker
[ABSTRACT]
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise
problematic behavior, so that, for example, they refuse to comply with requests
for help with committing crimes or with producing racist text. One approach to
fine-tuning, called reinforcement learning from human feedback, learns from
humans’ expressed preferences over multiple outputs. Another approach is
constitutional AI, in which the input from humans is a list of high-level
principles. But how do we deal with potentially diverging input from humans?
How can we aggregate the input into consistent data about ‘‘collective’’
preferences or otherwise use it to make collective choices about model
behavior? In this paper, we argue that the field of social choice is well
positioned to address these questions, and we discuss ways forward for this
agenda, drawing on discussions in a recent workshop on Social Choice for AI
Ethics and Safety held in Berkeley, CA, USA in December 2023.
[COMMENTS]
15 pages, 4 figures
[LINK]
http://arxiv.org/abs/2404.10271v1
[DATE]
2024-04-16 11:59:33+08:00
[CATEGORIES]
cs.LG
cs.CL
Modeling Low-Resource Health Coaching Dialogues via Neuro-Symbolic Goal Summarization and Text-Units-Text Generation
[AUTHORS]
Yue Zhou, Barbara Di Eugenio, Brian Ziebart, Lisa Sharp, Bing Liu, Nikolaos Agadakos
[ABSTRACT]
Health coaching helps patients achieve personalized and lifestyle-related
goals, effectively managing chronic conditions and alleviating mental health
issues. It is particularly beneficial, however cost-prohibitive, for
low-socioeconomic status populations due to its highly personalized and
labor-intensive nature. In this paper, we propose a neuro-symbolic goal
summarizer to support health coaches in keeping track of the goals and a
text-units-text dialogue generation model that converses with patients and
helps them create and accomplish specific goals for physical activities. Our
models outperform previous state-of-the-art while eliminating the need for
predefined schema and corresponding annotation. We also propose a new health
coaching dataset extending previous work and a metric to measure the
unconventionality of the patient’s response based on data difficulty,
facilitating potential coach alerts during deployment.
[COMMENTS]
Accepted to the main conference of LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.10268v1
[DATE]
2024-04-16 11:46:30+08:00
[CATEGORIES]
cs.CL
ProSwitch: Knowledge-Guided Instruction Tuning to Generate Professional and Non-Professional Styled Text
[AUTHORS]
Chang Zong, Yuyan Chen, Weiming Lu, Jian Shao, Yueting Zhuang
[ABSTRACT]
Large Language Models (LLMs) have demonstrated efficacy in various linguistic
applications, including text summarization and controlled text generation.
However, studies into their capacity of switching between styles via
fine-tuning remain underexplored. This study concentrates on textual
professionalism and introduces a novel methodology, named ProSwitch, which
equips a language model with the ability to produce both professional and
non-professional responses through knowledge-guided instruction tuning.
ProSwitch unfolds across three phases: data preparation for gathering domain
knowledge and training corpus; instruction tuning for optimizing language
models with multiple levels of instruction formats; and comprehensive
evaluation for assessing the professionalism discrimination and reference-based
quality of generated text. Comparative analysis of ProSwitch against both
general and specialized language models reveals that our approach outperforms
baselines in switching between professional and non-professional text
generation.
[COMMENTS]
8 pages
[LINK]
http://arxiv.org/abs/2403.09131v3
[DATE]
2024-04-16 11:31:25+08:00
[CATEGORIES]
cs.CL
Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy
[AUTHORS]
Tunazzina Islam, Dan Goldwasser
[ABSTRACT]
The widespread use of social media has led to a surge in popularity for
automated methods of analyzing public opinion. Supervised methods are adept at
text categorization, yet the dynamic nature of social media discussions poses a
continual challenge for these techniques due to the constant shifting of the
focus. On the other hand, traditional unsupervised methods for extracting
themes from public discourse, such as topic modeling, often reveal overarching
patterns that might not capture specific nuances. Consequently, a significant
portion of research into social media discourse still depends on
labor-intensive manual coding techniques and a human-in-the-loop approach,
which are both time-consuming and costly. In this work, we study the problem of
discovering arguments associated with a specific theme. We propose a generic
LLMs-in-the-Loop strategy that leverages the advanced capabilities of Large
Language Models (LLMs) to extract latent arguments from social media messaging.
To demonstrate our approach, we apply our framework to contentious topics. We
use two publicly available datasets: (1) the climate campaigns dataset of 14k
Facebook ads with 25 themes and (2) the COVID-19 vaccine campaigns dataset of
9k Facebook ads with 14 themes. Furthermore, we analyze demographic targeting
and the adaptation of messaging based on real-world events.
[LINK]
http://arxiv.org/abs/2404.10259v1
[DATE]
2024-04-16 11:26:43+08:00
[CATEGORIES]
cs.CL
cs.LG
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
[AUTHORS]
Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu
[ABSTRACT]
Large Language Models (LLMs) have greatly advanced the natural language
processing paradigm. However, the high computational load and huge model sizes
pose a grand challenge for deployment on edge devices. To this end, we propose
APTQ (Attention-aware Post-Training Mixed-Precision Quantization) for LLMs,
which considers not only the second-order information of each layer’s weights,
but also, for the first time, the nonlinear effect of attention outputs on the
entire model. We leverage the Hessian trace as a sensitivity metric for
mixed-precision quantization, ensuring an informed precision reduction that
retains model performance. Experiments show APTQ surpasses previous
quantization methods, achieving an average of 4 bit width a 5.22 perplexity
nearly equivalent to full precision in the C4 dataset. In addition, APTQ
attains state-of-the-art zero-shot accuracy of 68.24\% and 70.48\% at an
average bitwidth of 3.8 in LLaMa-7B and LLaMa-13B, respectively, demonstrating
its effectiveness to produce high-quality quantized LLMs.
[COMMENTS]
6 pages, 2 figures, published to DAC 2024: 61st IEEE/ACM Design
Automation Conference. (DAC’24)
[LINK]
http://arxiv.org/abs/2402.14866v2
[DATE]
2024-04-16 11:18:38+08:00
[CATEGORIES]
cs.LG
cs.CL
A Survey on Open Information Extraction from Rule-based Model to Large Language Model
[AUTHORS]
Pai Liu, Wenyang Gao, Wenjie Dong, Lin Ai, Ziwei Gong, Songfang Huang, Zongsheng Li, Ehsan Hoque, Julia Hirschberg, Yue Zhang
[ABSTRACT]
Open information extraction is an important NLP task that targets extracting
structured information from unstructured text without limitations on the
relation type or the domain of the text. This survey paper covers open
information extraction technologies from 2007 to 2022 with a focus on new
models not covered by previous surveys. We propose a new categorization method
from the source of information perspective to accommodate the development of
recent OIE technologies. In addition, we summarize three major approaches based
on task settings as well as current popular datasets and model evaluation
metrics. Given the comprehensive review, several future directions are shown
from datasets, source of information, output form, method, and evaluation
metric aspects.
[COMMENTS]
The first five authors contributed to this work equally. Names are
ordered randomly
[LINK]
http://arxiv.org/abs/2208.08690v2
[DATE]
2024-04-16 11:16:22+08:00
[CATEGORIES]
cs.CL
Generative Text Steganography with Large Language Model
[AUTHORS]
Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, Wanli Peng
[ABSTRACT]
Recent advances in large language models (LLMs) have blurred the boundary of
high-quality text generation between humans and machines, which is favorable
for generative text steganography. While, current advanced steganographic
mapping is not suitable for LLMs since most users are restricted to accessing
only the black-box API or user interface of the LLMs, thereby lacking access to
the training vocabulary and its sampling probabilities. In this paper, we
explore a black-box generative text steganographic method based on the user
interfaces of large language models, which is called LLM-Stega. The main goal
of LLM-Stega is that the secure covert communication between Alice (sender) and
Bob (receiver) is conducted by using the user interfaces of LLMs. Specifically,
We first construct a keyword set and design a new encrypted steganographic
mapping to embed secret messages. Furthermore, to guarantee accurate extraction
of secret messages and rich semantics of generated stego texts, an optimization
mechanism based on reject sampling is proposed. Comprehensive experiments
demonstrate that the proposed LLM-Stega outperforms current state-of-the-art
methods.
[LINK]
http://arxiv.org/abs/2404.10229v1
[DATE]
2024-04-16 10:19:28+08:00
[CATEGORIES]
cs.CL
Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks
[AUTHORS]
Joshua Melton, Shannon Reid, Gabriel Terejanu, Siddharth Krishnan
[ABSTRACT]
The high volume and rapid evolution of content on social media present major
challenges for studying the stance of social media users. In this work, we
develop a two stage stance labeling method that utilizes the user-hashtag
bipartite graph and the user-user interaction graph. In the first stage, a
simple and efficient heuristic for stance labeling uses the user-hashtag
bipartite graph to iteratively update the stance association of user and
hashtag nodes via a label propagation mechanism. This set of soft labels is
then integrated with the user-user interaction graph to train a graph neural
network (GNN) model using semi-supervised learning. We evaluate this method on
two large-scale datasets containing tweets related to climate change from June
2021 to June 2022 and gun control from January 2022 to January 2023.
Experiments demonstrate that our user-hashtag heuristic and the semi-supervised
GNN method outperform zero-shot stance labeling using LLMs such as GPT4.
Further analysis illustrates how the stance labeling information and
interaction graph can be used for evaluating the polarization of social media
interactions on divisive issues such as climate change and gun control.
[LINK]
http://arxiv.org/abs/2404.10228v1
[DATE]
2024-04-16 10:18:30+08:00
[CATEGORIES]
cs.LG
cs.CL
In-Context Learning Dynamics with Random Binary Sequences
[AUTHORS]
Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman
[ABSTRACT]
Large language models (LLMs) trained on huge corpora of text datasets
demonstrate intriguing capabilities, achieving state-of-the-art performance on
tasks they were not explicitly trained for. The precise nature of LLM
capabilities is often mysterious, and different prompts can elicit different
capabilities through in-context learning. We propose a framework that enables
us to analyze in-context learning dynamics to understand latent concepts
underlying LLMs’ behavioral patterns. This provides a more nuanced
understanding than success-or-failure evaluation benchmarks, but does not
require observing internal activations as a mechanistic interpretation of
circuits would. Inspired by the cognitive science of human randomness
perception, we use random binary sequences as context and study dynamics of
in-context learning by manipulating properties of context data, such as
sequence length. In the latest GPT-3.5+ models, we find emergent abilities to
generate seemingly random numbers and learn basic formal languages, with
striking in-context learning dynamics where model outputs transition sharply
from seemingly random behaviors to deterministic repetition.
[LINK]
http://arxiv.org/abs/2310.17639v3
[DATE]
2024-04-16 09:35:03+08:00
[CATEGORIES]
cs.CL
cs.LG
Controllable Prosody Generation With Partial Inputs
[AUTHORS]
Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh, Zack Hodari
[COMMENTS]
5 pages
[LINK]
http://arxiv.org/abs/2303.09446v2
[DATE]
2024-04-16 09:33:24+08:00
[CATEGORIES]
cs.CL
cs.LG
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
[AUTHORS]
Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang
[COMMENTS]
Accepted at ICLR 2024 Workshop on Reliable and Responsible Foundation
Models
[LINK]
http://arxiv.org/abs/2404.09127v2
[DATE]
2024-04-16 09:12:09+08:00
[CATEGORIES]
cs.CL
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
[AUTHORS]
Huihan Li, Liwei Jiang, Nouha Dziri, Xiang Ren, Yejin Choi
[ABSTRACT]
As the utilization of large language models (LLMs) has proliferated
worldwide, it is crucial for them to have adequate knowledge and fair
representation for diverse global cultures. In this work, we uncover culture
perceptions of three SOTA models on 110 countries and regions on 8
culture-related topics through culture-conditioned generations, and extract
symbols from these generations that are associated to each culture by the LLM.
We discover that culture-conditioned generation consist of linguistic “markers”
that distinguish marginalized cultures apart from default cultures. We also
discover that LLMs have an uneven degree of diversity in the culture symbols,
and that cultures from different geographic regions have different presence in
LLMs’ culture-agnostic generation. Our findings promote further research in
studying the knowledge and fairness of global culture perception in LLMs. Code
and Data can be found in: https://github.com/huihanlhh/Culture-Gen/
[LINK]
http://arxiv.org/abs/2404.10199v1
[DATE]
2024-04-16 08:50:43+08:00
[CATEGORIES]
cs.CL
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs’ internal prior
[AUTHORS]
Kevin Wu, Eric Wu, James Zou
[ABSTRACT]
Retrieval augmented generation (RAG) is often used to fix hallucinations and
provide up-to-date knowledge for large language models (LLMs). However, in
cases when the LLM alone incorrectly answers a question, does providing the
correct retrieved content always fix the error? Conversely, in cases where the
retrieved content is incorrect, does the LLM know to ignore the wrong
information, or does it recapitulate the error? To answer these questions, we
systematically analyze the tug-of-war between a LLM’s internal knowledge (i.e.
its prior) and the retrieved information in settings when they disagree. We
test GPT-4 and other LLMs on question-answering abilities across datasets with
and without reference documents. As expected, providing the correct retrieved
information fixes most model mistakes (94% accuracy). However, when the
reference document is perturbed with increasing levels of wrong values, the LLM
is more likely to recite the incorrect, modified information when its internal
prior is weaker but is more resistant when its prior is stronger. Similarly, we
also find that the more the modified information deviates from the model’s
prior, the less likely the model is to prefer it. These results highlight an
underlying tension between a model’s prior knowledge and the information
presented in reference documents.
[LINK]
http://arxiv.org/abs/2404.10198v1
[DATE]
2024-04-16 08:43:03+08:00
[CATEGORIES]
cs.CL
Can Large Language Models Automatically Score Proficiency of Written Essays?
[AUTHORS]
Watheq Mansour, Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed
[ABSTRACT]
Although several methods were proposed to address the problem of automated
essay scoring (AES) in the last 50 years, there is still much to desire in
terms of effectiveness. Large Language Models (LLMs) are transformer-based
models that demonstrate extraordinary capabilities on various tasks. In this
paper, we test the ability of LLMs, given their powerful linguistic knowledge,
to analyze and effectively score written essays. We experimented with two
popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do
this task and, if so, how their performance is positioned among the
state-of-the-art (SOTA) models across two levels, holistically and per
individual writing trait. We utilized prompt-engineering tactics in designing
four different prompts to bring their maximum potential to this task. Our
experiments conducted on the ASAP dataset revealed several interesting
observations. First, choosing the right prompt depends highly on the model and
nature of the task. Second, the two LLMs exhibited comparable average
performance in AES, with a slight advantage for ChatGPT. Finally, despite the
performance gap between the two LLMs and SOTA models in terms of predictions,
they provide feedback to enhance the quality of the essays, which can
potentially help both teachers and students.
[COMMENTS]
V2 (published version of LREC-COLING 2024)
[LINK]
http://arxiv.org/abs/2403.06149v2
[DATE]
2024-04-16 08:24:55+08:00
[CATEGORIES]
cs.CL
Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR
[AUTHORS]
Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar
[ABSTRACT]
Contextual biasing enables speech recognizers to transcribe important phrases
in the speaker’s context, such as contact names, even if they are rare in, or
absent from, the training data. Attention-based biasing is a leading approach
which allows for full end-to-end cotraining of the recognizer and biasing
system and requires no separate inference-time components. Such biasers
typically consist of a context encoder; followed by a context filter which
narrows down the context to apply, improving per-step inference time; and,
finally, context application via cross attention. Though much work has gone
into optimizing per-frame performance, the context encoder is at least as
important: recognition cannot begin before context encoding ends. Here, we show
the lightweight phrase selection pass can be moved before context encoding,
resulting in a speedup of up to 16.1 times and enabling biasing to scale to 20K
phrases with a maximum pre-decoding delay under 33ms. With the addition of
phrase- and wordpiece-level cross-entropy losses, our technique also achieves
up to a 37.5% relative WER reduction over the baseline without the losses and
lightweight phrase selection pass.
[COMMENTS]
9 pages, 3 figures, accepted by NAACL 2024 - Industry Track
[LINK]
http://arxiv.org/abs/2404.10180v1
[DATE]
2024-04-16 07:28:13+08:00
[CATEGORIES]
cs.CL
cs.LG
Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?
[AUTHORS]
Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, Vaikkunth Mugunthan
[ABSTRACT]
Machine learning practitioners often fine-tune generative pre-trained models
like GPT-3 to improve model performance at specific tasks. Previous works,
however, suggest that fine-tuned machine learning models memorize and emit
sensitive information from the original fine-tuning dataset. Companies such as
OpenAI offer fine-tuning services for their models, but no prior work has
conducted a memorization attack on any closed-source models. In this work, we
simulate a privacy attack on GPT-3 using OpenAI’s fine-tuning API. Our
objective is to determine if personally identifiable information (PII) can be
extracted from this model. We (1) explore the use of naive prompting methods on
a GPT-3 fine-tuned classification model, and (2) we design a practical word
generation task called Autocomplete to investigate the extent of PII
memorization in fine-tuned GPT-3 within a real-world context. Our findings
reveal that fine-tuning GPT3 for both tasks led to the model memorizing and
disclosing critical personally identifiable information (PII) obtained from the
underlying fine-tuning dataset. To encourage further research, we have made our
codes and datasets publicly available on GitHub at:
https://github.com/albertsun1/gpt3-pii-attacks
[LINK]
http://arxiv.org/abs/2307.16382v3
[DATE]
2024-04-16 06:34:22+08:00
[CATEGORIES]
cs.LG
cs.CL
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
[AUTHORS]
Nandan Thakur, Jianmo Ni, Gustavo Hernández Ábrego, John Wieting, Jimmy Lin, Daniel Cer
[ABSTRACT]
There has been limited success for dense retrieval models in multilingual
retrieval, due to uneven and scarce training data available across multiple
languages. Synthetic training data generation is promising (e.g., InPars or
Promptagator), but has been investigated only for English. Therefore, to study
model capabilities across both cross-lingual and monolingual retrieval tasks,
we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high
to very-low resource) languages for fine-tuning multilingual dense retrievers
without requiring any human supervision. To construct SWIM-IR, we propose SAP
(summarize-then-ask prompting), where the large language model (LLM) generates
a textual summary prior to the query generation step. SAP assists the LLM in
generating informative queries in the target language. Using SWIM-IR, we
explore synthetic fine-tuning of multilingual dense retrieval models and
evaluate them robustly on three retrieval benchmarks: XOR-Retrieve
(cross-lingual), MIRACL (monolingual) and XTREME-UP (cross-lingual). Our
models, called SWIM-X, are competitive with human-supervised dense retrieval
models, e.g., mContriever-X, finding that SWIM-IR can cheaply substitute for
expensive human-labeled retrieval training data. SWIM-IR dataset and SWIM-X
models are available at https://github.com/google-research-datasets/SWIM-IR.
[COMMENTS]
Accepted at NAACL 2024. Data released at
https://github.com/google-research-datasets/swim-ir
[LINK]
http://arxiv.org/abs/2311.05800v2
[DATE]
2024-04-16 06:11:33+08:00
[CATEGORIES]
cs.CL
NL2KQL: From Natural Language to Kusto Query
[AUTHORS]
Amir H. Abdi, Xinye Tang, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing
[ABSTRACT]
Data is growing rapidly in volume and complexity. Proficiency in database
query languages is pivotal for crafting effective queries. As coding assistants
become more prevalent, there is significant opportunity to enhance database
query languages. The Kusto Query Language (KQL) is a widely used query language
for large semi-structured data such as logs, telemetries, and time-series for
big data analytics platforms. This paper introduces NL2KQL an innovative
framework that uses large language models (LLMs) to convert natural language
queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several
key components: Schema Refiner which narrows down the schema to its most
pertinent elements; the Few-shot Selector which dynamically selects relevant
examples from a few-shot dataset; and the Query Refiner which repairs syntactic
and semantic errors in KQL queries. Additionally, this study outlines a method
for generating large datasets of synthetic NLQ-KQL pairs which are valid within
a specific database contexts. To validate NL2KQL’s performance, we utilize an
array of online (based on query execution) and offline (based on query parsing)
metrics. Through ablation studies, the significance of each framework component
is examined, and the datasets used for benchmarking are made publicly
available. This work is the first of its kind and is compared with available
baselines to demonstrate its effectiveness.
[LINK]
http://arxiv.org/abs/2404.02933v2
[DATE]
2024-04-16 06:10:17+08:00
[CATEGORIES]
cs.CL
Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step
[AUTHORS]
Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Yejin Choi
[ABSTRACT]
Chain-of-thought prompting (e.g., “Let’s think step-by-step”) primes large
language models to verbalize rationalization for their predictions. While
chain-of-thought can lead to dramatic performance gains, benefits appear to
emerge only for sufficiently large models (beyond 50B parameters). We show that
orders-of-magnitude smaller models (125M – 1.3B parameters) can still benefit
from chain-of-thought prompting. To achieve this, we introduce Symbolic
Chain-of-Thought Distillation (SCoTD), a method to train a smaller student
model on rationalizations sampled from a significantly larger teacher model.
Experiments across several commonsense benchmarks show that: 1) SCoTD enhances
the performance of the student model in both supervised and few-shot settings,
and especially for challenge sets; 2) sampling many reasoning chains per
instance from the teacher is paramount; and 3) after distillation, student
chain-of-thoughts are judged by humans as comparable to the teacher, despite
orders of magnitude fewer parameters. We test several hypotheses regarding what
properties of chain-of-thought samples are important, e.g., diversity vs.
teacher likelihood vs. open-endedness. We release our corpus of
chain-of-thought samples and code.
[COMMENTS]
ACL 2023
[LINK]
http://arxiv.org/abs/2306.14050v2
[DATE]
2024-04-16 05:58:27+08:00
[CATEGORIES]
cs.CL
TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition
[AUTHORS]
Md Mahadi Hasan Nahid, Davood Rafiei
[ABSTRACT]
Table reasoning is a challenging task that requires understanding both
natural language questions and structured tabular data. Large language models
(LLMs) have shown impressive capabilities in natural language understanding and
generation, but they often struggle with large tables due to their limited
input length. In this paper, we propose TabSQLify, a novel method that
leverages text-to-SQL generation to decompose tables into smaller and relevant
sub-tables, containing only essential information for answering questions or
verifying statements, before performing the reasoning task. In our
comprehensive evaluation on four challenging datasets, our approach
demonstrates comparable or superior performance compared to prevailing methods
reliant on full tables as input. Moreover, our method can reduce the input
context length significantly, making it more scalable and efficient for
large-scale table reasoning applications. Our method performs remarkably well
on the WikiTQ benchmark, achieving an accuracy of 64.7%. Additionally, on the
TabFact benchmark, it achieves a high accuracy of 79.5%. These results surpass
other LLM-based baseline models on gpt-3.5-turbo (chatgpt). TabSQLify can
reduce the table size significantly alleviating the computational load on LLMs
when handling large tables without compromising performance.
[COMMENTS]
Accepted to NAACL 2024 (long, main)
[LINK]
http://arxiv.org/abs/2404.10150v1
[DATE]
2024-04-16 05:42:20+08:00
[CATEGORIES]
cs.CL
Can MLLMs Perform Text-to-Image In-Context Learning?
[AUTHORS]
Yuchen Zeng, Wonjun Kang, Yicong Chen, Hyung Il Koo, Kangwook Lee
[ABSTRACT]
The evolution from Large Language Models (LLMs) to Multimodal Large Language
Models (MLLMs) has spurred research into extending In-Context Learning (ICL) to
its multimodal counterpart. Existing such studies have primarily concentrated
on image-to-text ICL. However, the Text-to-Image ICL (T2I-ICL), with its unique
characteristics and potential applications, remains underexplored. To address
this gap, we formally define the task of T2I-ICL and present CoBSAT, the first
T2I-ICL benchmark dataset, encompassing ten tasks. Utilizing our dataset to
benchmark six state-of-the-art MLLMs, we uncover considerable difficulties
MLLMs encounter in solving T2I-ICL. We identify the primary challenges as the
inherent complexity of multimodality and image generation, and show that
strategies such as fine-tuning and Chain-of-Thought prompting help to mitigate
these difficulties, leading to notable improvements in performance. Our code
and dataset are available at https://github.com/UW-Madison-Lee-Lab/CoBSAT.
[LINK]
http://arxiv.org/abs/2402.01293v2
[DATE]
2024-04-16 05:30:10+08:00
[CATEGORIES]
cs.LG
cs.CL
[AUTHORS]
Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar [ABSTRACT]
Recent advances in language models (LMs) have led to significant improvements
in quality on complex NLP tasks, but at the expense of increased inference
costs. Cascading offers a simple strategy to achieve more favorable
cost-quality tradeoffs: here, a small model is invoked for most “easy”
instances, while a few “hard” instances are deferred to the large model. While
the principles underpinning cascading are well-studied for classification taskswith deferral based on predicted class uncertainty favored theoretically and
practically - a similar understanding is lacking for generative LM tasks. In
this work, we initiate a systematic study of deferral rules for LM cascades. We
begin by examining the natural extension of predicted class uncertainty to
generative LM tasks, namely, the predicted sequence uncertainty. We show that
this measure suffers from the length bias problem, either over- or
under-emphasizing outputs based on their lengths. This is because LMs produce a
sequence of uncertainty values, one for each output token; and moreover, the
number of output tokens is variable across examples. To mitigate this issue, we
propose to exploit the richer token-level uncertainty information implicit in
generative LMs. We argue that naive predicted sequence uncertainty corresponds
to a simple aggregation of these uncertainties. By contrast, we show that
incorporating token-level uncertainty through learned post-hoc deferral rules
can significantly outperform such simple aggregation strategies, via
experiments on a range of natural language benchmarks with FLAN-T5 models. We
further show that incorporating embeddings from the smaller model and
intermediate layers of the larger model can give an additional boost in the
overall cost-quality tradeoff.
[LINK]
http://arxiv.org/abs/2404.10136v1
[DATE]
2024-04-16 05:02:48+08:00
[CATEGORIES]
cs.CL
cs.LG
Chinchilla Scaling: A replication attempt
[AUTHORS]
Tamay Besiroglu, Ege Erdil, Matthew Barnett, Josh You
[ABSTRACT]
Hoffmann et al. (2022) propose three methods for estimating a compute-optimal
scaling law. We attempt to replicate their third estimation procedure, which
involves fitting a parametric loss function to a reconstruction of data from
their plots. We find that the reported estimates are inconsistent with their
first two estimation methods, fail at fitting the extracted data, and report
implausibly narrow confidence intervals–intervals this narrow would require
over 600,000 experiments, while they likely only ran fewer than 500. In
contrast, our rederivation of the scaling law using the third approach yields
results that are compatible with the findings from the first two estimation
procedures described by Hoffmann et al.
[LINK]
http://arxiv.org/abs/2404.10102v1
[DATE]
2024-04-16 03:19:56+08:00
[CATEGORIES]
cs.CL
Visual Grounding Methods for VQA are Working for the Wrong Reasons!
[AUTHORS]
Robik Shrestha, Kushal Kafle, Christopher Kanan
[ABSTRACT]
Existing Visual Question Answering (VQA) methods tend to exploit dataset
biases and spurious statistical correlations, instead of producing right
answers for the right reasons. To address this issue, recent bias mitigation
methods for VQA propose to incorporate visual cues (e.g., human attention maps)
to better ground the VQA models, showcasing impressive gains. However, we show
that the performance improvements are not a result of improved visual
grounding, but a regularization effect which prevents over-fitting to
linguistic priors. For instance, we find that it is not actually necessary to
provide proper, human-based cues; random, insensible cues also result in
similar improvements. Based on this observation, we propose a simpler
regularization scheme that does not require any external annotations and yet
achieves near state-of-the-art performance on VQA-CPv2.
[COMMENTS]
ACL 2020
[LINK]
http://arxiv.org/abs/2004.05704v3
[DATE]
2024-04-16 03:09:39+08:00
[CATEGORIES]
cs.CL
Near-Term Advances in Quantum Natural Language Processing
[AUTHORS]
Dominic Widdows, Aaranya Alexander, Daiwei Zhu, Chase Zimmerman, Arunava Majumder
[ABSTRACT]
This paper describes experiments showing that some tasks in natural language
processing (NLP) can already be performed using quantum computers, though so
far only with small datasets.
We demonstrate various approaches to topic classification. The first uses an
explicit word-based approach, in which word-topic scoring weights are
implemented as fractional rotations of individual qubit, and a new phrase is
classified based on the accumulation of these weights in a scoring qubit using
entangling controlled-NOT gates. This is compared with more scalable quantum
encodings of word embedding vectors, which are used in the computation of
kernel values in a quantum support vector machine: this approach achieved an
average of 62% accuracy on classification tasks involving over 10000 words,
which is the largest such quantum computing experiment to date.
We describe a quantum probability approach to bigram modeling that can be
applied to sequences of words and formal concepts, investigating a generative
approximation to these distributions using a quantum circuit Born machine, and
an approach to ambiguity resolution in verb-noun composition using single-qubit
rotations for simple nouns and 2-qubit controlled-NOT gates for simple verbs.
The smaller systems described have been run successfully on physical quantum
computers, and the larger ones have been simulated. We show that statistically
meaningful results can be obtained using real datasets, but this is much more
difficult to predict than with easier artificial language examples used
previously in developing quantum NLP systems.
Other approaches to quantum NLP are compared, partly with respect to
contemporary issues including informal language, fluency, and truthfulness.
[LINK]
http://arxiv.org/abs/2206.02171v3
[DATE]
2024-04-16 02:53:56+08:00
[CATEGORIES]
cs.CL
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP
[AUTHORS]
Zonghai Yao, Nandyala Siddharth Kantu, Guanghao Wei, Hieu Tran, Zhangqi Duan, Sunjae Kwon, Zhichao Yang, README annotation team, Hong Yu
[ABSTRACT]
The advancement in healthcare has shifted focus toward patient-centric
approaches, particularly in self-care and patient education, facilitated by
access to Electronic Health Records (EHR). However, medical jargon in EHRs
poses significant challenges in patient comprehension. To address this, we
introduce a new task of automatically generating lay definitions, aiming to
simplify complex medical terms into patient-friendly lay language. We first
created the README dataset, an extensive collection of over 50,000 unique
(medical term, lay definition) pairs and 300,000 mentions, each offering
context-aware lay definitions manually annotated by domain experts. We have
also engineered a data-centric Human-AI pipeline that synergizes data
filtering, augmentation, and selection to improve data quality. We then used
README as the training data for models and leveraged a Retrieval-Augmented
Generation method to reduce hallucinations and improve the quality of model
outputs. Our extensive automatic and human evaluations demonstrate that
open-source mobile-friendly models, when fine-tuned with high-quality data, are
capable of matching or even surpassing the performance of state-of-the-art
closed-source large language models like ChatGPT. This research represents a
significant stride in closing the knowledge gap in patient education and
advancing patient-centric healthcare solutions.
[LINK]
http://arxiv.org/abs/2312.15561v2
[DATE]
2024-04-16 02:44:25+08:00
[CATEGORIES]
cs.CL
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs
[AUTHORS]
Juan Diego Rodriguez, Katrin Erk, Greg Durrett
[ABSTRACT]
Understanding when two pieces of text convey the same information is a goal
touching many subproblems in NLP, including textual entailment and
fact-checking. This problem becomes more complex when those two pieces of text
are in different languages. Here, we introduce X-PARADE (Cross-lingual
Paragraph-level Analysis of Divergences and Entailments), the first
cross-lingual dataset of paragraph-level information divergences. Annotators
label a paragraph in a target language at the span level and evaluate it with
respect to a corresponding paragraph in a source language, indicating whether a
given piece of information is the same, new, or new but can be inferred. This
last notion establishes a link with cross-language NLI. Aligned paragraphs are
sourced from Wikipedia pages in different languages, reflecting real
information divergences observed in the wild. Armed with our dataset, we
investigate a diverse set of approaches for this problem, including token
alignment from machine translation, textual entailment methods that localize
their decisions, and prompting LLMs. Our results show that these methods vary
in their capability to handle inferable information, but they all fall short of
human performance.
[COMMENTS]
To be published in NAACL 2024
[LINK]
http://arxiv.org/abs/2309.08873v2
[DATE]
2024-04-16 02:39:01+08:00
[CATEGORIES]
cs.CL
AIGeN: An Adversarial Approach for Instruction Generation in VLN
[AUTHORS]
Niyati Rawal, Roberto Bigazzi, Lorenzo Baraldi, Rita Cucchiara
[ABSTRACT]
In the last few years, the research interest in Vision-and-Language
Navigation (VLN) has grown significantly. VLN is a challenging task that
involves an agent following human instructions and navigating in a previously
unknown environment to reach a specified goal. Recent work in literature
focuses on different ways to augment the available datasets of instructions for
improving navigation performance by exploiting synthetic training data. In this
work, we propose AIGeN, a novel architecture inspired by Generative Adversarial
Networks (GANs) that produces meaningful and well-formed synthetic instructions
to improve navigation agents’ performance. The model is composed of a
Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the
training phase, the decoder generates sentences for a sequence of images
describing the agent’s path to a particular point while the encoder
discriminates between real and fake instructions. Experimentally, we evaluate
the quality of the generated instructions and perform extensive ablation
studies. Additionally, we generate synthetic instructions for 217K trajectories
using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in
the performance of an off-the-shelf VLN method. The validation analysis of our
proposal is conducted on REVERIE and R2R and highlights the promising aspects
of our proposal, achieving state-of-the-art performance.
[COMMENTS]
Accepted to 7th Multimodal Learning and Applications Workshop (MULA
2024) at the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2024
[LINK]
http://arxiv.org/abs/2404.10054v1
[DATE]
2024-04-16 02:00:30+08:00
[CATEGORIES]
cs.CL
H2O-Danube-1.8B Technical Report
[AUTHORS]
Philipp Singer, Pascal Pfeiffer, Yauhen Babakhin, Maximilian Jeblick, Nischay Dhankhar, Gabor Fodor, Sri Satish Ambati
[ABSTRACT]
We present H2O-Danube, a series of small 1.8B language models consisting of
H2O-Danube-1.8B, trained on 1T tokens, and the incremental improved
H2O-Danube2-1.8B trained on an additional 2T tokens. Our models exhibit highly
competitive metrics across a multitude of benchmarks and, as of the time of
this writing, H2O-Danube2-1.8B achieves the top ranking on Open LLM Leaderboard
for all models below the 2B parameter range. The models follow core principles
of LLama 2 and Mistral, and we leverage and refine various techniques for
pre-training large language models. We additionally release chat models trained
with supervised fine-tuning followed by direct preference optimization. We make
all models openly available under Apache 2.0 license further democratizing LLMs
to a wider audience economically.
[LINK]
http://arxiv.org/abs/2401.16818v2
[DATE]
2024-04-16 01:58:01+08:00
[CATEGORIES]
cs.CL
cs.LG
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
[AUTHORS]
Xue-Yong Fu, Md Tahmid Rahman Laskar, Elena Khasanova, Cheng Chen, Shashi Bhushan TN
[COMMENTS]
Accepted by NAACL 2024 (Industry Track). The first two authors
contributed equally to this work
[LINK]
http://arxiv.org/abs/2402.00841v2
[DATE]
2024-04-16 01:56:58+08:00
[CATEGORIES]
cs.CL
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems
[AUTHORS]
Clemencia Siro, Mohammad Aliannejadi, Maarten de Rijke
[COMMENTS]
Accepted at NAACL 2024 Findings
[LINK]
http://arxiv.org/abs/2404.09980v1
[DATE]
2024-04-16 01:56:39+08:00
[CATEGORIES]
cs.CL
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
[AUTHORS]
Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov
[ABSTRACT]
Large language models (LLMs) are susceptible to hallucination, which sparked
a widespread effort to detect and prevent them. Recent work attempts to
mitigate hallucinations by intervening in the model’s computation during
generation, using different setups and heuristics. Those works lack separation
between different hallucination causes. In this work, we first introduce an
approach for constructing datasets based on the model knowledge for detection
and intervention methods in closed-book and open-book question-answering
settings. We then characterize the effect of different choices for
intervention, such as the intervened components (MLPs, attention block,
residual stream, and specific heads), and how often and how strongly to
intervene. We find that intervention success varies depending on the component,
with some components being detrimental to language modeling capabilities.
Finally, we find that interventions can benefit from pre-hallucination steering
direction instead of post-hallucination. The code is available at
https://github.com/technion-cs-nlp/hallucination-mitigation
[LINK]
http://arxiv.org/abs/2404.09971v1
[DATE]
2024-04-16 01:48:46+08:00
[CATEGORIES]
cs.CL
Tuning Language Models by Proxy
[AUTHORS]
Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
[COMMENTS]
fix typo in Table 13, add acknowledgments section. code available at
https://github.com/alisawuffles/proxy-tuning
[LINK]
http://arxiv.org/abs/2401.08565v3
[DATE]
2024-04-16 01:20:09+08:00
[CATEGORIES]
cs.CL
On the Fragility of Active Learners
[AUTHORS]
Abhishek Ghose, Emma Thuong Nguyen
[ABSTRACT]
Active learning (AL) techniques aim to maximally utilize a labeling budget by
iteratively selecting instances that are most likely to improve prediction
accuracy. However, their benefit compared to random sampling has not been
consistent across various setups, e.g., different datasets, classifiers. In
this empirical study, we examine how a combination of different factors might
obscure any gains from an AL technique. Focusing on text classification, we
rigorously evaluate AL techniques over around 1000 experiments that vary wrt
the dataset, batch size, text representation and the classifier. We show that
AL is only effective in a narrow set of circumstances. We also address the
problem of using metrics that are better aligned with real world expectations.
The impact of this study is in its insights for a practitioner: (a) the choice
of text representation and classifier is as important as that of an AL
technique, (b) choice of the right metric is critical in assessment of the
latter, and, finally, (c) reported AL results must be holistically interpreted,
accounting for variables other than just the query strategy.
[LINK]
http://arxiv.org/abs/2403.15744v3
[DATE]
2024-04-16 01:10:46+08:00
[CATEGORIES]
cs.LG
cs.CL
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
[AUTHORS]
Indraneil Paul, Goran Glavaš, Iryna Gurevych
[ABSTRACT]
Code understanding and generation have fast become some of the most popular
applications of language models (LMs). Nonetheless, research on multilingual
aspects of Code-LMs (i.e., LMs for code generation) such as cross-lingual
transfer between different programming languages, language-specific data
augmentation, and post-hoc LM adaptation, alongside exploitation of data
sources other than the original textual content, has been much sparser than for
their natural language counterparts. In particular, most mainstream Code-LMs
have been pre-trained on source code files alone. In this work, we investigate
the prospect of leveraging readily available compiler intermediate
representations (IR) - shared across programming languages - to improve the
multilingual capabilities of Code-LMs and facilitate cross-lingual transfer.
To this end, we first compile SLTrans, a parallel dataset consisting of
nearly 4M self-contained source code files coupled with respective intermediate
representations. Next, starting from various base Code-LMs (ranging in size
from 1.1B to 7.3B parameters), we carry out continued causal language modelling
training on SLTrans, forcing the Code-LMs to (1) learn the IR language and (2)
align the IR constructs with respective constructs of various programming
languages. Our resulting models, dubbed IRCoder, display sizeable and
consistent gains across a wide variety of code generation tasks and metrics,
including prompt robustness, multilingual code completion, code understanding,
and instruction following.
[LINK]
http://arxiv.org/abs/2403.03894v3
[DATE]
2024-04-16 00:29:41+08:00
[CATEGORIES]
cs.CL
Progressive Knowledge Graph Completion
[AUTHORS]
Jiayi Li, Ruilin Luo, Jiaqi Sun, Jing Xiao, Yujiu Yang
[ABSTRACT]
Knowledge Graph Completion (KGC) has emerged as a promising solution to
address the issue of incompleteness within Knowledge Graphs (KGs). Traditional
KGC research primarily centers on triple classification and link prediction.
Nevertheless, we contend that these tasks do not align well with real-world
scenarios and merely serve as surrogate benchmarks. In this paper, we
investigate three crucial processes relevant to real-world construction
scenarios: (a) the verification process, which arises from the necessity and
limitations of human verifiers; (b) the mining process, which identifies the
most promising candidates for verification; and (c) the training process, which
harnesses verified data for subsequent utilization; in order to achieve a
transition toward more realistic challenges. By integrating these three
processes, we introduce the Progressive Knowledge Graph Completion (PKGC) task,
which simulates the gradual completion of KGs in real-world scenarios.
Furthermore, to expedite PKGC processing, we propose two acceleration modules:
Optimized Top-$k$ algorithm and Semantic Validity Filter. These modules
significantly enhance the efficiency of the mining procedure. Our experiments
demonstrate that performance in link prediction does not accurately reflect
performance in PKGC. A more in-depth analysis reveals the key factors
influencing the results and provides potential directions for future research.
[COMMENTS]
14 pages, 10 figures
[LINK]
http://arxiv.org/abs/2404.09897v1
[DATE]
2024-04-16 00:16:59+08:00
[CATEGORIES]
cs.CL
cs.LG
Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning
[AUTHORS]
Wasu Top Piriyakulkij, Kevin Ellis
[ABSTRACT]
We build a computational model of how humans actively infer hidden rules by
doing experiments. The basic principles behind the model is that, even if the
rule is deterministic, the learner considers a broader space of fuzzy
probabilistic rules, which it represents in natural language, and updates its
hypotheses online after each experiment according to approximately Bayesian
principles. In the same framework we also model experiment design according to
information-theoretic criteria. We find that the combination of these three
principles – explicit hypotheses, probabilistic rules, and online updates –
can explain human performance on a Zendo-style task, and that removing any of
these components leaves the model unable to account for the data.
[LINK]
http://arxiv.org/abs/2402.06025v3
[DATE]
2024-04-16 00:11:50+08:00
[CATEGORIES]
cs.CL
Towards Verifiable Text Generation with Symbolic References
[AUTHORS]
Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim
[ABSTRACT]
LLMs are vulnerable to hallucinations, and thus their outputs generally
require laborious human verification for high-stakes applications. To this end,
we propose symbolically grounded generation (SymGen) as a simple approach for
enabling easier manual validation of an LLM’s output. SymGen prompts an LLM to
interleave its regular output text with explicit symbolic references to fields
present in some conditioning data (e.g., a table in JSON format). The
references can be used to display the provenance of different spans of text in
the generation, reducing the effort required for manual verification. Across a
range of data-to-text and question-answering experiments, we find that LLMs are
able to directly output text that makes use of accurate symbolic references
while maintaining fluency and factuality. In a human study we further find that
such annotations can streamline human verification of machine-generated text.
Our code will be available at http://symgen.github.io.
[COMMENTS]
57 pages, 8 figures, 8 tables
[LINK]
http://arxiv.org/abs/2311.09188v2
[DATE]
2024-04-16 00:09:33+08:00
[CATEGORIES]
cs.CL
cs.LG
Automating REST API Postman Test Cases Using LLM
[AUTHORS]
S Deepika Sri, Mohammed Aadil S, Sanjjushri Varshini R, Raja CSP Raman, Gopinath Rajagopal, S Taranath Chan
[ABSTRACT]
In the contemporary landscape of technological advancements, the automation
of manual processes is crucial, compelling the demand for huge datasets to
effectively train and test machines. This research paper is dedicated to the
exploration and implementation of an automated approach to generate test cases
specifically using Large Language Models. The methodology integrates the use of
Open AI to enhance the efficiency and effectiveness of test case generation for
training and evaluating Large Language Models. This formalized approach with
LLMs simplifies the testing process, making it more efficient and
comprehensive. Leveraging natural language understanding, LLMs can
intelligently formulate test cases that cover a broad range of REST API
properties, ensuring comprehensive testing. The model that is developed during
the research is trained using manually collected postman test cases or
instances for various Rest APIs. LLMs enhance the creation of Postman test
cases by automating the generation of varied and intricate test scenarios.
Postman test cases offer streamlined automation, collaboration, and dynamic
data handling, providing a user-friendly and efficient approach to API testing
compared to traditional test cases. Thus, the model developed not only conforms
to current technological standards but also holds the promise of evolving into
an idea of substantial importance in future technological advancements.
[LINK]
http://arxiv.org/abs/2404.10678v1
[DATE]
2024-04-16 23:53:41+08:00
[CATEGORIES]
cs.LG
Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks
[AUTHORS]
Mohsen Hami, Mahdi JameBozorg
[ABSTRACT]
Images captured from the real world are often affected by different types of
noise, which can significantly impact the performance of Computer Vision
systems and the quality of visual data. This study presents a novel approach
for defect detection in casting product noisy images, specifically focusing on
submersible pump impellers. The methodology involves utilizing deep learning
models such as VGG16, InceptionV3, and other models in both the spatial and
frequency domains to identify noise types and defect status. The research
process begins with preprocessing images, followed by applying denoising
techniques tailored to specific noise categories. The goal is to enhance the
accuracy and robustness of defect detection by integrating noise detection and
denoising into the classification pipeline. The study achieved remarkable
results using VGG16 for noise type classification in the frequency domain,
achieving an accuracy of over 99%. Removal of salt and pepper noise resulted in
an average SSIM of 87.9, while Gaussian noise removal had an average SSIM of
64.0, and periodic noise removal yielded an average SSIM of 81.6. This
comprehensive approach showcases the effectiveness of the deep AutoEncoder
model and median filter, for denoising strategies in real-world industrial
applications. Finally, our study reports significant improvements in binary
classification accuracy for defect detection compared to previous methods. For
the VGG16 classifier, accuracy increased from 94.6% to 97.0%, demonstrating the
effectiveness of the proposed noise detection and denoising approach.
Similarly, for the InceptionV3 classifier, accuracy improved from 84.7% to
90.0%, further validating the benefits of integrating noise analysis into the
classification pipeline.
[COMMENTS]
13 pages, 13 figures, 13th International conference on innovative
technologies in the field of science, engineering and technology
[LINK]
http://arxiv.org/abs/2404.10664v1
[DATE]
2024-04-16 23:40:18+08:00
[CATEGORIES]
cs.LG
Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay
[AUTHORS]
Jinmei Liu, Wenbin Li, Xiangyu Yue, Shilin Zhang, Chunlin Chen, Zhi Wang
[ABSTRACT]
We study continual offline reinforcement learning, a practical paradigm that
facilitates forward transfer and mitigates catastrophic forgetting to tackle
sequential offline tasks. We propose a dual generative replay framework that
retains previous knowledge by concurrent replay of generated pseudo-data.
First, we decouple the continual learning policy into a diffusion-based
generative behavior model and a multi-head action evaluation model, allowing
the policy to inherit distributional expressivity for encompassing a
progressive range of diverse behaviors. Second, we train a task-conditioned
diffusion model to mimic state distributions of past tasks. Generated states
are paired with corresponding responses from the behavior generator to
represent old tasks with high-fidelity replayed samples. Finally, by
interleaving pseudo samples with real ones of the new task, we continually
update the state and behavior generators to model progressively diverse
behaviors, and regularize the multi-head critic via behavior cloning to
mitigate forgetting. Experiments demonstrate that our method achieves better
forward transfer with less forgetting, and closely approximates the results of
using previous ground-truth data due to its high-fidelity replay of the sample
space. Our code is available at
\href{https://github.com/NJU-RL/CuGRO}{https://github.com/NJU-RL/CuGRO}.
[LINK]
http://arxiv.org/abs/2404.10662v1
[DATE]
2024-04-16 23:39:11+08:00
[CATEGORIES]
cs.LG
Mori-Zwanzig latent space Koopman closure for nonlinear autoencoder
[AUTHORS]
Priyam Gupta, Peter J. Schmid, Denis Sipp, Taraneh Sayadi, Georgios Rigas
[ABSTRACT]
The Koopman operator presents an attractive approach to achieve global
linearization of nonlinear systems, making it a valuable method for simplifying
the understanding of complex dynamics. While data-driven methodologies have
exhibited promise in approximating finite Koopman operators, they grapple with
various challenges, such as the judicious selection of observables,
dimensionality reduction, and the ability to predict complex system behaviors
accurately. This study presents a novel approach termed Mori-Zwanzig
autoencoder (MZ-AE) to robustly approximate the Koopman operator in
low-dimensional spaces. The proposed method leverages a nonlinear autoencoder
to extract key observables for approximating a finite invariant Koopman
subspace and integrates a non-Markovian correction mechanism using the
Mori-Zwanzig formalism. Consequently, this approach yields a closed
representation of dynamics within the latent manifold of the nonlinear
autoencoder, thereby enhancing the precision and stability of the Koopman
operator approximation. Demonstrations showcase the technique’s ability to
capture regime transitions in the flow around a cylinder. It also provides a
low dimensional approximation for Kuramoto-Sivashinsky with promising
short-term predictability and robust long-term statistical performance. By
bridging the gap between data-driven techniques and the mathematical
foundations of Koopman theory, MZ-AE offers a promising avenue for improved
understanding and prediction of complex nonlinear dynamics.
[COMMENTS]
22 pages, 11 figures
[LINK]
http://arxiv.org/abs/2310.10745v2
[DATE]
2024-04-16 23:22:04+08:00
[CATEGORIES]
cs.LG
HOEG: A New Approach for Object-Centric Predictive Process Monitoring
[AUTHORS]
Tim K. Smit, Hajo A. Reijers, Xixi Lu
[ABSTRACT]
Predictive Process Monitoring focuses on predicting future states of ongoing
process executions, such as forecasting the remaining time. Recent developments
in Object-Centric Process Mining have enriched event data with objects and
their explicit relations between events. To leverage this enriched data, we
propose the Heterogeneous Object Event Graph encoding (HOEG), which integrates
events and objects into a graph structure with diverse node types. It does so
without aggregating object features, thus creating a more nuanced and
informative representation. We then adopt a heterogeneous Graph Neural Network
architecture, which incorporates these diverse object features in prediction
tasks. We evaluate the performance and scalability of HOEG in predicting
remaining time, benchmarking it against two established graph-based encodings
and two baseline models. Our evaluation uses three Object-Centric Event Logs
(OCELs), including one from a real-life process at a major Dutch financial
institution. The results indicate that HOEG competes well with existing models
and surpasses them when OCELs contain informative object attributes and
event-object interactions.
[COMMENTS]
accepted to 36th International Conference on Advanced Information
Systems Engineering (CAISE), 2024
[LINK]
http://arxiv.org/abs/2404.05316v2
[DATE]
2024-04-16 23:14:50+08:00
[CATEGORIES]
cs.LG
A Systematic Review of Low-Rank and Local Low-Rank Matrix Approximation in Big Data Medical Imaging
[AUTHORS]
Sisipho Hamlomo, Marcellin Atemkeng, Yusuf Brima, Chuneeta Nunhokee, Jeremy Baxter
[ABSTRACT]
The large volume and complexity of medical imaging datasets are bottlenecks
for storage, transmission, and processing. To tackle these challenges, the
application of low-rank matrix approximation (LRMA) and its derivative, local
LRMA (LLRMA) has demonstrated potential.
A detailed analysis of the literature identifies LRMA and LLRMA methods
applied to various imaging modalities, and the challenges and limitations
associated with existing LRMA and LLRMA methods are addressed.
We note a significant shift towards a preference for LLRMA in the medical
imaging field since 2015, demonstrating its potential and effectiveness in
capturing complex structures in medical data compared to LRMA. Acknowledging
the limitations of shallow similarity methods used with LLRMA, we suggest
advanced semantic image segmentation for similarity measure, explaining in
detail how it can measure similar patches and their feasibility.
We note that LRMA and LLRMA are mainly applied to unstructured medical data,
and we propose extending their application to different medical data types,
including structured and semi-structured. This paper also discusses how LRMA
and LLRMA can be applied to regular data with missing entries and the impact of
inaccuracies in predicting missing values and their effects. We discuss the
impact of patch size and propose the use of random search (RS) to determine the
optimal patch size. To enhance feasibility, a hybrid approach using Bayesian
optimization and RS is proposed, which could improve the application of LRMA
and LLRMA in medical imaging.
[LINK]
http://arxiv.org/abs/2402.14045v2
[DATE]
2024-04-16 22:45:44+08:00
[CATEGORIES]
cs.LG
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
[AUTHORS]
Sinisa Stekovic, Stefan Ainetter, Mattia D’Urso, Friedrich Fraundorfer, Vincent Lepetit
[ABSTRACT]
We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D
objects from images using interpretable shape programs. In comparison to
traditional CAD model retrieval methods, the use of shape programs for 3D
reconstruction allows for reasoning about the semantic properties of
reconstructed objects, editing, low memory footprint, etc. However, the
utilization of shape programs for 3D scene understanding has been largely
neglected in past works. As our main contribution, we enable gradient-based
optimization by introducing a module that translates shape programs designed in
Blender, for example, into efficient PyTorch code. We also provide a method
that relies on PyTorchGeoNodes and is inspired by Monte Carlo Tree Search
(MCTS) to jointly optimize discrete and continuous parameters of shape programs
and reconstruct 3D objects for input scenes. In our experiments, we apply our
algorithm to reconstruct 3D objects in the ScanNet dataset and evaluate our
results against CAD model retrieval-based reconstructions. Our experiments
indicate that our reconstructions match well the input scenes while enabling
semantic reasoning about reconstructed objects.
[COMMENTS]
In Submission
[LINK]
http://arxiv.org/abs/2404.10620v1
[DATE]
2024-04-16 22:43:33+08:00
[CATEGORIES]
cs.LG
DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
[AUTHORS]
Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo
[ABSTRACT]
Text-to-image diffusion models have been shown to suffer from sample-level
memorization, possibly reproducing near-perfect replica of images that they are
trained on, which may be undesirable. To remedy this issue, we develop the
first differentially private (DP) retrieval-augmented generation algorithm that
is capable of generating high-quality image samples while providing provable
privacy guarantees. Specifically, we assume access to a text-to-image diffusion
model trained on a small amount of public data, and design a DP retrieval
mechanism to augment the text prompt with samples retrieved from a private
retrieval dataset. Our \emph{differentially private retrieval-augmented
diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to
adapt to another domain, and can use state-of-the-art generative models to
generate high-quality image samples while satisfying rigorous DP guarantees.
For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a
privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in
FID compared to public-only retrieval for up to $10,000$ queries.
[LINK]
http://arxiv.org/abs/2403.14421v2
[DATE]
2024-04-16 22:16:48+08:00
[CATEGORIES]
cs.LG
Do Counterfactual Examples Complicate Adversarial Training?
[AUTHORS]
Eric Yeats, Cameron Darwin, Eduardo Ortega, Frank Liu, Hai Li
[ABSTRACT]
We leverage diffusion models to study the robustness-performance tradeoff of
robust classifiers. Our approach introduces a simple, pretrained diffusion
method to generate low-norm counterfactual examples (CEs): semantically altered
data which results in different true class membership. We report that the
confidence and accuracy of robust models on their clean training data are
associated with the proximity of the data to their CEs. Moreover, robust models
perform very poorly when evaluated on the CEs directly, as they become
increasingly invariant to the low-norm, semantic changes brought by CEs. The
results indicate a significant overlap between non-robust and semantic
features, countering the common assumption that non-robust features are not
interpretable.
[COMMENTS]
Accepted as a short paper to the GCV Workshop at CVPR’24
[LINK]
http://arxiv.org/abs/2404.10588v1
[DATE]
2024-04-16 22:13:44+08:00
[CATEGORIES]
cs.LG
Data-driven subgrouping of patient trajectories with chronic diseases: Evidence from low back pain
[AUTHORS]
Christof Naumzik, Alice Kongsted, Werner Vach, Stefan Feuerriegel
[ABSTRACT]
Clinical data informs the personalization of health care with a potential for
more effective disease management. In practice, this is achieved by
subgrouping, whereby clusters with similar patient characteristics are
identified and then receive customized treatment plans with the goal of
targeting subgroup-specific disease dynamics. In this paper, we propose a novel
mixture hidden Markov model for subgrouping patient trajectories from chronic
diseases. Our model is probabilistic and carefully designed to capture
different trajectory phases of chronic diseases (i.e., “severe”, “moderate”,
and “mild”) through tailored latent states. We demonstrate our subgrouping
framework based on a longitudinal study across 847 patients with non-specific
low back pain. Here, our subgrouping framework identifies 8 subgroups. Further,
we show that our subgrouping framework outperforms common baselines in terms of
cluster validity indices. Finally, we discuss the applicability of the model to
other chronic and long-lasting diseases.
[COMMENTS]
Forthcoming at Conference on Health, Inference, and Learning (CHIL)
2024
[LINK]
http://arxiv.org/abs/2404.10580v1
[DATE]
2024-04-16 22:05:29+08:00
[CATEGORIES]
cs.LG
EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
[AUTHORS]
Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong
[ABSTRACT]
A key challenge in contrastive learning is to generate negative samples from
a large sample set to contrast with positive samples, for learning better
encoding of the data. These negative samples often follow a softmax
distribution which are dynamically updated during the training process.
However, sampling from this distribution is non-trivial due to the high
computational costs in computing the partition function. In this paper, we
propose an Efficient Markov Chain Monte Carlo negative sampling method for
Contrastive learning (EMC$^2$). We follow the global contrastive learning loss
as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive
Metropolis-Hastings subroutine to generate hardness-aware negative samples in
an online fashion during the optimization. We prove that EMC$^2$ finds an
$\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in
$T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that
exhibits global convergence (to stationarity) regardless of the choice of batch
size while exhibiting low computation and memory cost. Numerical experiments
validate that EMC$^2$ is effective with small batch training and achieves
comparable or better performance than baseline algorithms. We report the
results for pre-training image encoders on STL-10 and Imagenet-100.
[COMMENTS]
20 pages
[LINK]
http://arxiv.org/abs/2404.10575v1
[DATE]
2024-04-16 21:53:58+08:00
[CATEGORIES]
cs.LG
Uncertainty-guided Open-Set Source-Free Unsupervised Domain Adaptation with Target-private Class Segregation
[AUTHORS]
Mattia Litrico, Davide Talon, Sebastiano Battiato, Alessio Del Bue, Mario Valerio Giuffrida, Pietro Morerio
[ABSTRACT]
Standard Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from
a labeled source domain to an unlabeled target but usually requires
simultaneous access to both source and target data. Moreover, UDA approaches
commonly assume that source and target domains share the same labels space.
Yet, these two assumptions are hardly satisfied in real-world scenarios. This
paper considers the more challenging Source-Free Open-set Domain Adaptation
(SF-OSDA) setting, where both assumptions are dropped. We propose a novel
approach for SF-OSDA that exploits the granularity of target-private categories
by segregating their samples into multiple unknown classes. Starting from an
initial clustering-based assignment, our method progressively improves the
segregation of target-private samples by refining their pseudo-labels with the
guide of an uncertainty-based sample selection module. Additionally, we propose
a novel contrastive loss, named NL-InfoNCELoss, that, integrating negative
learning into self-supervised contrastive learning, enhances the model
robustness to noisy pseudo-labels. Extensive experiments on benchmark datasets
demonstrate the superiority of the proposed method over existing approaches,
establishing new state-of-the-art performance. Notably, additional analyses
show that our method is able to learn the underlying semantics of novel
classes, opening the possibility to perform novel class discovery.
[LINK]
http://arxiv.org/abs/2404.10574v1
[DATE]
2024-04-16 21:52:00+08:00
[CATEGORIES]
cs.LG
HiGraphDTI: Hierarchical Graph Representation Learning for Drug-Target Interaction Prediction
[AUTHORS]
Bin Liu, Siqi Wu, Jin Wang, Xin Deng, Ao Zhou
[ABSTRACT]
The discovery of drug-target interactions (DTIs) plays a crucial role in
pharmaceutical development. The deep learning model achieves more accurate
results in DTI prediction due to its ability to extract robust and expressive
features from drug and target chemical structures. However, existing deep
learning methods typically generate drug features via aggregating molecular
atom representations, ignoring the chemical properties carried by motifs, i.e.,
substructures of the molecular graph. The atom-drug double-level molecular
representation learning can not fully exploit structure information and fails
to interpret the DTI mechanism from the motif perspective. In addition,
sequential model-based target feature extraction either fuses limited
contextual information or requires expensive computational resources. To tackle
the above issues, we propose a hierarchical graph representation learning-based
DTI prediction method (HiGraphDTI). Specifically, HiGraphDTI learns
hierarchical drug representations from triple-level molecular graphs to
thoroughly exploit chemical information embedded in atoms, motifs, and
molecules. Then, an attentional feature fusion module incorporates information
from different receptive fields to extract expressive target features.Last, the
hierarchical attention mechanism identifies crucial molecular segments, which
offers complementary views for interpreting interaction mechanisms. The
experiment results not only demonstrate the superiority of HiGraphDTI to the
state-of-the-art methods, but also confirm the practical ability of our model
in interaction interpretation and new DTI discovery.
[LINK]
http://arxiv.org/abs/2404.10561v1
[DATE]
2024-04-16 21:35:24+08:00
[CATEGORIES]
cs.LG
Sharp error bounds for imbalanced classification: how many examples in the minority class?
[AUTHORS]
Anass Aghbalou, François Portier, Anne Sabourin
[ABSTRACT]
When dealing with imbalanced classification data, reweighting the loss
function is a standard procedure allowing to equilibrate between the true
positive and true negative rates within the risk measure. Despite significant
theoretical work in this area, existing results do not adequately address a
main challenge within the imbalanced classification framework, which is the
negligible size of one class in relation to the full sample size and the need
to rescale the risk function by a probability tending to zero. To address this
gap, we present two novel contributions in the setting where the rare class
probability approaches zero: (1) a non asymptotic fast rate probability bound
for constrained balanced empirical risk minimization, and (2) a consistent
upper bound for balanced nearest neighbors estimates. Our findings provide a
clearer understanding of the benefits of class-weighting in realistic settings,
opening new avenues for further research in this field.
[LINK]
http://arxiv.org/abs/2310.14826v2
[DATE]
2024-04-16 21:25:38+08:00
[CATEGORIES]
cs.LG
Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem
[AUTHORS]
Roumen Nikolaev Popov
[ABSTRACT]
We propose an analytical solution for approximating the gradient of the
Evidence Lower Bound (ELBO) in variational inference problems where the
statistical model is a Bayesian network consisting of observations drawn from a
mixture of a Gaussian distribution embedded in unrelated clutter, known as the
clutter problem. The method employs the reparameterization trick to move the
gradient operator inside the expectation and relies on the assumption that,
because the likelihood factorizes over the observed data, the variational
distribution is generally more compactly supported than the Gaussian
distribution in the likelihood factors. This allows efficient local
approximation of the individual likelihood factors, which leads to an
analytical solution for the integral defining the gradient expectation. We
integrate the proposed gradient approximation as the expectation step in an EM
(Expectation Maximization) algorithm for maximizing ELBO and test against
classical deterministic approaches in Bayesian inference, such as the Laplace
approximation, Expectation Propagation and Mean-Field Variational Inference.
The proposed method demonstrates good accuracy and rate of convergence together
with linear computational complexity.
[COMMENTS]
16 pages, 4 figures, supporting code available at
https://github.com/rpopov42/elbo_gaa
[LINK]
http://arxiv.org/abs/2404.10550v1
[DATE]
2024-04-16 21:19:46+08:00
[CATEGORIES]
cs.LG
Classification of Prostate Cancer in 3D Magnetic Resonance Imaging Data based on Convolutional Neural Networks
[AUTHORS]
Malte Rippa, Ruben Schulze, Marian Himstedt, Felice Burn
[ABSTRACT]
Prostate cancer is a commonly diagnosed cancerous disease among men
world-wide. Even with modern technology such as multi-parametric magnetic
resonance tomography and guided biopsies, the process for diagnosing prostate
cancer remains time consuming and requires highly trained professionals. In
this paper, different convolutional neural networks (CNN) are evaluated on
their abilities to reliably classify whether an MRI sequence contains malignant
lesions. Implementations of a ResNet, a ConvNet and a ConvNeXt for 3D image
data are trained and evaluated. The models are trained using different data
augmentation techniques, learning rates, and optimizers. The data is taken from
a private dataset, provided by Cantonal Hospital Aarau. The best result was
achieved by a ResNet3D, yielding an average precision score of 0.4583 and AUC
ROC score of 0.6214.
[COMMENTS]
Previous version published in Buzug T.M., Handels H., M"uller S.,
H"ubner C., Mertins A., Rostalski P.: Student Conference Proceedings 2023,
Infinite Science Publishing, 2023 (ISBN/EAN 978-3-945954-72-0). 7 pages, 2
figures
[LINK]
http://arxiv.org/abs/2404.10548v1
[DATE]
2024-04-16 21:18:02+08:00
[CATEGORIES]
cs.LG
A/B testing under Interference with Partial Network Information
[AUTHORS]
Shiv Shankar, Ritwik Sinha, Yash Chandak, Saayan Mitra, Madalina Fiterau
[ABSTRACT]
A/B tests are often required to be conducted on subjects that might have
social connections. For e.g., experiments on social media, or medical and
social interventions to control the spread of an epidemic. In such settings,
the SUTVA assumption for randomized-controlled trials is violated due to
network interference, or spill-over effects, as treatments to group A can
potentially also affect the control group B. When the underlying social network
is known exactly, prior works have demonstrated how to conduct A/B tests
adequately to estimate the global average treatment effect (GATE). However, in
practice, it is often impossible to obtain knowledge about the exact underlying
network. In this paper, we present UNITE: a novel estimator that relax this
assumption and can identify GATE while only relying on knowledge of the
superset of neighbors for any subject in the graph. Through theoretical
analysis and extensive experiments, we show that the proposed approach performs
better in comparison to standard estimators.
[COMMENTS]
AISTATS 2024
[LINK]
http://arxiv.org/abs/2404.10547v1
[DATE]
2024-04-16 21:16:41+08:00
[CATEGORIES]
cs.LG
Warm-Start Variational Quantum Policy Iteration
[AUTHORS]
Nico Meyer, Jakob Murauer, Alexander Popov, Christian Ufrecht, Axel Plinge, Christopher Mutschler, Daniel D. Scherer
[ABSTRACT]
Reinforcement learning is a powerful framework aiming to determine optimal
behavior in highly complex decision-making scenarios. This objective can be
achieved using policy iteration, which requires to solve a typically large
linear system of equations. We propose the variational quantum policy iteration
(VarQPI) algorithm, realizing this step with a NISQ-compatible quantum-enhanced
subroutine. Its scalability is supported by an analysis of the structure of
generic reinforcement learning environments, laying the foundation for
potential quantum advantage with utility-scale quantum computers. Furthermore,
we introduce the warm-start initialization variant (WS-VarQPI) that
significantly reduces resource overhead. The algorithm solves a large
FrozenLake environment with an underlying 256x256-dimensional linear system,
indicating its practical robustness.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. 9 pages, 6 figures, 1 table
[LINK]
http://arxiv.org/abs/2404.10546v1
[DATE]
2024-04-16 21:16:19+08:00
[CATEGORIES]
cs.LG
Regularization by Texts for Latent Diffusion Inverse Solvers
[AUTHORS]
Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, Jong Chul Ye
[ABSTRACT]
The recent advent of diffusion models has led to significant progress in
solving inverse problems, leveraging these models as effective generative
priors. Nonetheless, there remain challenges related to the ill-posed nature of
such problems, often due to inherent ambiguities in measurements or intrinsic
system symmetries. To address this, drawing inspiration from the human ability
to resolve visual ambiguities through perceptual biases, here we introduce a
novel latent diffusion inverse solver by regularization by texts (TReg).
Specifically, TReg applies the textual description of the preconception of the
solution during the reverse diffusion sampling, of which the description is
dynamically reinforced through null-text optimization for adaptive negation.
Our comprehensive experimental results demonstrate that TReg successfully
mitigates ambiguity in the inverse problems, enhancing their effectiveness and
accuracy.
[LINK]
http://arxiv.org/abs/2311.15658v2
[DATE]
2024-04-16 20:58:57+08:00
[CATEGORIES]
cs.LG
Fossil 2.0: Formal Certificate Synthesis for the Verification and Control of Dynamical Models
[AUTHORS]
Alec Edwards, Andrea Peruffo, Alessandro Abate
[ABSTRACT]
This paper presents Fossil 2.0, a new major release of a software tool for
the synthesis of certificates (e.g., Lyapunov and barrier functions) for
dynamical systems modelled as ordinary differential and difference equations.
Fossil 2.0 is much improved from its original release, including new
interfaces, a significantly expanded certificate portfolio, controller
synthesis and enhanced extensibility. We present these new features as part of
this tool paper. Fossil implements a counterexample-guided inductive synthesis
(CEGIS) loop ensuring the soundness of the method. Our tool uses neural
networks as templates to generate candidate functions, which are then formally
proven by an SMT solver acting as an assertion verifier. Improvements with
respect to the first release include a wider range of certificates, synthesis
of control laws, and support for discrete-time models.
[COMMENTS]
HSCC 2024 Tool Paper
[LINK]
http://arxiv.org/abs/2311.09793v2
[DATE]
2024-04-16 20:51:47+08:00
[CATEGORIES]
cs.LG
VFLAIR: A Research Library and Benchmark for Vertical Federated Learning
[AUTHORS]
Tianyuan Zou, Zixuan Gu, Yu He, Hideaki Takahashi, Yang Liu, Ya-Qin Zhang
[ABSTRACT]
Vertical Federated Learning (VFL) has emerged as a collaborative training
paradigm that allows participants with different features of the same group of
users to accomplish cooperative training without exposing their raw data or
model parameters. VFL has gained significant attention for its research
potential and real-world applications in recent years, but still faces
substantial challenges, such as in defending various kinds of data inference
and backdoor attacks. Moreover, most of existing VFL projects are
industry-facing and not easily used for keeping track of the current research
progress. To address this need, we present an extensible and lightweight VFL
framework VFLAIR (available at https://github.com/FLAIR-THU/VFLAIR), which
supports VFL training with a variety of models, datasets and protocols, along
with standardized modules for comprehensive evaluations of attacks and defense
strategies. We also benchmark 11 attacks and 8 defenses performance under
different communication and model partition settings and draw concrete insights
and recommendations on the choice of defense strategies for different practical
VFL deployment scenarios.
[COMMENTS]
39 pages, 22 figures, 19 tabels
[LINK]
http://arxiv.org/abs/2310.09827v2
[DATE]
2024-04-16 20:34:39+08:00
[CATEGORIES]
cs.LG
Four-hour thunderstorm nowcasting using deep diffusion models of satellite
[AUTHORS]
Kuai Dai, Xutao Li, Junying Fang, Yunming Ye, Demin Yu, Di Xian, Danyu Qin
[ABSTRACT]
Convection (thunderstorm) develops rapidly within hours and is highly
destructive, posing a significant challenge for nowcasting and resulting in
substantial losses to nature and society. After the emergence of artificial
intelligence (AI)-based methods, convection nowcasting has experienced rapid
advancements, with its performance surpassing that of physics-based numerical
weather prediction and other conventional approaches. However, the lead time
and coverage of it still leave much to be desired and hardly meet the needs of
disaster emergency response. Here, we propose a deep diffusion model of
satellite (DDMS) to establish an AI-based convection nowcasting system. On one
hand, it employs diffusion processes to effectively simulate complicated
spatiotemporal evolution patterns of convective clouds, significantly improving
the forecast lead time. On the other hand, it utilizes geostationary satellite
brightness temperature data, thereby achieving planetary-scale forecast
coverage. During long-term tests and objective validation based on the
FengYun-4A satellite, our system achieves, for the first time, effective
convection nowcasting up to 4 hours, with broad coverage (about 20,000,000
km2), remarkable accuracy, and high resolution (15 minutes; 4 km). Its
performance reaches a new height in convection nowcasting compared to the
existing models. In terms of application, our system operates efficiently
(forecasting 4 hours of convection in 8 minutes), and is highly transferable
with the potential to collaborate with multiple satellites for global
convection nowcasting. Furthermore, our results highlight the remarkable
capabilities of diffusion models in convective clouds forecasting, as well as
the significant value of geostationary satellite data when empowered by AI
technologies.
[LINK]
http://arxiv.org/abs/2404.10512v1
[DATE]
2024-04-16 20:33:44+08:00
[CATEGORIES]
cs.LG
Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation
[AUTHORS]
Hao Tang, Lianglun Cheng, Guoheng Huang, Zhengguang Tan, Junhao Lu, Kaihong Wu
[ABSTRACT]
Image segmentation holds a vital position in the realms of diagnosis and
treatment within the medical domain. Traditional convolutional neural networks
(CNNs) and Transformer models have made significant advancements in this realm,
but they still encounter challenges because of limited receptive field or high
computing complexity. Recently, State Space Models (SSMs), particularly Mamba
and its variants, have demonstrated notable performance in the field of vision.
However, their feature extraction methods may not be sufficiently effective and
retain some redundant structures, leaving room for parameter reduction.
Motivated by previous spatial and channel attention methods, we propose Triplet
Mamba-UNet. The method leverages residual VSS Blocks to extract intensive
contextual features, while Triplet SSM is employed to fuse features across
spatial and channel dimensions. We conducted experiments on ISIC17, ISIC18,
CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets,
demonstrating the superior segmentation performance of our proposed TM-UNet.
Additionally, compared to the previous VM-UNet, our model achieves a one-third
reduction in parameters.
[LINK]
http://arxiv.org/abs/2403.17701v3
[DATE]
2024-04-16 19:46:39+08:00
[CATEGORIES]
cs.LG
Would You Trust an AI Doctor? Building Reliable Medical Predictions with Kernel Dropout Uncertainty
[AUTHORS]
Ubaid Azam, Imran Razzak, Shelly Vishwakarma, Hakim Hacid, Dell Zhang, Shoaib Jameel
[ABSTRACT]
The growing capabilities of AI raise questions about their trustworthiness in
healthcare, particularly due to opaque decision-making and limited data
availability. This paper proposes a novel approach to address these challenges,
introducing a Bayesian Monte Carlo Dropout model with kernel modelling. Our
model is designed to enhance reliability on small medical datasets, a crucial
barrier to the wider adoption of AI in healthcare. This model leverages
existing language models for improved effectiveness and seamlessly integrates
with current workflows. We demonstrate significant improvements in reliability,
even with limited data, offering a promising step towards building trust in
AI-driven medical predictions and unlocking its potential to improve patient
care.
[LINK]
http://arxiv.org/abs/2404.10483v1
[DATE]
2024-04-16 19:43:26+08:00
[CATEGORIES]
cs.LG
BayesJudge: Bayesian Kernel Language Modelling with Confidence Uncertainty in Legal Judgment Prediction
[AUTHORS]
Ubaid Azam, Imran Razzak, Shelly Vishwakarma, Hakim Hacid, Dell Zhang, Shoaib Jameel
[ABSTRACT]
Predicting legal judgments with reliable confidence is paramount for
responsible legal AI applications. While transformer-based deep neural networks
(DNNs) like BERT have demonstrated promise in legal tasks, accurately assessing
their prediction confidence remains crucial. We present a novel Bayesian
approach called BayesJudge that harnesses the synergy between deep learning and
deep Gaussian Processes to quantify uncertainty through Bayesian kernel Monte
Carlo dropout. Our method leverages informative priors and flexible data
modelling via kernels, surpassing existing methods in both predictive accuracy
and confidence estimation as indicated through brier score. Extensive
evaluations of public legal datasets showcase our model’s superior performance
across diverse tasks. We also introduce an optimal solution to automate the
scrutiny of unreliable predictions, resulting in a significant increase in the
accuracy of the model’s predictions by up to 27\%. By empowering judges and
legal professionals with more reliable information, our work paves the way for
trustworthy and transparent legal AI applications that facilitate informed
decisions grounded in both knowledge and quantified uncertainty.
[LINK]
http://arxiv.org/abs/2404.10481v1
[DATE]
2024-04-16 19:42:06+08:00
[CATEGORIES]
cs.LG
Instabilities in Convnets for Raw Audio
[AUTHORS]
Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs
[ABSTRACT]
What makes waveform-based deep learning so hard? Despite numerous attempts at
training convolutional neural networks (convnets) for filterbank design, they
often fail to outperform hand-crafted baselines. These baselines are linear
time-invariant systems: as such, they can be approximated by convnets with wide
receptive fields. Yet, in practice, gradient-based optimization leads to
suboptimal approximations. In our article, we approach this phenomenon from the
perspective of initialization. We present a theory of large deviations for the
energy response of FIR filterbanks with random Gaussian weights. We find that
deviations worsen for large filters and locally periodic input signals, which
are both typical for audio signal processing applications. Numerical
simulations align with our theory and suggest that the condition number of a
convolutional layer follows a logarithmic scaling law between the number and
length of the filters, which is reminiscent of discrete wavelet bases.
[COMMENTS]
4 pages, 5 figures, 1 page appendix, published in IEEE SPL
[LINK]
http://arxiv.org/abs/2309.05855v3
[DATE]
2024-04-16 19:40:46+08:00
[CATEGORIES]
cs.LG
Machine Learning Based Optimization Workflow for Tuning Numerical Settings of Differential Equation Solvers for Boundary Value Problems
[AUTHORS]
Viny Saajan Victor, Manuel Ettmüller, Andre Schmeißer, Heike Leitte, Simone Gramsch
[ABSTRACT]
Several numerical differential equation solvers have been employed
effectively over the years as an alternative to analytical solvers to quickly
and conveniently solve differential equations. One category of these is
boundary value solvers, which are used to solve real-world problems formulated
as differential equations with boundary conditions. These solvers require
certain numerical settings to solve the differential equations that affect
their solvability and performance. A systematic fine-tuning of these settings
is required to obtain the desired solution and performance. Currently, these
settings are either selected by trial and error or require domain expertise. In
this paper, we propose a machine learning-based optimization workflow for
fine-tuning the numerical settings to reduce the time and domain expertise
required in the process. In the evaluation section, we discuss the scalability,
stability, and reliability of the proposed workflow. We demonstrate our
workflow on a numerical boundary value problem solver.
[LINK]
http://arxiv.org/abs/2404.10472v1
[DATE]
2024-04-16 19:25:00+08:00
[CATEGORIES]
cs.LG
Advancing Long-Term Multi-Energy Load Forecasting with Patchformer: A Patch and Transformer-Based Approach
[AUTHORS]
Qiuyi Hong, Fanlin Meng, Felipe Maldonado
[ABSTRACT]
In the context of increasing demands for long-term multi-energy load
forecasting in real-world applications, this paper introduces Patchformer, a
novel model that integrates patch embedding with encoder-decoder
Transformer-based architectures. To address the limitation in existing
Transformer-based models, which struggle with intricate temporal patterns in
long-term forecasting, Patchformer employs patch embedding, which predicts
multivariate time-series data by separating it into multiple univariate data
and segmenting each of them into multiple patches. This method effectively
enhances the model’s ability to capture local and global semantic dependencies.
The numerical analysis shows that the Patchformer obtains overall better
prediction accuracy in both multivariate and univariate long-term forecasting
on the novel Multi-Energy dataset and other benchmark datasets. In addition,
the positive effect of the interdependence among energy-related products on the
performance of long-term time-series forecasting across Patchformer and other
compared models is discovered, and the superiority of the Patchformer against
other models is also demonstrated, which presents a significant advancement in
handling the interdependence and complexities of long-term multi-energy
forecasting. Lastly, Patchformer is illustrated as the only model that follows
the positive correlation between model performance and the length of the past
sequence, which states its ability to capture long-range past local semantic
information.
[LINK]
http://arxiv.org/abs/2404.10458v1
[DATE]
2024-04-16 18:56:33+08:00
[CATEGORIES]
cs.LG
Revealing data leakage in protein interaction benchmarks
[AUTHORS]
Anton Bushuiev, Roman Bushuiev, Jiri Sedlar, Tomas Pluskal, Jiri Damborsky, Stanislav Mazurenko, Josef Sivic
[ABSTRACT]
In recent years, there has been remarkable progress in machine learning for
protein-protein interactions. However, prior work has predominantly focused on
improving learning algorithms, with less attention paid to evaluation
strategies and data preparation. Here, we demonstrate that further development
of machine learning methods may be hindered by the quality of existing
train-test splits. Specifically, we find that commonly used splitting
strategies for protein complexes, based on protein sequence or metadata
similarity, introduce major data leakage. This may result in overoptimistic
evaluation of generalization, as well as unfair benchmarking of the models,
biased towards assessing their overfitting capacity rather than practical
utility. To overcome the data leakage, we recommend constructing data splits
based on 3D structural similarity of protein-protein interfaces and suggest
corresponding algorithms. We believe that addressing the data leakage problem
is critical for further progress in this research area.
[LINK]
http://arxiv.org/abs/2404.10457v1
[DATE]
2024-04-16 18:54:48+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks for Protein-Protein Interactions - A Short Survey
[AUTHORS]
Mingda Xu, Peisheng Qian, Ziyuan Zhao, Zeng Zeng, Jianguo Chen, Weide Liu, Xulei Yang
[ABSTRACT]
Protein-protein interactions (PPIs) play key roles in a broad range of
biological processes. Numerous strategies have been proposed for predicting
PPIs, and among them, graph-based methods have demonstrated promising outcomes
owing to the inherent graph structure of PPI networks. This paper reviews
various graph-based methodologies, and discusses their applications in PPI
prediction. We classify these approaches into two primary groups based on their
model structures. The first category employs Graph Neural Networks (GNN) or
Graph Convolutional Networks (GCN), while the second category utilizes Graph
Attention Networks (GAT), Graph Auto-Encoders and Graph-BERT. We highlight the
distinctive methodologies of each approach in managing the graph-structured
data inherent in PPI networks and anticipate future research directions in this
domain.
[LINK]
http://arxiv.org/abs/2404.10450v1
[DATE]
2024-04-16 18:39:25+08:00
[CATEGORIES]
cs.LG
Minerva: A File-Based Ransomware Detector
[AUTHORS]
Dorjan Hitaj, Giulio Pagnotta, Fabio De Gaspari, Lorenzo De Carli, Luigi V. Mancini
[ABSTRACT]
Ransomware attacks have caused billions of dollars in damages in recent
years, and are expected to cause billions more in the future. Consequently,
significant effort has been devoted to ransomware detection and mitigation.
Behavioral-based ransomware detection approaches have garnered considerable
attention recently. These behavioral detectors typically rely on process-based
behavioral profiles to identify malicious behaviors. However, with an
increasing body of literature highlighting the vulnerability of such approaches
to evasion attacks, a comprehensive solution to the ransomware problem remains
elusive. This paper presents Minerva, a novel robust approach to ransomware
detection. Minerva is engineered to be robust by design against evasion
attacks, with architectural and feature selection choices informed by their
resilience to adversarial manipulation. We conduct a comprehensive analysis of
Minerva across a diverse spectrum of ransomware types, encompassing unseen
ransomware as well as variants designed specifically to evade Minerva. Our
evaluation showcases the ability of Minerva to accurately identify ransomware,
generalize to unseen threats, and withstand evasion attacks. Furthermore,
Minerva achieves remarkably low detection times, enabling the adoption of data
loss prevention techniques with near-zero overhead.
[COMMENTS]
14 pages
[LINK]
http://arxiv.org/abs/2301.11050v2
[DATE]
2024-04-16 18:31:27+08:00
[CATEGORIES]
cs.LG
SparseDM: Toward Sparse Efficient Diffusion Models
[AUTHORS]
Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, Jun Zhu
[ABSTRACT]
Diffusion models have been extensively used in data generation tasks and are
recognized as one of the best generative models. However, their time-consuming
deployment, long inference time, and requirements on large memory limit their
application on mobile devices. In this paper, we propose a method based on the
improved Straight-Through Estimator to improve the deployment efficiency of
diffusion models. Specifically, we add sparse masks to the Convolution and
Linear layers in a pre-trained diffusion model, then use design progressive
sparsity for model training in the fine-tuning stage, and switch the inference
mask on and off, which supports a flexible choice of sparsity during inference
according to the FID and MACs requirements. Experiments on four datasets
conducted on a state-of-the-art Transformer-based diffusion model demonstrate
that our method reduces MACs by $50\%$ while increasing FID by only 1.5 on
average. Under other MACs conditions, the FID is also lower than 1$\sim$137
compared to other methods.
[LINK]
http://arxiv.org/abs/2404.10445v1
[DATE]
2024-04-16 18:31:06+08:00
[CATEGORIES]
cs.LG
Semi-supervised Fréchet Regression
[AUTHORS]
Rui Qiu, Zhou Yu, Zhenhua Lin
[ABSTRACT]
This paper explores the field of semi-supervised Fr'echet regression, driven
by the significant costs associated with obtaining non-Euclidean labels.
Methodologically, we propose two novel methods: semi-supervised NW Fr'echet
regression and semi-supervised kNN Fr'echet regression, both based on graph
distance acquired from all feature instances. These methods extend the scope of
existing semi-supervised Euclidean regression methods. We establish their
convergence rates with limited labeled data and large amounts of unlabeled
data, taking into account the low-dimensional manifold structure of the feature
space. Through comprehensive simulations across diverse settings and
applications to real data, we demonstrate the superior performance of our
methods over their supervised counterparts. This study addresses existing
research gaps and paves the way for further exploration and advancements in the
field of semi-supervised Fr'echet regression.
[LINK]
http://arxiv.org/abs/2404.10444v1
[DATE]
2024-04-16 18:30:52+08:00
[CATEGORIES]
cs.LG
AGHINT: Attribute-Guided Representation Learning on Heterogeneous Information Networks with Transformer
[AUTHORS]
Jinhui Yuan, Shan Lu, Peibo Duan, Jieyue He
[ABSTRACT]
Recently, heterogeneous graph neural networks (HGNNs) have achieved
impressive success in representation learning by capturing long-range
dependencies and heterogeneity at the node level. However, few existing studies
have delved into the utilization of node attributes in heterogeneous
information networks (HINs). In this paper, we investigate the impact of
inter-node attribute disparities on HGNNs performance within the benchmark
task, i.e., node classification, and empirically find that typical models
exhibit significant performance decline when classifying nodes whose attributes
markedly differ from their neighbors. To alleviate this issue, we propose a
novel Attribute-Guided heterogeneous Information Networks representation
learning model with Transformer (AGHINT), which allows a more effective
aggregation of neighbor node information under the guidance of attributes.
Specifically, AGHINT transcends the constraints of the original graph structure
by directly integrating higher-order similar neighbor features into the
learning process and modifies the message-passing mechanism between nodes based
on their attribute disparities. Extensive experimental results on three
real-world heterogeneous graph benchmarks with target node attributes
demonstrate that AGHINT outperforms the state-of-the-art.
[COMMENTS]
9 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.10443v1
[DATE]
2024-04-16 18:30:48+08:00
[CATEGORIES]
cs.LG
Tree Bandits for Generative Bayes
[AUTHORS]
Sean O’Hagan, Jungeum Kim, Veronika Rockova
[ABSTRACT]
In generative models with obscured likelihood, Approximate Bayesian
Computation (ABC) is often the tool of last resort for inference. However, ABC
demands many prior parameter trials to keep only a small fraction that passes
an acceptance test. To accelerate ABC rejection sampling, this paper develops a
self-aware framework that learns from past trials and errors. We apply
recursive partitioning classifiers on the ABC lookup table to sequentially
refine high-likelihood regions into boxes. Each box is regarded as an arm in a
binary bandit problem treating ABC acceptance as a reward. Each arm has a
proclivity for being chosen for the next ABC evaluation, depending on the prior
distribution and past rejections. The method places more splits in those areas
where the likelihood resides, shying away from low-probability regions destined
for ABC rejections. We provide two versions: (1) ABC-Tree for posterior
sampling, and (2) ABC-MAP for maximum a posteriori estimation. We demonstrate
accurate ABC approximability at much lower simulation cost. We justify the use
of our tree-based bandit algorithms with nearly optimal regret bounds. Finally,
we successfully apply our approach to the problem of masked image
classification using deep generative models.
[LINK]
http://arxiv.org/abs/2404.10436v1
[DATE]
2024-04-16 18:02:36+08:00
[CATEGORIES]
cs.LG
Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI
[AUTHORS]
Hugo Caselles-Dupré, Charles Mellerio, Paul Hérent, Alizée Lopez-Persem, Benoit Béranger, Mathieu Soularue, Pierre Fautrel, Gauthier Vernier, Matthieu Cord
[ABSTRACT]
The reconstruction of images observed by subjects from fMRI data collected
during visual stimuli has made significant strides in the past decade, thanks
to the availability of extensive fMRI datasets and advancements in generative
models for image generation. However, the application of visual reconstruction
has remained limited. Reconstructing visual imagination presents a greater
challenge, with potentially revolutionary applications ranging from aiding
individuals with disabilities to verifying witness accounts in court. The
primary hurdles in this field are the absence of data collection protocols for
visual imagery and the lack of datasets on the subject. Traditionally,
fMRI-to-image relies on data collected from subjects exposed to visual stimuli,
which poses issues for generating visual imagery based on the difference of
brain activity between visual stimulation and visual imagery. For the first
time, we have compiled a substantial dataset (around 6h of scans) on visual
imagery along with a proposed data collection protocol. We then train a
modified version of an fMRI-to-image model and demonstrate the feasibility of
reconstructing images from two modes of imagination: from memory and from pure
imagination. This marks an important step towards creating a technology that
allow direct reconstruction of visual imagery.
[COMMENTS]
Pre-print to be updated. Work in progress
[LINK]
http://arxiv.org/abs/2404.05468v3
[DATE]
2024-04-16 18:02:17+08:00
[CATEGORIES]
cs.LG
AudioProtoPNet: An interpretable deep learning model for bird sound classification
[AUTHORS]
René Heinrich, Bernhard Sick, Christoph Scholz
[ABSTRACT]
Recently, scientists have proposed several deep learning models to monitor
the diversity of bird species. These models can detect bird species with high
accuracy by analyzing acoustic signals. However, traditional deep learning
algorithms are black-box models that provide no insight into their
decision-making process. For domain experts, such as ornithologists, it is
crucial that these models are not only efficient, but also interpretable in
order to be used as assistive tools. In this study, we present an adaption of
the Prototypical Part Network (ProtoPNet) for audio classification that
provides inherent interpretability through its model architecture. Our approach
is based on a ConvNeXt backbone architecture for feature extraction and learns
prototypical patterns for each bird species using spectrograms of the training
data. Classification of new data is done by comparison with these prototypes in
latent space, which simultaneously serve as easily understandable explanations
for the model’s decisions.
[COMMENTS]
Work in progress
[LINK]
http://arxiv.org/abs/2404.10420v1
[DATE]
2024-04-16 17:37:41+08:00
[CATEGORIES]
cs.LG
VDTuner: Automated Performance Tuning for Vector Data Management Systems
[AUTHORS]
Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu
[ABSTRACT]
Vector data management systems (VDMSs) have become an indispensable
cornerstone in large-scale information retrieval and machine learning systems
like large language models. To enhance the efficiency and flexibility of
similarity search, VDMS exposes many tunable index parameters and system
parameters for users to specify. However, due to the inherent characteristics
of VDMS, automatic performance tuning for VDMS faces several critical
challenges, which cannot be well addressed by the existing auto-tuning methods.
In this paper, we introduce VDTuner, a learning-based automatic performance
tuning framework for VDMS, leveraging multi-objective Bayesian optimization.
VDTuner overcomes the challenges associated with VDMS by efficiently exploring
a complex multi-dimensional parameter space without requiring any prior
knowledge. Moreover, it is able to achieve a good balance between search speed
and recall rate, delivering an optimal configuration. Extensive evaluations
demonstrate that VDTuner can markedly improve VDMS performance (14.12% in
search speed and 186.38% in recall rate) compared with default setting, and is
more efficient compared with state-of-the-art baselines (up to 3.57 times
faster in terms of tuning time). In addition, VDTuner is scalable to specific
user preference and cost-aware optimization objective. VDTuner is available
online at https://github.com/tiannuo-yang/VDTuner.
[COMMENTS]
Accepted by ICDE 2024
[LINK]
http://arxiv.org/abs/2404.10413v1
[DATE]
2024-04-16 17:31:19+08:00
[CATEGORIES]
cs.LG
Manifold Gaussian Variational Bayes on the Precision Matrix
[AUTHORS]
Martin Magris, Mostafa Shabani, Alexandros Iosifidis
[ABSTRACT]
We propose an optimization algorithm for Variational Inference (VI) in
complex models. Our approach relies on natural gradient updates where the
variational space is a Riemann manifold. We develop an efficient algorithm for
Gaussian Variational Inference whose updates satisfy the positive definite
constraint on the variational covariance matrix. Our Manifold Gaussian
Variational Bayes on the Precision matrix (MGVBP) solution provides simple
update rules, is straightforward to implement, and the use of the precision
matrix parametrization has a significant computational advantage. Due to its
black-box nature, MGVBP stands as a ready-to-use solution for VI in complex
models. Over five datasets, we empirically validate our feasible approach on
different statistical and econometric models, discussing its performance with
respect to baseline methods.
[LINK]
http://arxiv.org/abs/2210.14598v4
[DATE]
2024-04-16 17:23:24+08:00
[CATEGORIES]
cs.LG
A Phone-based Distributed Ambient Temperature Measurement System with An Efficient Label-free Automated Training Strategy
[AUTHORS]
Dayin Chen, Xiaodan Shi, Haoran Zhang, Xuan Song, Dongxiao Zhang, Yuntian Chen, Jinyue Yan
[ABSTRACT]
Enhancing the energy efficiency of buildings significantly relies on
monitoring indoor ambient temperature. The potential limitations of
conventional temperature measurement techniques, together with the omnipresence
of smartphones, have redirected researchers’ attention towards the exploration
of phone-based ambient temperature estimation technology. Nevertheless,
numerous obstacles remain to be addressed in order to achieve a practical
implementation of this technology. This study proposes a distributed
phone-based ambient temperature estimation system which enables collaboration
between multiple phones to accurately measure the ambient temperature in each
small area of an indoor space. Besides, it offers a secure, efficient, and
cost-effective training strategy to train a new estimation model for each newly
added phone, eliminating the need for manual collection of labeled data. This
innovative training strategy can yield a high-performing estimation model for a
new phone with just 5 data points, requiring only a few iterations. Meanwhile,
by crowdsourcing, our system automatically provides accurate inferred labels
for all newly collected data. We also highlight the potential of integrating
federated learning into our system to ensure privacy protection at the end of
this study. We believe this study has the potential to advance the practical
application of phone-based ambient temperature measurement, facilitating
energy-saving efforts in buildings.
[LINK]
http://arxiv.org/abs/2404.10401v1
[DATE]
2024-04-16 17:03:13+08:00
[CATEGORIES]
cs.LG
Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States
[AUTHORS]
Robert Lefringhausen, Supitsana Srithasan, Armin Lederer, Sandra Hirche
[ABSTRACT]
As control engineering methods are applied to increasingly complex systems,
data-driven approaches for system identification appear as a promising
alternative to physics-based modeling. While the Bayesian approaches prevalent
for safety-critical applications usually rely on the availability of state
measurements, the states of a complex system are often not directly measurable.
It may then be necessary to jointly estimate the dynamics and the latent state,
making the quantification of uncertainties and the design of controllers with
formal performance guarantees considerably more challenging. This paper
proposes a novel method for the computation of an optimal input trajectory for
unknown nonlinear systems with latent states based on a combination of particle
Markov chain Monte Carlo methods and scenario theory. Probabilistic performance
guarantees are derived for the resulting input trajectory, and an approach to
validate the performance of arbitrary control laws is presented. The
effectiveness of the proposed method is demonstrated in a numerical simulation.
[COMMENTS]
Accepted version submitted to the 22nd European Control Conference
[LINK]
http://arxiv.org/abs/2303.17963v3
[DATE]
2024-04-16 16:45:26+08:00
[CATEGORIES]
cs.LG
Neuron-centric Hebbian Learning
[AUTHORS]
Andrea Ferigo, Elia Cunegatti, Giovanni Iacca
[ABSTRACT]
One of the most striking capabilities behind the learning mechanisms of the
brain is the adaptation, through structural and functional plasticity, of its
synapses. While synapses have the fundamental role of transmitting information
across the brain, several studies show that it is the neuron activations that
produce changes on synapses. Yet, most plasticity models devised for artificial
Neural Networks (NNs), e.g., the ABCD rule, focus on synapses, rather than
neurons, therefore optimizing synaptic-specific Hebbian parameters. This
approach, however, increases the complexity of the optimization process since
each synapse is associated to multiple Hebbian parameters. To overcome this
limitation, we propose a novel plasticity model, called Neuron-centric Hebbian
Learning (NcHL), where optimization focuses on neuron- rather than
synaptic-specific Hebbian parameters. Compared to the ABCD rule, NcHL reduces
the parameters from $5W$ to $5N$, being $W$ and $N$ the number of weights and
neurons, and usually $N \ll W$. We also devise a “weightless” NcHL model,
which requires less memory by approximating the weights based on a record of
neuron activations. Our experiments on two robotic locomotion tasks reveal that
NcHL performs comparably to the ABCD rule, despite using up to $\sim97$ times
less parameters, thus allowing for scalable plasticity
[COMMENTS]
Accepted at Genetic and Evolutionary Computation Conference (GECCO
2024)
[LINK]
http://arxiv.org/abs/2403.12076v2
[DATE]
2024-04-16 16:19:47+08:00
[CATEGORIES]
cs.LG
Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments
[AUTHORS]
Yongming Huang, Xiaohu You, Hang Zhan, Shiwen He, Ningning Fu, Wei Xu
[ABSTRACT]
Intelligent communications have played a pivotal role in shaping the
evolution of 6G networks. Native artificial intelligence (AI) within green
communication systems must meet stringent real-time requirements. To achieve
this, deploying lightweight and resource-efficient AI models is necessary.
However, as wireless networks generate a multitude of data fields and
indicators during operation, only a fraction of them imposes significant impact
on the network AI models. Therefore, real-time intelligence of communication
systems heavily relies on a small but critical set of the data that profoundly
influences the performance of network AI models. These challenges underscore
the need for innovative architectures and solutions. In this paper, we propose
a solution, termed the pervasive multi-level (PML) native AI architecture,
which integrates the concept of knowledge graph (KG) into the intelligent
operational manipulations of mobile networks, resulting in the establishment of
a wireless data KG. Leveraging the wireless data KG, we characterize the
massive and complex data collected from wireless communication networks and
analyze the relationships among various data fields. The obtained graph of data
field relations enables the on-demand generation of minimal and effective
datasets, referred to as feature datasets, tailored to specific application
requirements. Consequently, this architecture not only enhances AI training,
inference, and validation processes but also significantly reduces resource
wastage and overhead for communication networks. To implement this
architecture, we have developed a specific solution comprising a
spatio-temporal heterogeneous graph attention neural network model (STREAM) as
well as a feature dataset generation algorithm. Experiments are conducted to
validate the effectiveness of the proposed architecture.
[COMMENTS]
12 pages,11 figures
[LINK]
http://arxiv.org/abs/2404.10365v1
[DATE]
2024-04-16 15:55:34+08:00
[CATEGORIES]
cs.LG
Generating Counterfactual Trajectories with Latent Diffusion Models for Concept Discovery
[AUTHORS]
Payal Varshney, Adriano Lucieri, Christoph Balada, Andreas Dengel, Sheraz Ahmed
[ABSTRACT]
Trustworthiness is a major prerequisite for the safe application of opaque
deep learning models in high-stakes domains like medicine. Understanding the
decision-making process not only contributes to fostering trust but might also
reveal previously unknown decision criteria of complex models that could
advance the state of medical research. The discovery of decision-relevant
concepts from black box models is a particularly challenging task. This study
proposes Concept Discovery through Latent Diffusion-based Counterfactual
Trajectories (CDCT), a novel three-step framework for concept discovery
leveraging the superior image synthesis capabilities of diffusion models. In
the first step, CDCT uses a Latent Diffusion Model (LDM) to generate a
counterfactual trajectory dataset. This dataset is used to derive a
disentangled representation of classification-relevant concepts using a
Variational Autoencoder (VAE). Finally, a search algorithm is applied to
identify relevant concepts in the disentangled latent space. The application of
CDCT to a classifier trained on the largest public skin lesion dataset revealed
not only the presence of several biases but also meaningful biomarkers.
Moreover, the counterfactuals generated within CDCT show better FID scores than
those produced by a previously established state-of-the-art method, while being
12 times more resource-efficient. Unsupervised concept discovery holds great
potential for the application of trustworthy AI and the further development of
human knowledge in various domains. CDCT represents a further step in this
direction.
[COMMENTS]
Submitted to International Conference on Pattern Recognition (ICPR)
2024
[LINK]
http://arxiv.org/abs/2404.10356v1
[DATE]
2024-04-16 15:44:08+08:00
[CATEGORIES]
cs.LG
Rethinking the Graph Polynomial Filter via Positive and Negative Coupling Analysis
[AUTHORS]
Haodong Wen, Bodong Du, Ruixun Liu, Deyu Meng, Xiangyong Cao
[ABSTRACT]
Recently, the optimization of polynomial filters within Spectral Graph Neural
Networks (GNNs) has emerged as a prominent research focus. Existing spectral
GNNs mainly emphasize polynomial properties in filter design, introducing
computational overhead and neglecting the integration of crucial graph
structure information. We argue that incorporating graph information into basis
construction can enhance understanding of polynomial basis, and further
facilitate simplified polynomial filter design. Motivated by this, we first
propose a Positive and Negative Coupling Analysis (PNCA) framework, where the
concepts of positive and negative activation are defined and their respective
and mixed effects are analysed. Then, we explore PNCA from the message
propagation perspective, revealing the subtle information hidden in the
activation process. Subsequently, PNCA is used to analyze the mainstream
polynomial filters, and a novel simple basis that decouples the positive and
negative activation and fully utilizes graph structure information is designed.
Finally, a simple GNN (called GSCNet) is proposed based on the new basis.
Experimental results on the benchmark datasets for node classification verify
that our GSCNet obtains better or comparable results compared with existing
state-of-the-art GNNs while demanding relatively less computational time.
[COMMENTS]
13 pages, 8 figures, 6 tables
[LINK]
http://arxiv.org/abs/2404.10353v1
[DATE]
2024-04-16 15:41:29+08:00
[CATEGORIES]
cs.LG
Asset management, condition monitoring and Digital Twins: damage detection and virtual inspection on a reinforced concrete bridge
[AUTHORS]
Arnulf Hagen, Trond Michael Andersen
[ABSTRACT]
In April 2021 Stava bridge, a main bridge on E6 in Norway, was abruptly
closed for traffic. A structural defect had seriously compromised the bridge
structural integrity. The Norwegian Public Roads Administration (NPRA) closed
it, made a temporary solution and reopened with severe traffic restrictions.
The incident was alerted through what constitutes the bridge Digital Twin
processing data from Internet of Things sensors. The solution was crucial in
online and offline diagnostics, the case demonstrating the value of
technologies to tackle emerging dangerous situations as well as acting
preventively. A critical and rapidly developing damage was detected in time to
stop the development, but not in time to avoid the incident altogether. The
paper puts risk in a broader perspective for an organization responsible for
highway infrastructure. It positions online monitoring and Digital Twins in the
context of Risk- and Condition-Based Maintenance. The situation that arose at
Stava bridge, and how it was detected, analyzed, and diagnosed during virtual
inspection, is described. The case demonstrates how combining physics-based
methods with Machine Learning can facilitate damage detection and diagnostics.
A summary of lessons learnt, both from technical and organizational
perspectives, as well as plans of future work, is presented.
[LINK]
http://arxiv.org/abs/2404.10341v1
[DATE]
2024-04-16 15:24:54+08:00
[CATEGORIES]
cs.LG
Graph neural network-based surrogate modelling for real-time hydraulic prediction of urban drainage networks
[AUTHORS]
Zhiyu Zhang, Chenkaixiang Lu, Wenchong Tian, Zhenliang Liao, Zhiguo Yuan
[ABSTRACT]
Physics-based models are computationally time-consuming and infeasible for
real-time scenarios of urban drainage networks, and a surrogate model is needed
to accelerate the online predictive modelling. Fully-connected neural networks
(NNs) are potential surrogate models, but may suffer from low interpretability
and efficiency in fitting complex targets. Owing to the state-of-the-art
modelling power of graph neural networks (GNNs) and their match with urban
drainage networks in the graph structure, this work proposes a GNN-based
surrogate of the flow routing model for the hydraulic prediction problem of
drainage networks, which regards recent hydraulic states as initial conditions,
and future runoff and control policy as boundary conditions. To incorporate
hydraulic constraints and physical relationships into drainage modelling,
physics-guided mechanisms are designed on top of the surrogate model to
restrict the prediction variables with flow balance and flooding occurrence
constraints. According to case results in a stormwater network, the GNN-based
model is more cost-effective with better hydraulic prediction accuracy than the
NN-based model after equal training epochs, and the designed mechanisms further
limit prediction errors with interpretable domain knowledge. As the model
structure adheres to the flow routing mechanisms and hydraulic constraints in
urban drainage networks, it provides an interpretable and effective solution
for data-driven surrogate modelling. Simultaneously, the surrogate model
accelerates the predictive modelling of urban drainage networks for real-time
use compared with the physics-based model.
[LINK]
http://arxiv.org/abs/2404.10324v1
[DATE]
2024-04-16 15:08:04+08:00
[CATEGORIES]
cs.LG
Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional Systems
[AUTHORS]
Dan MacKinlay, Russell Tsuchida, Dan Pagendam, Petra Kuhnert
[ABSTRACT]
Efficient inference in high-dimensional models remains a central challenge in
machine learning. This paper introduces the Gaussian Ensemble Belief
Propagation (GEnBP) algorithm, a fusion of the Ensemble Kalman filter and
Gaussian belief propagation (GaBP) methods. GEnBP updates ensembles by passing
low-rank local messages in a graphical model structure. This combination
inherits favourable qualities from each method. Ensemble techniques allow GEnBP
to handle high-dimensional states, parameters and intricate, noisy, black-box
generation processes. The use of local messages in a graphical model structure
ensures that the approach is suited to distributed computing and can
efficiently handle complex dependence structures. GEnBP is particularly
advantageous when the ensemble size is considerably smaller than the inference
dimension. This scenario often arises in fields such as spatiotemporal
modelling, image processing and physical model inversion. GEnBP can be applied
to general problem structures, including jointly learning system parameters,
observation parameters, and latent state variables.
[COMMENTS]
Under conference submission
[LINK]
http://arxiv.org/abs/2402.08193v2
[DATE]
2024-04-16 15:05:13+08:00
[CATEGORIES]
cs.LG
Awareness of uncertainty in classification using a multivariate model and multi-views
[AUTHORS]
Alexey Kornaev, Elena Kornaeva, Oleg Ivanov, Ilya Pershin, Danis Alukaev
[ABSTRACT]
One of the ways to make artificial intelligence more natural is to give it
some room for doubt. Two main questions should be resolved in that way. First,
how to train a model to estimate uncertainties of its own predictions? And
then, what to do with the uncertain predictions if they appear? First, we
proposed an uncertainty-aware negative log-likelihood loss for the case of
N-dimensional multivariate normal distribution with spherical variance matrix
to the solution of N-classes classification tasks. The loss is similar to the
heteroscedastic regression loss. The proposed model regularizes uncertain
predictions, and trains to calculate both the predictions and their uncertainty
estimations. The model fits well with the label smoothing technique. Second, we
expanded the limits of data augmentation at the training and test stages, and
made the trained model to give multiple predictions for a given number of
augmented versions of each test sample. Given the multi-view predictions
together with their uncertainties and confidences, we proposed several methods
to calculate final predictions, including mode values and bin counts with soft
and hard weights. For the latter method, we formalized the model tuning task in
the form of multimodal optimization with non-differentiable criteria of maximum
accuracy, and applied particle swarm optimization to solve the tuning task. The
proposed methodology was tested using CIFAR-10 dataset with clean and noisy
labels and demonstrated good results in comparison with other uncertainty
estimation methods related to sample selection, co-teaching, and label
smoothing.
[LINK]
http://arxiv.org/abs/2404.10314v1
[DATE]
2024-04-16 14:40:51+08:00
[CATEGORIES]
cs.LG
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
[AUTHORS]
Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, Jinwoo Shin
[ABSTRACT]
Large language models (LLMs) have shown remarkable performance in various
natural language processing tasks. However, a primary constraint they face is
the context limit, i.e., the maximum number of tokens they can process.
Previous works have explored architectural changes and modifications in
positional encoding to relax the constraint, but they often require expensive
training or do not address the computational demands of self-attention. In this
paper, we present Hierarchical cOntext MERging (HOMER), a new training-free
scheme designed to overcome the limitations. HOMER uses a divide-and-conquer
algorithm, dividing long inputs into manageable chunks. Each chunk is then
processed collectively, employing a hierarchical strategy that merges adjacent
chunks at progressive transformer layers. A token reduction technique precedes
each merging, ensuring memory usage efficiency. We also propose an optimized
computational order reducing the memory requirement to logarithmically scale
with respect to input length, making it especially favorable for environments
with tight memory restrictions. Our experiments demonstrate the proposed
method’s superior performance and memory efficiency, enabling the broader use
of LLMs in contexts requiring extended context. Code is available at
https://github.com/alinlab/HOMER.
[COMMENTS]
Accepted to ICLR 2024. The first two authors contributed equally
[LINK]
http://arxiv.org/abs/2404.10308v1
[DATE]
2024-04-16 14:34:08+08:00
[CATEGORIES]
cs.LG
LLM-Powered Test Case Generation for Detecting Tricky Bugs
[AUTHORS]
Kaibo Liu, Yiyang Liu, Zhenpeng Chen, Jie M. Zhang, Yudong Han, Yun Ma, Ge Li, Gang Huang
[ABSTRACT]
Conventional automated test generation tools struggle to generate test
oracles and tricky bug-revealing test inputs. Large Language Models (LLMs) can
be prompted to produce test inputs and oracles for a program directly, but the
precision of the tests can be very low for complex scenarios (only 6.3% based
on our experiments). To fill this gap, this paper proposes AID, which combines
LLMs with differential testing to generate fault-revealing test inputs and
oracles targeting plausibly correct programs (i.e., programs that have passed
all the existing tests). In particular, AID selects test inputs that yield
diverse outputs on a set of program variants generated by LLMs, then constructs
the test oracle based on the outputs. We evaluate AID on two large-scale
datasets with tricky bugs: TrickyBugs and EvalPlus, and compare it with three
state-of-the-art baselines. The evaluation results show that the recall,
precision, and F1 score of AID outperform the state-of-the-art by up to 1.80x,
2.65x, and 1.66x, respectively.
[LINK]
http://arxiv.org/abs/2404.10304v1
[DATE]
2024-04-16 14:20:06+08:00
[CATEGORIES]
cs.LG
Long-form music generation with latent diffusion
[AUTHORS]
Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons
[ABSTRACT]
Audio-based generative models for music have seen great strides recently, but
so far have not managed to produce full-length music tracks with coherent
musical structure. We show that by training a generative model on long temporal
contexts it is possible to produce long-form music of up to 4m45s. Our model
consists of a diffusion-transformer operating on a highly downsampled
continuous latent representation (latent rate of 21.5Hz). It obtains
state-of-the-art generations according to metrics on audio quality and prompt
alignment, and subjective tests reveal that it produces full-length music with
coherent structure.
[LINK]
http://arxiv.org/abs/2404.10301v1
[DATE]
2024-04-16 14:09:33+08:00
[CATEGORIES]
cs.LG
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
[AUTHORS]
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
[ABSTRACT]
The growing demand for Large Language Models (LLMs) in applications such as
content generation, intelligent chatbots, and sentiment analysis poses
considerable challenges for LLM service providers. To efficiently use GPU
resources and boost throughput, batching multiple requests has emerged as a
popular paradigm; to further speed up batching, LLM quantization techniques
reduce memory consumption and increase computing capacity. However, prevalent
quantization schemes (e.g., 8-bit weight-activation quantization) cannot fully
leverage the capabilities of modern GPUs, such as 4-bit integer operators,
resulting in sub-optimal performance.
To maximize LLMs’ serving throughput, we introduce Atom, a low-bit
quantization method that achieves high throughput improvements with negligible
accuracy loss. Atom significantly boosts serving throughput by using low-bit
operators and considerably reduces memory consumption via low-bit quantization.
It attains high accuracy by applying a novel mixed-precision and fine-grained
quantization process. We evaluate Atom on 4-bit weight-activation quantization
in the serving context. Atom improves end-to-end throughput (token/s) by up to
$7.7\times$ compared to the FP16 and by $2.5\times$ compared to INT8
quantization, while maintaining the same latency target.
[LINK]
http://arxiv.org/abs/2310.19102v3
[DATE]
2024-04-16 14:08:05+08:00
[CATEGORIES]
cs.LG
Human-in-the-Loop Segmentation of Multi-species Coral Imagery
[AUTHORS]
Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer
[ABSTRACT]
Broad-scale marine surveys performed by underwater vehicles significantly
increase the availability of coral reef imagery, however it is costly and
time-consuming for domain experts to label images. Point label propagation is
an approach used to leverage existing image data labeled with sparse point
labels. The resulting augmented ground truth generated is then used to train a
semantic segmentation model. Here, we first demonstrate that recent advances in
foundation models enable generation of multi-species coral augmented ground
truth masks using denoised DINOv2 features and K-Nearest Neighbors (KNN),
without the need for any pre-training or custom-designed algorithms. For
extremely sparsely labeled images, we propose a labeling regime based on
human-in-the-loop principles, resulting in significant improvement in
annotation efficiency: If only 5 point labels per image are available, our
proposed human-in-the-loop approach improves on the state-of-the-art by 17.3%
for pixel accuracy and 22.6% for mIoU; and by 10.6% and 19.1% when 10 point
labels per image are available. Even if the human-in-the-loop labeling regime
is not used, the denoised DINOv2 features with a KNN outperforms the prior
state-of-the-art by 3.5% for pixel accuracy and 5.7% for mIoU (5 grid points).
We also provide a detailed analysis of how point labeling style and the
quantity of points per image affects the point label propagation quality and
provide general recommendations on maximizing point label efficiency.
[COMMENTS]
Accepted at the CVPR2024 3rd Workshop on Learning with Limited
Labelled Data for Image and Video Understanding (L3D-IVU), 10 pages, 6
figures, an additional 4 pages of supplementary material
[LINK]
http://arxiv.org/abs/2404.09406v2
[DATE]
2024-04-16 13:58:39+08:00
[CATEGORIES]
cs.LG
Clustering and Data Augmentation to Improve Accuracy of Sleep Assessment and Sleep Individuality Analysis
[AUTHORS]
Shintaro Tamai, Masayuki Numao, Ken-ichi Fukui
[ABSTRACT]
Recently, growing health awareness, novel methods allow individuals to
monitor sleep at home. Utilizing sleep sounds offers advantages over
conventional methods like smartwatches, being non-intrusive, and capable of
detecting various physiological activities. This study aims to construct a
machine learning-based sleep assessment model providing evidence-based
assessments, such as poor sleep due to frequent movement during sleep onset.
Extracting sleep sound events, deriving latent representations using VAE,
clustering with GMM, and training LSTM for subjective sleep assessment achieved
a high accuracy of 94.8% in distinguishing sleep satisfaction. Moreover,
TimeSHAP revealed differences in impactful sound event types and timings for
different individuals.
[LINK]
http://arxiv.org/abs/2404.10299v1
[DATE]
2024-04-16 13:56:41+08:00
[CATEGORIES]
cs.LG
Engineering software 2.0 by interpolating neural networks: unifying training, solving, and calibration
[AUTHORS]
Chanwook Park, Sourav Saha, Jiachen Guo, Xiaoyu Xie, Satyajit Mojumder, Miguel A. Bessa, Dong Qian, Wei Chen, Gregory J. Wagner, Jian Cao, Wing Kam Liu
[ABSTRACT]
The evolution of artificial intelligence (AI) and neural network theories has
revolutionized the way software is programmed, shifting from a hard-coded
series of codes to a vast neural network. However, this transition in
engineering software has faced challenges such as data scarcity, multi-modality
of data, low model accuracy, and slow inference. Here, we propose a new network
based on interpolation theories and tensor decomposition, the interpolating
neural network (INN). Instead of interpolating training data, a common notion
in computer science, INN interpolates interpolation points in the physical
space whose coordinates and values are trainable. It can also extrapolate if
the interpolation points reside outside of the range of training data and the
interpolation functions have a larger support domain. INN features orders of
magnitude fewer trainable parameters, faster training, a smaller memory
footprint, and higher model accuracy compared to feed-forward neural networks
(FFNN) or physics-informed neural networks (PINN). INN is poised to usher in
Engineering Software 2.0, a unified neural network that spans various domains
of space, time, parameters, and initial/boundary conditions. This has
previously been computationally prohibitive due to the exponentially growing
number of trainable parameters, easily exceeding the parameter size of ChatGPT,
which is over 1 trillion. INN addresses this challenge by leveraging tensor
decomposition and tensor product, with adaptable network architecture.
[COMMENTS]
9 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.10296v1
[DATE]
2024-04-16 13:40:30+08:00
[CATEGORIES]
cs.LG
Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
[AUTHORS]
Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu
[ABSTRACT]
Inductive biases are crucial in disentangled representation learning for
narrowing down an underspecified solution set. In this work, we consider
endowing a neural network autoencoder with three select inductive biases from
the literature: data compression into a grid-like latent space via
quantization, collective independence amongst latents, and minimal functional
influence of any latent on how other latents determine data generation. In
principle, these inductive biases are deeply complementary: they most directly
specify properties of the latent space, encoder, and decoder, respectively. In
practice, however, naively combining existing techniques instantiating these
inductive biases fails to yield significant benefits. To address this, we
propose adaptations to the three techniques that simplify the learning problem,
equip key regularization terms with stabilizing invariances, and quash
degenerate incentives. The resulting model, Tripod, achieves state-of-the-art
results on a suite of four image disentanglement benchmarks. We also verify
that Tripod significantly improves upon its naive incarnation and that all
three of its “legs” are necessary for best performance.
[COMMENTS]
22 pages, 10 figures, code available at
https://github.com/kylehkhsu/tripod
[LINK]
http://arxiv.org/abs/2404.10282v1
[DATE]
2024-04-16 12:52:41+08:00
[CATEGORIES]
cs.LG
DE-HNN: An effective neural model for Circuit Netlist representation
[AUTHORS]
Zhishang Luo, Truong Son Hy, Puoya Tabaghi, Donghyeon Koh, Michael Defferrard, Elahe Rezaei, Ryan Carey, Rhett Davis, Rajeev Jain, Yusu Wang
[ABSTRACT]
The run-time for optimization tools used in chip design has grown with the
complexity of designs to the point where it can take several days to go through
one design cycle which has become a bottleneck. Designers want fast tools that
can quickly give feedback on a design. Using the input and output data of the
tools from past designs, one can attempt to build a machine learning model that
predicts the outcome of a design in significantly shorter time than running the
tool. The accuracy of such models is affected by the representation of the
design data, which is usually a netlist that describes the elements of the
digital circuit and how they are connected. Graph representations for the
netlist together with graph neural networks have been investigated for such
models. However, the characteristics of netlists pose several challenges for
existing graph learning frameworks, due to the large number of nodes and the
importance of long-range interactions between nodes. To address these
challenges, we represent the netlist as a directed hypergraph and propose a
Directional Equivariant Hypergraph Neural Network (DE-HNN) for the effective
learning of (directed) hypergraphs. Theoretically, we show that our DE-HNN can
universally approximate any node or hyperedge based function that satisfies
certain permutation equivariant and invariant properties natural for directed
hypergraphs. We compare the proposed DE-HNN with several State-of-the-art
(SOTA) machine learning models for (hyper)graphs and netlists, and show that
the DE-HNN significantly outperforms them in predicting the outcome of
optimized place-and-route tools directly from the input netlists. Our source
code and the netlists data used are publicly available at
https://github.com/YusuLab/chips.git
[LINK]
http://arxiv.org/abs/2404.00477v3
[DATE]
2024-04-16 12:47:23+08:00
[CATEGORIES]
cs.LG
Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous Graphs
[AUTHORS]
Pengfei Ding, Yan Wang, Guanfeng Liu, Nan Wang, Xiaofang Zhou
[ABSTRACT]
Heterogeneous graph few-shot learning (HGFL) has been developed to address
the label sparsity issue in heterogeneous graphs (HGs), which consist of
various types of nodes and edges. The core concept of HGFL is to extract
knowledge from rich-labeled classes in a source HG, transfer this knowledge to
a target HG to facilitate learning new classes with few-labeled training data,
and finally make predictions on unlabeled testing data. Existing methods
typically assume that the source HG, training data, and testing data all share
the same distribution. However, in practice, distribution shifts among these
three types of data are inevitable due to two reasons: (1) the limited
availability of the source HG that matches the target HG distribution, and (2)
the unpredictable data generation mechanism of the target HG. Such distribution
shifts result in ineffective knowledge transfer and poor learning performance
in existing methods, thereby leading to a novel problem of out-of-distribution
(OOD) generalization in HGFL. To address this challenging problem, we propose a
novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In
COHF, we first characterize distribution shifts in HGs with a structural causal
model, establishing an invariance principle for OOD generalization in HGFL.
Then, following this invariance principle, we propose a new variational
autoencoder-based heterogeneous graph neural network to mitigate the impact of
distribution shifts. Finally, by integrating this network with a novel
meta-learning framework, COHF effectively transfers knowledge to the target HG
to predict new classes with few-labeled data. Extensive experiments on seven
real-world datasets have demonstrated the superior performance of COHF over the
state-of-the-art methods.
[LINK]
http://arxiv.org/abs/2401.03597v3
[DATE]
2024-04-16 12:36:18+08:00
[CATEGORIES]
cs.LG
OptiGrad: A Fair and more Efficient Price Elasticity Optimization via a Gradient Based Learning
[AUTHORS]
Vincent Grari, Marcin Detyniecki
[ABSTRACT]
This paper presents a novel approach to optimizing profit margins in non-life
insurance markets through a gradient descent-based method, targeting three key
objectives: 1) maximizing profit margins, 2) ensuring conversion rates, and 3)
enforcing fairness criteria such as demographic parity (DP). Traditional
pricing optimization, which heavily lean on linear and semi definite
programming, encounter challenges in balancing profitability and fairness.
These challenges become especially pronounced in situations that necessitate
continuous rate adjustments and the incorporation of fairness criteria.
Specifically, indirect Ratebook optimization, a widely-used method for new
business price setting, relies on predictor models such as XGBoost or GLMs/GAMs
to estimate on downstream individually optimized prices. However, this strategy
is prone to sequential errors and struggles to effectively manage optimizations
for continuous rate scenarios. In practice, to save time actuaries frequently
opt for optimization within discrete intervals (e.g., range of [-20\%, +20\%]
with fix increments) leading to approximate estimations. Moreover, to
circumvent infeasible solutions they often use relaxed constraints leading to
suboptimal pricing strategies. The reverse-engineered nature of traditional
models complicates the enforcement of fairness and can lead to biased outcomes.
Our method addresses these challenges by employing a direct optimization
strategy in the continuous space of rates and by embedding fairness through an
adversarial predictor model. This innovation not only reduces sequential errors
and simplifies the complexities found in traditional models but also directly
integrates fairness measures into the commercial premium calculation. We
demonstrate improved margin performance and stronger enforcement of fairness
highlighting the critical need to evolve existing pricing strategies.
[COMMENTS]
17 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.10275v1
[DATE]
2024-04-16 12:21:59+08:00
[CATEGORIES]
cs.LG
Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling
[AUTHORS]
Tara Kelly, Jessica Gupta
[ABSTRACT]
Traffic congestion at intersections is a significant issue in urban areas,
leading to increased commute times, safety hazards, and operational
inefficiencies. This study aims to develop a predictive model for congestion at
intersections in major U.S. cities, utilizing a dataset of trip-logging metrics
from commercial vehicles across 4,800 intersections. The dataset encompasses 27
features, including intersection coordinates, street names, time of day, and
traffic metrics (Kashyap et al., 2019). Additional features, such as
rainfall/snowfall percentage, distance from downtown and outskirts, and road
types, were incorporated to enhance the model’s predictive power. The
methodology involves data exploration, feature transformation, and handling
missing values through low-rank models and label encoding. The proposed model
has the potential to assist city planners and governments in anticipating
traffic hot spots, optimizing operations, and identifying infrastructure
challenges.
[LINK]
http://arxiv.org/abs/2404.08838v2
[DATE]
2024-04-16 12:20:32+08:00
[CATEGORIES]
cs.LG
Sparse Attention Regression Network Based Soil Fertility Prediction With Ummaso
[AUTHORS]
R V Raghavendra Rao, U Srinivasulu Reddy
[ABSTRACT]
The challenge of imbalanced soil nutrient datasets significantly hampers
accurate predictions of soil fertility. To tackle this, a new method is
suggested in this research, combining Uniform Manifold Approximation and
Projection (UMAP) with Least Absolute Shrinkage and Selection Operator (LASSO).
The main aim is to counter the impact of uneven data distribution and improve
soil fertility models’ predictive precision. The model introduced uses Sparse
Attention Regression, effectively incorporating pertinent features from the
imbalanced dataset. UMAP is utilized initially to reduce data complexity,
unveiling hidden structures and important patterns. Following this, LASSO is
applied to refine features and enhance the model’s interpretability. The
experimental outcomes highlight the effectiveness of the UMAP and LASSO hybrid
approach. The proposed model achieves outstanding performance metrics, reaching
a predictive accuracy of 98%, demonstrating its capability in accurate soil
fertility predictions. Additionally, it showcases a Precision of 91.25%,
indicating its adeptness in identifying fertile soil instances accurately. The
Recall metric stands at 90.90%, emphasizing the model’s ability to capture true
positive cases effectively.
[LINK]
http://arxiv.org/abs/2404.10274v1
[DATE]
2024-04-16 12:17:17+08:00
[CATEGORIES]
cs.LG
Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach
[AUTHORS]
Juan C. Mejía-Fragoso, Manuel A. Florez, Rocío Bernal-Olaya
[ABSTRACT]
Accurate determination of the geothermal gradient is critical for assessing
the geothermal energy potential of a given region. Of particular interest is
the case of Colombia, a country with abundant geothermal resources. A history
of active oil and gas exploration and production has left drilled boreholes in
different geological settings, providing direct measurements of the geothermal
gradient. Unfortunately, large regions of the country where geothermal
resources might exist lack such measurements. Indirect geophysical measurements
are costly and difficult to perform at regional scales. Computational thermal
models could be constructed, but they require very detailed knowledge of the
underlying geology and uniform sampling of subsurface temperatures to be
well-constrained. We present an alternative approach that leverages recent
advances in supervised machine learning and available direct measurements to
predict the geothermal gradient in regions where only global-scale geophysical
datasets and course geological knowledge are available. We find that a Gradient
Boosted Regression Tree algorithm yields optimal predictions and extensively
validate the trained model. We show that predictions of our model are within
12\% accuracy and that independent measurements performed by other authors
agree well with our model. Finnally, we present a geothermal gradient map for
Colombia that highlights regions where futher exploration and data collection
should be performed.
[COMMENTS]
This is the version we re-submitted to the journal after addressing
all the peer review requirements
[LINK]
http://arxiv.org/abs/2404.05184v3
[DATE]
2024-04-16 11:48:27+08:00
[CATEGORIES]
cs.LG
Federated Multi-Task Learning on Non-IID Data Silos: An Experimental Study
[AUTHORS]
Yuwen Yang, Yuxiang Lu, Suizhi Huang, Shalayiding Sirejiding, Hongtao Lu, Yue Ding
[ABSTRACT]
The innovative Federated Multi-Task Learning (FMTL) approach consolidates the
benefits of Federated Learning (FL) and Multi-Task Learning (MTL), enabling
collaborative model training on multi-task learning datasets. However, a
comprehensive evaluation method, integrating the unique features of both FL and
MTL, is currently absent in the field. This paper fills this void by
introducing a novel framework, FMTL-Bench, for systematic evaluation of the
FMTL paradigm. This benchmark covers various aspects at the data, model, and
optimization algorithm levels, and comprises seven sets of comparative
experiments, encapsulating a wide array of non-independent and identically
distributed (Non-IID) data partitioning scenarios. We propose a systematic
process for comparing baselines of diverse indicators and conduct a case study
on communication expenditure, time, and energy consumption. Through our
exhaustive experiments, we aim to provide valuable insights into the strengths
and limitations of existing baseline methods, contributing to the ongoing
discourse on optimal FMTL application in practical scenarios. The source code
can be found on https://github.com/youngfish42/FMTL-Benchmark .
[COMMENTS]
Accepted by ICMR’24
[LINK]
http://arxiv.org/abs/2402.12876v2
[DATE]
2024-04-16 11:48:17+08:00
[CATEGORIES]
cs.LG
Attention-based Shape-Deformation Networks for Artifact-Free Geometry Reconstruction of Lumbar Spine from MR Images
[AUTHORS]
Linchen Qian, Jiasong Chen, Linhai Ma, Timur Urakov, Weiyong Gu, Liang Liang
[ABSTRACT]
Lumbar disc degeneration, a progressive structural wear and tear of lumbar
intervertebral disc, is regarded as an essential role on low back pain, a
significant global health concern. Automated lumbar spine geometry
reconstruction from MR images will enable fast measurement of medical
parameters to evaluate the lumbar status, in order to determine a suitable
treatment. Existing image segmentation-based techniques often generate
erroneous segments or unstructured point clouds, unsuitable for medical
parameter measurement. In this work, we present TransDeformer: a novel
attention-based deep learning approach that reconstructs the geometry of the
lumbar spine with high spatial accuracy and mesh correspondence across
patients, and we also present a variant of TransDeformer for error estimation.
Specially, we devise new attention modules with a new attention formula, which
integrate image features and tokenized contour features to predict the
displacements of the points on a shape template without the need for image
segmentation. The deformed template reveals the lumbar spine geometry in an
image. Experiment results show that our TransDeformer generates artifact-free
geometry outputs, and its variant predicts the error of a reconstructed
geometry. Our code is available at
https://github.com/linchenq/TransDeformer-Mesh.
[LINK]
http://arxiv.org/abs/2404.00231v2
[DATE]
2024-04-16 11:38:31+08:00
[CATEGORIES]
cs.LG
Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport
[AUTHORS]
Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac
[ABSTRACT]
In this paper, we tackle Multi-Source Domain Adaptation (MSDA), a task in
transfer learning where one adapts multiple heterogeneous, labeled source
probability measures towards a different, unlabeled target measure. We propose
a novel framework for MSDA, based on Optimal Transport (OT) and Gaussian
Mixture Models (GMMs). Our framework has two key advantages. First, OT between
GMMs can be solved efficiently via linear programming. Second, it provides a
convenient model for supervised learning, especially classification, as
components in the GMM can be associated with existing classes. Based on the
GMM-OT problem, we propose a novel technique for calculating barycenters of
GMMs. Based on this novel algorithm, we propose two new strategies for MSDA:
GMM-WBT and GMM-DaDiL. We empirically evaluate our proposed methods on four
benchmarks in image classification and fault diagnosis, showing that we improve
over the prior art while being faster and involving fewer parameters.
[COMMENTS]
Under review
[LINK]
http://arxiv.org/abs/2404.10261v1
[DATE]
2024-04-16 11:31:28+08:00
[CATEGORIES]
cs.LG
The Marginal Value of Momentum for Small Learning Rate SGD
[AUTHORS]
Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
[ABSTRACT]
Momentum is known to accelerate the convergence of gradient descent in
strongly convex settings without stochastic gradient noise. In stochastic
optimization, such as training neural networks, folklore suggests that momentum
may help deep learning optimization by reducing the variance of the stochastic
gradient update, but previous theoretical analyses do not find momentum to
offer any provable acceleration. Theoretical results in this paper clarify the
role of momentum in stochastic settings where the learning rate is small and
gradient noise is the dominant source of instability, suggesting that SGD with
and without momentum behave similarly in the short and long time horizons.
Experiments show that momentum indeed has limited benefits for both
optimization and generalization in practical training regimes where the optimal
learning rate is not very large, including small- to medium-batch training from
scratch on ImageNet and fine-tuning language models on downstream tasks.
[LINK]
http://arxiv.org/abs/2307.15196v2
[DATE]
2024-04-16 11:25:54+08:00
[CATEGORIES]
cs.LG
Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration
[AUTHORS]
Kotaro Yoshida, Hiroki Naganuma
[ABSTRACT]
Machine learning models traditionally assume that training and test data are
independently and identically distributed. However, in real-world applications,
the test distribution often differs from training. This problem, known as
out-of-distribution generalization, challenges conventional models. Invariant
Risk Minimization (IRM) emerges as a solution, aiming to identify features
invariant across different environments to enhance out-of-distribution
robustness. However, IRM’s complexity, particularly its bi-level optimization,
has led to the development of various approximate methods. Our study
investigates these approximate IRM techniques, employing the Expected
Calibration Error (ECE) as a key metric. ECE, which measures the reliability of
model prediction, serves as an indicator of whether models effectively capture
environment-invariant features. Through a comparative analysis of datasets with
distributional shifts, we observe that Information Bottleneck-based IRM, which
condenses representational information, achieves a balance in improving ECE
while preserving accuracy relatively. This finding is pivotal, as it
demonstrates a feasible path to maintaining robustness without compromising
accuracy. Nonetheless, our experiments also caution against
over-regularization, which can diminish accuracy. This underscores the
necessity for a systematic approach in evaluating out-of-distribution
generalization metrics, one that beyond mere accuracy to address the nuanced
interplay between accuracy and calibration.
[LINK]
http://arxiv.org/abs/2401.17541v3
[DATE]
2024-04-16 11:15:45+08:00
[CATEGORIES]
cs.LG
Multi-Constraint Safe RL with Objective Suppression for Safety-Critical Applications
[AUTHORS]
Zihan Zhou, Jonathan Booher, Khashayar Rohanimanesh, Wei Liu, Aleksandr Petiushko, Animesh Garg
[ABSTRACT]
Safe reinforcement learning tasks with multiple constraints are a challenging
domain despite being very common in the real world. In safety-critical domains,
properly handling the constraints becomes even more important. To address this
challenge, we first describe the multi-constraint problem with a stronger
Uniformly Constrained MDP (UCMDP) model; we then propose Objective Suppression,
a novel method that adaptively suppresses the task reward maximizing objectives
according to a safety critic, as a solution to the Lagrangian dual of a UCMDP.
We benchmark Objective Suppression in two multi-constraint safety domains,
including an autonomous driving domain where any incorrect behavior can lead to
disastrous consequences. Empirically, we demonstrate that our proposed method,
when combined with existing safe RL algorithms, can match the task reward
achieved by our baselines with significantly fewer constraint violations.
[LINK]
http://arxiv.org/abs/2402.15650v2
[DATE]
2024-04-16 11:00:51+08:00
[CATEGORIES]
cs.LG
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach
[AUTHORS]
Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou
[ABSTRACT]
In this paper, we propose an online convex optimization approach with two
different levels of adaptivity. On a higher level, our approach is agnostic to
the unknown types and curvatures of the online functions, while at a lower
level, it can exploit the unknown niceness of the environments and attain
problem-dependent guarantees. Specifically, we obtain $\mathcal{O}(\log V_T)$,
$\mathcal{O}(d \log V_T)$ and $\hat{\mathcal{O}}(\sqrt{V_T})$ regret bounds for
strongly convex, exp-concave and convex loss functions, respectively, where $d$
is the dimension, $V_T$ denotes problem-dependent gradient variations and the
$\hat{\mathcal{O}}(\cdot)$-notation omits $\log V_T$ factors. Our result not
only safeguards the worst-case guarantees but also directly implies the
small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic
convex optimization and game theory problems, our result enhances the existing
universal guarantees. Our approach is based on a multi-layer online ensemble
framework incorporating novel ingredients, including a carefully designed
optimism for unifying diverse function types and cascaded corrections for
algorithmic stability. Notably, despite its multi-layer structure, our
algorithm necessitates only one gradient query per round, making it favorable
when the gradient evaluation is time-consuming. This is facilitated by a novel
regret decomposition equipped with carefully designed surrogate losses.
[COMMENTS]
NeurIPS 2023
[LINK]
http://arxiv.org/abs/2307.08360v3
[DATE]
2024-04-16 10:58:34+08:00
[CATEGORIES]
cs.LG
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
[AUTHORS]
Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw
[ABSTRACT]
Featurizing microscopy images for use in biological research remains a
significant challenge, especially for large-scale experiments spanning millions
of images. This work explores the scaling properties of weakly supervised
classifiers and self-supervised masked autoencoders (MAEs) when training with
increasingly larger model backbones and microscopy datasets. Our results show
that ViT-based MAEs outperform weakly supervised classifiers on a variety of
tasks, achieving as much as a 11.5% relative improvement when recalling known
biological relationships curated from public databases. Additionally, we
develop a new channel-agnostic MAE architecture (CA-MAE) that allows for
inputting images of different numbers and orders of channels at inference time.
We demonstrate that CA-MAEs effectively generalize by inferring and evaluating
on a microscopy image dataset (JUMP-CP) generated under different experimental
conditions with a different channel structure than our pretraining data
(RPI-93M). Our findings motivate continued research into scaling
self-supervised learning on microscopy data in order to create powerful
foundation models of cellular biology that have the potential to catalyze
advancements in drug discovery and beyond.
[COMMENTS]
CVPR 2024 Highlight. arXiv admin note: text overlap with
arXiv:2309.16064
[LINK]
http://arxiv.org/abs/2404.10242v1
[DATE]
2024-04-16 10:42:06+08:00
[CATEGORIES]
cs.LG
Harmonizing SO(3)-Equivariance with Neural Expressiveness: a Hybrid Deep Learning Framework Oriented to the Prediction of Electronic Structure Hamiltonian
[AUTHORS]
Shi Yin, Xinyang Pan, Xudong Zhu, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He
[ABSTRACT]
Deep learning for predicting the electronic structure Hamiltonian of quantum
systems necessitates satisfying the covariance laws, among which achieving
SO(3)-equivariance without sacrificing the non-linear expressive capability of
networks remains unsolved. To navigate the harmonization between equivariance
and expressiveness, we propose a deep learning method synergizing two distinct
categories of neural mechanisms as a two-stage cascaded regression framework.
The first stage corresponds to group theory-based neural mechanisms with
inherent SO(3)-equivariant properties prior to the parameter learning process,
while the second stage is characterized by a non-linear 3D graph Transformer
network we propose featuring high capability on non-linear expressiveness. The
novel combination lies in the point that, the first stage predicts baseline
Hamiltonians with abundant SO(3)-equivariant features extracted, assisting the
second stage in empirical learning of equivariance; and in turn, the second
stage refines the first stage’s output as a fine-grained prediction of
Hamiltonians using powerful non-linear neural mappings, compensating for the
intrinsic weakness on non-linear expressiveness capability of mechanisms in the
first stage. Our method enables precise, generalizable predictions while
maintaining robust SO(3)-equivariance under rotational transformations, and
achieves state-of-the-art performance in Hamiltonian prediction on six
benchmark databases.
[LINK]
http://arxiv.org/abs/2401.00744v8
[DATE]
2024-04-16 10:04:29+08:00
[CATEGORIES]
cs.LG
LatticeML: A data-driven application for predicting the effective Young Modulus of high temperature graph based architected materials
[AUTHORS]
Akshansh Mishra
[ABSTRACT]
Architected materials with their unique topology and geometry offer the
potential to modify physical and mechanical properties. Machine learning can
accelerate the design and optimization of these materials by identifying
optimal designs and forecasting performance. This work presents LatticeML, a
data-driven application for predicting the effective Young’s Modulus of
high-temperature graph-based architected materials. The study considers eleven
graph-based lattice structures with two high-temperature alloys, Ti-6Al-4V and
Inconel 625. Finite element simulations were used to compute the effective
Young’s Modulus of the 2x2x2 unit cell configurations. A machine learning
framework was developed to predict Young’s Modulus, involving data collection,
preprocessing, implementation of regression models, and deployment of the
best-performing model. Five supervised learning algorithms were evaluated, with
the XGBoost Regressor achieving the highest accuracy (MSE = 2.7993, MAE =
1.1521, R-squared = 0.9875). The application uses the Streamlit framework to
create an interactive web interface, allowing users to input material and
geometric parameters and obtain predicted Young’s Modulus values.
[COMMENTS]
32 pages, 11 figures
[LINK]
http://arxiv.org/abs/2404.09470v2
[DATE]
2024-04-16 09:52:45+08:00
[CATEGORIES]
cs.LG
Anomaly Correction of Business Processes Using Transformer Autoencoder
[AUTHORS]
Ziyou Gong, Xianwen Fang, Ping Wu
[ABSTRACT]
Event log records all events that occur during the execution of business
processes, so detecting and correcting anomalies in event log can provide
reliable guarantee for subsequent process analysis. The previous works mainly
include next event prediction based methods and autoencoder-based methods.
These methods cannot accurately and efficiently detect anomalies and correct
anomalies at the same time, and they all rely on the set threshold to detect
anomalies. To solve these problems, we propose a business process anomaly
correction method based on Transformer autoencoder. By using self-attention
mechanism and autoencoder structure, it can efficiently process event sequences
of arbitrary length, and can directly output corrected business process
instances, so that it can adapt to various scenarios. At the same time, the
anomaly detection is transformed into a classification problem by means of
selfsupervised learning, so that there is no need to set a specific threshold
in anomaly detection. The experimental results on several real-life event logs
show that the proposed method is superior to the previous methods in terms of
anomaly detection accuracy and anomaly correction results while ensuring high
running efficiency.
[LINK]
http://arxiv.org/abs/2404.10211v1
[DATE]
2024-04-16 09:45:18+08:00
[CATEGORIES]
cs.LG
Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models
[AUTHORS]
Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen
[LINK]
http://arxiv.org/abs/2404.10209v1
[DATE]
2024-04-16 09:38:34+08:00
[CATEGORIES]
cs.LG
HELLINGER-UCB: A novel algorithm for stochastic multi-armed bandit problem and cold start problem in recommender system
[AUTHORS]
Ruibo Yang, Jiazhou Wang, Andrew Mullhaupt
[ABSTRACT]
In this paper, we study the stochastic multi-armed bandit problem, where the
reward is driven by an unknown random variable. We propose a new variant of the
Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages
the squared Hellinger distance to build the upper confidence bound. We prove
that the Hellinger-UCB reaches the theoretical lower bound. We also show that
the Hellinger-UCB has a solid statistical interpretation. We show that
Hellinger-UCB is effective in finite time horizons with numerical experiments
between Hellinger-UCB and other variants of the UCB algorithm. As a real-world
example, we apply the Hellinger-UCB algorithm to solve the cold-start problem
for a content recommender system of a financial app. With reasonable
assumption, the Hellinger-UCB algorithm has a convenient but important lower
latency feature. The online experiment also illustrates that the Hellinger-UCB
outperforms both KL-UCB and UCB1 in the sense of a higher click-through rate
(CTR).
[LINK]
http://arxiv.org/abs/2404.10207v1
[DATE]
2024-04-16 09:20:51+08:00
[CATEGORIES]
cs.LG
Towards a Novel Perspective on Adversarial Examples Driven by Frequency
[AUTHORS]
Zhun Zhang, Yi Zeng, Qihe Liu, Shijie Zhou
[ABSTRACT]
Enhancing our understanding of adversarial examples is crucial for the secure
application of machine learning models in real-world scenarios. A prevalent
method for analyzing adversarial examples is through a frequency-based
approach. However, existing research indicates that attacks designed to exploit
low-frequency or high-frequency information can enhance attack performance,
leading to an unclear relationship between adversarial perturbations and
different frequency components. In this paper, we seek to demystify this
relationship by exploring the characteristics of adversarial perturbations
within the frequency domain. We employ wavelet packet decomposition for
detailed frequency analysis of adversarial examples and conduct statistical
examinations across various frequency bands. Intriguingly, our findings
indicate that significant adversarial perturbations are present within the
high-frequency components of low-frequency bands. Drawing on this insight, we
propose a black-box adversarial attack algorithm based on combining different
frequency bands. Experiments conducted on multiple datasets and models
demonstrate that combining low-frequency bands and high-frequency components of
low-frequency bands can significantly enhance attack efficiency. The average
attack success rate reaches 99\%, surpassing attacks that utilize a single
frequency segment. Additionally, we introduce the normalized disturbance
visibility index as a solution to the limitations of $L_2$ norm in assessing
continuous and discrete perturbations.
[LINK]
http://arxiv.org/abs/2404.10202v1
[DATE]
2024-04-16 08:58:46+08:00
[CATEGORIES]
cs.LG
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
[AUTHORS]
Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Samson Zhou, Kunal Talwar
[ABSTRACT]
We study the problem of private vector mean estimation in the shuffle model
of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$.
We propose a new multi-message protocol that achieves the optimal error using
$\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user.
Moreover, we show that any (unbiased) protocol that achieves optimal error
requires each user to send $\Omega(\min(n\varepsilon^2,d)/\log(n))$ messages,
demonstrating the optimality of our message complexity up to logarithmic
factors. Additionally, we study the single-message setting and design a
protocol that achieves mean squared error
$\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any
single-message protocol must incur mean squared error $\Omega(dn^{d/(d+2)})$,
showing that our protocol is optimal in the standard setting where $\varepsilon
= \Theta(1)$. Finally, we study robustness to malicious users and show that
malicious users can incur large additive error with a single shuffler.
[LINK]
http://arxiv.org/abs/2404.10201v1
[DATE]
2024-04-16 08:56:36+08:00
[CATEGORIES]
cs.LG
Learning to Manipulate under Limited Information
[AUTHORS]
Wesley H. Holliday, Alexander Kristoffersen, Eric Pacuit
[ABSTRACT]
By classic results in social choice theory, any reasonable preferential
voting method sometimes gives individuals an incentive to report an insincere
preference. The extent to which different voting methods are more or less
resistant to such strategic manipulation has become a key consideration for
comparing voting methods. Here we measure resistance to manipulation by whether
neural networks of varying sizes can learn to profitably manipulate a given
voting method in expectation, given different types of limited information
about how other voters will vote. We trained over 70,000 neural networks of 26
sizes to manipulate against 8 different voting methods, under 6 types of
limited information, in committee-sized elections with 5-21 voters and 3-6
candidates. We find that some voting methods, such as Borda, are highly
manipulable by networks with limited information, while others, such as Instant
Runoff, are not, despite being quite profitably manipulated by an ideal
manipulator with full information. For the two probability models for elections
that we use, the overall least manipulable of the 8 methods we study are
Condorcet methods, namely Minimax and Split Cycle.
[COMMENTS]
Appears at the 1st Workshop on Social Choice and Learning Algorithms
(SCaLA 2024) held at the 23rd International Conference on Autonomous Agents
and Multiagent Systems, organized by B. Armstrong, R. Fairstein, N. Mattei,
and Z. Terzopoulou, May 6-7, 2024, Auckland, New Zealand
[LINK]
http://arxiv.org/abs/2401.16412v2
[DATE]
2024-04-16 08:49:53+08:00
[CATEGORIES]
cs.LG
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery
[AUTHORS]
Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown
[ABSTRACT]
Automating robotic surgery via learning from demonstration (LfD) techniques
is extremely challenging. This is because surgical tasks often involve
sequential decision-making processes with complex interactions of physical
objects and have low tolerance for mistakes. Prior works assume that all
demonstrations are fully observable and optimal, which might not be practical
in the real world. This paper introduces a sample-efficient method that learns
a robust reward function from a limited amount of ranked suboptimal
demonstrations consisting of partial-view point cloud observations. The method
then learns a policy by optimizing the learned reward function using
reinforcement learning (RL). We show that using a learned reward function to
obtain a policy is more robust than pure imitation learning. We apply our
approach on a physical surgical electrocautery task and demonstrate that our
method can perform well even when the provided demonstrations are suboptimal
and the observations are high-dimensional point clouds. Code and videos
available here: https://sites.google.com/view/lfdinelectrocautery
[COMMENTS]
In proceedings of the International Symposium on Medical Robotics
(ISMR) 2024. Equal contribution from two first authors
[LINK]
http://arxiv.org/abs/2404.07185v2
[DATE]
2024-04-16 08:23:03+08:00
[CATEGORIES]
cs.LG
Multi-objective evolutionary GAN for tabular data synthesis
[AUTHORS]
Nian Ran, Bahrul Ilmi Nasution, Claire Little, Richard Allmendinger, Mark Elliot
[ABSTRACT]
Synthetic data has a key role to play in data sharing by statistical agencies
and other generators of statistical data products. Generative Adversarial
Networks (GANs), typically applied to image synthesis, are also a promising
method for tabular data synthesis. However, there are unique challenges in
tabular data compared to images, eg tabular data may contain both continuous
and discrete variables and conditional sampling, and, critically, the data
should possess high utility and low disclosure risk (the risk of re-identifying
a population unit or learning something new about them), providing an
opportunity for multi-objective (MO) optimization. Inspired by MO GANs for
images, this paper proposes a smart MO evolutionary conditional tabular GAN
(SMOE-CTGAN). This approach models conditional synthetic data by applying
conditional vectors in training, and uses concepts from MO optimisation to
balance disclosure risk against utility. Our results indicate that SMOE-CTGAN
is able to discover synthetic datasets with different risk and utility levels
for multiple national census datasets. We also find a sweet spot in the early
stage of training where a competitive utility and extremely low risk are
achieved, by using an Improvement Score. The full code can be downloaded from
https://github.com/HuskyNian/SMO_EGAN_pytorch.
[LINK]
http://arxiv.org/abs/2404.10176v1
[DATE]
2024-04-16 07:07:57+08:00
[CATEGORIES]
cs.LG
EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning
[AUTHORS]
Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta
[ABSTRACT]
From a visual perception perspective, modern graphical user interfaces (GUIs)
comprise a complex graphics-rich two-dimensional visuospatial arrangement of
text, images, and interactive objects such as buttons and menus. While existing
models can accurately predict regions and objects that are likely to attract
attention “on average”, so far there is no scanpath model capable of
predicting scanpaths for an individual. To close this gap, we introduce
EyeFormer, which leverages a Transformer architecture as a policy network to
guide a deep reinforcement learning algorithm that controls gaze locations. Our
model has the unique capability of producing personalized predictions when
given a few user scanpath samples. It can predict full scanpath information,
including fixation positions and duration, across individuals and various
stimulus types. Additionally, we demonstrate applications in GUI layout
optimization driven by our model. Our software and models will be publicly
available.
[LINK]
http://arxiv.org/abs/2404.10163v1
[DATE]
2024-04-16 06:26:27+08:00
[CATEGORIES]
cs.LG
Optimal Kernel Tuning Parameter Prediction using Deep Sequence Models
[AUTHORS]
Khawir Mahmood, Jehandad Khan, Hammad Afzal
[ABSTRACT]
GPU kernels have come to the forefront of comput- ing due to their utility in
varied fields, from high-performance computing to machine learning. A typical
GPU compute kernel is invoked millions, if not billions of times in a typical
application, which makes their performance highly critical. Due to the unknown
nature of the optimization surface, an exhaustive search is required to
discover the global optimum, which is infeasible due to the possible
exponential number of parameter combinations. In this work, we propose a
methodology that uses deep sequence- to-sequence models to predict the optimal
tuning parameters governing compute kernels. This work considers the prediction
of kernel parameters as a sequence to the sequence translation problem,
borrowing models from the Natural Language Process- ing (NLP) domain.
Parameters describing the input, output and weight tensors are considered as
the input language to the model that emits the corresponding kernel parameters.
In essence, the model translates the problem parameter language to kernel
parameter language. The core contributions of this work are: a) Proposing that
a sequence to sequence model can accurately learn the performance dynamics of a
GPU compute kernel b) A novel network architecture which predicts the kernel
tuning parameters for GPU kernels, c) A constrained beam search which
incorporates the physical limits of the GPU hardware as well as other expert
knowledge reducing the search space. The proposed algorithm can achieve more
than 90% accuracy on various convolutional kernels in MIOpen, the AMD machine
learning primitives library. As a result, the proposed technique can reduce the
development time and compute resources required to tune unseen input
configurations, resulting in shorter development cycles, reduced development
costs, and better user experience.
[LINK]
http://arxiv.org/abs/2404.10162v1
[DATE]
2024-04-16 06:25:54+08:00
[CATEGORIES]
cs.LG
Salient Object-Aware Background Generation using Text-Guided Diffusion Models
[AUTHORS]
Amir Erfan Eshratifar, Joao V. B. Soares, Kapil Thadani, Shaunak Mishra, Mikhail Kuznetsov, Yueh-Ning Ku, Paloma de Juan
[ABSTRACT]
Generating background scenes for salient objects plays a crucial role across
various domains including creative design and e-commerce, as it enhances the
presentation and context of subjects by integrating them into tailored
environments. Background generation can be framed as a task of text-conditioned
outpainting, where the goal is to extend image content beyond a salient
object’s boundaries on a blank background. Although popular diffusion models
for text-guided inpainting can also be used for outpainting by mask inversion,
they are trained to fill in missing parts of an image rather than to place an
object into a scene. Consequently, when used for background creation,
inpainting models frequently extend the salient object’s boundaries and thereby
change the object’s identity, which is a phenomenon we call “object expansion.”
This paper introduces a model for adapting inpainting diffusion models to the
salient object outpainting task using Stable Diffusion and ControlNet
architectures. We present a series of qualitative and quantitative results
across models and datasets, including a newly proposed metric to measure object
expansion that does not require any human labeling. Compared to Stable
Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by
3.6x on average with no degradation in standard visual metrics across multiple
datasets.
[COMMENTS]
Accepted for publication at CVPR 2024’s Generative Models for
Computer Vision workshop
[LINK]
http://arxiv.org/abs/2404.10157v1
[DATE]
2024-04-16 06:13:35+08:00
[CATEGORIES]
cs.LG
Quality Assessment of Prompts Used in Code Generation
[AUTHORS]
Mohammed Latif Siddiq, Simantika Dristi, Joy Saha, Joanna C. S. Santos
[ABSTRACT]
Large Language Models (LLMs) are gaining popularity among software engineers.
A crucial aspect of developing effective code-generation LLMs is to evaluate
these models using a robust benchmark. Evaluation benchmarks with quality
issues can provide a false sense of performance. In this work, we conduct the
first-of-its-kind study of the quality of prompts within benchmarks used to
compare the performance of different code generation models. To conduct this
study, we analyzed 3,566 prompts from 9 code generation benchmarks to identify
quality issues in them. We also investigated whether fixing the identified
quality issues in the benchmarks’ prompts affects a model’s performance. We
also studied memorization issues of the evaluation dataset, which can put into
question a benchmark’s trustworthiness. We found that code generation
evaluation benchmarks mainly focused on Python and coding exercises and had
very limited contextual dependencies to challenge the model. These datasets and
the developers’ prompts suffer from quality issues like spelling and
grammatical errors, unclear sentences to express developers’ intent, and not
using proper documentation style. Fixing all these issues in the benchmarks can
lead to a better performance for Python code generation, but not a significant
improvement was observed for Java code generation. We also found evidence that
GPT-3.5-Turbo and CodeGen-2.5 models possibly have data contamination issues.
[COMMENTS]
Under review
[LINK]
http://arxiv.org/abs/2404.10155v1
[DATE]
2024-04-16 06:02:58+08:00
[CATEGORIES]
cs.LG
Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond
[AUTHORS]
Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova
[ABSTRACT]
Homophily is a graph property describing the tendency of edges to connect
similar nodes; the opposite is called heterophily. It is often believed that
heterophilous graphs are challenging for standard message-passing graph neural
networks (GNNs), and much effort has been put into developing efficient methods
for this setting. However, there is no universally agreed-upon measure of
homophily in the literature. In this work, we show that commonly used homophily
measures have critical drawbacks preventing the comparison of homophily levels
across different datasets. For this, we formalize desirable properties for a
proper homophily measure and verify which measures satisfy which properties. In
particular, we show that a measure that we call adjusted homophily satisfies
more desirable properties than other popular homophily measures while being
rarely used in graph machine learning literature. Then, we go beyond the
homophily-heterophily dichotomy and propose a new characteristic that allows
one to further distinguish different sorts of heterophily. The proposed label
informativeness (LI) characterizes how much information a neighbor’s label
provides about a node’s label. We prove that this measure satisfies important
desirable properties. We also observe empirically that LI better agrees with
GNN performance compared to homophily measures, which confirms that it is a
useful characteristic of the graph structure.
[LINK]
http://arxiv.org/abs/2209.06177v5
[DATE]
2024-04-16 05:49:09+08:00
[CATEGORIES]
cs.LG
Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method
[AUTHORS]
Charis Stamouli, Ingvar Ziemann, George J. Pappas
[ABSTRACT]
We study the quadratic prediction error method – i.e., nonlinear least
squares – for a class of time-varying parametric predictor models satisfying a
certain identifiability condition. While this method is known to asymptotically
achieve the optimal rate for a wide range of problems, there have been no
non-asymptotic results matching these optimal rates outside of a select few,
typically linear, model classes. By leveraging modern tools from learning with
dependent data, we provide the first rate-optimal non-asymptotic analysis of
this method for our more general setting of nonlinearly parametrized model
classes. Moreover, we show that our results can be applied to a particular
class of identifiable AutoRegressive Moving Average (ARMA) models, resulting in
the first optimal non-asymptotic rates for identification of ARMA models.
[COMMENTS]
38 pages, added acknowledgements
[LINK]
http://arxiv.org/abs/2404.07937v2
[DATE]
2024-04-16 05:12:20+08:00
[CATEGORIES]
cs.LG
Epistemic Uncertainty Quantification For Pre-trained Neural Network
[AUTHORS]
Hanjing Wang, Qiang Ji
[ABSTRACT]
Epistemic uncertainty quantification (UQ) identifies where models lack
knowledge. Traditional UQ methods, often based on Bayesian neural networks, are
not suitable for pre-trained non-Bayesian models. Our study addresses
quantifying epistemic uncertainty for any pre-trained model, which does not
need the original training data or model modifications and can ensure broad
applicability regardless of network architectures or training techniques.
Specifically, we propose a gradient-based approach to assess epistemic
uncertainty, analyzing the gradients of outputs relative to model parameters,
and thereby indicating necessary model adjustments to accurately represent the
inputs. We first explore theoretical guarantees of gradient-based methods for
epistemic UQ, questioning the view that this uncertainty is only calculable
through differences between multiple models. We further improve gradient-driven
UQ by using class-specific weights for integrating gradients and emphasizing
distinct contributions from neural network layers. Additionally, we enhance UQ
accuracy by combining gradient and perturbation methods to refine the
gradients. We evaluate our approach on out-of-distribution detection,
uncertainty calibration, and active learning, demonstrating its superiority
over current state-of-the-art UQ methods for pre-trained models.
[COMMENTS]
Published at CVPR 2024
[LINK]
http://arxiv.org/abs/2404.10124v1
[DATE]
2024-04-16 04:21:05+08:00
[CATEGORIES]
cs.LG
Online Estimation via Offline Estimation: An Information-Theoretic Framework
[AUTHORS]
Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin
[ABSTRACT]
$ $The classical theory of statistical estimation aims to estimate a
parameter of interest under data generated from a fixed design (“offline
estimation”), while the contemporary theory of online learning provides
algorithms for estimation under adaptively chosen covariates (“online
estimation”). Motivated by connections between estimation and interactive
decision making, we ask: is it possible to convert offline estimation
algorithms into online estimation algorithms in a black-box fashion? We
investigate this question from an information-theoretic perspective by
introducing a new framework, Oracle-Efficient Online Estimation (OEOE), where
the learner can only interact with the data stream indirectly through a
sequence of offline estimators produced by a black-box algorithm operating on
the stream. Our main results settle the statistical and computational
complexity of online estimation in this framework.
$\bullet$ Statistical complexity. We show that information-theoretically,
there exist algorithms that achieve near-optimal online estimation error via
black-box offline estimation oracles, and give a nearly-tight characterization
for minimax rates in the OEOE framework.
$\bullet$ Computational complexity. We show that the guarantees above cannot
be achieved in a computationally efficient fashion in general, but give a
refined characterization for the special case of conditional density
estimation: computationally efficient online estimation via black-box offline
estimation is possible whenever it is possible via unrestricted algorithms.
Finally, we apply our results to give offline oracle-efficient algorithms for
interactive decision making.
[LINK]
http://arxiv.org/abs/2404.10122v1
[DATE]
2024-04-16 04:19:18+08:00
[CATEGORIES]
cs.LG
Multiple-Input Fourier Neural Operator (MIFNO) for source-dependent 3D elastodynamics
[AUTHORS]
Fanny Lehmann, Filippo Gatti, Didier Clouteau
[ABSTRACT]
Numerical simulations are essential tools to evaluate the solution of the
wave equation in complex settings, such as three-dimensional (3D) domains with
heterogeneous properties. However, their application is limited by high
computational costs and existing surrogate models lack the flexibility of
numerical solvers. This work introduces the Multiple-Input Fourier Neural
Operator (MIFNO) to deal with structured 3D fields representing material
properties as well as vectors describing the source characteristics. The MIFNO
is applied to the problem of elastic wave propagation in the Earth’s crust. It
is trained on the HEMEW^S-3D database containing 30000 earthquake simulations
in different heterogeneous domains with random source positions and
orientations. Outputs are time- and space-dependent surface wavefields. The
MIFNO predictions are assessed as good to excellent based on Goodness-Of-Fit
(GOF) criteria. Wave arrival times and wave fronts’ propagation are very
accurate since 80% of the predictions have an excellent phase GOF. The
fluctuations amplitudes are good for 87% of the predictions. The envelope score
is hindered by the small-scale fluctuations that are challenging to capture due
to the complex physical phenomena associated with high-frequency features.
Nevertheless, the MIFNO can generalize to sources located outside the training
domain and it shows good generalization ability to a real complex overthrust
geology. When focusing on a region of interest, transfer learning improves the
accuracy with limited additional costs, since GOF scores improved by more than
1 GOF unit with only 500 additional specific samples. The MIFNO is the first
surrogate model offering the flexibility of an earthquake simulator with
varying sources and material properties. Its good accuracy and massive speed-up
offer new perspectives to replace numerical simulations in many-query problems.
[LINK]
http://arxiv.org/abs/2404.10115v1
[DATE]
2024-04-16 04:07:44+08:00
[CATEGORIES]
cs.LG
Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning
[AUTHORS]
Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao
[ABSTRACT]
E-health allows smart devices and medical institutions to collaboratively
collect patients’ data, which is trained by Artificial Intelligence (AI)
technologies to help doctors make diagnosis. By allowing multiple devices to
train models collaboratively, federated learning is a promising solution to
address the communication and privacy issues in e-health. However, applying
federated learning in e-health faces many challenges. First, medical data is
both horizontally and vertically partitioned. Since single Horizontal Federated
Learning (HFL) or Vertical Federated Learning (VFL) techniques cannot deal with
both types of data partitioning, directly applying them may consume excessive
communication cost due to transmitting a part of raw data when requiring high
modeling accuracy. Second, a naive combination of HFL and VFL has limitations
including low training efficiency, unsound convergence analysis, and lack of
parameter tuning strategies. In this paper, we provide a thorough study on an
effective integration of HFL and VFL, to achieve communication efficiency and
overcome the above limitations when data is both horizontally and vertically
partitioned. Specifically, we propose a hybrid federated learning framework
with one intermediate result exchange and two aggregation phases. Based on this
framework, we develop a Hybrid Stochastic Gradient Descent (HSGD) algorithm to
train models. Then, we theoretically analyze the convergence upper bound of the
proposed algorithm. Using the convergence results, we design adaptive
strategies to adjust the training parameters and shrink the size of transmitted
data. Experimental results validate that the proposed HSGD algorithm can
achieve the desired accuracy while reducing communication cost, and they also
verify the effectiveness of the adaptive strategies.
[LINK]
http://arxiv.org/abs/2404.10110v1
[DATE]
2024-04-16 03:45:07+08:00
[CATEGORIES]
cs.LG
GeoAI Reproducibility and Replicability: a computational and spatial perspective
[AUTHORS]
Wenwen Lia, Chia-Yu Hsu, Sizhe Wang, Peter Kedron
[ABSTRACT]
GeoAI has emerged as an exciting interdisciplinary research area that
combines spatial theories and data with cutting-edge AI models to address
geospatial problems in a novel, data-driven manner. While GeoAI research has
flourished in the GIScience literature, its reproducibility and replicability
(R&R), fundamental principles that determine the reusability, reliability, and
scientific rigor of research findings, have rarely been discussed. This paper
aims to provide an in-depth analysis of this topic from both computational and
spatial perspectives. We first categorize the major goals for reproducing GeoAI
research, namely, validation (repeatability), learning and adapting the method
for solving a similar or new problem (reproducibility), and examining the
generalizability of the research findings (replicability). Each of these goals
requires different levels of understanding of GeoAI, as well as different
methods to ensure its success. We then discuss the factors that may cause the
lack of R&R in GeoAI research, with an emphasis on (1) the selection and use of
training data; (2) the uncertainty that resides in the GeoAI model design,
training, deployment, and inference processes; and more importantly (3) the
inherent spatial heterogeneity of geospatial data and processes. We use a deep
learning-based image analysis task as an example to demonstrate the results’
uncertainty and spatial variance caused by different factors. The findings
reiterate the importance of knowledge sharing, as well as the generation of a
“replicability map” that incorporates spatial autocorrelation and spatial
heterogeneity into consideration in quantifying the spatial replicability of
GeoAI research.
[COMMENTS]
Accepted by Annals of the American Association of Geographers
[LINK]
http://arxiv.org/abs/2404.10108v1
[DATE]
2024-04-16 03:43:16+08:00
[CATEGORIES]
cs.LG
Feature selection in linear SVMs via hard cardinality constraint: a scalable SDP decomposition approach
[AUTHORS]
Immanuel Bomze, Federico D’Onofrio, Laura Palagi, Bo Peng
[ABSTRACT]
In this paper, we study the embedded feature selection problem in linear
Support Vector Machines (SVMs), in which a cardinality constraint is employed,
leading to a fully explainable selection model. The problem is NP-hard due to
the presence of the cardinality constraint, even though the original linear SVM
amounts to a problem solvable in polynomial time. To handle the hard problem,
we first introduce two mixed-integer formulations for which novel SDP
relaxations are proposed. Exploiting the sparsity pattern of the relaxations,
we decompose the problems and obtain equivalent relaxations in a much smaller
cone, making the conic approaches scalable. To make the best usage of the
decomposed relaxations, we propose heuristics using the information of its
optimal solution. Moreover, an exact procedure is proposed by solving a
sequence of mixed-integer decomposed SDPs. Numerical results on classical
benchmarking datasets are reported, showing the efficiency and effectiveness of
our approach.
[COMMENTS]
Submitted to European Journal of Operational Research. arXiv admin
note: text overlap with arXiv:1808.02435 by other authors
[LINK]
http://arxiv.org/abs/2404.10099v1
[DATE]
2024-04-16 03:15:32+08:00
[CATEGORIES]
cs.LG
LegalPro-BERT: Classification of Legal Provisions by fine-tuning BERT Large Language Model
[AUTHORS]
Amit Tewari
[ABSTRACT]
A contract is a type of legal document commonly used in organizations.
Contract review is an integral and repetitive process to avoid business risk
and liability. Contract analysis requires the identification and classification
of key provisions and paragraphs within an agreement. Identification and
validation of contract clauses can be a time-consuming and challenging task
demanding the services of trained and expensive lawyers, paralegals or other
legal assistants. Classification of legal provisions in contracts using
artificial intelligence and natural language processing is complex due to the
requirement of domain-specialized legal language for model training and the
scarcity of sufficient labeled data in the legal domain. Using general-purpose
models is not effective in this context due to the use of specialized legal
vocabulary in contracts which may not be recognized by a general model. To
address this problem, we propose the use of a pre-trained large language model
which is subsequently calibrated on legal taxonomy. We propose LegalPro-BERT, a
BERT transformer architecture model that we fine- tune to efficiently handle
classification task for legal provisions. We conducted experiments to measure
and compare metrics with current benchmark results. We found that LegalPro-BERT
outperforms the previous benchmark used for comparison in this research.
[COMMENTS]
17 pages, 4 figures
[LINK]
http://arxiv.org/abs/2404.10097v1
[DATE]
2024-04-16 03:08:48+08:00
[CATEGORIES]
cs.LG
Towards DNA-Encoded Library Generation with GFlowNets
[AUTHORS]
Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio
[ABSTRACT]
DNA-encoded libraries (DELs) are a powerful approach for rapidly screening
large numbers of diverse compounds. One of the key challenges in using DELs is
library design, which involves choosing the building blocks that will be
combinatorially combined to produce the final library. In this paper we
consider the task of protein-protein interaction (PPI) biased DEL design. To
this end, we evaluate several machine learning algorithms on the PPI modulation
task and use them as a reward for the proposed GFlowNet-based generative
approach. We additionally investigate the possibility of using structural
information about building blocks to design a hierarchical action space for the
GFlowNet. The observed results indicate that GFlowNets are a promising approach
for generating diverse combinatorial library candidates.
[LINK]
http://arxiv.org/abs/2404.10094v1
[DATE]
2024-04-16 03:01:20+08:00
[CATEGORIES]
cs.LG
Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics
[AUTHORS]
Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su
[ABSTRACT]
Federated learning is a popular distributed learning approach for training a
machine learning model without disclosing raw data. It consists of a parameter
server and a possibly large collection of clients (e.g., in cross-device
federated learning) that may operate in congested and changing environments. In
this paper, we study federated learning in the presence of stochastic and
dynamic communication failures wherein the uplink between the parameter server
and client $i$ is on with unknown probability $p_i^t$ in round $t$.
Furthermore, we allow the dynamics of $p_i^t$ to be arbitrary.
We first demonstrate that when the $p_i^t$’s vary across clients, the most
widely adopted federated learning algorithm, Federated Average (FedAvg),
experiences significant bias. To address this observation, we propose Federated
Postponed Broadcast (FedPBC), a simple variant of FedAvg. FedPBC differs from
FedAvg in that the parameter server postpones broadcasting the global model
till the end of each round. Despite uplink failures, we show that FedPBC
converges to a stationary point of the original non-convex objective. On the
technical front, postponing the global model broadcasts enables implicit
gossiping among the clients with active links in round $t$. Despite the
time-varying nature of $p_i^t$, we can bound the perturbation of the global
model dynamics using techniques to control gossip-type information mixing
errors. Extensive experiments have been conducted on real-world datasets over
diversified unreliable uplink patterns to corroborate our analysis.
[COMMENTS]
This is a substantial extension of the conference paper “Towards Bias
Correction of Fedavg over Nonuniform and Time-varying Communications”, which
was published in 2023 62nd IEEE Conference on Decision and Control (CDC),
DOI: 10.1109/CDC49753.2023.10383258
[LINK]
http://arxiv.org/abs/2404.10091v1
[DATE]
2024-04-16 02:58:39+08:00
[CATEGORIES]
cs.LG
Principal-Agent Hypothesis Testing
[AUTHORS]
Stephen Bates, Michael I. Jordan, Michael Sklar, Jake A. Soloff
[ABSTRACT]
Consider the relationship between a regulator (the principal) and an
experimenter (the agent) such as a pharmaceutical company. The pharmaceutical
company wishes to sell a drug for profit, whereas the regulator wishes to allow
only efficacious drugs to be marketed. The efficacy of the drug is not known to
the regulator, so the pharmaceutical company must run a costly trial to prove
efficacy to the regulator. Critically, the statistical protocol used to
establish efficacy affects the behavior of a strategic, self-interested agent;
a lower standard of statistical evidence incentivizes the agent to run more
trials that are less likely to be effective. The interaction between the
statistical protocol and the incentives of the pharmaceutical company is
crucial for understanding this system and designing protocols with high social
utility. In this work, we discuss how the regulator can set up a protocol with
payoffs based on statistical evidence. We show how to design protocols that are
robust to an agent’s strategic actions, and derive the optimal protocol in the
presence of strategic entrants.
[LINK]
http://arxiv.org/abs/2205.06812v3
[DATE]
2024-04-16 02:38:26+08:00
[CATEGORIES]
cs.LG
Variational quantum simulation: a case study for understanding warm starts
[AUTHORS]
Ricard Puig i Valls, Marc Drudis, Supanut Thanasilp, Zoë Holmes
[ABSTRACT]
The barren plateau phenomenon, characterized by loss gradients that vanish
exponentially with system size, poses a challenge to scaling variational
quantum algorithms. Here we explore the potential of warm starts, whereby one
initializes closer to a solution in the hope of enjoying larger loss variances.
Focusing on an iterative variational method for learning shorter-depth circuits
for quantum real and imaginary time evolution we conduct a case study to
elucidate the potential and limitations of warm starts. We start by proving
that the iterative variational algorithm will exhibit substantial (at worst
vanishing polynomially in system size) gradients in a small region around the
initializations at each time-step. Convexity guarantees for these regions are
then established, suggesting trainability for polynomial size time-steps.
However, our study highlights scenarios where a good minimum shifts outside the
region with trainability guarantees. Our analysis leaves open the question
whether such minima jumps necessitate optimization across barren plateau
landscapes or whether there exist gradient flows, i.e., fertile valleys away
from the plateau with substantial gradients, that allow for training.
[COMMENTS]
9 + 26 pages, 5 + 2 figures
[LINK]
http://arxiv.org/abs/2404.10044v1
[DATE]
2024-04-16 02:00:03+08:00
[CATEGORIES]
cs.LG
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
[AUTHORS]
Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng
[ABSTRACT]
Neural Radiance Field (NeRF) is a representation for 3D reconstruction from
multi-view images. Despite some recent work showing preliminary success in
editing a reconstructed NeRF with diffusion prior, they remain struggling to
synthesize reasonable geometry in completely uncovered regions. One major
reason is the high diversity of synthetic contents from the diffusion model,
which hinders the radiance field from converging to a crisp and deterministic
geometry. Moreover, applying latent diffusion models on real data often yields
a textural shift incoherent to the image condition due to auto-encoding errors.
These two problems are further reinforced with the use of pixel-distance
losses. To address these issues, we propose tempering the diffusion model’s
stochasticity with per-scene customization and mitigating the textural shift
with masked adversarial training. During the analyses, we also found the
commonly used pixel and perceptual losses are harmful in the NeRF inpainting
task. Through rigorous experiments, our framework yields state-of-the-art NeRF
inpainting results on various real-world scenes. Project page:
https://hubert0527.github.io/MALD-NeRF
[COMMENTS]
Project page: https://hubert0527.github.io/MALD-NeRF
[LINK]
http://arxiv.org/abs/2404.09995v1
[DATE]
2024-04-16 01:59:57+08:00
[CATEGORIES]
cs.LG
Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning
[AUTHORS]
Kostadin Garov, Dimitar I. Dimitrov, Nikola Jovanović, Martin Vechev
[ABSTRACT]
Malicious server (MS) attacks have enabled the scaling of data stealing in
federated learning to large batch sizes and secure aggregation, settings
previously considered private. However, many concerns regarding the client-side
detectability of MS attacks were raised, questioning their practicality. In
this work, for the first time, we thoroughly study client-side detectability.
We first demonstrate that all prior MS attacks are detectable by principled
checks, and formulate a necessary set of requirements that a practical MS
attack must satisfy. Next, we propose SEER, a novel attack framework that
satisfies these requirements. The key insight of SEER is the use of a secret
decoder, jointly trained with the shared model. We show that SEER can steal
user data from gradients of realistic networks, even for large batch sizes of
up to 512 and under secure aggregation. Our work is a promising step towards
assessing the true vulnerability of federated learning in real-world settings.
[LINK]
http://arxiv.org/abs/2306.03013v5
[DATE]
2024-04-16 01:50:38+08:00
[CATEGORIES]
cs.LG
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
[AUTHORS]
Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal
[ABSTRACT]
ControlNets are widely used for adding spatial control in image generation
with different conditions, such as depth maps, canny edges, and human poses.
However, there are several challenges when leveraging the pretrained image
ControlNets for controlled video generation. First, pretrained ControlNet
cannot be directly plugged into new backbone models due to the mismatch of
feature spaces, and the cost of training ControlNets for new backbones is a big
burden. Second, ControlNet features for different frames might not effectively
handle the temporal consistency. To address these challenges, we introduce
Ctrl-Adapter, an efficient and versatile framework that adds diverse controls
to any image/video diffusion models, by adapting pretrained ControlNets (and
improving temporal alignment for videos). Ctrl-Adapter provides diverse
capabilities including image control, video control, video control with sparse
frames, multi-condition control, compatibility with different backbones,
adaptation to unseen control conditions, and video editing. In Ctrl-Adapter, we
train adapter layers that fuse pretrained ControlNet features to different
image/video diffusion models, while keeping the parameters of the ControlNets
and the diffusion models frozen. Ctrl-Adapter consists of temporal and spatial
modules so that it can effectively handle the temporal consistency of videos.
We also propose latent skipping and inverse timestep sampling for robust
adaptation and sparse control. Moreover, Ctrl-Adapter enables control from
multiple conditions by simply taking the (weighted) average of ControlNet
outputs. With diverse image/video diffusion backbones (SDXL, Hotshot-XL,
I2VGen-XL, and SVD), Ctrl-Adapter matches ControlNet for image control and
outperforms all baselines for video control (achieving the SOTA accuracy on the
DAVIS 2017 dataset) with significantly lower computational costs (less than 10
GPU hours).
[COMMENTS]
First two authors contributed equally; Project page:
https://ctrl-adapter.github.io/
[LINK]
http://arxiv.org/abs/2404.09967v1
[DATE]
2024-04-16 01:45:36+08:00
[CATEGORIES]
cs.LG
Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition
[AUTHORS]
Masato Tamura
[ABSTRACT]
Social group activity recognition is a challenging task extended from group
activity recognition, where social groups must be recognized with their
activities and group members. Existing methods tackle this task by leveraging
region features of individuals following existing group activity recognition
methods. However, the effectiveness of region features is susceptible to person
localization and variable semantics of individual actions. To overcome these
issues, we propose leveraging attention modules in transformers to generate
social group features. In this method, multiple embeddings are used to
aggregate features for a social group, each of which is assigned to a group
member without duplication. Due to this non-duplicated assignment, the number
of embeddings must be significant to avoid missing group members and thus
renders attention in transformers ineffective. To find optimal attention
designs with a large number of embeddings, we explore several design choices of
queries for feature aggregation and self-attention modules in transformer
decoders. Extensive experimental results show that the proposed method achieves
state-of-the-art performance and verify that the proposed attention designs are
highly effective on social group activity recognition.
[COMMENTS]
Accepted to IJCV, preprint version
[LINK]
http://arxiv.org/abs/2404.09964v1
[DATE]
2024-04-16 01:40:23+08:00
[CATEGORIES]
cs.LG
Prompt Stealing Attacks Against Text-to-Image Generation Models
[AUTHORS]
Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang
[ABSTRACT]
Text-to-Image generation models have revolutionized the artwork design
process and enabled anyone to create high-quality images by entering text
descriptions called prompts. Creating a high-quality prompt that consists of a
subject and several modifiers can be time-consuming and costly. In consequence,
a trend of trading high-quality prompts on specialized marketplaces has
emerged. In this paper, we perform the first study on understanding the threat
of a novel attack, namely prompt stealing attack, which aims to steal prompts
from generated images by text-to-image generation models. Successful prompt
stealing attacks directly violate the intellectual property of prompt engineers
and jeopardize the business model of prompt marketplaces. We first perform a
systematic analysis on a dataset collected by ourselves and show that a
successful prompt stealing attack should consider a prompt’s subject as well as
its modifiers. Based on this observation, we propose a simple yet effective
prompt stealing attack, PromptStealer. It consists of two modules: a subject
generator trained to infer the subject and a modifier detector for identifying
the modifiers within the generated image. Experimental results demonstrate that
PromptStealer is superior over three baseline methods, both quantitatively and
qualitatively. We also make some initial attempts to defend PromptStealer. In
general, our study uncovers a new attack vector within the ecosystem
established by the popular text-to-image generation models. We hope our results
can contribute to understanding and mitigating this emerging threat.
[LINK]
http://arxiv.org/abs/2302.09923v2
[DATE]
2024-04-16 01:40:04+08:00
[CATEGORIES]
cs.LG
Invariant Subspace Decomposition
[AUTHORS]
Margherita Lazzaretto, Jonas Peters, Niklas Pfister
[ABSTRACT]
We consider the task of predicting a response Y from a set of covariates X in
settings where the conditional distribution of Y given X changes over time. For
this to be feasible, assumptions on how the conditional distribution changes
over time are required. Existing approaches assume, for example, that changes
occur smoothly over time so that short-term prediction using only the recent
past becomes feasible. In this work, we propose a novel invariance-based
framework for linear conditionals, called Invariant Subspace Decomposition
(ISD), that splits the conditional distribution into a time-invariant and a
residual time-dependent component. As we show, this decomposition can be
utilized both for zero-shot and time-adaptation prediction tasks, that is,
settings where either no or a small amount of training data is available at the
time points we want to predict Y at, respectively. We propose a practical
estimation procedure, which automatically infers the decomposition using tools
from approximate joint matrix diagonalization. Furthermore, we provide finite
sample guarantees for the proposed estimator and demonstrate empirically that
it indeed improves on approaches that do not use the additional invariant
structure.
[LINK]
http://arxiv.org/abs/2404.09962v1
[DATE]
2024-04-16 01:39:44+08:00
[CATEGORIES]
cs.LG
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model
[AUTHORS]
Hanxue Gu, Haoyu Dong, Jichen Yang, Maciej A. Mazurowski
[ABSTRACT]
Automated segmentation is a fundamental medical image analysis task, which
enjoys significant advances due to the advent of deep learning. While
foundation models have been useful in natural language processing and some
vision tasks for some time, the foundation model developed with image
segmentation in mind - Segment Anything Model (SAM) - has been developed only
recently and has shown similar promise. However, there are still no systematic
analyses or “best-practice” guidelines for optimal fine-tuning of SAM for
medical image segmentation. This work summarizes existing fine-tuning
strategies with various backbone architectures, model components, and
fine-tuning algorithms across 18 combinations, and evaluates them on 17
datasets covering all common radiology modalities. Our study reveals that (1)
fine-tuning SAM leads to slightly better performance than previous segmentation
methods, (2) fine-tuning strategies that use parameter-efficient learning in
both the encoder and decoder are superior to other strategies, (3) network
architecture has a small impact on final performance, (4) further training SAM
with self-supervised learning can improve final model performance. We also
demonstrate the ineffectiveness of some methods popular in the literature and
further expand our experiments into few-shot and prompt-based settings. Lastly,
we released our code and MRI-specific fine-tuned weights, which consistently
obtained superior performance over the original SAM, at
https://github.com/mazurowski-lab/finetune-SAM.
[COMMENTS]
Code available at https://github.com/mazurowski-lab/finetune-SAM
[LINK]
http://arxiv.org/abs/2404.09957v1
[DATE]
2024-04-16 01:31:32+08:00
[CATEGORIES]
cs.LG
Classification Tree-based Active Learning: A Wrapper Approach
[AUTHORS]
Ashna Jose, Emilie Devijver, Massih-Reza Amini, Noel Jakse, Roberta Poloni
[ABSTRACT]
Supervised machine learning often requires large training sets to train
accurate models, yet obtaining large amounts of labeled data is not always
feasible. Hence, it becomes crucial to explore active learning methods for
reducing the size of training sets while maintaining high accuracy. The aim is
to select the optimal subset of data for labeling from an initial unlabeled
set, ensuring precise prediction of outcomes. However, conventional active
learning approaches are comparable to classical random sampling. This paper
proposes a wrapper active learning method for classification, organizing the
sampling process into a tree structure, that improves state-of-the-art
algorithms. A classification tree constructed on an initial set of labeled
samples is considered to decompose the space into low-entropy regions.
Input-space based criteria are used thereafter to sub-sample from these
regions, the total number of points to be labeled being decomposed into each
region. This adaptation proves to be a significant enhancement over existing
active learning methods. Through experiments conducted on various benchmark
data sets, the paper demonstrates the efficacy of the proposed framework by
being effective in constructing accurate classification models, even when
provided with a severely restricted labeled data set.
[LINK]
http://arxiv.org/abs/2404.09953v1
[DATE]
2024-04-16 01:27:00+08:00
[CATEGORIES]
cs.LG
A Note on Loss Functions and Error Compounding in Model-based Reinforcement Learning
[AUTHORS]
Nan Jiang
[ABSTRACT]
This note clarifies some confusions (and perhaps throws out more) around
model-based reinforcement learning and their theoretical understanding in the
context of deep RL. Main topics of discussion are (1) how to reconcile
model-based RL’s bad empirical reputation on error compounding with its
superior theoretical properties, and (2) the limitations of empirically popular
losses. For the latter, concrete counterexamples for the “MuZero loss” are
constructed to show that it not only fails in stochastic environments, but also
suffers exponential sample complexity in deterministic environments when data
provides sufficient coverage.
[LINK]
http://arxiv.org/abs/2404.09946v1
[DATE]
2024-04-16 01:15:18+08:00
[CATEGORIES]
cs.LG
Global Safe Sequential Learning via Efficient Knowledge Transfer
[AUTHORS]
Cen-You Li, Olaf Duennbier, Marc Toussaint, Barbara Rakitsch, Christoph Zimmer
[ABSTRACT]
Sequential learning methods such as active learning and Bayesian optimization
select the most informative data to learn about a task. In many medical or
engineering applications, the data selection is constrained by a priori unknown
safety conditions. A promissing line of safe learning methods utilize Gaussian
processes (GPs) to model the safety probability and perform data selection in
areas with high safety confidence. However, accurate safety modeling requires
prior knowledge or consumes data. In addition, the safety confidence centers
around the given observations which leads to local exploration. As transferable
source knowledge is often available in safety critical experiments, we propose
to consider transfer safe sequential learning to accelerate the learning of
safety. We further consider a pre-computation of source components to reduce
the additional computational load that is introduced by incorporating source
data. In this paper, we theoretically analyze the maximum explorable safe
regions of conventional safe learning methods. Furthermore, we empirically
demonstrate that our approach 1) learns a task with lower data consumption, 2)
globally explores multiple disjoint safe regions under guidance of the source
knowledge, and 3) operates with computation comparable to conventional safe
learning methods.
[LINK]
http://arxiv.org/abs/2402.14402v2
[DATE]
2024-04-16 00:57:36+08:00
[CATEGORIES]
cs.LG
Autonomous Path Planning for Intercostal Robotic Ultrasound Imaging Using Reinforcement Learning
[AUTHORS]
Yuan Bi, Cheng Qian, Zhicheng Zhang, Nassir Navab, Zhongliang Jiang
[ABSTRACT]
Ultrasound (US) has been widely used in daily clinical practice for screening
internal organs and guiding interventions. However, due to the acoustic shadow
cast by the subcutaneous rib cage, the US examination for thoracic application
is still challenging. To fully cover and reconstruct the region of interest in
US for diagnosis, an intercostal scanning path is necessary. To tackle this
challenge, we present a reinforcement learning (RL) approach for planning
scanning paths between ribs to monitor changes in lesions on internal organs,
such as the liver and heart, which are covered by rib cages. Structured
anatomical information of the human skeleton is crucial for planning these
intercostal paths. To obtain such anatomical insight, an RL agent is trained in
a virtual environment constructed using computational tomography (CT) templates
with randomly initialized tumors of various shapes and locations. In addition,
task-specific state representation and reward functions are introduced to
ensure the convergence of the training process while minimizing the effects of
acoustic attenuation and shadows during scanning. To validate the effectiveness
of the proposed approach, experiments have been carried out on unseen CTs with
randomly defined single or multiple scanning targets. The results demonstrate
the efficiency of the proposed RL framework in planning non-shadowed US
scanning trajectories in areas with limited acoustic access.
[LINK]
http://arxiv.org/abs/2404.09927v1
[DATE]
2024-04-16 00:52:53+08:00
[CATEGORIES]
cs.LG
Comprehensive Library of Variational LSE Solvers
[AUTHORS]
Nico Meyer, Martin Röhn, Jakob Murauer, Axel Plinge, Christopher Mutschler, Daniel D. Scherer
[ABSTRACT]
Linear systems of equations can be found in various mathematical domains, as
well as in the field of machine learning. By employing noisy intermediate-scale
quantum devices, variational solvers promise to accelerate finding solutions
for large systems. Although there is a wealth of theoretical research on these
algorithms, only fragmentary implementations exist. To fill this gap, we have
developed the variational-lse-solver framework, which realizes existing
approaches in literature, and introduces several enhancements. The
user-friendly interface is designed for researchers that work at the
abstraction level of identifying and developing end-to-end applications.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. 3 pages, 2 figures, 1 table
[LINK]
http://arxiv.org/abs/2404.09916v1
[DATE]
2024-04-16 00:43:13+08:00
[CATEGORIES]
cs.LG
Doubly Robust Inference in Causal Latent Factor Models
[AUTHORS]
Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah
[ABSTRACT]
This article introduces a new estimator of average treatment effects under
unobserved confounding in modern data-rich environments featuring large numbers
of units and outcomes. The proposed estimator is doubly robust, combining
outcome imputation, inverse probability weighting, and a novel cross-fitting
procedure for matrix completion. We derive finite-sample and asymptotic
guarantees, and show that the error of the new estimator converges to a
mean-zero Gaussian distribution at a parametric rate. Simulation results
demonstrate the practical relevance of the formal properties of the estimators
analyzed in this article.
[LINK]
http://arxiv.org/abs/2402.11652v2
[DATE]
2024-04-16 00:39:15+08:00
[CATEGORIES]
cs.LG
Near-optimal Closed-loop Method via Lyapunov Damping for Convex Optimization
[AUTHORS]
Severin Maier, Camille Castera, Peter Ochs
[ABSTRACT]
We introduce an autonomous system with closed-loop damping for first-order
convex optimization. While, to this day, optimal rates of convergence are
almost exclusively achieved by non-autonomous methods via open-loop damping
(e.g., Nesterov’s algorithm), we show that our system, featuring a closed-loop
damping, exhibits a rate arbitrarily close to the optimal one. We do so by
coupling the damping and the speed of convergence of the system via a
well-chosen Lyapunov function. By discretizing our system we then derive an
algorithm and present numerical experiments supporting our theoretical
findings.
[LINK]
http://arxiv.org/abs/2311.10053v2
[DATE]
2024-04-16 00:37:57+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Yongquan Qu, Mohamed Aziz Bhouri, Pierre Gentine [ABSTRACT]
Accurate representations of unknown and sub-grid physical processes through
parameterizations (or closure) in numerical simulations with quantified
uncertainty are critical for resolving the coarse-grained partial differential
equations that govern many problems ranging from weather and climate prediction
to turbulence simulations. Recent advances have seen machine learning (ML)
increasingly applied to model these subgrid processes, resulting in the
development of hybrid physics-ML models through the integration with numerical
solvers. In this work, we introduce a novel framework for the joint estimation
of physical parameters and machine learning parameterizations with uncertainty
quantification. Our framework incorporates online training and efficient
Bayesian inference within a high-dimensional parameter space, facilitated by
differentiable programming. This proof of concept underscores the substantial
potential of differentiable programming in synergistically combining machine
learning with differential equations, thereby enhancing the capabilities of
hybrid physics-ML modeling. [COMMENTS]
Accepted at ICLR 2024 Workshop on AI4Differential Equations in
Science [LINK]
http://arxiv.org/abs/2403.02215v2 [DATE]
2024-04-16 00:35:51+08:00 [CATEGORIES]
cs.LG
Is Table Retrieval a Solved Problem? Join-Aware Multi-Table Retrieval
[AUTHORS]
Peter Baile Chen, Yi Zhang, Dan Roth
[ABSTRACT]
Retrieving relevant tables containing the necessary information to accurately
answer a given question over tables is critical to open-domain
question-answering (QA) systems. Previous methods assume the answer to such a
question can be found either in a single table or multiple tables identified
through question decomposition or rewriting. However, neither of these
approaches is sufficient, as many questions require retrieving multiple tables
and joining them through a join plan that cannot be discerned from the user
query itself. If the join plan is not considered in the retrieval stage, the
subsequent steps of reasoning and answering based on those retrieved tables are
likely to be incorrect. To address this problem, we introduce a method that
uncovers useful join relations for any query and database during table
retrieval. We use a novel re-ranking method formulated as a mixed-integer
program that considers not only table-query relevance but also table-table
relevance that requires inferring join relationships. Our method outperforms
the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score
and for end-to-end QA by up to 5.4% in accuracy.
[LINK]
http://arxiv.org/abs/2404.09889v1
[DATE]
2024-04-15 23:55:01+08:00
[CATEGORIES]
cs.CL
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
[AUTHORS]
Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu
[COMMENTS]
Accepted to NAACL 2024
[LINK]
http://arxiv.org/abs/2311.10774v2
[DATE]
2024-04-15 23:48:48+08:00
[CATEGORIES]
cs.CL
Machine Translation for Ge’ez Language
[AUTHORS]
Aman Kassahun Wassie
[ABSTRACT]
Machine translation (MT) for low-resource languages such as Ge’ez, an ancient
language that is no longer the native language of any community, faces
challenges such as out-of-vocabulary words, domain mismatches, and lack of
sufficient labeled training data. In this work, we explore various methods to
improve Ge’ez MT, including transfer-learning from related languages,
optimizing shared vocabulary and token segmentation approaches, finetuning
large pre-trained models, and using large language models (LLMs) for few-shot
translation with fuzzy matches. We develop a multilingual neural machine
translation (MNMT) model based on languages relatedness, which brings an
average performance improvement of about 4 BLEU compared to standard bilingual
models. We also attempt to finetune the NLLB-200 model, one of the most
advanced translation models available today, but find that it performs poorly
with only 4k training samples for Ge’ez. Furthermore, we experiment with using
GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches,
which leverages embedding similarity-based retrieval to find context examples
from a parallel corpus. We observe that GPT-3.5 achieves a remarkable BLEU
score of 9.2 with no initial knowledge of Ge’ez, but still lower than the MNMT
baseline of 15.2. Our work provides insights into the potential and limitations
of different approaches for low-resource and ancient language MT.
[COMMENTS]
8 pages, 1 figure
[LINK]
http://arxiv.org/abs/2311.14530v3
[DATE]
2024-04-15 23:08:43+08:00
[CATEGORIES]
cs.CL
Gradient Flow of Energy: A General and Efficient Approach for Entity Alignment Decoding
[AUTHORS]
Yuanyi Wang, Haifeng Sun, Jingyu Wang, Qi Qi, Shaoling Sun, Jianxin Liao
[ABSTRACT]
Entity alignment (EA), a pivotal process in integrating multi-source
Knowledge Graphs (KGs), seeks to identify equivalent entity pairs across these
graphs. Most existing approaches regard EA as a graph representation learning
task, concentrating on enhancing graph encoders. However, the decoding process
in EA - essential for effective operation and alignment accuracy - has received
limited attention and remains tailored to specific datasets and model
architectures, necessitating both entity and additional explicit relation
embeddings. This specificity limits its applicability, particularly in
GNN-based models. To address this gap, we introduce a novel, generalized, and
efficient decoding approach for EA, relying solely on entity embeddings. Our
method optimizes the decoding process by minimizing Dirichlet energy, leading
to the gradient flow within the graph, to maximize graph homophily. The
discretization of the gradient flow produces a fast and scalable approach,
termed Triple Feature Propagation (TFP). TFP innovatively generalizes adjacency
matrices to multi-views matrices:entity-to-entity, entity-to-relation,
relation-to-entity, and relation-to-triple. The gradient flow through
generalized matrices enables TFP to harness the multi-view structural
information of KGs. Rigorous experimentation on diverse public datasets
demonstrates that our approach significantly enhances various EA methods.
Notably, the approach achieves these advancements with less than 6 seconds of
additional computational time, establishing a new benchmark in efficiency and
adaptability for future EA methods.
[LINK]
http://arxiv.org/abs/2401.12798v2
[DATE]
2024-04-15 22:47:12+08:00
[CATEGORIES]
cs.CL
On the Calibration of Multilingual Question Answering LLMs
[AUTHORS]
Yahan Yang, Soham Dan, Dan Roth, Insup Lee
[ABSTRACT]
Multilingual pre-trained Large Language Models (LLMs) are incredibly
effective at Question Answering (QA), a core task in Natural Language
Understanding, achieving high accuracies on several multilingual benchmarks.
However, little is known about how well their confidences are calibrated. In
this paper, we comprehensively benchmark the calibration of several
multilingual LLMs (MLLMs) on a variety of QA tasks. We perform extensive
experiments, spanning encoder-only, encoder-decoder, and decoder-only QA models
(size varying from 110M to 7B parameters) and diverse languages, including both
high- and low-resource ones. We study different dimensions of calibration in
in-distribution, out-of-distribution, and cross-lingual transfer settings, and
investigate strategies to improve it, including post-hoc methods and
regularized fine-tuning. For decoder-only LLMs such as LlaMa2, we additionally
find that in-context learning improves confidence calibration on multilingual
data. We also conduct several ablation experiments to study the effect of
language distances, language corpus size, and model size on calibration, and
how multilingual models compare with their monolingual counterparts for diverse
tasks and languages. Our experiments suggest that the multilingual QA models
are poorly calibrated for languages other than English and incorporating a
small set of cheaply translated multilingual samples during
fine-tuning/calibration effectively enhances the calibration performance.
[COMMENTS]
Preprint. Under Submission
[LINK]
http://arxiv.org/abs/2311.08669v2
[DATE]
2024-04-15 22:44:04+08:00
[CATEGORIES]
cs.CL
cs.LG
Negation Triplet Extraction with Syntactic Dependency and Semantic Consistency
[AUTHORS]
Yuchen Shi, Deqing Yang, Jingping Liu, Yanghua Xiao, Zongyu Wang, Huimin Xu
[ABSTRACT]
Previous works of negation understanding mainly focus on negation cue
detection and scope resolution, without identifying negation subject which is
also significant to the downstream tasks. In this paper, we propose a new
negation triplet extraction (NTE) task which aims to extract negation subject
along with negation cue and scope. To achieve NTE, we devise a novel
Syntax&Semantic-Enhanced Negation Extraction model, namely SSENE, which is
built based on a generative pretrained language model (PLM) {of Encoder-Decoder
architecture} with a multi-task learning framework. Specifically, the given
sentence’s syntactic dependency tree is incorporated into the PLM’s encoder to
discover the correlations between the negation subject, cue and scope.
Moreover, the semantic consistency between the sentence and the extracted
triplet is ensured by an auxiliary task learning. Furthermore, we have
constructed a high-quality Chinese dataset NegComment based on the users’
reviews from the real-world platform of Meituan, upon which our evaluations
show that SSENE achieves the best NTE performance compared to the baselines.
Our ablation and case studies also demonstrate that incorporating the syntactic
information helps the PLM’s recognize the distant dependency between the
subject and cue, and the auxiliary task learning is helpful to extract the
negation triplets with more semantic consistency.
[COMMENTS]
Accepted by COLING 2024
[LINK]
http://arxiv.org/abs/2404.09830v1
[DATE]
2024-04-15 22:28:33+08:00
[CATEGORIES]
cs.CL
Impact of Preference Noise on the Alignment Performance of Generative Language Models
[AUTHORS]
Yang Gao, Dana Alon, Donald Metzler
[ABSTRACT]
A key requirement in developing Generative Language Models (GLMs) is to have
their values aligned with human values. Preference-based alignment is a widely
used paradigm for this purpose, in which preferences over generation pairs are
first elicited from human annotators or AI systems, and then fed into some
alignment techniques, e.g., Direct Preference Optimization. However, a
substantial percent (20 - 40%) of the preference pairs used in GLM alignment
are noisy, and it remains unclear how the noise affects the alignment
performance and how to mitigate its negative impact. In this paper, we propose
a framework to inject desirable amounts and types of noise to the preferences,
and systematically study the impact of preference noise on the alignment
performance in two tasks (summarization and dialogue generation). We find that
the alignment performance can be highly sensitive to the noise rates in the
preference data: e.g., a 10 percentage points (pp) increase of the noise rate
can lead to 30 pp drop in the alignment performance (in win rate). To mitigate
the impact of noise, confidence-based data filtering shows significant benefit
when certain types of noise are present. We hope our work can help the
community better understand and mitigate the impact of preference noise in GLM
alignment.
[LINK]
http://arxiv.org/abs/2404.09824v1
[DATE]
2024-04-15 22:21:53+08:00
[CATEGORIES]
cs.CL
Out-of-distribution Evidence-aware Fake News Detection via Dual Adversarial Debiasing
[AUTHORS]
Qiang Liu, Junfei Wu, Shu Wu, Liang Wang
[ABSTRACT]
Evidence-aware fake news detection aims to conduct reasoning between news and
evidence, which is retrieved based on news content, to find uniformity or
inconsistency. However, we find evidence-aware detection models suffer from
biases, i.e., spurious correlations between news/evidence contents and
true/fake news labels, and are hard to be generalized to Out-Of-Distribution
(OOD) situations. To deal with this, we propose a novel Dual Adversarial
Learning (DAL) approach. We incorporate news-aspect and evidence-aspect
debiasing discriminators, whose targets are both true/fake news labels, in DAL.
Then, DAL reversely optimizes news-aspect and evidence-aspect debiasing
discriminators to mitigate the impact of news and evidence content biases. At
the same time, DAL also optimizes the main fake news predictor, so that the
news-evidence interaction module can be learned. This process allows us to
teach evidence-aware fake news detection models to better conduct news-evidence
reasoning, and minimize the impact of content biases. To be noted, our proposed
DAL approach is a plug-and-play module that works well with existing backbones.
We conduct comprehensive experiments under two OOD settings, and plug DAL in
four evidence-aware fake news detection backbones. Results demonstrate that,
DAL significantly and stably outperforms the original backbones and some
competitive debiasing methods.
[LINK]
http://arxiv.org/abs/2304.12888v2
[DATE]
2024-04-15 22:17:30+08:00
[CATEGORIES]
cs.CL
ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models
[AUTHORS]
Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, Yvette Graham
[ABSTRACT]
Parameter-efficient fine-tuning (PEFT) is widely studied for its
effectiveness and efficiency in the era of large language models. Low-rank
adaptation (LoRA) has demonstrated commendable performance as a popular and
representative method. However, it is implemented with a fixed intrinsic rank
that might not be the ideal setting for the downstream tasks. Recognizing the
need for more flexible downstream task adaptation, we extend the methodology of
LoRA to an innovative approach we call allocating low-rank adaptation (ALoRA)
that enables dynamic adjustments to the intrinsic rank during the adaptation
process. First, we propose a novel method, AB-LoRA, that can effectively
estimate the importance score of each LoRA rank. Second, guided by AB-LoRA, we
gradually prune abundant and negatively impacting LoRA ranks and allocate the
pruned LoRA budgets to important Transformer modules needing higher ranks. We
have conducted experiments on various tasks, and the experimental results
demonstrate that our ALoRA method can outperform the recent baselines with
comparable tunable parameters.
[COMMENTS]
Accepted by NAACL-2024
[LINK]
http://arxiv.org/abs/2403.16187v2
[DATE]
2024-04-15 21:25:05+08:00
[CATEGORIES]
cs.CL
KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models
[AUTHORS]
Avinash Anand, Mohit Gupta, Kritarth Prasad, Ujjwal Goel, Naman Lal, Astha Verma, Rajiv Ratn Shah
[ABSTRACT]
Citation Text Generation (CTG) is a task in natural language processing (NLP)
that aims to produce text that accurately cites or references a cited document
within a source document. In CTG, the generated text draws upon contextual cues
from both the source document and the cited paper, ensuring accurate and
relevant citation information is provided. Previous work in the field of
citation generation is mainly based on the text summarization of documents.
Following this, this paper presents a framework, and a comparative study to
demonstrate the use of Large Language Models (LLMs) for the task of citation
generation. Also, we have shown the improvement in the results of citation
generation by incorporating the knowledge graph relations of the papers in the
prompt for the LLM to better learn the relationship between the papers. To
assess how well our model is performing, we have used a subset of standard
S2ORC dataset, which only consists of computer science academic research papers
in the English Language. Vicuna performs best for this task with 14.15 Meteor,
12.88 Rouge-1, 1.52 Rouge-2, and 10.94 Rouge-L. Also, Alpaca performs best, and
improves the performance by 36.98% in Rouge-1, and 33.14% in Meteor by
including knowledge graphs.
[LINK]
http://arxiv.org/abs/2404.09763v1
[DATE]
2024-04-15 21:06:32+08:00
[CATEGORIES]
cs.CL
Evaluating the Deductive Competence of Large Language Models
[AUTHORS]
Spencer M. Seals, Valerie L. Shalin
[COMMENTS]
17 pages, 7 figures, accepted to NAACL 2024
[LINK]
http://arxiv.org/abs/2309.05452v2
[DATE]
2024-04-15 21:01:30+08:00
[CATEGORIES]
cs.CL
Personalized Collaborative Fine-Tuning for On-Device Large Language Models
[AUTHORS]
Nicolas Wagner, Dongyang Fan, Martin Jaggi
[ABSTRACT]
We explore on-device self-supervised collaborative fine-tuning of large
language models with limited local data availability. Taking inspiration from
the collaborative learning community, we introduce three distinct
trust-weighted gradient aggregation schemes: weight similarity-based,
prediction similarity-based and validation performance-based. To minimize
communication overhead, we integrate Low-Rank Adaptation (LoRA) and only
exchange LoRA weight updates. Our protocols, driven by prediction and
performance metrics, surpass both FedAvg and local fine-tuning methods, which
is particularly evident in realistic scenarios with more diverse local data
distributions. The results underscore the effectiveness of our approach in
addressing heterogeneity and scarcity within local datasets.
[LINK]
http://arxiv.org/abs/2404.09753v1
[DATE]
2024-04-15 20:54:31+08:00
[CATEGORIES]
cs.CL
cs.LG
PerkwE_COQA: Enhanced Persian Conversational Question Answering by combining contextual keyword extraction with Large Language Models
[AUTHORS]
Pardis Moradbeiki, Nasser Ghadiri
[ABSTRACT]
Smart cities need the involvement of their residents to enhance quality of
life. Conversational query-answering is an emerging approach for user
engagement. There is an increasing demand of an advanced conversational
question-answering that goes beyond classic systems. Existing approaches have
shown that LLMs offer promising capabilities for CQA, but may struggle to
capture the nuances of conversational contexts. The new approach involves
understanding the content and engaging in a multi-step conversation with the
user to fulfill their needs. This paper presents a novel method to elevate the
performance of Persian Conversational question-answering (CQA) systems. It
combines the strengths of Large Language Models (LLMs) with contextual keyword
extraction. Our method extracts keywords specific to the conversational flow,
providing the LLM with additional context to understand the user’s intent and
generate more relevant and coherent responses. We evaluated the effectiveness
of this combined approach through various metrics, demonstrating significant
improvements in CQA performance compared to an LLM-only baseline. The proposed
method effectively handles implicit questions, delivers contextually relevant
answers, and tackles complex questions that rely heavily on conversational
context. The findings indicate that our method outperformed the evaluation
benchmarks up to 8% higher than existing methods and the LLM-only baseline.
[LINK]
http://arxiv.org/abs/2404.05406v2
[DATE]
2024-04-15 20:38:33+08:00
[CATEGORIES]
cs.CL
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
[AUTHORS]
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui
[ABSTRACT]
Transformers are ubiquitous in wide tasks. Interpreting their internals is a
pivotal goal. Nevertheless, their particular components, feed-forward (FF)
blocks, have typically been less analyzed despite their substantial parameter
amounts. We analyze the input contextualization effects of FF blocks by
rendering them in the attention maps as a human-friendly visualization scheme.
Our experiments with both masked- and causal-language models reveal that FF
networks modify the input contextualization to emphasize specific types of
linguistic compositions. In addition, FF and its surrounding components tend to
cancel out each other’s effects, suggesting potential redundancy in the
processing of the Transformer layer.
[COMMENTS]
ICLR 2024 Spotlight; 37 pages, 32 figures, 3 tables
[LINK]
http://arxiv.org/abs/2302.00456v3
[DATE]
2024-04-15 20:27:00+08:00
[CATEGORIES]
cs.CL
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model
[AUTHORS]
Hyunsoo Cho
[COMMENTS]
Under review @ *ACL
[LINK]
http://arxiv.org/abs/2404.09717v1
[DATE]
2024-04-15 20:20:09+08:00
[CATEGORIES]
cs.CL
cs.LG
Psychometric Predictive Power of Large Language Models
[AUTHORS]
Tatsuki Kuribayashi, Yohei Oseki, Timothy Baldwin
[COMMENTS]
23 pages; Findings of NAACL 2024
[LINK]
http://arxiv.org/abs/2311.07484v3
[DATE]
2024-04-15 20:12:24+08:00
[CATEGORIES]
cs.CL
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
[AUTHORS]
Guangyan Li, Yongqiang Tang, Wensheng Zhang
[ABSTRACT]
Large language models (LLMs) show excellent performance in difficult tasks,
but they often require massive memories and computational resources. How to
reduce the parameter scale of LLMs has become research hotspots. In this study,
we make an important observation that the multi-head self-attention (MHA)
sub-layer of Transformer exhibits noticeable low-rank structure, while the
feed-forward network (FFN) sub-layer does not. With this regard, we design a
mixed compression model, which organically combines Low-Rank matrix
approximation And structured Pruning (LoRAP). For the MHA sub-layer, we propose
an input activation weighted singular value decomposition method to strengthen
the low-rank characteristic. Furthermore, we discover that the weight matrices
in MHA sub-layer have different low-rank degrees. Thus, a novel parameter
allocation scheme according to the discrepancy of low-rank degrees is devised.
For the FFN sub-layer, we propose a gradient-free structured channel pruning
method. During the pruning, we get an interesting finding that the least
important 1% of parameter actually play a vital role in model performance.
Extensive evaluations on zero-shot perplexity and zero-shot task classification
indicate that our proposal is superior to previous structured compression
rivals under multiple compression ratios.
[COMMENTS]
8 pages,4 figures
[LINK]
http://arxiv.org/abs/2404.09695v1
[DATE]
2024-04-15 19:53:22+08:00
[CATEGORIES]
cs.LG
cs.CL
Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation
[AUTHORS]
Juhwan Choi, Jungmin Yun, Kyohoon Jin, YoungBin Kim
[ABSTRACT]
The quality of the dataset is crucial for ensuring optimal performance and
reliability of downstream task models. However, datasets often contain noisy
data inadvertently included during the construction process. Numerous attempts
have been made to correct this issue through human annotators. However, hiring
and managing human annotators is expensive and time-consuming. As an
alternative, recent studies are exploring the use of large language models
(LLMs) for data annotation.
In this study, we present a case study that extends the application of
LLM-based data annotation to enhance the quality of existing datasets through a
cleansing strategy. Specifically, we leverage approaches such as
chain-of-thought (CoT) and majority voting to imitate human annotation and
classify unrelated documents from the Multi-News dataset, which is widely used
for the multi-document summarization task. Through our proposed cleansing
method, we introduce an enhanced Multi-News+. By employing LLMs for data
cleansing, we demonstrate an efficient and effective approach to improving
dataset quality without relying on expensive human annotation efforts.
[LINK]
http://arxiv.org/abs/2404.09682v1
[DATE]
2024-04-15 19:36:10+08:00
[CATEGORIES]
cs.CL
CBQ: Cross-Block Quantization for Large Language Models
[AUTHORS]
Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe Wang
[ABSTRACT]
Post-training quantization (PTQ) has played a key role in compressing large
language models (LLMs) with ultra-low costs. However, existing PTQ methods only
focus on handling the outliers within one layer or one block, which ignores the
dependency of blocks and leads to severe performance degradation in low-bit
settings. In this paper, we propose CBQ, a cross-block reconstruction-based PTQ
method for LLMs. CBQ employs a cross-block dependency using a homologous
reconstruction scheme, establishing long-range dependencies across multiple
blocks to minimize error accumulation. Furthermore, CBQ incorporates a
coarse-to-fine preprocessing (CFP) strategy for suppressing weight and
activation outliers, coupled with an adaptive LoRA-Rounding technique for
precise weight quantization. These innovations enable CBQ to not only handle
extreme outliers effectively but also improve overall quantization accuracy.
Extensive experiments show that CBQ achieves superior low-bit quantization
(W4A4, W4A8, W2A16) and outperforms existing state-of-the-art methods across
various LLMs and datasets. Notably, CBQ quantizes the 4-bit LLAMA1-65B model
within only 4.3 hours on a single GPU, achieving a commendable tradeoff between
performance and quantization efficiency.
[LINK]
http://arxiv.org/abs/2312.07950v4
[DATE]
2024-04-15 18:57:16+08:00
[CATEGORIES]
cs.LG
cs.CL
Learn Your Reference Model for Real Good Alignment
[AUTHORS]
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov, Nikita Surnachev, Yaroslav Aksenov, Ian Maksimov, Nikita Balagansky, Daniil Gavrilov
[ABSTRACT]
The complexity of the alignment problem stems from the fact that existing
methods are unstable. Researchers continuously invent various tricks to address
this shortcoming. For instance, in the fundamental Reinforcement Learning From
Human Feedback (RLHF) technique of Language Model alignment, in addition to
reward maximization, the Kullback-Leibler divergence between the trainable
policy and the SFT policy is minimized. This addition prevents the model from
being overfitted to the Reward Model (RM) and generating texts that are
out-of-domain for the RM. The Direct Preference Optimization (DPO) method
reformulates the optimization task of RLHF and eliminates the Reward Model
while tacitly maintaining the requirement for the policy to be close to the SFT
policy. In our paper, we argue that this implicit limitation in the DPO method
leads to sub-optimal results. We propose a new method called Trust Region DPO
(TR-DPO), which updates the reference policy during training. With such a
straightforward update, we demonstrate the effectiveness of TR-DPO against DPO
on the Anthropic HH and TLDR datasets. We show that TR-DPO outperforms DPO by
up to 19%, measured by automatic evaluation with GPT-4. The new alignment
approach that we propose allows us to improve the quality of models across
several parameters at once, such as coherence, correctness, level of detail,
helpfulness, and harmlessness.
[LINK]
http://arxiv.org/abs/2404.09656v1
[DATE]
2024-04-15 18:44:31+08:00
[CATEGORIES]
cs.LG
cs.CL
Real-world Instance-specific Image Goal Navigation for Service Robots: Bridging the Domain Gap with Contrastive Learning
[AUTHORS]
Taichi Sakaguchi, Akira Taniguchi, Yoshinobu Hagiwara, Lotfi El Hafi, Shoichi Hasegawa, Tadahiro Taniguchi
[ABSTRACT]
Improving instance-specific image goal navigation (InstanceImageNav), which
locates the identical object in a real-world environment from a query image, is
essential for robotic systems to assist users in finding desired objects. The
challenge lies in the domain gap between low-quality images observed by the
moving robot, characterized by motion blur and low-resolution, and high-quality
query images provided by the user. Such domain gaps could significantly reduce
the task success rate but have not been the focus of previous work. To address
this, we propose a novel method called Few-shot Cross-quality Instance-aware
Adaptation (CrossIA), which employs contrastive learning with an instance
classifier to align features between massive low- and few high-quality images.
This approach effectively reduces the domain gap by bringing the latent
representations of cross-quality images closer on an instance basis.
Additionally, the system integrates an object image collection with a
pre-trained deblurring model to enhance the observed image quality. Our method
fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We
evaluated our method’s effectiveness through an InstanceImageNav task with 20
different types of instances, where the robot identifies the same instance in a
real-world environment as a high-quality query image. Our experiments showed
that our method improves the task success rate by up to three times compared to
the baseline, a conventional approach based on SuperGlue. These findings
highlight the potential of leveraging contrastive learning and image
enhancement techniques to bridge the domain gap and improve object localization
in robotic applications. The project website is
https://emergentsystemlabstudent.github.io/DomainBridgingNav/.
[COMMENTS]
See website at
https://emergentsystemlabstudent.github.io/DomainBridgingNav/. Submitted to
IROS2024
[LINK]
http://arxiv.org/abs/2404.09645v1
[DATE]
2024-04-15 18:24:32+08:00
[CATEGORIES]
cs.CL
Improving Recall of Large Language Models: A Model Collaboration Approach for Relational Triple Extraction
[AUTHORS]
Zepeng Ding, Wenhao Huang, Jiaqing Liang, Deqing Yang, Yanghua Xiao
[COMMENTS]
Accepted at LREC-COLING 2024 main conference
[LINK]
http://arxiv.org/abs/2404.09593v1
[DATE]
2024-04-15 17:03:05+08:00
[CATEGORIES]
cs.CL
Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
[AUTHORS]
Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy
[COMMENTS]
Accepted to the 3rd Workshop on Perspectivist Approaches to NLP at
LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2307.12973v2
[DATE]
2024-04-15 17:00:26+08:00
[CATEGORIES]
cs.CL
Transformers, Contextualism, and Polysemy
[AUTHORS]
Jumbly Grindrod
[ABSTRACT]
The transformer architecture, introduced by Vaswani et al. (2017), is at the
heart of the remarkable recent progress in the development of language models,
including famous chatbots such as Chat-gpt and Bard. In this paper, I argue
that we an extract from the way the transformer architecture works a picture of
the relationship between context and meaning. I call this the transformer
picture, and I argue that it is a novel with regard to two related
philosophical debates: the contextualism debate regarding the extent of
context-sensitivity across natural language, and the polysemy debate regarding
how polysemy should be captured within an account of word meaning. Although
much of the paper merely tries to position the transformer picture with respect
to these two debates, I will also begin to make the case for the transformer
picture.
[LINK]
http://arxiv.org/abs/2404.09577v1
[DATE]
2024-04-15 16:38:43+08:00
[CATEGORIES]
cs.CL
Large language models and linguistic intentionality
[AUTHORS]
Jumbly Grindrod
[ABSTRACT]
Do large language models like Chat-GPT or LLaMa meaningfully use the words
they produce? Or are they merely clever prediction machines, simulating
language use by producing statistically plausible text? There have already been
some initial attempts to answer this question by showing that these models meet
the criteria for entering meaningful states according to metasemantic theories
of mental content. In this paper, I will argue for a different approach - that
we should instead consider whether language models meet the criteria given by
our best metasemantic theories of linguistic content. In that vein, I will
illustrate how this can be done by applying two such theories to the case of
language models: Gareth Evans’ (1982) account of naming practices and Ruth
Millikan’s (1984, 2004, 2005) teleosemantics. In doing so, I will argue that it
is a mistake to think that the failure of LLMs to meet plausible conditions for
mental intentionality thereby renders their outputs meaningless, and that a
distinguishing feature of linguistic intentionality - dependency on a
pre-existing linguistic system - allows for the plausible result LLM outputs
are meaningful.
[LINK]
http://arxiv.org/abs/2404.09576v1
[DATE]
2024-04-15 16:37:26+08:00
[CATEGORIES]
cs.CL
Reliability Estimation of News Media Sources: Birds of a Feather Flock Together
[AUTHORS]
Sergio Burdisso, Dairazalia Sánchez-Cortés, Esaú Villatoro-Tello, Petr Motlicek
[COMMENTS]
Accepted to NAACL 2024 Main Conference
[LINK]
http://arxiv.org/abs/2404.09565v1
[DATE]
2024-04-15 16:27:47+08:00
[CATEGORIES]
cs.CL
cs.LG
Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend
[AUTHORS]
Ning Lu, Shengcai Liu, Zhirui Zhang, Qi Wang, Haifeng Liu, Ke Tang
[ABSTRACT]
Word-level textual adversarial attacks have demonstrated notable efficacy in
misleading Natural Language Processing (NLP) models. Despite their success, the
underlying reasons for their effectiveness and the fundamental characteristics
of adversarial examples (AEs) remain obscure. This work aims to interpret
word-level attacks by examining their $n$-gram frequency patterns. Our
comprehensive experiments reveal that in approximately 90\% of cases,
word-level attacks lead to the generation of examples where the frequency of
$n$-grams decreases, a tendency we term as the $n$-gram Frequency Descend
($n$-FD). This finding suggests a straightforward strategy to enhance model
robustness: training models using examples with $n$-FD. To examine the
feasibility of this strategy, we employed the $n$-gram frequency information,
as an alternative to conventional loss gradients, to generate perturbed
examples in adversarial training. The experiment results indicate that the
frequency-based approach performs comparably with the gradient-based approach
in improving model robustness. Our research offers a novel and more intuitive
perspective for understanding word-level textual adversarial attacks and
proposes a new direction to improve model robustness.
[COMMENTS]
To be published in: 2024 IEEE Conference on Artificial Intelligence
(CAI 2024)
[LINK]
http://arxiv.org/abs/2302.02568v4
[DATE]
2024-04-15 16:11:18+08:00
[CATEGORIES]
cs.CL
cs.LG
Large Language Models as Optimizers
[AUTHORS]
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen
[COMMENTS]
ICLR 2024; 42 pages, 26 figures, 15 tables. Code at
https://github.com/google-deepmind/opro
[LINK]
http://arxiv.org/abs/2309.03409v3
[DATE]
2024-04-15 15:50:32+08:00
[CATEGORIES]
cs.LG
cs.CL
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
[AUTHORS]
Siyan Zhao, Daniel Israel, Guy Van den Broeck, Aditya Grover
[ABSTRACT]
During inference for transformer-based large language models (LLM),
prefilling is the computation of the key-value (KV) cache for input tokens in
the prompt prior to autoregressive generation. For longer input prompt lengths,
prefilling will incur a significant overhead on decoding time. In this work, we
highlight the following pitfall of prefilling: for batches containing
high-varying prompt lengths, significant computation is wasted by the standard
practice of padding sequences to the maximum length. As LLMs increasingly
support longer context lengths, potentially up to 10 million tokens, variations
in prompt lengths within a batch become more pronounced. To address this, we
propose Prepacking, a simple yet effective method to optimize prefilling
computation. To avoid redundant computation on pad tokens, prepacking combines
prompts of varying lengths into a sequence and packs multiple sequences into a
compact batch using a bin-packing algorithm. It then modifies the attention
mask and positional encoding to compute multiple prefilled KV-caches for
multiple prompts within a single sequence. On standard curated dataset
containing prompts with varying lengths, we obtain a significant speed and
memory efficiency improvements as compared to the default padding-based
prefilling computation within Huggingface across a range of base model
configurations and inference serving scenarios.
[COMMENTS]
18 pages, code in https://github.com/siyan-zhao/prepacking
[LINK]
http://arxiv.org/abs/2404.09529v1
[DATE]
2024-04-15 15:49:10+08:00
[CATEGORIES]
cs.LG
cs.CL
Neuron-level LLM Patching for Code Generation
[AUTHORS]
Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang
[ABSTRACT]
Large Language Models (LLMs) have found widespread adoption in software
engineering, particularly in code generation tasks. However, updating these
models with new knowledge can be prohibitively expensive, yet it is essential
for maximizing their utility. In this paper, we propose a novel and effective
model editing approach, \textsc{MENT}, to patch LLMs in coding tasks.
\textsc{MENT} is effective, efficient, and reliable. It can correct a neural
model by patching 1 or 2 neurons. As the pioneer work on neuron-level model
editing of generative models, we formalize the editing process and introduce
the involved concepts. Besides, we also introduce new measures to evaluate its
generalization ability, and build a benchmark for further study. Our approach
is evaluated on three coding tasks, including API-seq recommendation,
line-level code generation, and pseudocode-to-code transaction. The
experimental results show that the proposed approach outperforms the state of
the arts by a significant margin in both effectiveness and efficiency measures.
In addition, we demonstrate the usages of \textsc{MENT} for LLM reasoning in
software engineering. By editing LLM knowledge, the directly or indirectly
dependent behaviors of API invocation in the chain-of-thought will change
accordingly. It explained the significance of repairing LLMs.
[COMMENTS]
12 pages, 6 figures, 6 tables, under peer-review
[LINK]
http://arxiv.org/abs/2312.05356v3
[DATE]
2024-04-15 15:31:00+08:00
[CATEGORIES]
cs.CL
cs.LG
State Space Model for New-Generation Network Alternative to Transformers: A Survey
[AUTHORS]
Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang
[ABSTRACT]
In the post-deep learning era, the Transformer architecture has demonstrated
its powerful performance across pre-trained big models and various downstream
tasks. However, the enormous computational demands of this architecture have
deterred many researchers. To further reduce the complexity of attention
models, numerous efforts have been made to design more efficient methods. Among
them, the State Space Model (SSM), as a possible replacement for the
self-attention based Transformer model, has drawn more and more attention in
recent years. In this paper, we give the first comprehensive review of these
works and also provide experimental comparisons and analysis to better
demonstrate the features and advantages of SSM. Specifically, we first give a
detailed description of principles to help the readers quickly capture the key
ideas of SSM. After that, we dive into the reviews of existing SSMs and their
various applications, including natural language processing, computer vision,
graph, multi-modal and multi-media, point cloud/event stream, time series data,
and other domains. In addition, we give statistical comparisons and analysis of
these models and hope it helps the readers to understand the effectiveness of
different structures on various tasks. Then, we propose possible research
points in this direction to better promote the development of the theoretical
model and application of SSM. More related works will be continuously updated
on the following GitHub:
https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.
[COMMENTS]
The First review of State Space Model (SSM)/Mamba and their
applications in artificial intelligence, 33 pages
[LINK]
http://arxiv.org/abs/2404.09516v1
[DATE]
2024-04-15 15:24:45+08:00
[CATEGORIES]
cs.LG
cs.CL
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
[AUTHORS]
Fangkai Jiao, Chengwei Qin, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty
[ABSTRACT]
Large Language Models (LLMs) have demonstrated significant potential in
handling complex reasoning tasks through step-by-step rationale generation.
However, recent studies have raised concerns regarding the hallucination and
flaws in their reasoning process. Substantial efforts are being made to improve
the reliability and faithfulness of the generated rationales. Some approaches
model reasoning as planning, while others focus on annotating for process
supervision. Nevertheless, the planning-based search process often results in
high latency due to the frequent assessment of intermediate reasoning states
and the extensive exploration space. Additionally, supervising the reasoning
process with human annotation is costly and challenging to scale for LLM
training. To address these issues, in this paper, we propose a framework to
learn planning-based reasoning through Direct Preference Optimization (DPO) on
collected trajectories, which are ranked according to synthesized process
rewards. Our results on challenging logical reasoning benchmarks demonstrate
the effectiveness of our learning framework, showing that our 7B model can
surpass the strong counterparts like GPT-3.5-Turbo.
[COMMENTS]
17 pages, 9 figures
[LINK]
http://arxiv.org/abs/2402.00658v2
[DATE]
2024-04-15 14:36:24+08:00
[CATEGORIES]
cs.CL
A Novel Paradigm Boosting Translation Capabilities of Large Language Models
[AUTHORS]
Jiaxin Guo, Hao Yang, Zongyao Li, Daimeng Wei, Hengchao Shang, Xiaoyu Chen
[ABSTRACT]
This paper presents a study on strategies to enhance the translation
capabilities of large language models (LLMs) in the context of machine
translation (MT) tasks. The paper proposes a novel paradigm consisting of three
stages: Secondary Pre-training using Extensive Monolingual Data, Continual
Pre-training with Interlinear Text Format Documents, and Leveraging
Source-Language Consistent Instruction for Supervised Fine-Tuning. Previous
research on LLMs focused on various strategies for supervised fine-tuning
(SFT), but their effectiveness has been limited. While traditional machine
translation approaches rely on vast amounts of parallel bilingual data, our
paradigm highlights the importance of using smaller sets of high-quality
bilingual data. We argue that the focus should be on augmenting LLMs’
cross-lingual alignment abilities during pre-training rather than solely
relying on extensive bilingual data during SFT. Experimental results conducted
using the Llama2 model, particularly on Chinese-Llama2 after monolingual
augmentation, demonstrate the improved translation capabilities of LLMs. A
significant contribution of our approach lies in Stage2: Continual Pre-training
with Interlinear Text Format Documents, which requires less than 1B training
data, making our method highly efficient. Additionally, in Stage3, we observed
that setting instructions consistent with the source language benefits the
supervised fine-tuning process. Experimental results demonstrate that our
approach surpasses previous work and achieves superior performance compared to
models such as NLLB-54B and GPT3.5-text-davinci-003, despite having a
significantly smaller parameter count of only 7B or 13B. This achievement
establishes our method as a pioneering strategy in the field of machine
translation.
[COMMENTS]
Accepted in NAACL 2024
[LINK]
http://arxiv.org/abs/2403.11430v2
[DATE]
2024-04-15 14:34:04+08:00
[CATEGORIES]
cs.CL
Bridging the Gap between Different Vocabularies for LLM Ensemble
[AUTHORS]
Yangyifan Xu, Jinliang Lu, Jiajun Zhang
[ABSTRACT]
Ensembling different large language models (LLMs) to unleash their
complementary potential and harness their individual strengths is highly
valuable. Nevertheless, vocabulary discrepancies among various LLMs have
constrained previous studies to either selecting or blending completely
generated outputs. This limitation hinders the dynamic correction and
enhancement of outputs during the generation process, resulting in a limited
capacity for effective ensemble. To address this issue, we propose a novel
method to Ensemble LLMs via Vocabulary Alignment (EVA). EVA bridges the lexical
gap among various LLMs, enabling meticulous ensemble at each generation step.
Specifically, we first learn mappings between the vocabularies of different
LLMs with the assistance of overlapping tokens. Subsequently, these mappings
are employed to project output distributions of LLMs into a unified space,
facilitating a fine-grained ensemble. Finally, we design a filtering strategy
to exclude models that generate unfaithful tokens. Experimental results on
commonsense reasoning, arithmetic reasoning, machine translation, and
data-to-text generation tasks demonstrate the superiority of our approach
compared with individual LLMs and previous ensemble methods conducted on
complete outputs. Further analyses confirm that our approach can leverage
knowledge from different language models and yield consistent improvement.
[COMMENTS]
Accepted to the main conference of NAACL 2024
[LINK]
http://arxiv.org/abs/2404.09492v1
[DATE]
2024-04-15 14:28:20+08:00
[CATEGORIES]
cs.CL
MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems
[AUTHORS]
Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Jing Ma
[ABSTRACT]
Programming often involves converting detailed and complex specifications
into code, a process during which developers typically utilize visual aids to
more effectively convey concepts. While recent developments in Large Multimodal
Models have demonstrated remarkable abilities in visual reasoning and
mathematical tasks, there is little work on investigating whether these models
can effectively interpret visual elements for code generation. To this end, we
present MMCode, the first multi-modal coding dataset for evaluating algorithmic
problem-solving skills in visually rich contexts. MMCode contains 3,548
questions and 6,620 images collected from real-world programming challenges
harvested from 10 code competition websites, presenting significant challenges
due to the extreme demand for reasoning abilities. Our experiment results show
that current state-of-the-art models struggle to solve these problems. The
results highlight the lack of powerful vision-code models, and we hope MMCode
can serve as an inspiration for future works in this domain. The data and code
are publicly available at https://github.com/happylkx/MMCode.
[COMMENTS]
46 pages, 21 figures and 6 tables
[LINK]
http://arxiv.org/abs/2404.09486v1
[DATE]
2024-04-15 14:15:46+08:00
[CATEGORIES]
cs.CL
Mitigating Hallucination in Abstractive Summarization with Domain-Conditional Mutual Information
[AUTHORS]
Kyubyung Chae, Jaepill Choi, Yohan Jo, Taesup Kim
[ABSTRACT]
A primary challenge in abstractive summarization is hallucination – the
phenomenon where a model generates plausible text that is absent in the source
text. We hypothesize that the domain (or topic) of the source text triggers the
model to generate text that is highly probable in the domain, neglecting the
details of the source text. To alleviate this model bias, we introduce a
decoding strategy based on domain-conditional pointwise mutual information.
This strategy adjusts the generation probability of each token by comparing it
with the token’s marginal probability within the domain of the source text.
According to evaluation on the XSUM dataset, our method demonstrates
improvement in terms of faithfulness and source relevance. The code is publicly
available at \url{https://github.com/qqplot/dcpmi}.
[COMMENTS]
Accepted by Findings of NAACL 2024
[LINK]
http://arxiv.org/abs/2404.09480v1
[DATE]
2024-04-15 14:06:43+08:00
[CATEGORIES]
cs.CL
Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions
[AUTHORS]
Jiahuan Li, Hao Zhou, Shujian Huang, Shanbo Cheng, Jiajun Chen
[ABSTRACT]
Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4, have
shown strong abilities in multilingual translations, without being explicitly
trained on parallel corpora. It is interesting how the LLMs obtain their
ability to carry out translation instructions for different languages. In this
paper, we present a detailed analysis by finetuning a multilingual pretrained
language model, XGLM-7B, to perform multilingual translation following given
instructions. Firstly, we show that multilingual LLMs have stronger translation
abilities than previously demonstrated. For a certain language, the performance
depends on its similarity to English and the amount of data used in the
pretraining phase. Secondly, we find that LLMs’ ability to carry out
translation instructions relies on the understanding of translation
instructions and the alignment among different languages. With multilingual
finetuning, LLMs could learn to perform the translation task well even for
those language pairs unseen during the instruction tuning phase.
[COMMENTS]
accepted by Transaction of ACL, pre-MIT version
[LINK]
http://arxiv.org/abs/2305.15083v4
[DATE]
2024-04-15 14:02:59+08:00
[CATEGORIES]
cs.CL
Rectifying Demonstration Shortcut in In-Context Learning
[AUTHORS]
Joonwon Jang, Sanghwan Jang, Wonbin Kweon, Minjin Jeon, Hwanjo Yu
[COMMENTS]
NAACL 2024
[LINK]
http://arxiv.org/abs/2403.09488v3
[DATE]
2024-04-15 12:29:33+08:00
[CATEGORIES]
cs.CL
Flames: Benchmarking Value Alignment of LLMs in Chinese
[AUTHORS]
Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin
[ABSTRACT]
The widespread adoption of large language models (LLMs) across various
regions underscores the urgent need to evaluate their alignment with human
values. Current benchmarks, however, fall short of effectively uncovering
safety vulnerabilities in LLMs. Despite numerous models achieving high scores
and ‘topping the chart’ in these evaluations, there is still a significant gap
in LLMs’ deeper alignment with human values and achieving genuine harmlessness.
To this end, this paper proposes a value alignment benchmark named Flames,
which encompasses both common harmlessness principles and a unique morality
dimension that integrates specific Chinese values such as harmony. Accordingly,
we carefully design adversarial prompts that incorporate complex scenarios and
jailbreaking methods, mostly with implicit malice. By prompting 17 mainstream
LLMs, we obtain model responses and rigorously annotate them for detailed
evaluation. Our findings indicate that all the evaluated LLMs demonstrate
relatively poor performance on Flames, particularly in the safety and fairness
dimensions. We also develop a lightweight specified scorer capable of scoring
LLMs across multiple dimensions to efficiently evaluate new models on the
benchmark. The complexity of Flames has far exceeded existing benchmarks,
setting a new challenge for contemporary LLMs and highlighting the need for
further alignment of LLMs. Our benchmark is publicly available at
https://github.com/AIFlames/Flames.
[COMMENTS]
Accepted to the NAACL 2024
[LINK]
http://arxiv.org/abs/2311.06899v4
[DATE]
2024-04-15 12:18:59+08:00
[CATEGORIES]
cs.CL
Recommender Systems in the Era of Large Language Models (LLMs)
[AUTHORS]
Wenqi Fan, Zihuai Zhao, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, Qing Li
[ABSTRACT]
With the prosperity of e-commerce and web applications, Recommender Systems
(RecSys) have become an important component of our daily life, providing
personalized suggestions that cater to user preferences. While Deep Neural
Networks (DNNs) have made significant advancements in enhancing recommender
systems by modeling user-item interactions and incorporating textual side
information, DNN-based methods still face limitations, such as difficulties in
understanding users’ interests and capturing textual side information,
inabilities in generalizing to various recommendation scenarios and reasoning
on their predictions, etc. Meanwhile, the emergence of Large Language Models
(LLMs), such as ChatGPT and GPT4, has revolutionized the fields of Natural
Language Processing (NLP) and Artificial Intelligence (AI), due to their
remarkable abilities in fundamental responsibilities of language understanding
and generation, as well as impressive generalization and reasoning
capabilities. As a result, recent studies have attempted to harness the power
of LLMs to enhance recommender systems. Given the rapid evolution of this
research direction in recommender systems, there is a pressing need for a
systematic overview that summarizes existing LLM-empowered recommender systems,
to provide researchers in relevant fields with an in-depth understanding.
Therefore, in this paper, we conduct a comprehensive review of LLM-empowered
recommender systems from various aspects including Pre-training, Fine-tuning,
and Prompting. More specifically, we first introduce representative methods to
harness the power of LLMs (as a feature encoder) for learning representations
of users and items. Then, we review recent techniques of LLMs for enhancing
recommender systems from three paradigms, namely pre-training, fine-tuning, and
prompting. Finally, we comprehensively discuss future directions in this
emerging field.
[COMMENTS]
Accepted by IEEE TKDE
[LINK]
http://arxiv.org/abs/2307.02046v3
[DATE]
2024-04-15 12:18:34+08:00
[CATEGORIES]
cs.CL
Can LLM-Generated Misinformation Be Detected?
[AUTHORS]
Canyu Chen, Kai Shu
[COMMENTS]
Accepted to Proceedings of ICLR 2024. 9 pages for main paper, 38
pages including appendix. The code, results, dataset for this paper and more
resources on “LLMs Meet Misinformation” have been released on the project
website: https://llm-misinformation.github.io/
[LINK]
http://arxiv.org/abs/2309.13788v4
[DATE]
2024-04-15 11:01:09+08:00
[CATEGORIES]
cs.CL
cs.LG
Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM
[AUTHORS]
Ruohong Zhang, Yau-Shian Wang, Yiming Yang
[ABSTRACT]
The remarkable performance of large language models (LLMs) in zero-shot
language understanding has garnered significant attention. However, employing
LLMs for large-scale inference or domain-specific fine-tuning requires immense
computational resources due to their substantial model size. To overcome these
limitations, we introduce a novel method, namely GenCo, which leverages the
strong generative power of LLMs to assist in training a smaller and more
adaptable language model. In our method, an LLM plays an important role in the
self-training loop of a smaller model in two important ways. Firstly, the LLM
is used to augment each input instance with a variety of possible
continuations, enriching its semantic context for better understanding.
Secondly, it helps crafting additional high-quality training pairs, by
rewriting input texts conditioned on predicted labels. This ensures the
generated texts are highly relevant to the predicted labels, alleviating the
prediction error during pseudo-labeling, while reducing the dependency on large
volumes of unlabeled text. In our experiments, GenCo outperforms previous
state-of-the-art methods when only limited ($<5\%$ of original) in-domain text
data is available. Notably, our approach surpasses the performance of Alpaca-7B
with human prompts, highlighting the potential of leveraging LLM for
self-training.
[LINK]
http://arxiv.org/abs/2304.11872v2
[DATE]
2024-04-15 10:40:54+08:00
[CATEGORIES]
cs.CL
SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation
[AUTHORS]
Adrián Bazaga, Pietro Liò, Gos Micklem
[ABSTRACT]
In recent years, there has been growing interest in text-to-SQL translation,
which is the task of converting natural language questions into executable SQL
queries. This technology is important for its potential to democratize data
extraction from databases. However, some of its key hurdles include domain
generalisation, which is the ability to adapt to previously unseen databases,
and alignment of natural language questions with the corresponding SQL queries.
To overcome these challenges, we introduce SQLformer, a novel Transformer
architecture specifically crafted to perform text-to-SQL translation tasks. Our
model predicts SQL queries as abstract syntax trees (ASTs) in an autoregressive
way, incorporating structural inductive bias in the encoder and decoder layers.
This bias, guided by database table and column selection, aids the decoder in
generating SQL query ASTs represented as graphs in a Breadth-First Search
canonical order. Comprehensive experiments show the state-of-the-art
performance of SQLformer across five widely used text-to-SQL benchmarks. Our
implementation is available at https://github.com/AdrianBZG/SQLformer.
[COMMENTS]
13 pages, 4 figures, 8 tables
[LINK]
http://arxiv.org/abs/2310.18376v3
[DATE]
2024-04-15 10:26:43+08:00
[CATEGORIES]
cs.CL
cs.LG
A Large-Scale Evaluation of Speech Foundation Models
[AUTHORS]
Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee
[ABSTRACT]
The foundation model paradigm leverages a shared foundation model to achieve
state-of-the-art (SOTA) performance for various tasks, requiring minimal
downstream-specific modeling and data annotation. This approach has proven
crucial in the field of Natural Language Processing (NLP). However, the speech
processing community lacks a similar setup to explore the paradigm
systematically. In this work, we establish the Speech processing Universal
PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for
speech. We propose a unified multi-tasking framework to address speech
processing tasks in SUPERB using a frozen foundation model followed by
task-specialized, lightweight prediction heads. Combining our results with
community submissions, we verify that the foundation model paradigm is
promising for speech, and our multi-tasking framework is simple yet effective,
as the best-performing foundation model shows competitive generalizability
across most SUPERB tasks. For reproducibility and extensibility, we have
developed a long-term maintained platform that enables deterministic
benchmarking, allows for result sharing via an online leaderboard, and promotes
collaboration through a community-driven benchmark database to support new
development cycles. Finally, we conduct a series of analyses to offer an
in-depth understanding of SUPERB and speech foundation models, including
information flows across tasks inside the models, the correctness of the
weighted-sum benchmarking protocol and the statistical significance and
robustness of the benchmark.
[COMMENTS]
The extended journal version for SUPERB and SUPERB-SG. Accepted to
TASLP. The arxiv version is further refined
[LINK]
http://arxiv.org/abs/2404.09385v1
[DATE]
2024-04-15 08:03:16+08:00
[CATEGORIES]
cs.CL
Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields
[AUTHORS]
Ryan Cotterell, Kevin Duh
[ABSTRACT]
Low-resource named entity recognition is still an open problem in NLP. Most
state-of-the-art systems require tens of thousands of annotated sentences in
order to obtain high performance. However, for most of the world’s languages,
it is unfeasible to obtain such annotation. In this paper, we present a
transfer learning scheme, whereby we train character-level neural CRFs to
predict named entities for both high-resource languages and low resource
languages jointly. Learning character representations for multiple related
languages allows transfer among the languages, improving F1 by up to 9.8 points
over a loglinear CRF baseline.
[COMMENTS]
IJCNLP 2017
[LINK]
http://arxiv.org/abs/2404.09383v1
[DATE]
2024-04-15 07:44:49+08:00
[CATEGORIES]
cs.CL
Raidar: geneRative AI Detection viA Rewriting
[AUTHORS]
Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang
[COMMENTS]
Accepted by ICLR 2024, Large Language Models, Detection
[LINK]
http://arxiv.org/abs/2401.12970v2
[DATE]
2024-04-15 06:34:37+08:00
[CATEGORIES]
cs.CL
The Effect of Data Partitioning Strategy on Model Generalizability: A Case Study of Morphological Segmentation
[AUTHORS]
Zoey Liu, Bonnie J. Dorr
[ABSTRACT]
Recent work to enhance data partitioning strategies for more realistic model
evaluation face challenges in providing a clear optimal choice. This study
addresses these challenges, focusing on morphological segmentation and
synthesizing limitations related to language diversity, adoption of multiple
datasets and splits, and detailed model comparisons. Our study leverages data
from 19 languages, including ten indigenous or endangered languages across 10
language families with diverse morphological systems (polysynthetic, fusional,
and agglutinative) and different degrees of data availability. We conduct
large-scale experimentation with varying sized combinations of training and
evaluation sets as well as new test data. Our results show that, when faced
with new test data: (1) models trained from random splits are able to achieve
higher numerical scores; (2) model rankings derived from random splits tend to
generalize more consistently.
[COMMENTS]
Accepted to 2024 Annual Conference of the North American Chapter of
the Association for Computational Linguistics (16 pages including 9 tables
and 1 figure)
[LINK]
http://arxiv.org/abs/2404.09371v1
[DATE]
2024-04-15 06:22:58+08:00
[CATEGORIES]
cs.CL
Understanding the Role of Temperature in Diverse Question Generation by GPT-4
[AUTHORS]
Arav Agarwal, Karthik Mittal, Aidan Doyle, Pragnya Sridhar, Zipiao Wan, Jacob Arthur Doughty, Jaromir Savelka, Majd Sakr
[ABSTRACT]
We conduct a preliminary study of the effect of GPT’s temperature parameter
on the diversity of GPT4-generated questions. We find that using higher
temperature values leads to significantly higher diversity, with different
temperatures exposing different types of similarity between generated sets of
questions. We also demonstrate that diverse question generation is especially
difficult for questions targeting lower levels of Bloom’s Taxonomy.
[LINK]
http://arxiv.org/abs/2404.09366v1
[DATE]
2024-04-15 05:38:50+08:00
[CATEGORIES]
cs.CL
Detection of ChatGPT Fake Science with the xFakeSci Learning Algorithm
[AUTHORS]
Ahmed Abdeen Hamed, Xindong Wu
[ABSTRACT]
Generative AI tools exemplified by ChatGPT are becoming a new reality. This
study is motivated by the premise that ``AI generated content may exhibit a
distinctive behavior that can be separated from scientific articles’’. In this
study, we show how articles can be generated using means of prompt engineering
for various diseases and conditions. We then show how we tested this premise in
two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel
learning algorithm, that is capable of distinguishing ChatGPT-generated
articles from publications produced by scientists. The algorithm is trained
using network models driven from both sources. As for the classification step,
it was performed using 300 articles per condition. The actual label steps took
place against an equal mix of 50 generated articles and 50 authentic PubMed
abstracts. The testing also spanned publication periods from 2010 to 2024 and
encompassed research on three distinct diseases: cancer, depression, and
Alzheimer’s. Further, we evaluated the accuracy of the xFakeSci algorithm
against some of the classical data mining algorithms (e.g., Support Vector
Machines, Regression, and Naive Bayes). The xFakeSci algorithm achieved F1
scores ranging from 80% to 94%, outperforming common data mining algorithms,
which scored F1 values between 38% and 52%. We attribute the noticeable
difference to the introduction of calibration and a proximity distance
heuristic, which underscores this promising performance. Indeed, the prediction
of fake science generated by ChatGPT presents a considerable challenge.
Nonetheless, the introduction of the xFakeSci algorithm is a significant step
on the way to combating fake science.
[COMMENTS]
18 pages, 8 figures, 8 tables, 5 algorithms
[LINK]
http://arxiv.org/abs/2308.11767v4
[DATE]
2024-04-15 05:20:58+08:00
[CATEGORIES]
cs.CL
Rethinking ASTE: A Minimalist Tagging Scheme Alongside Contrastive Learning
[AUTHORS]
Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu
[ABSTRACT]
Aspect Sentiment Triplet Extraction (ASTE) is a burgeoning subtask of
fine-grained sentiment analysis, aiming to extract structured sentiment
triplets from unstructured textual data. Existing approaches to ASTE often
complicate the task with additional structures or external data. In this
research, we propose a novel tagging scheme and employ a contrastive learning
approach to mitigate these challenges. The proposed approach demonstrates
comparable or superior performance in comparison to state-of-the-art
techniques, while featuring a more compact design and reduced computational
overhead. Notably, even in the era of Large Language Models (LLMs), our method
exhibits superior efficacy compared to GPT 3.5 and GPT 4 in a few-shot learning
scenarios. This study also provides valuable insights for the advancement of
ASTE techniques within the paradigm of large language models.
[LINK]
http://arxiv.org/abs/2403.07342v2
[DATE]
2024-04-15 04:53:02+08:00
[CATEGORIES]
cs.CL
LLeMpower: Understanding Disparities in the Control and Access of Large Language Models
[AUTHORS]
Vishwas Sathish, Hannah Lin, Aditya K Kamath, Anish Nyayachavadi
[ABSTRACT]
Large Language Models (LLMs) are a powerful technology that augment human
skill to create new opportunities, akin to the development of steam engines and
the internet. However, LLMs come with a high cost. They require significant
computing resources and energy to train and serve. Inequity in their control
and access has led to concentration of ownership and power to a small
collection of corporations. In our study, we collect training and inference
requirements for various LLMs. We then analyze the economic strengths of
nations and organizations in the context of developing and serving these
models. Additionally, we also look at whether individuals around the world can
access and use this emerging technology. We compare and contrast these groups
to show that these technologies are monopolized by a surprisingly few entities.
We conclude with a qualitative study on the ethical implications of our
findings and discuss future directions towards equity in LLM access.
[COMMENTS]
11 total pages, 7 page text, 4 page references, 3 figures (with
subfigures), 1 table
[LINK]
http://arxiv.org/abs/2404.09356v1
[DATE]
2024-04-15 04:49:53+08:00
[CATEGORIES]
cs.CL
Towards Practical Tool Usage for Continually Learning LLMs
[AUTHORS]
Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar
[ABSTRACT]
Large language models (LLMs) show an innate skill for solving language based
tasks. But insights have suggested an inability to adjust for information or
task-solving skills becoming outdated, as their knowledge, stored directly
within their parameters, remains static in time. Tool use helps by offloading
work to systems that the LLM can access through an interface, but LLMs that use
them still must adapt to nonstationary environments for prolonged use, as new
tools can emerge and existing tools can change. Nevertheless, tools require
less specialized knowledge, therefore we hypothesize they are better suited for
continual learning (CL) as they rely less on parametric memory for solving
tasks and instead focus on learning when to apply pre-defined tools. To verify
this, we develop a synthetic benchmark and follow this by aggregating existing
NLP tasks to form a more realistic testing scenario. While we demonstrate
scaling model size is not a solution, regardless of tool usage, continual
learning techniques can enable tool LLMs to both adapt faster while forgetting
less, highlighting their potential as continual learners.
[COMMENTS]
20 pages, 11 tables, 7 figures
[LINK]
http://arxiv.org/abs/2404.09339v1
[DATE]
2024-04-15 03:45:47+08:00
[CATEGORIES]
cs.CL
cs.LG
Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
[AUTHORS]
Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu
[ABSTRACT]
Large language models (LLMs) exhibit impressive natural language capabilities
but suffer from hallucination – generating content ungrounded in the realities
of training data. Recent work has focused on decoding techniques to improve
factuality during inference by leveraging LLMs’ hierarchical representation of
factual knowledge, manipulating the predicted distributions at inference time.
Current state-of-the-art approaches refine decoding by contrasting early-exit
distributions from a lower layer with the final layer to exploit information
related to factuality within the model forward procedure. However, such methods
often assume the final layer is the most reliable and the lower layer selection
process depends on it. In this work, we first propose extrapolation of critical
token probabilities beyond the last layer for more accurate contrasting. We
additionally employ layer-wise entropy-guided lower layer selection, decoupling
the selection process from the final layer. Experiments demonstrate strong
performance - surpassing state-of-the-art on multiple different datasets by
large margins. Analyses show different kinds of prompts respond to different
selection strategies.
[COMMENTS]
Work in Progress
[LINK]
http://arxiv.org/abs/2404.09338v1
[DATE]
2024-04-15 03:45:35+08:00
[CATEGORIES]
cs.CL
Contextual Label Projection for Cross-Lingual Structured Prediction
[AUTHORS]
Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, Nanyun Peng
[ABSTRACT]
Label projection, which involves obtaining translated labels and texts
jointly, is essential for leveraging machine translation to facilitate
cross-lingual transfer in structured prediction tasks. Prior research exploring
label projection often compromise translation accuracy by favoring simplified
label translation or relying solely on word-level alignments. In this paper, we
introduce a novel label projection approach, CLaP, which translates text to the
target language and performs contextual translation on the labels using the
translated text as the context, ensuring better accuracy for the translated
labels. We leverage instruction-tuned language models with multilingual
capabilities as our contextual translator, imposing the constraint of the
presence of translated labels in the translated text via instructions. We
benchmark CLaP with other label projection techniques on zero-shot
cross-lingual transfer across 39 languages on two representative structured
prediction tasks - event argument extraction (EAE) and named entity recognition
(NER), showing over 2.4 F1 improvement for EAE and 1.4 F1 improvement for NER.
We further explore the applicability of CLaP on ten extremely low-resource
languages to showcase its potential for cross-lingual structured prediction.
[COMMENTS]
Accepted at NAACL 2024
[LINK]
http://arxiv.org/abs/2309.08943v3
[DATE]
2024-04-15 03:38:57+08:00
[CATEGORIES]
cs.CL
Self-Selected Attention Span for Accelerating Large Language Model Inference
[AUTHORS]
Tian Jin, Wanzin Yazar, Zifei Xu, Sayeh Sharify, Xin Wang
[ABSTRACT]
Large language models (LLMs) can solve challenging tasks. However, their
inference computation on modern GPUs is highly inefficient due to the
increasing number of tokens they must attend to as they generate new ones. To
address this inefficiency, we capitalize on LLMs’ problem-solving capabilities
to optimize their own inference-time efficiency. We demonstrate with two
specific tasks: (a) evaluating complex arithmetic expressions and (b)
summarizing news articles. For both tasks, we create custom datasets to
fine-tune an LLM. The goal of fine-tuning is twofold: first, to make the LLM
learn to solve the evaluation or summarization task, and second, to train it to
identify the minimal attention spans required for each step of the task. As a
result, the fine-tuned model is able to convert these self-identified minimal
attention spans into sparse attention masks on-the-fly during inference. We
develop a custom CUDA kernel to take advantage of the reduced context to attend
to. We demonstrate that using this custom CUDA kernel improves the throughput
of LLM inference by 28%. Our work presents an end-to-end demonstration showing
that training LLMs to self-select their attention spans speeds up
autoregressive inference in solving real-world tasks.
[LINK]
http://arxiv.org/abs/2404.09336v1
[DATE]
2024-04-15 03:36:04+08:00
[CATEGORIES]
cs.CL
Good Books are Complex Matters: Gauging Complexity Profiles Across Diverse Categories of Perceived Literary Quality
[AUTHORS]
Yuri Bizzoni, Pascale Feldkamp, Ida Marie Lassen, Mia Jacobsen, Mads Rosendahl Thomsen, Kristoffer Nielbo
[ABSTRACT]
In this study, we employ a classification approach to show that different
categories of literary “quality” display unique linguistic profiles, leveraging
a corpus that encompasses titles from the Norton Anthology, Penguin Classics
series, and the Open Syllabus project, contrasted against contemporary
bestsellers, Nobel prize winners and recipients of prestigious literary awards.
Our analysis reveals that canonical and so called high-brow texts exhibit
distinct textual features when compared to other quality categories such as
bestsellers and popular titles as well as to control groups, likely responding
to distinct (but not mutually exclusive) models of quality. We apply a classic
machine learning approach, namely Random Forest, to distinguish quality novels
from “control groups”, achieving up to 77\% F1 scores in differentiating
between the categories. We find that quality category tend to be easier to
distinguish from control groups than from other quality categories, suggesting
than literary quality features might be distinguishable but shared through
quality proxies.
[LINK]
http://arxiv.org/abs/2404.04022v2
[DATE]
2024-04-15 01:30:24+08:00
[CATEGORIES]
cs.CL
Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A~Case~Study~at~HCMUT
[AUTHORS]
Tuan Bui, Oanh Tran, Phuong Nguyen, Bao Ho, Long Nguyen, Thang Bui, Tho Quan
[ABSTRACT]
In today’s rapidly evolving landscape of Artificial Intelligence, large
language models (LLMs) have emerged as a vibrant research topic. LLMs find
applications in various fields and contribute significantly. Despite their
powerful language capabilities, similar to pre-trained language models (PLMs),
LLMs still face challenges in remembering events, incorporating new
information, and addressing domain-specific issues or hallucinations. To
overcome these limitations, researchers have proposed Retrieval-Augmented
Generation (RAG) techniques, some others have proposed the integration of LLMs
with Knowledge Graphs (KGs) to provide factual context, thereby improving
performance and delivering more accurate feedback to user queries.
Education plays a crucial role in human development and progress. With the
technology transformation, traditional education is being replaced by digital
or blended education. Therefore, educational data in the digital environment is
increasing day by day. Data in higher education institutions are diverse,
comprising various sources such as unstructured/structured text, relational
databases, web/app-based API access, etc. Constructing a Knowledge Graph from
these cross-data sources is not a simple task. This article proposes a method
for automatically constructing a Knowledge Graph from multiple data sources and
discusses some initial applications (experimental trials) of KG in conjunction
with LLMs for question-answering tasks.
[COMMENTS]
8 pages, 7 figures
[LINK]
http://arxiv.org/abs/2404.09296v1
[DATE]
2024-04-15 00:34:31+08:00
[CATEGORIES]
cs.CL
ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation
[AUTHORS]
Divyang Doshi, Jung-Eun Kim
[ABSTRACT]
In this research, we propose an innovative method to boost Knowledge
Distillation efficiency without the need for resource-heavy teacher models.
Knowledge Distillation trains a smaller “student” model with guidance from a
larger “teacher” model, which is computationally costly. However, the main
benefit comes from the soft labels provided by the teacher, helping the student
grasp nuanced class similarities. In our work, we propose an efficient method
for generating these soft labels, thereby eliminating the need for a large
teacher model. We employ a compact autoencoder to extract essential features
and calculate similarity scores between different classes. Afterward, we apply
the softmax function to these similarity scores to obtain a soft probability
vector. This vector serves as valuable guidance during the training of the
student model. Our extensive experiments on various datasets, including
CIFAR-100, Tiny Imagenet, and Fashion MNIST, demonstrate the superior resource
efficiency of our approach compared to traditional knowledge distillation
methods that rely on large teacher models. Importantly, our approach
consistently achieves similar or even superior performance in terms of model
accuracy. We also perform a comparative study with various techniques recently
developed for knowledge distillation showing our approach achieves competitive
performance with using significantly less resources. We also show that our
approach can be easily added to any logit based knowledge distillation method.
This research contributes to making knowledge distillation more accessible and
cost-effective for practical applications, making it a promising avenue for
improving the efficiency of model training. The code for this work is available
at, https://github.com/JEKimLab/ReffAKD.
[LINK]
http://arxiv.org/abs/2404.09886v1
[DATE]
2024-04-15 23:54:30+08:00
[CATEGORIES]
cs.LG
Backdoor Federated Learning by Poisoning Backdoor-Critical Layers
[AUTHORS]
Haomin Zhuang, Mingxian Yu, Hao Wang, Yang Hua, Jian Li, Xu Yuan
[COMMENTS]
Accepted to ICLR‘24
[LINK]
http://arxiv.org/abs/2308.04466v3
[DATE]
2024-04-15 23:52:41+08:00
[CATEGORIES]
cs.LG
A probabilistic, data-driven closure model for RANS simulations with aleatoric, model uncertainty
[AUTHORS]
Atul Agrawal, Phaedon-Stelios Koutsourelakis
[ABSTRACT]
We propose a data-driven, closure model for Reynolds-averaged Navier-Stokes
(RANS) simulations that incorporates aleatoric, model uncertainty. The proposed
closure consists of two parts. A parametric one, which utilizes previously
proposed, neural-network-based tensor basis functions dependent on the rate of
strain and rotation tensor invariants. This is complemented by latent, random
variables which account for aleatoric model errors. A fully Bayesian
formulation is proposed, combined with a sparsity-inducing prior in order to
identify regions in the problem domain where the parametric closure is
insufficient and where stochastic corrections to the Reynolds stress tensor are
needed. Training is performed using sparse, indirect data, such as mean
velocities and pressures, in contrast to the majority of alternatives that
require direct Reynolds stress data. For inference and learning, a Stochastic
Variational Inference scheme is employed, which is based on Monte Carlo
estimates of the pertinent objective in conjunction with the reparametrization
trick. This necessitates derivatives of the output of the RANS solver, for
which we developed an adjoint-based formulation. In this manner, the parametric
sensitivities from the differentiable solver can be combined with the built-in,
automatic differentiation capability of the neural network library in order to
enable an end-to-end differentiable framework. We demonstrate the capability of
the proposed model to produce accurate, probabilistic, predictive estimates for
all flow quantities, even in regions where model errors are present, on a
separated flow in the backward-facing step benchmark problem.
[COMMENTS]
31 pages, 10 figures
[LINK]
http://arxiv.org/abs/2307.02432v2
[DATE]
2024-04-15 23:48:39+08:00
[CATEGORIES]
cs.LG
Towards White Box Deep Learning
[AUTHORS]
Maciej Satkiewicz
[ABSTRACT]
This paper introduces semantic features as a candidate conceptual framework
for white-box neural networks. The proof of concept model is well-motivated,
inherently interpretable, has low parameter-count and achieves almost
human-level adversarial test metrics - with no adversarial training! These
results and the general nature of the approach warrant further research on
semantic features. The code is available at
https://github.com/314-Foundation/white-box-nn
[COMMENTS]
15 pages, 12 figures, independent research, v4 changes: Added
visualization of SFmatch and PoC architecture, refactored Introduction, Added
Section 7.2 Limitations, expanded Figure descriptions
[LINK]
http://arxiv.org/abs/2403.09863v4
[DATE]
2024-04-15 23:42:57+08:00
[CATEGORIES]
cs.LG
Efficiently Computable Safety Bounds for Gaussian Processes in Active Learning
[AUTHORS]
Jörn Tebbe, Christoph Zimmer, Ansgar Steland, Markus Lange-Hegermann, Fabian Mies
[ABSTRACT]
Active learning of physical systems must commonly respect practical safety
constraints, which restricts the exploration of the design space. Gaussian
Processes (GPs) and their calibrated uncertainty estimations are widely used
for this purpose. In many technical applications the design space is explored
via continuous trajectories, along which the safety needs to be assessed. This
is particularly challenging for strict safety requirements in GP methods, as it
employs computationally expensive Monte-Carlo sampling of high quantiles. We
address these challenges by providing provable safety bounds based on the
adaptively sampled median of the supremum of the posterior GP. Our method
significantly reduces the number of samples required for estimating high safety
probabilities, resulting in faster evaluation without sacrificing accuracy and
exploration speed. The effectiveness of our safe active learning approach is
demonstrated through extensive simulations and validated using a real-world
engine example.
[COMMENTS]
AISTATS 2024
[LINK]
http://arxiv.org/abs/2402.18260v2
[DATE]
2024-04-15 23:40:06+08:00
[CATEGORIES]
cs.LG
Unsupervised Federated Optimization at the Edge: D2D-Enabled Learning without Labels
[AUTHORS]
Satyavrat Wagle, Seyyedali Hosseinalipour, Naji Khosravan, Christopher G. Brinton
[ABSTRACT]
Federated learning (FL) is a popular solution for distributed machine
learning (ML). While FL has traditionally been studied for supervised ML tasks,
in many applications, it is impractical to assume availability of labeled data
across devices. To this end, we develop Cooperative Federated unsupervised
Contrastive Learning ({\tt CF-CL)} to facilitate FL across edge devices with
unlabeled datasets. {\tt CF-CL} employs local device cooperation where either
explicit (i.e., raw data) or implicit (i.e., embeddings) information is
exchanged through device-to-device (D2D) communications to improve local
diversity. Specifically, we introduce a \textit{smart information push-pull}
methodology for data/embedding exchange tailored to FL settings with either
soft or strict data privacy restrictions. Information sharing is conducted
through a probabilistic importance sampling technique at receivers leveraging a
carefully crafted reserve dataset provided by transmitters. In the implicit
case, embedding exchange is further integrated into the local ML training at
the devices via a regularization term incorporated into the contrastive loss,
augmented with a dynamic contrastive margin to adjust the volume of latent
space explored. Numerical evaluations demonstrate that {\tt CF-CL} leads to
alignment of latent spaces learned across devices, results in faster and more
efficient global model training, and is effective in extreme non-i.i.d. data
distribution settings across devices.
[COMMENTS]
16 pages, 11 figures
[LINK]
http://arxiv.org/abs/2404.09861v1
[DATE]
2024-04-15 23:17:38+08:00
[CATEGORIES]
cs.LG
Statistical learning for constrained functional parameters in infinite-dimensional models with applications in fair machine learning
[AUTHORS]
Razieh Nabi, Nima S. Hejazi, Mark J. van der Laan, David Benkeser
[ABSTRACT]
Constrained learning has become increasingly important, especially in the
realm of algorithmic fairness and machine learning. In these settings,
predictive models are developed specifically to satisfy pre-defined notions of
fairness. Here, we study the general problem of constrained statistical machine
learning through a statistical functional lens. We consider learning a
function-valued parameter of interest under the constraint that one or several
pre-specified real-valued functional parameters equal zero or are otherwise
bounded. We characterize the constrained functional parameter as the minimizer
of a penalized risk criterion using a Lagrange multiplier formulation. We show
that closed-form solutions for the optimal constrained parameter are often
available, providing insight into mechanisms that drive fairness in predictive
models. Our results also suggest natural estimators of the constrained
parameter that can be constructed by combining estimates of unconstrained
parameters of the data generating distribution. Thus, our estimation procedure
for constructing fair machine learning algorithms can be applied in conjunction
with any statistical learning approach and off-the-shelf software. We
demonstrate the generality of our method by explicitly considering a number of
examples of statistical fairness constraints and implementing the approach
using several popular learning approaches.
[LINK]
http://arxiv.org/abs/2404.09847v1
[DATE]
2024-04-15 22:59:21+08:00
[CATEGORIES]
cs.LG
Exploration of the search space of Gaussian graphical models for paired data
[AUTHORS]
Alberto Roverato, Dung Ngoc Nguyen
[ABSTRACT]
We consider the problem of learning a Gaussian graphical model in the case
where the observations come from two dependent groups sharing the same
variables. We focus on a family of coloured Gaussian graphical models
specifically suited for the paired data problem. Commonly, graphical models are
ordered by the submodel relationship so that the search space is a lattice,
called the model inclusion lattice. We introduce a novel order between models,
named the twin order. We show that, embedded with this order, the model space
is a lattice that, unlike the model inclusion lattice, is distributive.
Furthermore, we provide the relevant rules for the computation of the
neighbours of a model. The latter are more efficient than the same operations
in the model inclusion lattice, and are then exploited to achieve a more
efficient exploration of the search space. These results can be applied to
improve the efficiency of both greedy and Bayesian model search procedures.
Here we implement a stepwise backward elimination procedure and evaluate its
performance by means of simulations. Finally, the procedure is applied to learn
a brain network from fMRI data where the two groups correspond to the left and
right hemispheres, respectively.
[LINK]
http://arxiv.org/abs/2303.05561v2
[DATE]
2024-04-15 22:43:05+08:00
[CATEGORIES]
cs.LG
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
[AUTHORS]
Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo
[ABSTRACT]
With the rise of text-to-image (T2I) generative AI models reaching wide
audiences, it is critical to evaluate model robustness against non-obvious
attacks to mitigate the generation of offensive images. By focusing on
“implicitly adversarial” prompts (those that trigger T2I models to generate
unsafe images for non-obvious reasons), we isolate a set of difficult safety
issues that human creativity is well-suited to uncover. To this end, we built
the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing
a diverse set of implicitly adversarial prompts. We have assembled a suite of
state-of-the-art T2I models, employed a simple user interface to identify and
annotate harms, and engaged diverse populations to capture long-tail safety
issues that may be overlooked in standard testing. The challenge is run in
consecutive rounds to enable a sustained discovery and analysis of safety
pitfalls in T2I models.
In this paper, we present an in-depth account of our methodology, a
systematic study of novel attack strategies and discussion of safety failures
revealed by challenge participants. We also release a companion visualization
tool for easy exploration and derivation of insights from the dataset. The
first challenge round resulted in over 10k prompt-image pairs with machine
annotations for safety. A subset of 1.5k samples contains rich human
annotations of harm types and attack styles. We find that 14% of images that
humans consider harmful are mislabeled as “safe” by machines. We have
identified new attack strategies that highlight the complexity of ensuring T2I
model robustness. Our findings emphasize the necessity of continual auditing
and adaptation as new vulnerabilities emerge. We are confident that this work
will enable proactive, iterative safety assessments and promote responsible
development of T2I models.
[COMMENTS]
15 pages, 6 figures
[LINK]
http://arxiv.org/abs/2403.12075v2
[DATE]
2024-04-15 22:41:09+08:00
[CATEGORIES]
cs.LG
No-Regret Algorithms in non-Truthful Auctions with Budget and ROI Constraints
[AUTHORS]
Gagan Aggarwal, Giannis Fikioris, Mingfei Zhao
[ABSTRACT]
Advertisers increasingly use automated bidding to optimize their ad campaigns
on online advertising platforms. Autobidding optimizes an advertiser’s
objective subject to various constraints, e.g. average ROI and budget
constraints. In this paper, we study the problem of designing online
autobidding algorithms to optimize value subject to ROI and budget constraints
when the platform is running any mixture of first and second price auction.
We consider the following stochastic setting: There is an item for sale in
each of $T$ rounds. In each round, buyers submit bids and an auction is run to
sell the item. We focus on one buyer, possibly with budget and ROI constraints.
We assume that the buyer’s value and the highest competing bid are drawn i.i.d.
from some unknown (joint) distribution in each round. We design a low-regret
bidding algorithm that satisfies the buyer’s constraints. Our benchmark is the
objective value achievable by the best possible Lipschitz function that maps
values to bids, which is rich enough to best respond to many different
correlation structures between value and highest competing bid. Our main result
is an algorithm with full information feedback that guarantees a near-optimal
$\tilde O(\sqrt T)$ regret with respect to the best Lipschitz function. Our
result applies to a wide range of auctions, most notably any mixture of first
and second price auctions (price is a convex combination of the first and
second price). In addition, our result holds for both value-maximizing buyers
and quasi-linear utility-maximizing buyers.
We also study the bandit setting, where we show an $\Omega(T^{2/3})$ lower
bound on the regret for first-price auctions, showing a large disparity between
the full information and bandit settings. We also design an algorithm with
$\tilde O(T^{3/4})$ regret, when the value distribution is known and is
independent of the highest competing bid.
[LINK]
http://arxiv.org/abs/2404.09832v1
[DATE]
2024-04-15 22:31:53+08:00
[CATEGORIES]
cs.LG
Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models
[AUTHORS]
Hyeonggeun Yun
[ABSTRACT]
In computer vision, explainable AI (xAI) methods seek to mitigate the
‘black-box’ problem by making the decision-making process of deep learning
models more interpretable and transparent. Traditional xAI methods concentrate
on visualizing input features that influence model predictions, providing
insights primarily suited for experts. In this work, we present an
interaction-based xAI method that enhances user comprehension of image
classification models through their interaction. Thus, we developed a web-based
prototype allowing users to modify images via painting and erasing, thereby
observing changes in classification results. Our approach enables users to
discern critical features influencing the model’s decision-making process,
aligning their mental models with the model’s logic. Experiments conducted with
five images demonstrate the potential of the method to reveal feature
importance through user interaction. Our work contributes a novel perspective
to xAI by centering on end-user engagement and understanding, paving the way
for more intuitive and accessible explainability in AI systems.
[COMMENTS]
5 pages, 2 figures, 1 table
[LINK]
http://arxiv.org/abs/2404.09828v1
[DATE]
2024-04-15 22:26:00+08:00
[CATEGORIES]
cs.LG
A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness
[AUTHORS]
Yuri Kinoshita, Taro Toyoizumi
[ABSTRACT]
While neural networks can enjoy an outstanding flexibility and exhibit
unprecedented performance, the mechanism behind their behavior is still not
well-understood. To tackle this fundamental challenge, researchers have tried
to restrict and manipulate some of their properties in order to gain new
insights and better control on them. Especially, throughout the past few years,
the concept of \emph{bi-Lipschitzness} has been proved as a beneficial
inductive bias in many areas. However, due to its complexity, the design and
control of bi-Lipschitz architectures are falling behind, and a model that is
precisely designed for bi-Lipschitzness realizing a direct and simple control
of the constants along with solid theoretical analysis is lacking. In this
work, we investigate and propose a novel framework for bi-Lipschitzness that
can achieve such a clear and tight control based on convex neural networks and
the Legendre-Fenchel duality. Its desirable properties are illustrated with
concrete experiments. We also apply this framework to uncertainty estimation
and monotone problem settings to illustrate its broad range of applications.
[LINK]
http://arxiv.org/abs/2404.09821v1
[DATE]
2024-04-15 22:21:01+08:00
[CATEGORIES]
cs.LG
FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity
[AUTHORS]
Kai Yi, Nidham Gazagnadou, Peter Richtárik, Lingjuan Lyu
[ABSTRACT]
The interest in federated learning has surged in recent research due to its
unique ability to train a global model using privacy-secured information held
locally on each client. This paper pays particular attention to the issue of
client-side model heterogeneity, a pervasive challenge in the practical
implementation of FL that escalates its complexity. Assuming a scenario where
each client possesses varied memory storage, processing capabilities and
network bandwidth - a phenomenon referred to as system heterogeneity - there is
a pressing need to customize a unique model for each client. In response to
this, we present an effective and adaptable federated framework FedP3,
representing Federated Personalized and Privacy-friendly network Pruning,
tailored for model heterogeneity scenarios. Our proposed methodology can
incorporate and adapt well-established techniques to its specific instances. We
offer a theoretical interpretation of FedP3 and its locally
differential-private variant, DP-FedP3, and theoretically validate their
efficiencies.
[LINK]
http://arxiv.org/abs/2404.09816v1
[DATE]
2024-04-15 22:14:05+08:00
[CATEGORIES]
cs.LG
Solving the Tree Containment Problem Using Graph Neural Networks
[AUTHORS]
Arkadiy Dushatskiy, Esther Julien, Leo van Iersel, Leen Stougie
[ABSTRACT]
Tree Containment is a fundamental problem in phylogenetics useful for
verifying a proposed phylogenetic network, representing the evolutionary
history of certain species. Tree Containment asks whether the given
phylogenetic tree (for instance, constructed from a DNA fragment showing
tree-like evolution) is contained in the given phylogenetic network. In the
general case, this is an NP-complete problem. We propose to solve it
approximately using Graph Neural Networks. In particular, we propose to combine
the given network and the tree and apply a Graph Neural Network to this
network-tree graph. This way, we achieve the capability of solving the tree
containment instances representing a larger number of species than the
instances contained in the training dataset (i.e., our algorithm has the
inductive learning ability). Our algorithm demonstrates an accuracy of over
$95\%$ in solving the tree containment problem on instances with up to 100
leaves.
[LINK]
http://arxiv.org/abs/2404.09812v1
[DATE]
2024-04-15 22:10:06+08:00
[CATEGORIES]
cs.LG
Neighbour-level Message Interaction Encoding for Improved Representation Learning on Graphs
[AUTHORS]
Haimin Zhang, Min Xu
[ABSTRACT]
Message passing has become the dominant framework in graph representation
learning. The essential idea of the message-passing framework is to update node
embeddings based on the information aggregated from local neighbours. However,
most existing aggregation methods have not encoded neighbour-level message
interactions into the aggregated message, resulting in an information lost in
embedding generation. And this information lost could be accumulated and become
more serious as more layers are added to the graph network model. To address
this issue, we propose a neighbour-level message interaction information
encoding method for improving graph representation learning. For messages that
are aggregated at a node, we explicitly generate an encoding between each
message and the rest messages using an encoding function. Then we aggregate
these learned encodings and take the sum of the aggregated encoding and the
aggregated message to update the embedding for the node. By this way,
neighbour-level message interaction information is integrated into the
generated node embeddings. The proposed encoding method is a generic method
which can be integrated into message-passing graph convolutional networks.
Extensive experiments are conducted on six popular benchmark datasets across
four highly-demanded tasks. The results show that integrating neighbour-level
message interactions achieves improved performance of the base models,
advancing the state of the art results for representation learning over graphs.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2404.09809v1
[DATE]
2024-04-15 22:07:33+08:00
[CATEGORIES]
cs.LG
The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs
[AUTHORS]
Saroj Gopali, Akbar S. Namin, Faranak Abri, Keith S. Jones
[ABSTRACT]
Cyber attacks continue to pose significant threats to individuals and
organizations, stealing sensitive data such as personally identifiable
information, financial information, and login credentials. Hence, detecting
malicious websites before they cause any harm is critical to preventing fraud
and monetary loss. To address the increasing number of phishing attacks,
protective mechanisms must be highly responsive, adaptive, and scalable.
Fortunately, advances in the field of machine learning, coupled with access to
vast amounts of data, have led to the adoption of various deep learning models
for timely detection of these cyber crimes. This study focuses on the detection
of phishing websites using deep learning models such as Multi-Head Attention,
Temporal Convolutional Network (TCN), BI-LSTM, and LSTM where URLs of the
phishing websites are treated as a sequence. The results demonstrate that
Multi-Head Attention and BI-LSTM model outperform some other deep
learning-based algorithms such as TCN and LSTM in producing better precision,
recall, and F1-scores.
[LINK]
http://arxiv.org/abs/2404.09802v1
[DATE]
2024-04-15 21:58:22+08:00
[CATEGORIES]
cs.LG
Domain Adaptive Graph Neural Networks for Constraining Cosmological Parameters Across Multiple Data Sets
[AUTHORS]
Andrea Roncoli, Aleksandra Ćiprijanović, Maggie Voetberg, Francisco Villaescusa-Navarro, Brian Nord
[ABSTRACT]
Deep learning models have been shown to outperform methods that rely on
summary statistics, like the power spectrum, in extracting information from
complex cosmological data sets. However, due to differences in the subgrid
physics implementation and numerical approximations across different simulation
suites, models trained on data from one cosmological simulation show a drop in
performance when tested on another. Similarly, models trained on any of the
simulations would also likely experience a drop in performance when applied to
observational data. Training on data from two different suites of the CAMELS
hydrodynamic cosmological simulations, we examine the generalization
capabilities of Domain Adaptive Graph Neural Networks (DA-GNNs). By utilizing
GNNs, we capitalize on their capacity to capture structured scale-free
cosmological information from galaxy distributions. Moreover, by including
unsupervised domain adaptation via Maximum Mean Discrepancy (MMD), we enable
our models to extract domain-invariant features. We demonstrate that DA-GNN
achieves higher accuracy and robustness on cross-dataset tasks (up to $28\%$
better relative error and up to almost an order of magnitude better $\chi^2$).
Using data visualizations, we show the effects of domain adaptation on proper
latent space data alignment. This shows that DA-GNNs are a promising method for
extracting domain-independent cosmological information, a vital step toward
robust deep learning for real cosmic survey data.
[COMMENTS]
Accepted in Machine Learning and the Physical Sciences Workshop at
NeurIPS 2023; 9 pages, 2 figures, 1 table
[LINK]
http://arxiv.org/abs/2311.01588v3
[DATE]
2024-04-15 21:56:24+08:00
[CATEGORIES]
cs.LG
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
[AUTHORS]
Iryna Hartsock, Ghulam Rasool
[ABSTRACT]
Medical vision-language models (VLMs) combine computer vision (CV) and
natural language processing (NLP) to analyze visual and textual medical data.
Our paper reviews recent advancements in developing VLMs specialized for
healthcare, focusing on models designed for medical report generation and
visual question answering (VQA). We provide background on NLP and CV,
explaining how techniques from both fields are integrated into VLMs to enable
learning from multimodal data. Key areas we address include the exploration of
medical vision-language datasets, in-depth analyses of architectures and
pre-training strategies employed in recent noteworthy medical VLMs, and
comprehensive discussion on evaluation metrics for assessing VLMs’ performance
in medical report generation and VQA. We also highlight current challenges and
propose future directions, including enhancing clinical validity and addressing
patient privacy concerns. Overall, our review summarizes recent progress in
developing VLMs to harness multimodal medical data for improved healthcare
applications.
[COMMENTS]
43 pages; paper edited and restructured
[LINK]
http://arxiv.org/abs/2403.02469v2
[DATE]
2024-04-15 21:51:30+08:00
[CATEGORIES]
cs.LG
Emergent Language Symbolic Autoencoder (ELSA) with Weak Supervision to Model Hierarchical Brain Networks
[AUTHORS]
Ammar Ahmed Pallikonda Latheef, Alberto Santamaria-Pang, Craig K Jones, Haris I Sair
[ABSTRACT]
Brain networks display a hierarchical organization, a complexity that poses a
challenge for existing deep learning models, often structured as flat
classifiers, leading to difficulties in interpretability and the ‘black box’
issue. To bridge this gap, we propose a novel architecture: a symbolic
autoencoder informed by weak supervision and an Emergent Language (EL)
framework. This model moves beyond traditional flat classifiers by producing
hierarchical clusters and corresponding imagery, subsequently represented
through symbolic sentences to improve the clinical interpretability of
hierarchically organized data such as intrinsic brain networks, which can be
characterized using resting-state fMRI images. Our innovation includes a
generalized hierarchical loss function designed to ensure that both sentences
and images accurately reflect the hierarchical structure of functional brain
networks. This enables us to model functional brain networks from a broader
perspective down to more granular details. Furthermore, we introduce a
quantitative method to assess the hierarchical consistency of these symbolic
representations. Our qualitative analyses show that our model successfully
generates hierarchically organized, clinically interpretable images, a finding
supported by our quantitative evaluations. We find that our best performing
loss function leads to a hierarchical consistency of over 97% when identifying
images corresponding to brain networks. This approach not only advances the
interpretability of deep learning models in neuroimaging analysis but also
represents a significant step towards modeling the intricate hierarchical
nature of brain networks.
[COMMENTS]
10 pages, 4 figures
[LINK]
http://arxiv.org/abs/2404.10031v1
[DATE]
2024-04-15 21:51:05+08:00
[CATEGORIES]
cs.LG
Off-Policy Primal-Dual Safe Reinforcement Learning
[AUTHORS]
Zifan Wu, Bo Tang, Qian Lin, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang
[COMMENTS]
ICLR 2024 Poster
[LINK]
http://arxiv.org/abs/2401.14758v2
[DATE]
2024-04-15 21:44:11+08:00
[CATEGORIES]
cs.LG
A replica analysis of under-bagging
[AUTHORS]
Takashi Takahashi
[ABSTRACT]
A sharp asymptotics of the under-bagging (UB) method, which is a popular
ensemble learning method for training classifiers from an imbalanced data, is
derived and used to compare with several other standard methods for learning
from imbalanced data, in the scenario where a linear classifier is trained from
a binary mixture data. The methods compared include the under-sampling (US)
method, which trains a model using a single realization of the subsampled
dataset, and the simple weighting (SW) method, which trains a model with a
weighted loss on the entire data. It is shown that the performance of UB is
improved by increasing the size of the majority class, even if the class
imbalance can be large, especially when the size of the minority class is
small. This is in contrast to US, whose performance does not change as the size
of the majority class increases, and SW, whose performance decreases as the
imbalance increases. These results are different from the case of the naive
bagging in training generalized linear models without considering the structure
of class imbalance, indicating the intrinsic difference between the ensembling
and the direct regularization on the parameters.
[COMMENTS]
18 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.09779v1
[DATE]
2024-04-15 21:31:31+08:00
[CATEGORIES]
cs.LG
RandAlign: A Parameter-Free Method for Regularizing Graph Convolutional Networks
[AUTHORS]
Haimin Zhang, Min Xu
[ABSTRACT]
Studies continually find that message-passing graph convolutional networks
suffer from the over-smoothing issue. Basically, the issue of over-smoothing
refers to the phenomenon that the learned embeddings for all nodes can become
very similar to one another and therefore are uninformative after repeatedly
applying message passing iterations. Intuitively, we can expect the generated
embeddings become smooth asymptotically layerwisely, that is each layer of
graph convolution generates a smoothed version of embeddings as compared to
that generated by the previous layer. Based on this intuition, we propose
RandAlign, a stochastic regularization method for graph convolutional networks.
The idea of RandAlign is to randomly align the learned embedding for each node
with that of the previous layer using randomly interpolation in each graph
convolution layer. Through alignment, the smoothness of the generated
embeddings is explicitly reduced. To better maintain the benefit yielded by the
graph convolution, in the alignment step we introduce to first scale the
embedding of the previous layer to the same norm as the generated embedding and
then perform random interpolation for aligning the generated embedding.
RandAlign is a parameter-free method and can be directly applied without
introducing additional trainable weights or hyper-parameters. We experimentally
evaluate RandAlign on different graph domain tasks on seven benchmark datasets.
The experimental results show that RandAlign is a general method that improves
the generalization performance of various graph convolutional network models
and also improves the numerical stability of optimization, advancing the state
of the art performance for graph representation learning.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2404.09774v1
[DATE]
2024-04-15 21:28:13+08:00
[CATEGORIES]
cs.LG
Effective Reinforcement Learning Based on Structural Information Principles
[AUTHORS]
Xianghua Zeng, Hao Peng, Dingli Su, Angsheng Li
[ABSTRACT]
Although Reinforcement Learning (RL) algorithms acquire sequential behavioral
patterns through interactions with the environment, their effectiveness in
noisy and high-dimensional scenarios typically relies on specific structural
priors. In this paper, we propose a novel and general Structural Information
principles-based framework for effective Decision-Making, namely SIDM,
approached from an information-theoretic perspective. This paper presents a
specific unsupervised partitioning method that forms vertex communities in the
state and action spaces based on their feature similarities. An aggregation
function, which utilizes structural entropy as the vertex weight, is devised
within each community to obtain its embedding, thereby facilitating
hierarchical state and action abstractions. By extracting abstract elements
from historical trajectories, a directed, weighted, homogeneous transition
graph is constructed. The minimization of this graph’s high-dimensional entropy
leads to the generation of an optimal encoding tree. An innovative two-layer
skill-based learning mechanism is introduced to compute the common path entropy
of each state transition as its identified probability, thereby obviating the
requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated
into various single-agent and multi-agent RL algorithms, enhancing their
performance. Finally, extensive evaluations on challenging benchmarks
demonstrate that, compared with SOTA baselines, our framework significantly and
consistently improves the policy’s quality, stability, and efficiency up to
32.70%, 88.26%, and 64.86%, respectively.
[LINK]
http://arxiv.org/abs/2404.09760v1
[DATE]
2024-04-15 21:02:00+08:00
[CATEGORIES]
cs.LG
Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?
[AUTHORS]
Shruthi Gowda, Elahe Arani, Bahram Zonooz
[ABSTRACT]
Self-supervised learning (SSL) has emerged as a promising solution for
addressing the challenge of limited labeled data in deep neural networks
(DNNs), offering scalability potential. However, the impact of design
dependencies within the SSL framework remains insufficiently investigated. In
this study, we comprehensively explore SSL behavior across a spectrum of
augmentations, revealing their crucial role in shaping SSL model performance
and learning mechanisms. Leveraging these insights, we propose a novel learning
approach that integrates prior knowledge, with the aim of curtailing the need
for extensive data augmentations and thereby amplifying the efficacy of learned
representations. Notably, our findings underscore that SSL models imbued with
prior knowledge exhibit reduced texture bias, diminished reliance on shortcuts
and augmentations, and improved robustness against both natural and adversarial
corruptions. These findings not only illuminate a new direction in SSL
research, but also pave the way for enhancing DNN performance while
concurrently alleviating the imperative for intensive data augmentation,
thereby enhancing scalability and real-world problem-solving capabilities.
[LINK]
http://arxiv.org/abs/2404.09752v1
[DATE]
2024-04-15 20:53:48+08:00
[CATEGORIES]
cs.LG
Amortized Network Intervention to Steer the Excitatory Point Processes
[AUTHORS]
Zitao Song, Wendi Ren, Shuang Li
[ABSTRACT]
Excitatory point processes (i.e., event flows) occurring over dynamic graphs
(i.e., evolving topologies) provide a fine-grained model to capture how
discrete events may spread over time and space. How to effectively steer the
event flows by modifying the dynamic graph structures presents an interesting
problem, motivated by curbing the spread of infectious diseases through
strategically locking down cities to mitigating traffic congestion via traffic
light optimization. To address the intricacies of planning and overcome the
high dimensionality inherent to such decision-making problems, we design an
Amortized Network Interventions (ANI) framework, allowing for the pooling of
optimal policies from history and other contexts while ensuring a permutation
equivalent property. This property enables efficient knowledge transfer and
sharing across diverse contexts. Each task is solved by an H-step lookahead
model-based reinforcement learning, where neural ODEs are introduced to model
the dynamics of the excitatory point processes. Instead of simulating rollouts
from the dynamics model, we derive an analytical mean-field approximation for
the event flows given the dynamics, making the online planning more efficiently
solvable. We empirically illustrate that this ANI approach substantially
enhances policy learning for unseen dynamics and exhibits promising outcomes in
steering event flows through network intervention using synthetic and real
COVID datasets.
[LINK]
http://arxiv.org/abs/2310.04159v2
[DATE]
2024-04-15 20:52:30+08:00
[CATEGORIES]
cs.LG
Federated Learning on Riemannian Manifolds with Differential Privacy
[AUTHORS]
Zhenwei Huang, Wen Huang, Pratik Jawanpuria, Bamdev Mishra
[ABSTRACT]
In recent years, federated learning (FL) has emerged as a prominent paradigm
in distributed machine learning. Despite the partial safeguarding of agents’
information within FL systems, a malicious adversary can potentially infer
sensitive information through various means. In this paper, we propose a
generic private FL framework defined on Riemannian manifolds (PriRFed) based on
the differential privacy (DP) technique. We analyze the privacy guarantee while
establishing the convergence properties. To the best of our knowledge, this is
the first federated learning framework on Riemannian manifold with a privacy
guarantee and convergence results. Numerical simulations are performed on
synthetic and real-world datasets to showcase the efficacy of the proposed
PriRFed approach.
[LINK]
http://arxiv.org/abs/2404.10029v1
[DATE]
2024-04-15 20:32:20+08:00
[CATEGORIES]
cs.LG
Convergence Analysis of Probability Flow ODE for Score-based Generative Models
[AUTHORS]
Daniel Zhengyu Huang, Jiaoyang Huang, Zhengjiang Lin
[ABSTRACT]
Score-based generative models have emerged as a powerful approach for
sampling high-dimensional probability distributions. Despite their
effectiveness, their theoretical underpinnings remain relatively
underdeveloped. In this work, we study the convergence properties of
deterministic samplers based on probability flow ODEs from both theoretical and
numerical perspectives. Assuming access to $L^2$-accurate estimates of the
score function, we prove the total variation between the target and the
generated data distributions can be bounded above by
$\mathcal{O}(d\sqrt{\delta})$ in the continuous time level, where $d$ denotes
the data dimension and $\delta$ represents the $L^2$-score matching error. For
practical implementations using a $p$-th order Runge-Kutta integrator with step
size $h$, we establish error bounds of $\mathcal{O}(d(\sqrt{\delta} + (dh)^p))$
at the discrete level. Finally, we present numerical studies on problems up to
$128$ dimensions to verify our theory, which indicate a better score matching
error and dimension dependence.
[COMMENTS]
33 pages, 7 figures
[LINK]
http://arxiv.org/abs/2404.09730v1
[DATE]
2024-04-15 20:29:28+08:00
[CATEGORIES]
cs.LG
Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis
[AUTHORS]
Shuaicong Hu, Yanan Wang, Jian Liu, Jingyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang
[ABSTRACT]
Considering the variability of amplitude and phase patterns in
electrocardiogram (ECG) signals due to cardiac activity and individual
differences, existing entropy-based studies have not fully utilized these two
patterns and lack integration. To address this gap, this paper proposes a novel
fusion entropy metric, morphological ECG entropy (MEE) for the first time,
specifically designed for ECG morphology, to comprehensively describe the
fusion of amplitude and phase patterns. MEE is computed based on beat-level
samples, enabling detailed analysis of each cardiac cycle. Experimental results
demonstrate that MEE achieves rapid, accurate, and label-free localization of
abnormal ECG arrhythmia regions. Furthermore, MEE provides a method for
assessing sample diversity, facilitating compression of imbalanced training
sets (via representative sample selection), and outperforms random pruning.
Additionally, MEE exhibits the ability to describe areas of poor quality. By
discussing, it proves the robustness of MEE value calculation to noise
interference and its low computational complexity. Finally, we integrate this
method into a clinical interactive interface to provide a more convenient and
intuitive user experience. These findings indicate that MEE serves as a
valuable clinical descriptor for ECG characterization. The implementation code
can be referenced at the following link:
https://github.com/fdu-harry/ECG-MEE-metric.
[COMMENTS]
16 pages, 12 figures
[LINK]
http://arxiv.org/abs/2404.09729v1
[DATE]
2024-04-15 20:29:16+08:00
[CATEGORIES]
cs.LG
An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization
[AUTHORS]
Guy Kornowski, Ohad Shamir
[ABSTRACT]
We study the complexity of producing $(\delta,\epsilon)$-stationary points of
Lipschitz objectives which are possibly neither smooth nor convex, using only
noisy function evaluations. Recent works proposed several stochastic zero-order
algorithms that solve this task, all of which suffer from a
dimension-dependence of $\Omega(d^{3/2})$ where $d$ is the dimension of the
problem, which was conjectured to be optimal. We refute this conjecture by
providing a faster algorithm that has complexity
$O(d\delta^{-1}\epsilon^{-3})$, which is optimal (up to numerical constants)
with respect to $d$ and also optimal with respect to the accuracy parameters
$\delta,\epsilon$, thus solving an open question due to Lin et al.
(NeurIPS’22). Moreover, the convergence rate achieved by our algorithm is also
optimal for smooth objectives, proving that in the nonconvex stochastic
zero-order setting, nonsmooth optimization is as easy as smooth optimization.
We provide algorithms that achieve the aforementioned convergence rate in
expectation as well as with high probability. Our analysis is based on a simple
yet powerful lemma regarding the Goldstein-subdifferential set, which allows
utilizing recent advancements in first-order nonsmooth nonconvex optimization.
[COMMENTS]
Accepted to Journal of Machine Learning Research (JMLR); improved
dependence on Lipschitz constant; some minor edits following reviews
[LINK]
http://arxiv.org/abs/2307.04504v3
[DATE]
2024-04-15 20:26:40+08:00
[CATEGORIES]
cs.LG
VFLGAN: Vertical Federated Learning-based Generative Adversarial Network for Vertically Partitioned Data Publication
[AUTHORS]
Xun Yuan, Yang Yang, Prosanta Gope, Aryan Pasikhani, Biplab Sikdar
[ABSTRACT]
In the current artificial intelligence (AI) era, the scale and quality of the
dataset play a crucial role in training a high-quality AI model. However, good
data is not a free lunch and is always hard to access due to privacy
regulations like the General Data Protection Regulation (GDPR). A potential
solution is to release a synthetic dataset with a similar distribution to that
of the private dataset. Nevertheless, in some scenarios, it has been found that
the attributes needed to train an AI model belong to different parties, and
they cannot share the raw data for synthetic data publication due to privacy
regulations. In PETS 2023, Xue et al. proposed the first generative adversary
network-based model, VertiGAN, for vertically partitioned data publication.
However, after thoroughly investigating, we found that VertiGAN is less
effective in preserving the correlation among the attributes of different
parties. This article proposes a Vertical Federated Learning-based Generative
Adversarial Network, VFLGAN, for vertically partitioned data publication to
address the above issues. Our experimental results show that compared with
VertiGAN, VFLGAN significantly improves the quality of synthetic data. Taking
the MNIST dataset as an example, the quality of the synthetic dataset generated
by VFLGAN is 3.2 times better than that generated by VertiGAN w.r.t. the
Fr'echet Distance. We also designed a more efficient and effective Gaussian
mechanism for the proposed VFLGAN to provide the synthetic dataset with a
differential privacy guarantee. On the other hand, differential privacy only
gives the upper bound of the worst-case privacy guarantee. This article also
proposes a practical auditing scheme that applies membership inference attacks
to estimate privacy leakage through the synthetic dataset.
[LINK]
http://arxiv.org/abs/2404.09722v1
[DATE]
2024-04-15 20:25:41+08:00
[CATEGORIES]
cs.LG
Graph Convolutional Networks for Simulating Multi-phase Flow and Transport in Porous Media
[AUTHORS]
Jiamin Jiang, Bo Guo
[ABSTRACT]
Numerical simulation of multi-phase fluid dynamics in porous media is
critical for many energy and environmental applications in Earth’s subsurface.
Data-driven surrogate modeling provides computationally inexpensive
alternatives to high-fidelity numerical simulators. While the commonly used
convolutional neural networks (CNNs) are powerful in approximating partial
differential equation solutions, it remains challenging for CNNs to handle
irregular and unstructured simulation meshes. However, simulation models for
Earth’s subsurface often involve unstructured meshes with complex mesh
geometries, which limits the application of CNNs. To address this challenge, we
construct surrogate models based on Graph Convolutional Networks (GCNs) to
approximate the spatial-temporal solutions of multi-phase flow and transport
processes in porous media. We propose a new GCN architecture suited to the
hyperbolic character of the coupled PDE system, to better capture transport
dynamics. Results of 2D heterogeneous test cases show that our surrogates
predict the evolutions of pressure and saturation states with high accuracy,
and the predicted rollouts remain stable for multiple timesteps. Moreover, the
GCN-based models generalize well to irregular domain geometries and
unstructured meshes that are unseen in the training dataset.
[LINK]
http://arxiv.org/abs/2307.04449v2
[DATE]
2024-04-15 20:24:07+08:00
[CATEGORIES]
cs.LG
Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning
[AUTHORS]
Linjie Xu, Zichuan Liu, Alexander Dockhorn, Diego Perez-Liebana, Jinyu Wang, Lei Song, Jiang Bian
[ABSTRACT]
One of the notorious issues for Reinforcement Learning (RL) is poor sample
efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent
Reinforcement Learning (MARL) is more challenging because of its inherent
partial observability, non-stationary training, and enormous strategy space.
Although much effort has been devoted to developing new methods and enhancing
sample efficiency, we look at the widely used episodic training mechanism. In
each training step, tens of frames are collected, but only one gradient step is
made. We argue that this episodic training could be a source of poor sample
efficiency. To better exploit the data already collected, we propose to
increase the frequency of the gradient updates per environment interaction
(a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we
evaluate $3$ MARL methods on $6$ SMAC tasks. The empirical results validate
that a higher replay ratio significantly improves the sample efficiency for
MARL algorithms. The codes to reimplement the results presented in this paper
are open-sourced at https://anonymous.4open.science/r/rr_for_MARL-0D83/.
[LINK]
http://arxiv.org/abs/2404.09715v1
[DATE]
2024-04-15 20:18:09+08:00
[CATEGORIES]
cs.LG
Concentrated Differential Privacy for Bandits
[AUTHORS]
Achraf Azize, Debabrota Basu
[ABSTRACT]
Bandits serve as the theoretical foundation of sequential learning and an
algorithmic foundation of modern recommender systems. However, recommender
systems often rely on user-sensitive data, making privacy a critical concern.
This paper contributes to the understanding of Differential Privacy (DP) in
bandits with a trusted centralised decision-maker, and especially the
implications of ensuring zero Concentrated Differential Privacy (zCDP). First,
we formalise and compare different adaptations of DP to bandits, depending on
the considered input and the interaction protocol. Then, we propose three
private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit
settings, namely finite-armed bandits, linear bandits, and linear contextual
bandits. The three algorithms share a generic algorithmic blueprint, i.e. the
Gaussian mechanism and adaptive episodes, to ensure a good privacy-utility
trade-off. We analyse and upper bound the regret of these three algorithms. Our
analysis shows that in all of these settings, the prices of imposing zCDP are
(asymptotically) negligible in comparison with the regrets incurred oblivious
to privacy. Next, we complement our regret upper bounds with the first minimax
lower bounds on the regret of bandits with zCDP. To prove the lower bounds, we
elaborate a new proof technique based on couplings and optimal transport. We
conclude by experimentally validating our theoretical results for the three
different settings of bandits.
[COMMENTS]
Appears in IEEE SaTML 2024
[LINK]
http://arxiv.org/abs/2309.00557v3
[DATE]
2024-04-15 20:08:53+08:00
[CATEGORIES]
cs.LG
Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context
[AUTHORS]
Moyu Zhang, Yongxiang Tang, Jinxin Hu, Yu Zhang
[ABSTRACT]
Existing methods often adjust representations adaptively only after
aggregating user behavior sequences. This coarse-grained approach to
re-weighting the entire user sequence hampers the model’s ability to accurately
model the user interest migration across different scenarios. To enhance the
model’s capacity to capture user interests from historical behavior sequences
in each scenario, we develop a ranking framework named the Scenario-Adaptive
Fine-Grained Personalization Network (SFPNet), which designs a kind of
fine-grained method for multi-scenario personalized recommendations.
Specifically, SFPNet comprises a series of blocks named as Scenario-Tailoring
Block, stacked sequentially. Each block initially deploys a parameter
personalization unit to integrate scenario information at a coarse-grained
level by redefining fundamental features. Subsequently, we consolidate
scenario-adaptively adjusted feature representations to serve as context
information. By employing residual connection, we incorporate this context into
the representation of each historical behavior, allowing for context-aware
fine-grained customization of the behavior representations at the
scenario-level, which in turn supports scenario-aware user interest modeling.
[COMMENTS]
Accepted by SIGIR 2024, 10 pages, 5 figures, 5 tables
[LINK]
http://arxiv.org/abs/2404.09709v1
[DATE]
2024-04-15 20:08:44+08:00
[CATEGORIES]
cs.LG
Kernel-based learning with guarantees for multi-agent applications
[AUTHORS]
Krzysztof Kowalczyk, Paweł Wachel, Cristian R. Rojas
[ABSTRACT]
This paper addresses a kernel-based learning problem for a network of agents
locally observing a latent multidimensional, nonlinear phenomenon in a noisy
environment. We propose a learning algorithm that requires only mild a priori
knowledge about the phenomenon under investigation and delivers a model with
corresponding non-asymptotic high probability error bounds. Both non-asymptotic
analysis of the method and numerical simulation results are presented and
discussed in the paper.
[LINK]
http://arxiv.org/abs/2404.09708v1
[DATE]
2024-04-15 20:06:22+08:00
[CATEGORIES]
cs.LG
Adaptive Patching for High-resolution Image Segmentation with Transformers
[AUTHORS]
Enzhi Zhang, Isaac Lyngaas, Peng Chen, Xiao Wang, Jun Igarashi, Yuankai Huo, Mohamed Wahib, Masaharu Munetomo
[ABSTRACT]
Attention-based models are proliferating in the space of image analytics,
including segmentation. The standard method of feeding images to transformer
encoders is to divide the images into patches and then feed the patches to the
model as a linear sequence of tokens. For high-resolution images, e.g.
microscopic pathology images, the quadratic compute and memory cost prohibits
the use of an attention-based model, if we are to use smaller patch sizes that
are favorable in segmentation. The solution is to either use custom complex
multi-resolution models or approximate attention schemes. We take inspiration
from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the
images, as a pre-processing step, based on the image details to reduce the
number of patches being fed to the model, by orders of magnitude. This method
has a negligible overhead, and works seamlessly with any attention-based model,
i.e. it is a pre-processing step that can be adopted by any attention-based
model without friction. We demonstrate superior segmentation quality over SoTA
segmentation models for real-world pathology datasets while gaining a geomean
speedup of $6.9\times$ for resolutions up to $64K^2$, on up to $2,048$ GPUs.
[LINK]
http://arxiv.org/abs/2404.09707v1
[DATE]
2024-04-15 20:06:00+08:00
[CATEGORIES]
cs.LG
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
[AUTHORS]
David Stotko, Nils Wandel, Reinhard Klein
[ABSTRACT]
3D reconstruction of dynamic scenes is a long-standing problem in computer
graphics and increasingly difficult the less information is available.
Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry
from RGB images or video sequences, often leveraging just a single monocular
camera without depth information, such as regular smartphone recordings.
Unfortunately, existing reconstruction methods are either unphysical and noisy
or slow in optimization. To solve this problem, we propose a novel SfT
reconstruction algorithm for cloth using a pre-trained neural surrogate model
that is fast to evaluate, stable, and produces smooth reconstructions due to a
regularizing physics simulation. Differentiable rendering of the simulated mesh
enables pixel-wise comparisons between the reconstruction and a target video
sequence that can be used for a gradient-based optimization procedure to
extract not only shape information but also physical parameters such as
stretching, shearing, or bending stiffness of the cloth. This allows to retain
a precise, stable, and smooth reconstructed geometry while reducing the runtime
by a factor of 400-500 compared to $\phi$-SfT, a state-of-the-art physics-based
SfT approach.
[LINK]
http://arxiv.org/abs/2311.12796v3
[DATE]
2024-04-15 19:40:39+08:00
[CATEGORIES]
cs.LG
AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
[AUTHORS]
Siyuan Li, Youshao Xiao, Fanzhuang Meng, Lin Ju, Lei Liang, Lin Wang, Jun Zhou
[ABSTRACT]
Offline batch inference is a common task in the industry for deep learning
applications, but it can be challenging to ensure stability and performance
when dealing with large amounts of data and complicated inference pipelines.
This paper demonstrated AntBatchInfer, an elastic batch inference framework,
which is specially optimized for the non-dedicated cluster. AntBatchInfer
addresses these challenges by providing multi-level fault-tolerant
capabilities, enabling the stable execution of versatile and long-running
inference tasks. It also improves inference efficiency by pipelining,
intra-node, and inter-node scaling. It further optimizes the performance in
complicated multiple-model batch inference scenarios. Through extensive
experiments and real-world statistics, we demonstrate the superiority of our
framework in terms of stability and efficiency. In the experiment, it
outperforms the baseline by at least $2\times$ and $6\times$ in the
single-model or multiple-model batch inference. Also, it is widely used at Ant
Group, with thousands of daily jobs from various scenarios, including DLRM, CV,
and NLP, which proves its practicability in the industry.
[LINK]
http://arxiv.org/abs/2404.09686v1
[DATE]
2024-04-15 19:37:40+08:00
[CATEGORIES]
cs.LG
Robust agents learn causal world models
[AUTHORS]
Jonathan Richens, Tom Everitt
[ABSTRACT]
It has long been hypothesised that causal reasoning plays a fundamental role
in robust and general intelligence. However, it is not known if agents must
learn causal models in order to generalise to new domains, or if other
inductive biases are sufficient. We answer this question, showing that any
agent capable of satisfying a regret bound under a large set of distributional
shifts must have learned an approximate causal model of the data generating
process, which converges to the true causal model for optimal agents. We
discuss the implications of this result for several research areas including
transfer learning and causal inference.
[COMMENTS]
ICLR 2024 (oral). Proofs in appendix simplified. Typos corrected
[LINK]
http://arxiv.org/abs/2402.10877v6
[DATE]
2024-04-15 19:34:52+08:00
[CATEGORIES]
cs.LG
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes
[AUTHORS]
Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou
[ABSTRACT]
Many distributed training techniques like Parameter Server and AllReduce have
been proposed to take advantage of the increasingly large data and rich
features. However, stragglers frequently occur in distributed training due to
resource contention and hardware heterogeneity, which significantly hampers the
training efficiency. Previous works only address part of the stragglers and
could not adaptively solve various stragglers in practice. Additionally, it is
challenging to use a systematic framework to address all stragglers because
different stragglers require diverse data allocation and fault-tolerance
mechanisms. Therefore, this paper proposes a unified distributed training
framework called AntDT (Ant Distributed Training Framework) to adaptively solve
the straggler problems. Firstly, the framework consists of four components,
including the Stateful Dynamic Data Sharding service, Monitor, Controller, and
Agent. These components work collaboratively to efficiently distribute
workloads and provide a range of pre-defined straggler mitigation methods with
fault tolerance, thereby hiding messy details of data allocation and fault
handling. Secondly, the framework provides a high degree of flexibility,
allowing for the customization of straggler mitigation solutions based on the
specific circumstances of the cluster. Leveraging this flexibility, we
introduce two straggler mitigation solutions, namely AntDT-ND for non-dedicated
clusters and AntDT-DD for dedicated clusters, as practical examples to resolve
various types of stragglers at Ant Group. Justified by our comprehensive
experiments and industrial deployment statistics, AntDT outperforms other SOTA
methods more than 3x in terms of training efficiency. Additionally, in Alipay’s
homepage recommendation scenario, using AntDT reduces the training duration of
the ranking model from 27.8 hours to just 5.4 hours.
[LINK]
http://arxiv.org/abs/2404.09679v1
[DATE]
2024-04-15 19:20:44+08:00
[CATEGORIES]
cs.LG
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
[AUTHORS]
Ziyu Wang, Yue Xu, Cewu Lu, Yong-Lu Li
[ABSTRACT]
Recently, dataset distillation has paved the way towards efficient machine
learning, especially for image datasets. However, the distillation for videos,
characterized by an exclusive temporal dimension, remains an underexplored
domain. In this work, we provide the first systematic study of video
distillation and introduce a taxonomy to categorize temporal compression. Our
investigation reveals that the temporal information is usually not well learned
during distillation, and the temporal dimension of synthetic data contributes
little. The observations motivate our unified framework of disentangling the
dynamic and static information in the videos. It first distills the videos into
still images as static memory and then compensates the dynamic and motion
information with a learnable dynamic memory block. Our method achieves
state-of-the-art on video datasets at different scales, with a notably smaller
memory storage budget. Our code is available at
https://github.com/yuz1wan/video_distillation.
[COMMENTS]
CVPR 2024, project page: https://mvig-rhos.com/video-distill
[LINK]
http://arxiv.org/abs/2312.00362v2
[DATE]
2024-04-15 19:03:06+08:00
[CATEGORIES]
cs.LG
Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows
[AUTHORS]
Georg Rabenstein, Lars Ullrich, Knut Graichen
[ABSTRACT]
Alongside optimization-based planners, sampling-based approaches are often
used in trajectory planning for autonomous driving due to their simplicity.
Model predictive path integral control is a framework that builds upon
optimization principles while incorporating stochastic sampling of input
trajectories. This paper investigates several sampling approaches for
trajectory generation. In this context, normalizing flows originating from the
field of variational inference are considered for the generation of sampling
distributions, as they model transformations of simple to more complex
distributions. Accordingly, learning-based normalizing flow models are trained
for a more efficient exploration of the input domain for the task at hand. The
developed algorithm and the proposed sampling distributions are evaluated in
two simulation scenarios.
[COMMENTS]
Accepted to be published as part of the 2024 IEEE Intelligent
Vehicles Symposium (IV), Jeju Shinhwa World, Jeju Island, Korea, June 2-5,
2024
[LINK]
http://arxiv.org/abs/2404.09657v1
[DATE]
2024-04-15 18:45:12+08:00
[CATEGORIES]
cs.LG
All-in-one simulation-based inference
[AUTHORS]
Manuel Gloeckler, Michael Deistler, Christian Weilbach, Frank Wood, Jakob H. Macke
[ABSTRACT]
Amortized Bayesian inference trains neural networks to solve stochastic
inference problems using model simulations, thereby making it possible to
rapidly perform Bayesian inference for any newly observed data. However,
current simulation-based amortized inference methods are simulation-hungry and
inflexible: They require the specification of a fixed parametric prior,
simulator, and inference tasks ahead of time. Here, we present a new amortized
inference method – the Simformer – which overcomes these limitations. By
training a probabilistic diffusion model with transformer architectures, the
Simformer outperforms current state-of-the-art amortized inference approaches
on benchmark tasks and is substantially more flexible: It can be applied to
models with function-valued parameters, it can handle inference scenarios with
missing or unstructured data, and it can sample arbitrary conditionals of the
joint distribution of parameters and data, including both posterior and
likelihood. We showcase the performance and flexibility of the Simformer on
simulators from ecology, epidemiology, and neuroscience, and demonstrate that
it opens up new possibilities and application domains for amortized Bayesian
inference on simulation-based models.
[LINK]
http://arxiv.org/abs/2404.09636v1
[DATE]
2024-04-15 18:12:33+08:00
[CATEGORIES]
cs.LG
Bridging Vision and Language Spaces with Assignment Prediction
[AUTHORS]
Jungin Park, Jiyoung Lee, Kwanghoon Sohn
[ABSTRACT]
This paper introduces VLAP, a novel approach that bridges pretrained vision
models and large language models (LLMs) to make frozen LLMs understand the
visual world. VLAP transforms the embedding space of pretrained vision models
into the LLMs’ word embedding space using a single linear layer for efficient
and general-purpose visual and language understanding. Specifically, we harness
well-established word embeddings to bridge two modality embedding spaces. The
visual and text representations are simultaneously assigned to a set of word
embeddings within pretrained LLMs by formulating the assigning procedure as an
optimal transport problem. We predict the assignment of one modality from the
representation of another modality data, enforcing consistent assignments for
paired multimodal data. This allows vision and language representations to
contain the same information, grounding the frozen LLMs’ word embedding space
in visual data. Moreover, a robust semantic taxonomy of LLMs can be preserved
with visual data since the LLMs interpret and reason linguistic information
from correlations between word embeddings. Experimental results show that VLAP
achieves substantial improvements over the previous linear transformation-based
approaches across a range of vision-language tasks, including image captioning,
visual question answering, and cross-modal retrieval. We also demonstrate the
learned visual representations hold a semantic taxonomy of LLMs, making visual
semantic arithmetic possible.
[COMMENTS]
ICLR 2024 Camera-ready
[LINK]
http://arxiv.org/abs/2404.09632v1
[DATE]
2024-04-15 18:04:15+08:00
[CATEGORIES]
cs.LG
Privacy-Preserving Intrusion Detection using Convolutional Neural Networks
[AUTHORS]
Martin Kodys, Zhongmin Dai, Vrizlynn L. L. Thing
[ABSTRACT]
Privacy-preserving analytics is designed to protect valuable assets. A common
service provision involves the input data from the client and the model on the
analyst’s side. The importance of the privacy preservation is fuelled by legal
obligations and intellectual property concerns. We explore the use case of a
model owner providing an analytic service on customer’s private data. No
information about the data shall be revealed to the analyst and no information
about the model shall be leaked to the customer. Current methods involve costs:
accuracy deterioration and computational complexity. The complexity, in turn,
results in a longer processing time, increased requirement on computing
resources, and involves data communication between the client and the server.
In order to deploy such service architecture, we need to evaluate the optimal
setting that fits the constraints. And that is what this paper addresses. In
this work, we enhance an attack detection system based on Convolutional Neural
Networks with privacy-preserving technology based on PriMIA framework that is
initially designed for medical data.
[COMMENTS]
Accepted at IEEE Conference on Artificial Intelligence (CAI) 2024
[LINK]
http://arxiv.org/abs/2404.09625v1
[DATE]
2024-04-15 17:56:36+08:00
[CATEGORIES]
cs.LG
TTK is Getting MPI-Ready
[AUTHORS]
Eve Le Guillou, Michael Will, Pierre Guillou, Jonas Lukasczyk, Pierre Fortin, Christoph Garth, Julien Tierny
[ABSTRACT]
This system paper documents the technical foundations for the extension of
the Topology ToolKit (TTK) to distributed-memory parallelism with the Message
Passing Interface (MPI). While several recent papers introduced topology-based
approaches for distributed-memory environments, these were reporting
experiments obtained with tailored, mono-algorithm implementations. In
contrast, we describe in this paper a versatile approach (supporting both
triangulated domains and regular grids) for the support of topological analysis
pipelines, i.e. a sequence of topological algorithms interacting together.
While developing this extension, we faced several algorithmic and software
engineering challenges, which we document in this paper. We describe an MPI
extension of TTK’s data structure for triangulation representation and
traversal, a central component to the global performance and generality of
TTK’s topological implementations. We also introduce an intermediate interface
between TTK and MPI, both at the global pipeline level, and at the fine-grain
algorithmic level. We provide a taxonomy for the distributed-memory topological
algorithms supported by TTK, depending on their communication needs and provide
examples of hybrid MPI+thread parallelizations. Performance analyses show that
parallel efficiencies range from 20% to 80% (depending on the algorithms), and
that the MPI-specific preconditioning introduced by our framework induces a
negligible computation time overhead. We illustrate the new distributed-memory
capabilities of TTK with an example of advanced analysis pipeline, combining
multiple algorithms, run on the largest publicly available dataset we have
found (120 billion vertices) on a cluster with 64 nodes (for a total of 1536
cores). Finally, we provide a roadmap for the completion of TTK’s MPI
extension, along with generic recommendations for each algorithm communication
category.
[COMMENTS]
18 pages, 13 figures
[LINK]
http://arxiv.org/abs/2310.08339v2
[DATE]
2024-04-15 17:51:15+08:00
[CATEGORIES]
cs.LG
Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices
[AUTHORS]
Ou Hongjia, Andreas Themelis
[ABSTRACT]
Leveraging on recent advancements on adaptive methods for convex minimization
problems, this paper provides a linesearch-free proximal gradient framework for
globalizing the convergence of popular stepsize choices such as
Barzilai-Borwein and one-dimensional Anderson acceleration. This framework can
cope with problems in which the gradient of the differentiable function is
merely locally H"older continuous. Our analysis not only encompasses but also
refines existing results upon which it builds. The theory is corroborated by
numerical evidence that showcases the synergetic interplay between fast
stepsize selections and adaptive methods.
[LINK]
http://arxiv.org/abs/2404.09617v1
[DATE]
2024-04-15 17:46:12+08:00
[CATEGORIES]
cs.LG
A Review and Efficient Implementation of Scene Graph Generation Metrics
[AUTHORS]
Julian Lorenz, Robin Schön, Katja Ludwig, Rainer Lienhart
[ABSTRACT]
Scene graph generation has emerged as a prominent research field in computer
vision, witnessing significant advancements in the recent years. However,
despite these strides, precise and thorough definitions for the metrics used to
evaluate scene graph generation models are lacking. In this paper, we address
this gap in the literature by providing a review and precise definition of
commonly used metrics in scene graph generation. Our comprehensive examination
clarifies the underlying principles of these metrics and can serve as a
reference or introduction to scene graph metrics.
Furthermore, to facilitate the usage of these metrics, we introduce a
standalone Python package called SGBench that efficiently implements all
defined metrics, ensuring their accessibility to the research community.
Additionally, we present a scene graph benchmarking web service, that enables
researchers to compare scene graph generation methods and increase visibility
of new methods in a central place.
All of our code can be found at https://lorjul.github.io/sgbench/.
[LINK]
http://arxiv.org/abs/2404.09616v1
[DATE]
2024-04-15 17:40:44+08:00
[CATEGORIES]
cs.LG
View selection in multi-view stacking: Choosing the meta-learner
[AUTHORS]
Wouter van Loon, Marjolein Fokkema, Botond Szabo, Mark de Rooij
[ABSTRACT]
Multi-view stacking is a framework for combining information from different
views (i.e. different feature sets) describing the same set of objects. In this
framework, a base-learner algorithm is trained on each view separately, and
their predictions are then combined by a meta-learner algorithm. In a previous
study, stacked penalized logistic regression, a special case of multi-view
stacking, has been shown to be useful in identifying which views are most
important for prediction. In this article we expand this research by
considering seven different algorithms to use as the meta-learner, and
evaluating their view selection and classification performance in simulations
and two applications on real gene-expression data sets. Our results suggest
that if both view selection and classification accuracy are important to the
research at hand, then the nonnegative lasso, nonnegative adaptive lasso and
nonnegative elastic net are suitable meta-learners. Exactly which among these
three is to be preferred depends on the research context. The remaining four
meta-learners, namely nonnegative ridge regression, nonnegative forward
selection, stability selection and the interpolating predictor, show little
advantages in order to be preferred over the other three.
[COMMENTS]
47 pages, 17 figures. Accepted manuscript
[LINK]
http://arxiv.org/abs/2010.16271v3
[DATE]
2024-04-15 17:39:57+08:00
[CATEGORIES]
cs.LG
LoRA Dropout as a Sparsity Regularizer for Overfitting Control
[AUTHORS]
Yang Lin, Xinyu Ma, Xu Chu, Yujie Jin, Zhibang Yang, Yasha Wang, Hong Mei
[ABSTRACT]
Parameter-efficient fine-tuning methods, represented by LoRA, play an
essential role in adapting large-scale pre-trained models to downstream tasks.
However, fine-tuning LoRA-series models also faces the risk of overfitting on
the training dataset, and yet there’s still a lack of theoretical guidance and
practical mechanism to control overfitting on LoRA-based PEFT methods. In this
paper, we propose a LoRA Dropout mechanism for the LoRA-based methods by
introducing random noises to the learnable low-rank matrices and increasing
parameter sparsity. We then demonstrate the theoretical mechanism of our LoRA
Dropout mechanism from the perspective of sparsity regularization by providing
a generalization error bound under this framework. Theoretical results show
that appropriate sparsity would help tighten the gap between empirical and
generalization risks and thereby control overfitting. Furthermore, based on the
LoRA Dropout framework, we introduce a test-time ensemble strategy and provide
theoretical evidence demonstrating that the ensemble method can further
compress the error bound, and lead to better performance during inference time.
Extensive experiments on various NLP tasks provide practical validations of the
effectiveness of our LoRA Dropout framework in improving model accuracy and
calibration.
[LINK]
http://arxiv.org/abs/2404.09610v1
[DATE]
2024-04-15 17:32:12+08:00
[CATEGORIES]
cs.LG
Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression
[AUTHORS]
Dilyara Bareeva, Maximilian Dreyer, Frederik Pahde, Wojciech Samek, Sebastian Lapuschkin
[ABSTRACT]
Deep Neural Networks are prone to learning and relying on spurious
correlations in the training data, which, for high-risk applications, can have
fatal consequences. Various approaches to suppress model reliance on harmful
features have been proposed that can be applied post-hoc without additional
training. Whereas those methods can be applied with efficiency, they also tend
to harm model performance by globally shifting the distribution of latent
features. To mitigate unintended overcorrection of model behavior, we propose a
reactive approach conditioned on model-derived knowledge and eXplainable
Artificial Intelligence (XAI) insights. While the reactive approach can be
applied to many post-hoc methods, we demonstrate the incorporation of
reactivity in particular for P-ClArC (Projective Class Artifact Compensation),
introducing a new method called R-ClArC (Reactive Class Artifact Compensation).
Through rigorous experiments in controlled settings (FunnyBirds) and with a
real-world dataset (ISIC2019), we show that introducing reactivity can minimize
the detrimental effect of the applied correction while simultaneously ensuring
low reliance on spurious features.
[LINK]
http://arxiv.org/abs/2404.09601v1
[DATE]
2024-04-15 17:16:49+08:00
[CATEGORIES]
cs.LG
Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection
[AUTHORS]
Lisang Zhou, Meng Wang, Ning Zhou
[ABSTRACT]
Distributed training can facilitate the processing of large medical image
datasets, and improve the accuracy and efficiency of disease diagnosis while
protecting patient privacy, which is crucial for achieving efficient medical
image analysis and accelerating medical research progress. This paper presents
an innovative approach to medical image classification, leveraging Federated
Learning (FL) to address the dual challenges of data privacy and efficient
disease diagnosis. Traditional Centralized Machine Learning models, despite
their widespread use in medical imaging for tasks such as disease diagnosis,
raise significant privacy concerns due to the sensitive nature of patient data.
As an alternative, FL emerges as a promising solution by allowing the training
of a collective global model across local clients without centralizing the
data, thus preserving privacy. Focusing on the application of FL in Magnetic
Resonance Imaging (MRI) brain tumor detection, this study demonstrates the
effectiveness of the Federated Learning framework coupled with EfficientNet-B0
and the FedAvg algorithm in enhancing both privacy and diagnostic accuracy.
Through a meticulous selection of preprocessing methods, algorithms, and
hyperparameters, and a comparative analysis of various Convolutional Neural
Network (CNN) architectures, the research uncovers optimal strategies for image
classification. The experimental results reveal that EfficientNet-B0
outperforms other models like ResNet in handling data heterogeneity and
achieving higher accuracy and lower loss, highlighting the potential of FL in
overcoming the limitations of traditional models. The study underscores the
significance of addressing data heterogeneity and proposes further research
directions for broadening the applicability of FL in medical image analysis.
[LINK]
http://arxiv.org/abs/2404.10026v1
[DATE]
2024-04-15 17:07:19+08:00
[CATEGORIES]
cs.LG
Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing
[AUTHORS]
Song Xia, Yu Yi, Xudong Jiang, Henghui Ding
[ABSTRACT]
Randomized Smoothing (RS) has been proven a promising method for endowing an
arbitrary image classifier with certified robustness. However, the substantial
uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes
the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$
certified robustness radius provided by RS exhibits a diminishing trend with
the expansion of the input dimension $d$, proportionally decreasing at a rate
of $1/\sqrt{d}$. This paper explores the feasibility of providing ${\ell_2}$
certified robustness for high-dimensional input through the utilization of dual
smoothing in the lower-dimensional space. The proposed Dual Randomized
Smoothing (DRS) down-samples the input image into two sub-images and smooths
the two sub-images in lower dimensions. Theoretically, we prove that DRS
guarantees a tight ${\ell_2}$ certified robustness radius for the original
input and reveal that DRS attains a superior upper bound on the ${\ell_2}$
robustness radius, which decreases proportionally at a rate of $(1/\sqrt m +
1/\sqrt n )$ with $m+n=d$. Extensive experiments demonstrate the
generalizability and effectiveness of DRS, which exhibits a notable capability
to integrate with established methodologies, yielding substantial improvements
in both accuracy and ${\ell_2}$ certified robustness baselines of RS on the
CIFAR-10 and ImageNet datasets. Code is available at
https://github.com/xiasong0501/DRS.
[LINK]
http://arxiv.org/abs/2404.09586v1
[DATE]
2024-04-15 16:54:33+08:00
[CATEGORIES]
cs.LG
A Graph Transformer-Driven Approach for Network Robustness Learning
[AUTHORS]
Yu Zhang, Jia Li, Jie Ding, Xiang Li
[ABSTRACT]
Learning and analysis of network robustness, including controllability
robustness and connectivity robustness, is critical for various networked
systems against attacks. Traditionally, network robustness is determined by
attack simulations, which is very time-consuming and even incapable for
large-scale networks. Network Robustness Learning, which is dedicated to
learning network robustness with high precision and high speed, provides a
powerful tool to analyze network robustness by replacing simulations. In this
paper, a novel versatile and unified robustness learning approach via graph
transformer (NRL-GT) is proposed, which accomplishes the task of
controllability robustness learning and connectivity robustness learning from
multiple aspects including robustness curve learning, overall robustness
learning, and synthetic network classification. Numerous experiments show that:
1) NRL-GT is a unified learning framework for controllability robustness and
connectivity robustness, demonstrating a strong generalization ability to
ensure high precision when training and test sets are distributed differently;
2) Compared to the cutting-edge methods, NRL-GT can simultaneously perform
network robustness learning from multiple aspects and obtains superior results
in less time. NRL-GT is also able to deal with complex networks of different
size with low learning error and high efficiency; 3) It is worth mentioning
that the backbone of NRL-GT can serve as a transferable feature learning module
for complex networks of different size and different downstream tasks.
[COMMENTS]
14 pages, 7 figures
[LINK]
http://arxiv.org/abs/2306.06913v2
[DATE]
2024-04-15 16:54:09+08:00
[CATEGORIES]
cs.LG
Backward Learning for Goal-Conditioned Policies
[AUTHORS]
Marc Höftmann, Jan Robine, Stefan Harmeling
[ABSTRACT]
Can we learn policies in reinforcement learning without rewards? Can we learn
a policy just by trying to reach a goal state? We answer these questions
positively by proposing a multi-step procedure that first learns a world model
that goes backward in time, secondly generates goal-reaching backward
trajectories, thirdly improves those sequences using shortest path finding
algorithms, and finally trains a neural network policy by imitation learning.
We evaluate our method on a deterministic maze environment where the
observations are $64\times 64$ pixel bird’s eye images and can show that it
consistently reaches several goals.
[COMMENTS]
World Models, Goal-conditioned, Reward-free, Workshop on
Goal-Conditioned Reinforcement Learning - NeurIPS 2023
[LINK]
http://arxiv.org/abs/2312.05044v2
[DATE]
2024-04-15 16:45:16+08:00
[CATEGORIES]
cs.LG
On the Convergence of Continual Learning with Adaptive Methods
[AUTHORS]
Seungyub Han, Yeongmo Kim, Taehyun Cho, Jungwoo Lee
[ABSTRACT]
One of the objectives of continual learning is to prevent catastrophic
forgetting in learning multiple tasks sequentially, and the existing solutions
have been driven by the conceptualization of the plasticity-stability dilemma.
However, the convergence of continual learning for each sequential task is less
studied so far. In this paper, we provide a convergence analysis of
memory-based continual learning with stochastic gradient descent and empirical
evidence that training current tasks causes the cumulative degradation of
previous tasks. We propose an adaptive method for nonconvex continual learning
(NCCL), which adjusts step sizes of both previous and current tasks with the
gradients. The proposed method can achieve the same convergence rate as the SGD
method when the catastrophic forgetting term which we define in the paper is
suppressed at each iteration. Further, we demonstrate that the proposed
algorithm improves the performance of continual learning over existing methods
for several image classification tasks.
[COMMENTS]
Proceedings of the Thirty-Ninth Conference on Uncertainty in
Artificial Intelligence (UAI 2023), see
https://proceedings.mlr.press/v216/han23a.html
[LINK]
http://arxiv.org/abs/2404.05555v2
[DATE]
2024-04-15 16:44:13+08:00
[CATEGORIES]
cs.LG
Inference from Real-World Sparse Measurements
[AUTHORS]
Arnaud Pannatier, Kyle Matoba, François Fleuret
[ABSTRACT]
Real-world problems often involve complex and unstructured sets of
measurements, which occur when sensors are sparsely placed in either space or
time. Being able to model this irregular spatiotemporal data and extract
meaningful forecasts is crucial. Deep learning architectures capable of
processing sets of measurements with positions varying from set to set, and
extracting readouts anywhere are methodologically difficult. Current
state-of-the-art models are graph neural networks and require domain-specific
knowledge for proper setup.
We propose an attention-based model focused on robustness and practical
applicability, with two key design contributions. First, we adopt a ViT-like
transformer that takes both context points and read-out positions as inputs,
eliminating the need for an encoder-decoder structure. Second, we use a unified
method for encoding both context and read-out positions. This approach is
intentionally straightforward and integrates well with other systems. Compared
to existing approaches, our model is simpler, requires less specialized
knowledge, and does not suffer from a problematic bottleneck effect, all of
which contribute to superior performance.
We conduct in-depth ablation studies that characterize this problematic
bottleneck in the latent representations of alternative models that inhibit
information utilization and impede training efficiency. We also perform
experiments across various problem domains, including high-altitude wind
nowcasting, two-day weather forecasting, fluid dynamics, and heat diffusion.
Our attention-based model consistently outperforms state-of-the-art models in
handling irregularly sampled data. Notably, our model reduces the root mean
square error (RMSE) for wind nowcasting from 9.24 to 7.98 and for heat
diffusion tasks from 0.126 to 0.084.
[COMMENTS]
27 pages, 12 figures, Published at TMLR
https://openreview.net/forum?id=y9IDfODRns
[LINK]
http://arxiv.org/abs/2210.11269v7
[DATE]
2024-04-15 16:24:11+08:00
[CATEGORIES]
cs.LG
σ-GPTs: A New Approach to Autoregressive Models
[AUTHORS]
Arnaud Pannatier, Evann Courdier, François Fleuret
[ABSTRACT]
Autoregressive models, such as the GPT family, use a fixed order, usually
left-to-right, to generate sequences. However, this is not a necessity. In this
paper, we challenge this assumption and show that by simply adding a positional
encoding for the output, this order can be modulated on-the-fly per-sample
which offers key advantageous properties. It allows for the sampling of and
conditioning on arbitrary subsets of tokens, and it also allows sampling in one
shot multiple tokens dynamically according to a rejection strategy, leading to
a sub-linear number of model evaluations. We evaluate our method across various
domains, including language modeling, path-solving, and aircraft vertical rate
prediction, decreasing the number of steps required for generation by an order
of magnitude.
[LINK]
http://arxiv.org/abs/2404.09562v1
[DATE]
2024-04-15 16:22:47+08:00
[CATEGORIES]
cs.LG
A precise symbolic emulator of the linear matter power spectrum
[AUTHORS]
Deaglan J. Bartlett, Lukas Kammerer, Gabriel Kronberger, Harry Desmond, Pedro G. Ferreira, Benjamin D. Wandelt, Bogdan Burlacu, David Alonso, Matteo Zennaro
[ABSTRACT]
Computing the matter power spectrum, $P(k)$, as a function of cosmological
parameters can be prohibitively slow in cosmological analyses, hence emulating
this calculation is desirable. Previous analytic approximations are
insufficiently accurate for modern applications, so black-box, uninterpretable
emulators are often used. We utilise an efficient genetic programming based
symbolic regression framework to explore the space of potential mathematical
expressions which can approximate the power spectrum and $\sigma_8$. We learn
the ratio between an existing low-accuracy fitting function for $P(k)$ and that
obtained by solving the Boltzmann equations and thus still incorporate the
physics which motivated this earlier approximation. We obtain an analytic
approximation to the linear power spectrum with a root mean squared fractional
error of 0.2% between $k = 9\times10^{-3} - 9 \, h{\rm \, Mpc^{-1}}$ and across
a wide range of cosmological parameters, and we provide physical
interpretations for various terms in the expression. Our analytic approximation
is 950 times faster to evaluate than camb and 36 times faster than the neural
network based matter power spectrum emulator BACCO. We also provide a simple
analytic approximation for $\sigma_8$ with a similar accuracy, with a root mean
squared fractional error of just 0.1% when evaluated across the same range of
cosmologies. This function is easily invertible to obtain $A_{\rm s}$ as a
function of $\sigma_8$ and the other cosmological parameters, if preferred. It
is possible to obtain symbolic approximations to a seemingly complex function
at a precision required for current and future cosmological analyses without
resorting to deep-learning techniques, thus avoiding their black-box nature and
large number of parameters. Our emulator will be usable long after the codes on
which numerical approximations are built become outdated.
[COMMENTS]
9 pages, 5 figures. Accepted for publication in A&A
[LINK]
http://arxiv.org/abs/2311.15865v2
[DATE]
2024-04-15 16:18:47+08:00
[CATEGORIES]
cs.LG
Application of the representative measure approach to assess the reliability of decision trees in dealing with unseen vehicle collision data
[AUTHORS]
Javier Perera-Lago, Víctor Toscano-Durán, Eduardo Paluzo-Hidalgo, Sara Narteni, Matteo Rucco
[ABSTRACT]
Machine learning algorithms are fundamental components of novel data-informed
Artificial Intelligence architecture. In this domain, the imperative role of
representative datasets is a cornerstone in shaping the trajectory of
artificial intelligence (AI) development. Representative datasets are needed to
train machine learning components properly. Proper training has multiple
impacts: it reduces the final model’s complexity, power, and uncertainties. In
this paper, we investigate the reliability of the
$\varepsilon$-representativeness method to assess the dataset similarity from a
theoretical perspective for decision trees. We decided to focus on the family
of decision trees because it includes a wide variety of models known to be
explainable. Thus, in this paper, we provide a result guaranteeing that if two
datasets are related by $\varepsilon$-representativeness, i.e., both of them
have points closer than $\varepsilon$, then the predictions by the classic
decision tree are similar. Experimentally, we have also tested that
$\varepsilon$-representativeness presents a significant correlation with the
ordering of the feature importance. Moreover, we extend the results
experimentally in the context of unseen vehicle collision data for XGboost, a
machine-learning component widely adopted for dealing with tabular data.
[LINK]
http://arxiv.org/abs/2404.09541v1
[DATE]
2024-04-15 16:06:54+08:00
[CATEGORIES]
cs.LG
Beyond Noise: Privacy-Preserving Decentralized Learning with Virtual Nodes
[AUTHORS]
Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos
[ABSTRACT]
Decentralized learning (DL) enables collaborative learning without a server
and without training data leaving the users’ devices. However, the models
shared in DL can still be used to infer training data. Conventional privacy
defenses such as differential privacy and secure aggregation fall short in
effectively safeguarding user privacy in DL. We introduce Shatter, a novel DL
approach in which nodes create virtual nodes (VNs) to disseminate chunks of
their full model on their behalf. This enhances privacy by (i) preventing
attackers from collecting full models from other nodes, and (ii) hiding the
identity of the original node that produced a given model chunk. We
theoretically prove the convergence of Shatter and provide a formal analysis
demonstrating how Shatter reduces the efficacy of attacks compared to when
exchanging full models between participating nodes. We evaluate the convergence
and attack resilience of Shatter with existing DL algorithms, with
heterogeneous datasets, and against three standard privacy attacks, including
gradient inversion. Our evaluation shows that Shatter not only renders these
privacy attacks infeasible when each node operates 16 VNs but also exhibits a
positive impact on model convergence compared to standard DL. This enhanced
privacy comes with a manageable increase in communication volume.
[LINK]
http://arxiv.org/abs/2404.09536v1
[DATE]
2024-04-15 15:59:11+08:00
[CATEGORIES]
cs.LG
WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion
[AUTHORS]
Bin Wang, Fei Deng, Peifan Jiang, Shuang Wang, Xiao Han, Hongjie Zheng
[ABSTRACT]
Low-dose computed tomography (LDCT) has become the technology of choice for
diagnostic medical imaging, given its lower radiation dose compared to standard
CT, despite increasing image noise and potentially affecting diagnostic
accuracy. To address this, advanced deep learning-based LDCT denoising
algorithms have been developed, primarily using Convolutional Neural Networks
(CNNs) or Transformer Networks with the Unet architecture. This architecture
enhances image detail by integrating feature maps from the encoder and decoder
via skip connections. However, current methods often overlook enhancements to
the Unet architecture itself, focusing instead on optimizing encoder and
decoder structures. This approach can be problematic due to the significant
differences in feature map characteristics between the encoder and decoder,
where simple fusion strategies may not effectively reconstruct images.In this
paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes
nested, dense skip pathways instead of traditional skip connections to improve
feature integration. WiTUnet also incorporates a windowed Transformer structure
to process images in smaller, non-overlapping segments, reducing computational
load. Additionally, the integration of a Local Image Perception Enhancement
(LiPe) module in both the encoder and decoder replaces the standard multi-layer
perceptron (MLP) in Transformers, enhancing local feature capture and
representation. Through extensive experimental comparisons, WiTUnet has
demonstrated superior performance over existing methods in key metrics such as
Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean
Square Error (RMSE), significantly improving noise removal and image quality.
[LINK]
http://arxiv.org/abs/2404.09533v1
[DATE]
2024-04-15 15:53:07+08:00
[CATEGORIES]
cs.LG
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
[AUTHORS]
Haojun Sun, Chen Tang, Zhi Wang, Yuan Meng, Jingyan jiang, Xinzhu Ma, Wenwu Zhu
[ABSTRACT]
Diffusion models have emerged as preeminent contenders in the realm of
generative models. Distinguished by their distinctive sequential generative
processes, characterized by hundreds or even thousands of timesteps, diffusion
models progressively reconstruct images from pure Gaussian noise, with each
timestep necessitating full inference of the entire model. However, the
substantial computational demands inherent to these models present challenges
for deployment, quantization is thus widely used to lower the bit-width for
reducing the storage and computing overheads. Current quantization
methodologies primarily focus on model-side optimization, disregarding the
temporal dimension, such as the length of the timestep sequence, thereby
allowing redundant timesteps to continue consuming computational resources,
leaving substantial scope for accelerating the generative process. In this
paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and
quantization to achieve a superior performance-efficiency trade-off, addressing
both temporal and model optimization aspects. For timestep reduction, we devise
a non-uniform grouping scheme tailored to the non-uniform nature of the
denoising process, thereby mitigating the explosive combinations of timesteps.
In terms of quantization, we adopt a fine-grained layer-wise approach to
allocate varying bit-widths to different layers based on their respective
contributions to the final generative performance, thus rectifying performance
degradation observed in prior studies. To expedite the evaluation of
fine-grained quantization, we further devise a super-network to serve as a
precision solver by leveraging shared quantization results. These two design
components are seamlessly integrated within our framework, enabling rapid joint
exploration of the exponentially large decision space via a gradient-free
evolutionary search algorithm.
[LINK]
http://arxiv.org/abs/2404.09532v1
[DATE]
2024-04-15 15:51:40+08:00
[CATEGORIES]
cs.LG
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
[AUTHORS]
Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin
[ABSTRACT]
The context window of large language models (LLMs) is rapidly increasing,
leading to a huge variance in resource usage between different requests as well
as between different phases of the same request. Restricted by static
parallelism strategies, existing LLM serving systems cannot efficiently utilize
the underlying resources to serve variable-length requests in different phases.
To address this problem, we propose a new parallelism paradigm, elastic
sequence parallelism (ESP), to elastically adapt to the variance between
different requests and phases. Based on ESP, we design and build LoongServe, an
LLM serving system that (1) improves computation efficiency by elastically
adjusting the degree of parallelism in real-time, (2) improves communication
efficiency by reducing key-value cache migration overhead and overlapping
partial decoding communication with computation, and (3) improves GPU memory
efficiency by reducing key-value cache fragmentation across instances. Our
evaluation under diverse real-world datasets shows that LoongServe improves the
maximum throughput by up to 3.85$\times$ compared to the chunked prefill and
5.81$\times$ compared to the prefill-decoding disaggregation.
[LINK]
http://arxiv.org/abs/2404.09526v1
[DATE]
2024-04-15 15:45:04+08:00
[CATEGORIES]
cs.LG
Dynamic fault detection and diagnosis of industrial alkaline water electrolyzer process with variational Bayesian dictionary learning
[AUTHORS]
Qi Zhang, Lei Xie, Weihua Xu, Hongye Su
[ABSTRACT]
Alkaline Water Electrolysis (AWE) is one of the simplest green hydrogen
production method using renewable energy.
AWE system typically yields process variables that are serially correlated
and contaminated by measurement uncertainty.
A novel robust dynamic variational Bayesian dictionary learning (RDVDL)
monitoring approach is proposed to improve the reliability and safety of AWE
operation.
RDVDL employs a sparse Bayesian dictionary learning to preserve the dynamic
mechanism information of AWE process which allows the easy interpretation of
fault detection results.
To improve the robustness to measurement uncertainty, a low-rank vector
autoregressive (VAR) method is derived to reliably extract the serial
correlation from process variables.
The effectiveness of the proposed approach is demonstrated with an industrial
hydrogen production process, and RDVDL can efficiently detect and diagnose
critical AWE faults.
[LINK]
http://arxiv.org/abs/2404.09524v1
[DATE]
2024-04-15 15:41:35+08:00
[CATEGORIES]
cs.LG
Optimal Inflationary Potentials
[AUTHORS]
Tomás Sousa, Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira
[ABSTRACT]
Inflation is a highly favoured theory for the early Universe. It is
compatible with current observations of the cosmic microwave background and
large scale structure and is a driver in the quest to detect primordial
gravitational waves. It is also, given the current quality of the data, highly
under-determined with a large number of candidate implementations. We use a new
method in symbolic regression to generate all possible simple scalar field
potentials for one of two possible basis sets of operators. Treating these as
single-field, slow-roll inflationary models we then score them with an
information-theoretic metric (“minimum description length”) that quantifies
their efficiency in compressing the information in current data. We explore two
possible priors on the parameter space of potentials, one related to the
functions’ structural complexity and one that uses a Katz back-off language
model to prefer functions that may be theoretically motivated. This enables us
to identify the inflaton potentials that optimally balance simplicity with
accuracy at explaining current data, which may subsequently find theoretical
motivation. Our exploratory study opens the door to extraction of fundamental
physics directly from data, and may be augmented with more refined theoretical
priors in the quest for a complete understanding of the early Universe.
[COMMENTS]
13+4 pages, 4 figures; Accepted for publication in Physical Review D
[LINK]
http://arxiv.org/abs/2310.16786v2
[DATE]
2024-04-15 15:36:00+08:00
[CATEGORIES]
cs.LG
Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control
[AUTHORS]
Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie
[ABSTRACT]
The accuracy of the underlying model predictions is crucial for the success
of model predictive control (MPC) applications. If the model is unable to
accurately analyze the dynamics of the controlled system, the performance and
stability guarantees provided by MPC may not be achieved. Learning-based MPC
can learn models from data, improving the applicability and reliability of MPC.
This study develops a nonlinear sparse variational Bayesian learning based MPC
(NSVB-MPC) for nonlinear systems, where the model is learned by the developed
NSVB method. Variational inference is used by NSVB-MPC to assess the predictive
accuracy and make the necessary corrections to quantify system uncertainty. The
suggested approach ensures input-to-state (ISS) and the feasibility of
recursive constraints in accordance with the concept of an invariant terminal
region. Finally, a PEMFC temperature control model experiment confirms the
effectiveness of the NSVB-MPC method.
[LINK]
http://arxiv.org/abs/2404.09519v1
[DATE]
2024-04-15 15:30:26+08:00
[CATEGORIES]
cs.LG
Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection
[AUTHORS]
Soopil Kim, Sion An, Philip Chikontwe, Myeongkyun Kang, Ehsan Adeli, Kilian M. Pohl, Sang Hyun Park
[ABSTRACT]
Logical anomalies (LA) refer to data violating underlying logical constraints
e.g., the quantity, arrangement, or composition of components within an image.
Detecting accurately such anomalies requires models to reason about various
component types through segmentation. However, curation of pixel-level
annotations for semantic segmentation is both time-consuming and expensive.
Although there are some prior few-shot or unsupervised co-part segmentation
algorithms, they often fail on images with industrial object. These images have
components with similar textures and shapes, and a precise differentiation
proves challenging. In this study, we introduce a novel component segmentation
model for LA detection that leverages a few labeled samples and unlabeled
images sharing logical constraints. To ensure consistent segmentation across
unlabeled images, we employ a histogram matching loss in conjunction with an
entropy loss. As segmentation predictions play a crucial role, we propose to
enhance both local and global sample validity detection by capturing key
aspects from visual semantics via three memory banks: class histograms,
component composition embeddings and patch-level representations. For effective
LA detection, we propose an adaptive scaling strategy to standardize anomaly
scores from different memory banks in inference. Extensive experiments on the
public benchmark MVTec LOCO AD reveal our method achieves 98.1% AUROC in LA
detection vs. 89.6% from competing methods.
[COMMENTS]
Accepted in AAAI2024
[LINK]
http://arxiv.org/abs/2312.13783v2
[DATE]
2024-04-15 15:18:45+08:00
[CATEGORIES]
cs.LG
On the Stability of Expressive Positional Encodings for Graphs
[AUTHORS]
Yinan Huang, William Lu, Joshua Robinson, Yu Yang, Muhan Zhang, Stefanie Jegelka, Pan Li
[ABSTRACT]
Designing effective positional encodings for graphs is key to building
powerful graph transformers and enhancing message-passing graph neural
networks. Although widespread, using Laplacian eigenvectors as positional
encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there
are many different eigendecompositions of the same Laplacian, and (2)
\emph{Instability}: small perturbations to the Laplacian could result in
completely different eigenspaces, leading to unpredictable changes in
positional encoding. Despite many attempts to address non-uniqueness, most
methods overlook stability, leading to poor generalization on unseen graph
structures. We identify the cause of instability to be a “hard partition” of
eigenspaces. Hence, we introduce Stable and Expressive Positional Encodings
(SPE), an architecture for processing eigenvectors that uses eigenvalues to
“softly partition” eigenspaces. SPE is the first architecture that is (1)
provably stable, and (2) universally expressive for basis invariant functions
whilst respecting all symmetries of eigenvectors. Besides guaranteed stability,
we prove that SPE is at least as expressive as existing methods, and highly
capable of counting graph structures. Finally, we evaluate the effectiveness of
our method on molecular property prediction, and out-of-distribution
generalization tasks, finding improved generalization compared to existing
positional encoding methods. Our code is available at
\url{https://github.com/Graph-COM/SPE}.
[COMMENTS]
ICLR 2023
[LINK]
http://arxiv.org/abs/2310.02579v2
[DATE]
2024-04-15 15:11:03+08:00
[CATEGORIES]
cs.LG
Listen to the Waves: Using a Neuronal Model of the Human Auditory System to Predict Ocean Waves
[AUTHORS]
Artur Matysiak, Volker Roeber, Henrik Kalisch, Reinhard König, Patrick J. C. May
[ABSTRACT]
Artificial neural networks (ANNs) have evolved from the 1940s primitive
models of brain function to become tools for artificial intelligence. They
comprise many units, artificial neurons, interlinked through weighted
connections. ANNs are trained to perform tasks through learning rules that
modify the connection weights. With these rules being in the focus of research,
ANNs have become a branch of machine learning developing independently from
neuroscience. Although likely required for the development of truly intelligent
machines, the integration of neuroscience into ANNs has remained a neglected
proposition.
Here, we demonstrate that designing an ANN along biological principles
results in drastically improved task performance. As a challenging real-world
problem, we choose real-time ocean-wave prediction which is essential for
various maritime operations. Motivated by the similarity of ocean waves
measured at a single location to sound waves arriving at the eardrum, we
redesign an echo state network to resemble the brain’s auditory system. This
yields a powerful predictive tool which is computationally lean, robust with
respect to network parameters, and works efficiently across a wide range of sea
states. Our results demonstrate the advantages of integrating neuroscience with
machine learning and offer a tool for use in the production of green energy
from ocean waves.
[COMMENTS]
23 pages, 6 figures
[LINK]
http://arxiv.org/abs/2404.09510v1
[DATE]
2024-04-15 15:06:47+08:00
[CATEGORIES]
cs.LG
Diversity-Preserving K-Armed Bandits, Revisited
[AUTHORS]
Hédi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz
[ABSTRACT]
We consider the bandit-based framework for diversity-preserving
recommendations introduced by Celis et al. (2019), who approached it in the
case of a polytope mainly by a reduction to the setting of linear bandits. We
design a UCB algorithm using the specific structure of the setting and show
that it enjoys a bounded distribution-dependent regret in the natural cases
when the optimal mixed actions put some probability mass on all actions (i.e.,
when diversity is desirable). The regret lower bounds provided show that
otherwise, at least when the model is mean-unbounded, a regret is suffered. We
also discuss an example beyond the special case of polytopes.
[LINK]
http://arxiv.org/abs/2010.01874v2
[DATE]
2024-04-15 14:39:11+08:00
[CATEGORIES]
cs.LG
ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs
[AUTHORS]
Yogesh Verma, Markus Heinonen, Vikas Garg
[ABSTRACT]
Climate and weather prediction traditionally relies on complex numerical
simulations of atmospheric physics. Deep learning approaches, such as
transformers, have recently challenged the simulation paradigm with complex
network forecasts. However, they often act as data-driven black-box models that
neglect the underlying physics and lack uncertainty quantification. We address
these limitations with ClimODE, a spatiotemporal continuous-time process that
implements a key principle of advection from statistical mechanics, namely,
weather changes due to a spatial movement of quantities over time. ClimODE
models precise weather evolution with value-conserving dynamics, learning
global weather transport as a neural flow, which also enables estimating the
uncertainty in predictions. Our approach outperforms existing data-driven
methods in global and regional forecasting with an order of magnitude smaller
parameterization, establishing a new state of the art.
[COMMENTS]
Accepted as ICLR 2024 Oral. Project website:
https://yogeshverma1998.github.io/ClimODE/
[LINK]
http://arxiv.org/abs/2404.10024v1
[DATE]
2024-04-15 14:38:21+08:00
[CATEGORIES]
cs.LG
Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
[AUTHORS]
Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister
[ABSTRACT]
Large Language Models (LLMs), with their remarkable ability to tackle
challenging and unseen reasoning problems, hold immense potential for tabular
learning, that is vital for many real-world applications. In this paper, we
propose a novel in-context learning framework, FeatLLM, which employs LLMs as
feature engineers to produce an input data set that is optimally suited for
tabular predictions. The generated features are used to infer class likelihood
with a simple downstream machine learning model, such as linear regression and
yields high performance few-shot learning. The proposed FeatLLM framework only
uses this simple predictive model with the discovered features at inference
time. Compared to existing LLM-based approaches, FeatLLM eliminates the need to
send queries to the LLM for each sample at inference time. Moreover, it merely
requires API-level access to LLMs, and overcomes prompt size limitations. As
demonstrated across numerous tabular datasets from a wide range of domains,
FeatLLM generates high-quality rules, significantly (10% on average)
outperforming alternatives such as TabLLM and STUNT.
[LINK]
http://arxiv.org/abs/2404.09491v1
[DATE]
2024-04-15 14:26:08+08:00
[CATEGORIES]
cs.LG
Data Imputation with Iterative Graph Reconstruction
[AUTHORS]
Jiajun Zhong, Weiwei Ye, Ning Gui
[ABSTRACT]
Effective data imputation demands rich latent <span style="color:#e74d3c;">structure</span>" discovery
capabilities from
plain” tabular data. Recent advances in graph neural
networks-based data imputation solutions show their strong structure learning
potential by directly translating tabular data as bipartite graphs. However,
due to a lack of relations between samples, those solutions treat all samples
equally which is against one important observation: similar <span style="color:#e74d3c;">sample</span> should
give more information about missing values." This paper presents a novel
Iterative graph <span style="color:#e74d3c;">Generation</span> and Reconstruction framework for Missing data
imputation(IGRM). Instead of treating all samples equally, we introduce the
concept:
friend networks” to represent different relations among samples. To
generate an accurate friend network with missing data, an end-to-end friend
network reconstruction solution is designed to allow for continuous friend
network optimization during imputation learning. The representation of the
optimized friend network, in turn, is used to further optimize the data
imputation process with differentiated message passing. Experiment results on
eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error
compared with nine baselines and 9.04% lower than the second-best. Our code is
available at https://github.com/G-AILab/IGRM.
[COMMENTS]
Published in AAAI2023
[LINK]
http://arxiv.org/abs/2212.02810v2
[DATE]
2024-04-15 14:15:32+08:00
[CATEGORIES]
cs.LG
Differentiable Search for Finding Optimal Quantization Strategy
[AUTHORS]
Lianqiang Li, Chenqian Yan, Yefei Chen
[ABSTRACT]
To accelerate and compress deep neural networks (DNNs), many network
quantization algorithms have been proposed. Although the quantization strategy
of any algorithm from the state-of-the-arts may outperform others in some
network architectures, it is hard to prove the strategy is always better than
others, and even cannot judge that the strategy is always the best choice for
all layers in a network. In other words, existing quantization algorithms are
suboptimal as they ignore the different characteristics of different layers and
quantize all layers by a uniform quantization strategy. To solve the issue, in
this paper, we propose a differentiable quantization strategy search (DQSS) to
assign optimal quantization strategy for individual layer by taking advantages
of the benefits of different quantization algorithms. Specifically, we
formulate DQSS as a differentiable neural architecture search problem and adopt
an efficient convolution to efficiently explore the mixed quantization
strategies from a global perspective by gradient-based optimization. We conduct
DQSS for post-training quantization to enable their performance to be
comparable with that in full precision models. We also employ DQSS in
quantization-aware training for further validating the effectiveness of DQSS.
To circumvent the expensive optimization cost when employing DQSS in
quantization-aware training, we update the hyper-parameters and the network
parameters in a single forward-backward pass. Besides, we adjust the
optimization process to avoid the potential under-fitting problem.
Comprehensive experiments on high level computer vision task, i.e., image
classification, and low level computer vision task, i.e., image
super-resolution, with various network architectures show that DQSS could
outperform the state-of-the-arts.
[LINK]
http://arxiv.org/abs/2404.08010v2
[DATE]
2024-04-15 14:08:51+08:00
[CATEGORIES]
cs.LG
SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection
[AUTHORS]
Yekai Li, Rufan Zhang, Wenxin Rong, Xianghang Mi
[ABSTRACT]
In this study, we introduce SpamDam, a SMS spam detection framework designed
to overcome key challenges in detecting and understanding SMS spam, such as the
lack of public SMS spam datasets, increasing privacy concerns of collecting SMS
data, and the need for adversary-resistant detection models. SpamDam comprises
four innovative modules: an SMS spam radar that identifies spam messages from
online social networks(OSNs); an SMS spam inspector for statistical analysis;
SMS spam detectors(SSDs) that enable both central training and federated
learning; and an SSD analyzer that evaluates model resistance against
adversaries in realistic scenarios. Leveraging SpamDam, we have compiled over
76K SMS spam messages from Twitter and Weibo between 2018 and 2023, forming the
largest dataset of its kind. This dataset has enabled new insights into recent
spam campaigns and the training of high-performing binary and multi-label
classifiers for spam detection. Furthermore, effectiveness of federated
learning has been well demonstrated to enable privacy-preserving SMS spam
detection. Additionally, we have rigorously tested the adversarial robustness
of SMS spam detection models, introducing the novel reverse backdoor attack,
which has shown effectiveness and stealthiness in practical tests.
[LINK]
http://arxiv.org/abs/2404.09481v1
[DATE]
2024-04-15 14:07:10+08:00
[CATEGORIES]
cs.LG
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
[AUTHORS]
Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang
[ABSTRACT]
We introduce PhysGaussian, a new method that seamlessly integrates physically
grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel
motion synthesis. Employing a custom Material Point Method (MPM), our approach
enriches 3D Gaussian kernels with physically meaningful kinematic deformation
and mechanical stress attributes, all evolved in line with continuum mechanics
principles. A defining characteristic of our method is the seamless integration
between physical simulation and visual rendering: both components utilize the
same 3D Gaussian kernels as their discrete representations. This negates the
necessity for triangle/tetrahedron meshing, marching cubes, “cage meshes,” or
any other geometry embedding, highlighting the principle of “what you see is
what you simulate (WS$^2$).” Our method demonstrates exceptional versatility
across a wide variety of materials–including elastic entities, metals,
non-Newtonian fluids, and granular materials–showcasing its strong
capabilities in creating diverse visual content with novel viewpoints and
movements. Our project page is at: https://xpandora.github.io/PhysGaussian/
[COMMENTS]
Accepted by CVPR 2024
[LINK]
http://arxiv.org/abs/2311.12198v3
[DATE]
2024-04-15 14:04:55+08:00
[CATEGORIES]
cs.LG
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching
[AUTHORS]
Akira Ito, Masanori Yamada, Atsutoshi Kumagai
[ABSTRACT]
Recently, Ainsworth et al. showed that using weight matching (WM) to minimize
the $L_2$ distance in a permutation search of model parameters effectively
identifies permutations that satisfy linear mode connectivity (LMC), in which
the loss along a linear path between two independently trained models with
different seeds remains nearly constant. This paper provides a theoretical
analysis of LMC using WM, which is crucial for understanding stochastic
gradient descent’s effectiveness and its application in areas like model
merging. We first experimentally and theoretically show that permutations found
by WM do not significantly reduce the $L_2$ distance between two models and the
occurrence of LMC is not merely due to distance reduction by WM in itself. We
then provide theoretical insights showing that permutations can change the
directions of the singular vectors, but not the singular values, of the weight
matrices in each layer. This finding shows that permutations found by WM mainly
align the directions of singular vectors associated with large singular values
across models. This alignment brings the singular vectors with large singular
values, which determine the model functionality, closer between pre-merged and
post-merged models, so that the post-merged model retains functionality similar
to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze
the difference between WM and straight-through estimator (STE), a
dataset-dependent permutation search method, and show that WM outperforms STE,
especially when merging three or more models.
[COMMENTS]
26 pages
[LINK]
http://arxiv.org/abs/2402.04051v3
[DATE]
2024-04-15 13:57:26+08:00
[CATEGORIES]
cs.LG
Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation?
[AUTHORS]
Dmitry Ignatov, Andrey Ignatov, Radu Timofte
[ABSTRACT]
We present ANYU, a new virtually augmented version of the NYU depth v2
dataset, designed for monocular depth estimation. In contrast to the well-known
approach where full 3D scenes of a virtual world are utilized to generate
artificial datasets, ANYU was created by incorporating RGB-D representations of
virtual reality objects into the original NYU depth v2 images. We specifically
did not match each generated virtual object with an appropriate texture and a
suitable location within the real-world image. Instead, an assignment of
texture, location, lighting, and other rendering parameters was randomized to
maximize a diversity of the training data, and to show that it is randomness
that can improve the generalizing ability of a dataset. By conducting extensive
experiments with our virtually modified dataset and validating on the original
NYU depth v2 and iBims-1 benchmarks, we show that ANYU improves the monocular
depth estimation performance and generalization of deep neural networks with
considerably different architectures, especially for the current
state-of-the-art VPD model. To the best of our knowledge, this is the first
work that augments a real-world dataset with randomly generated virtual 3D
objects for monocular depth estimation. We make our ANYU dataset publicly
available in two training configurations with 10% and 100% additional
synthetically enriched RGB-D pairs of training images, respectively, for
efficient training and empirical exploration of virtual augmentation at
https://github.com/ABrain-One/ANYU
[LINK]
http://arxiv.org/abs/2404.09469v1
[DATE]
2024-04-15 13:44:03+08:00
[CATEGORIES]
cs.LG
Scoring Intervals using Non-hierarchical Transformer For Automatic Piano Transcription
[AUTHORS]
Yujia Yan, Zhiyao Duan
[ABSTRACT]
The neural semi-Markov Conditional Random Field (semi-CRF) framework has
demonstrated promise for event-based piano transcription. In this framework,
all events (notes or pedals) are represented as closed intervals tied to
specific event types. The neural semi-CRF approach requires an interval scoring
matrix that assigns a score for every candidate interval. However, designing an
efficient and expressive architecture for scoring intervals is not trivial. In
this paper, we introduce a simple method for scoring intervals using scaled
inner product operations that resemble how attention scoring is done in
transformers. We show theoretically that, due to the special structure from
encoding the non-overlapping intervals, under a mild condition, the inner
product operations are expressive enough to represent an ideal scoring matrix
that can yield the correct transcription result. We then demonstrate that an
encoder-only non-hierarchical transformer backbone, operating only on a
low-time-resolution feature map, is capable of transcribing piano notes and
pedals with high accuracy and time precision. The experiment shows that our
approach achieves the new state-of-the-art performance across all subtasks in
terms of the F1 measure on the Maestro dataset.
[LINK]
http://arxiv.org/abs/2404.09466v1
[DATE]
2024-04-15 13:35:09+08:00
[CATEGORIES]
cs.LG
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
[AUTHORS]
Yandan Yang, Baoxiong Jia, Peiyuan Zhi, Siyuan Huang
[ABSTRACT]
With recent developments in Embodied Artificial Intelligence (EAI) research,
there has been a growing demand for high-quality, large-scale interactive scene
generation. While prior methods in scene synthesis have prioritized the
naturalness and realism of the generated scenes, the physical plausibility and
interactivity of scenes have been largely left unexplored. To address this
disparity, we introduce PhyScene, a novel method dedicated to generating
interactive 3D scenes characterized by realistic layouts, articulated objects,
and rich physical interactivity tailored for embodied agents. Based on a
conditional diffusion model for capturing scene layouts, we devise novel
physics- and interactivity-based guidance mechanisms that integrate constraints
from object collision, room layout, and object reachability. Through extensive
experiments, we demonstrate that PhyScene effectively leverages these guidance
functions for physically interactable scene synthesis, outperforming existing
state-of-the-art scene synthesis methods by a large margin. Our findings
suggest that the scenes generated by PhyScene hold considerable potential for
facilitating diverse skill acquisition among agents within interactive
environments, thereby catalyzing further advancements in embodied AI research.
Project website: http://physcene.github.io.
[COMMENTS]
Accepted by CVPR 2024, 18 pages
[LINK]
http://arxiv.org/abs/2404.09465v1
[DATE]
2024-04-15 13:29:23+08:00
[CATEGORIES]
cs.LG
A Lightweight Method for Tackling Unknown Participation Statistics in Federated Averaging
[AUTHORS]
Shiqiang Wang, Mingyue Ji
[ABSTRACT]
In federated learning (FL), clients usually have diverse participation
statistics that are unknown a priori, which can significantly harm the
performance of FL if not handled properly. Existing works aiming at addressing
this problem are usually based on global variance reduction, which requires a
substantial amount of additional memory in a multiplicative factor equal to the
total number of clients. An important open problem is to find a lightweight
method for FL in the presence of clients with unknown participation rates. In
this paper, we address this problem by adapting the aggregation weights in
federated averaging (FedAvg) based on the participation history of each client.
We first show that, with heterogeneous participation statistics, FedAvg with
non-optimal aggregation weights can diverge from the optimal solution of the
original FL objective, indicating the need of finding optimal aggregation
weights. However, it is difficult to compute the optimal weights when the
participation statistics are unknown. To address this problem, we present a new
algorithm called FedAU, which improves FedAvg by adaptively weighting the
client updates based on online estimates of the optimal weights without knowing
the statistics of client participation. We provide a theoretical convergence
analysis of FedAU using a novel methodology to connect the estimation error and
convergence. Our theoretical results reveal important and interesting insights,
while showing that FedAU converges to an optimal solution of the original
objective and has desirable properties such as linear speedup. Our experimental
results also verify the advantage of FedAU over baseline methods with various
participation patterns.
[COMMENTS]
Accepted to ICLR 2024
[LINK]
http://arxiv.org/abs/2306.03401v3
[DATE]
2024-04-15 13:13:25+08:00
[CATEGORIES]
cs.LG
Hyperbolic Heterogeneous Graph Attention Networks
[AUTHORS]
Jongmin Park, Seunghoon Han, Soohwan Jeong, Sungsu Lim
[ABSTRACT]
Most previous heterogeneous graph embedding models represent elements in a
heterogeneous graph as vector representations in a low-dimensional Euclidean
space. However, because heterogeneous graphs inherently possess complex
structures, such as hierarchical or power-law structures, distortions can occur
when representing them in Euclidean space. To overcome this limitation, we
propose Hyperbolic Heterogeneous Graph Attention Networks (HHGAT) that learn
vector representations in hyperbolic spaces with meta-path instances. We
conducted experiments on three real-world heterogeneous graph datasets,
demonstrating that HHGAT outperforms state-of-the-art heterogeneous graph
embedding models in node classification and clustering tasks.
[COMMENTS]
Accepted in ACM THE WEB CONFERENCE 2024 short paper track
[LINK]
http://arxiv.org/abs/2404.09456v1
[DATE]
2024-04-15 12:45:49+08:00
[CATEGORIES]
cs.LG
Utility-Fairness Trade-Offs and How to Find Them
[AUTHORS]
Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti
[ABSTRACT]
When building classification systems with demographic fairness
considerations, there are two objectives to satisfy: 1) maximizing utility for
the specific task and 2) ensuring fairness w.r.t. a known demographic
attribute. These objectives often compete, so optimizing both can lead to a
trade-off between utility and fairness. While existing works acknowledge the
trade-offs and study their limits, two questions remain unanswered: 1) What are
the optimal trade-offs between utility and fairness? and 2) How can we
numerically quantify these trade-offs from data for a desired prediction task
and demographic attribute of interest? This paper addresses these questions. We
introduce two utility-fairness trade-offs: the Data-Space and Label-Space
Trade-off. The trade-offs reveal three regions within the utility-fairness
plane, delineating what is fully and partially possible and impossible. We
propose U-FaTE, a method to numerically quantify the trade-offs for a given
prediction task and group fairness definition from data samples. Based on the
trade-offs, we introduce a new scheme for evaluating representations. An
extensive evaluation of fair representation learning methods and
representations from over 1000 pre-trained models revealed that most current
approaches are far from the estimated and achievable fairness-utility
trade-offs across multiple datasets and prediction tasks.
[COMMENTS]
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
[LINK]
http://arxiv.org/abs/2404.09454v1
[DATE]
2024-04-15 12:43:53+08:00
[CATEGORIES]
cs.LG
Towards Greener Nights: Exploring AI-Driven Solutions for Light Pollution Management
[AUTHORS]
Paras Varshney, Niral Desai, Uzair Ahmed
[ABSTRACT]
This research endeavors to address the pervasive issue of light pollution
through an interdisciplinary approach, leveraging data science and machine
learning techniques. By analyzing extensive datasets and research findings, we
aim to develop predictive models capable of estimating the degree of sky glow
observed in various locations and times. Our research seeks to inform
evidence-based interventions and promote responsible outdoor lighting practices
to mitigate the adverse impacts of light pollution on ecosystems, energy
consumption, and human well-being.
[LINK]
http://arxiv.org/abs/2404.09453v1
[DATE]
2024-04-15 12:41:53+08:00
[CATEGORIES]
cs.LG
The Role of Federated Learning in a Wireless World with Foundation Models
[AUTHORS]
Zihan Chen, Howard H. Yang, Y. C. Tay, Kai Fong Ernest Chong, Tony Q. S. Quek
[ABSTRACT]
Foundation models (FMs) are general-purpose artificial intelligence (AI)
models that have recently enabled multiple brand-new generative AI
applications. The rapid advances in FMs serve as an important contextual
backdrop for the vision of next-generation wireless networks, where federated
learning (FL) is a key enabler of distributed network intelligence. Currently,
the exploration of the interplay between FMs and FL is still in its nascent
stage. Naturally, FMs are capable of boosting the performance of FL, and FL
could also leverage decentralized data and computing resources to assist in the
training of FMs. However, the exceptionally high requirements that FMs have for
computing resources, storage, and communication overhead would pose critical
challenges to FL-enabled wireless networks. In this article, we explore the
extent to which FMs are suitable for FL over wireless networks, including a
broad overview of research challenges and opportunities. In particular, we
discuss multiple new paradigms for realizing future intelligent networks that
integrate FMs and FL. We also consolidate several broad research directions
associated with these paradigms.
[COMMENTS]
8 pages, 4 figures, 2 tables. This version has been accepted by IEEE
Wireless Communiactions
[LINK]
http://arxiv.org/abs/2310.04003v2
[DATE]
2024-04-15 12:40:08+08:00
[CATEGORIES]
cs.LG
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
[AUTHORS]
Zhongrui Gui, Shuyang Sun, Runjia Li, Jianhao Yuan, Zhaochong An, Karsten Roth, Ameya Prabhu, Philip Torr
[ABSTRACT]
Rapid advancements in continual segmentation have yet to bridge the gap of
scaling to large continually expanding vocabularies under compute-constrained
scenarios. We discover that traditional continual training leads to
catastrophic forgetting under compute constraints, unable to outperform
zero-shot segmentation methods. We introduce a novel strategy for semantic and
panoptic segmentation with zero forgetting, capable of adapting to continually
growing vocabularies without the need for retraining or large memory costs. Our
training-free approach, kNN-CLIP, leverages a database of instance embeddings
to enable open-vocabulary segmentation approaches to continually expand their
vocabulary on any given domain with a single-pass through data, while only
storing embeddings minimizing both compute and memory costs. This method
achieves state-of-the-art mIoU performance across large-vocabulary semantic and
panoptic segmentation datasets. We hope kNN-CLIP represents a step forward in
enabling more efficient and adaptable continual segmentation, paving the way
for advances in real-world large-vocabulary continual segmentation methods.
[COMMENTS]
10 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.09447v1
[DATE]
2024-04-15 12:20:01+08:00
[CATEGORIES]
cs.LG
Exploring Text-to-Motion Generation with Human Preference
[AUTHORS]
Jenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang, Yong-Jin Liu
[ABSTRACT]
This paper presents an exploration of preference learning in text-to-motion
generation. We find that current improvements in text-to-motion generation
still rely on datasets requiring expert labelers with motion capture systems.
Instead, learning from human preference data does not require motion capture
systems; a labeler with no expertise simply compares two generated motions.
This is particularly efficient because evaluating the model’s output is easier
than gathering the motion that performs a desired task (e.g. backflip). To
pioneer the exploration of this paradigm, we annotate 3,528 preference pairs
generated by MotionGPT, marking the first effort to investigate various
algorithms for learning from preference data. In particular, our exploration
highlights important design choices when using preference data. Additionally,
our experimental results show that preference learning has the potential to
greatly improve current text-to-motion generative models. Our code and dataset
are publicly available at
https://github.com/THU-LYJ-Lab/InstructMotion}{https://github.com/THU-LYJ-Lab/InstructMotion
to further facilitate research in this area.
[COMMENTS]
Accepted to CVPR 2024 HuMoGen Workshop
[LINK]
http://arxiv.org/abs/2404.09445v1
[DATE]
2024-04-15 12:14:42+08:00
[CATEGORIES]
cs.LG
Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network
[AUTHORS]
Jaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng
[ABSTRACT]
Federated learning is an emerging paradigm for decentralized training of
machine learning models on distributed clients, without revealing the data to
the central server. Most existing works have focused on horizontal or vertical
data distributions, where each client possesses different samples with shared
features, or each client fully shares only sample indices, respectively.
However, the hybrid scheme is much less studied, even though it is much more
common in the real world. Therefore, in this paper, we propose a generalized
algorithm, FedGraph, that introduces a graph convolutional neural network to
capture feature-sharing information while learning features from a subset of
clients. We also develop a simple but effective clustering algorithm that
aggregates features produced by the deep neural networks of each client while
preserving data privacy.
[LINK]
http://arxiv.org/abs/2404.09443v1
[DATE]
2024-04-15 12:02:39+08:00
[CATEGORIES]
cs.LG
Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization
[AUTHORS]
Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh
[ABSTRACT]
In this paper, we consider the minimization of a nonsmooth nonconvex
objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of
$\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We
develop a unified framework for developing Lagrangian-based methods, which
takes a single-step update to the primal variables by some subgradient methods
in each iteration. These subgradient methods are “embedded” into our
framework, in the sense that they are incorporated as black-box updates to the
primal variables. We prove that our proposed framework inherits the global
convergence guarantees from these embedded subgradient methods under mild
conditions. In addition, we show that our framework can be extended to solve
constrained optimization problems with expectation constraints. Based on the
proposed framework, we show that a wide range of existing stochastic
subgradient methods, including the proximal SGD, proximal momentum SGD, and
proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary
numerical experiments on deep learning tasks illustrate that our proposed
framework yields efficient variants of Lagrangian-based methods with
convergence guarantees for nonconvex nonsmooth constrained optimization
problems.
[COMMENTS]
30 pages, 4 figures
[LINK]
http://arxiv.org/abs/2404.09438v1
[DATE]
2024-04-15 11:50:47+08:00
[CATEGORIES]
cs.LG
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
[AUTHORS]
Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li
[ABSTRACT]
Stochastic gradient descent~(SGD) and its variants have been the dominating
optimization methods in machine learning. Compared to SGD with small-batch
training, SGD with large-batch training can better utilize the computational
power of current multi-core systems such as graphics processing units~(GPUs)
and can reduce the number of communication rounds in distributed training
settings. Thus, SGD with large-batch training has attracted considerable
attention. However, existing empirical results showed that large-batch training
typically leads to a drop in generalization accuracy. Hence, how to guarantee
the generalization ability in large-batch training becomes a challenging task.
In this paper, we propose a simple yet effective method, called stochastic
normalized gradient descent with momentum~(SNGM), for large-batch training. We
prove that with the same number of gradient computations, SNGM can adopt a
larger batch size than momentum SGD~(MSGD), which is one of the most widely
used variants of SGD, to converge to an $\epsilon$-stationary point. Empirical
results on deep learning verify that when adopting the same large batch size,
SNGM can achieve better test accuracy than MSGD and other state-of-the-art
large-batch training methods.
[LINK]
http://arxiv.org/abs/2007.13985v2
[DATE]
2024-04-15 11:27:58+08:00
[CATEGORIES]
cs.LG
LadleNet: A Two-Stage UNet for Infrared Image to Visible Image Translation Guided by Semantic Segmentation
[AUTHORS]
Tonghui Zou, Lei Chen
[ABSTRACT]
The translation of thermal infrared (TIR) images into visible light (VI)
images plays a critical role in enhancing model performance and generalization
capability, particularly in various fields such as registration and fusion of
TIR and VI images. However, current research in this field faces challenges of
insufficiently realistic image quality after translation and the difficulty of
existing models in adapting to unseen scenarios. In order to develop a more
generalizable image translation architecture, we conducted an analysis of
existing translation architectures. By exploring the interpretability of
intermediate modalities in existing translation architectures, we found that
the intermediate modality in the image translation process for street scene
images essentially performs semantic segmentation, distinguishing street images
based on background and foreground patterns before assigning color information.
Based on these principles, we propose an improved algorithm based on U-net
called LadleNet. This network utilizes a two-stage U-net concatenation
structure, consisting of Handle and Bowl modules. The Handle module is
responsible for constructing an abstract semantic space, while the Bowl module
decodes the semantic space to obtain the mapped VI image. Due to the
characteristic of semantic segmentation, the Handle module has strong
extensibility. Therefore, we also propose LadleNet+, which replaces the Handle
module in LadleNet with a pre-trained DeepLabv3+ network, enabling the model to
have a more powerful capability in constructing semantic space. The proposed
methods were trained and tested on the KAIST dataset, followed by quantitative
and qualitative analysis. Compared to existing methods, LadleNet and LadleNet+
achieved an average improvement of 12.4% and 15.2% in SSIM metrics, and 37.9%
and 50.6% in MS-SSIM metrics, respectively.
[LINK]
http://arxiv.org/abs/2308.06603v3
[DATE]
2024-04-15 11:20:41+08:00
[CATEGORIES]
cs.LG
Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs
[AUTHORS]
Zihan Zhou, Honghao Wei, Lei Ying
[ABSTRACT]
This paper considers the best policy identification (BPI) problem in online
Constrained Markov Decision Processes (CMDPs). We are interested in algorithms
that are model-free, have low regret, and identify an approximately optimal
policy with a high probability. Existing model-free algorithms for online CMDPs
with sublinear regret and constraint violation do not provide any convergence
guarantee to an optimal policy and provide only average performance guarantees
when a policy is uniformly sampled at random from all previously used policies.
In this paper, we develop a new algorithm, named
Pruning-Refinement-Identification (PRI), based on a fundamental structural
property of CMDPs proved before, which we call limited stochasticity. The
property says for a CMDP with $N$ constraints, there exists an optimal policy
with at most $N$ stochastic decisions. The proposed algorithm first identifies
at which step and in which state a stochastic decision has to be taken and then
fine-tunes the distributions of these stochastic decisions. PRI achieves trio
objectives: (i) PRI is a model-free algorithm; and (ii) it outputs an
approximately optimal policy with a high probability at the end of learning;
and (iii) PRI guarantees $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint
violation, which significantly improves the best existing regret bound
$\tilde{\mathcal{O}}(H^4 \sqrt{SA}K^{\frac{4}{5}})$ under a model-free
algorithm, where $H$ is the length of each episode, $S$ is the number of
states, $A$ is the number of actions, and the total number of episodes during
learning is $2K+\tilde{\cal O}(K^{0.25}).$ We further present a matching lower
via an example that shows under any online learning algorithm, there exists a
well-separated CMDP instance such that either the regret or violation has to be
$\Omega(H\sqrt{K}),$ which matches the upper bound by a polylogarithmic factor.
[LINK]
http://arxiv.org/abs/2309.15395v5
[DATE]
2024-04-15 11:20:29+08:00
[CATEGORIES]
cs.LG
Stochastic Hessian Fittings with Lie Groups
[AUTHORS]
Xi-Lin Li
[ABSTRACT]
This paper studies the fitting of Hessian or its inverse for stochastic
optimizations using a Hessian fitting criterion from the preconditioned
stochastic gradient descent (PSGD) method, which is intimately related to many
commonly used second order and adaptive gradient optimizers, e.g., BFGS,
Gaussian-Newton and natural gradient descent, AdaGrad, etc. Our analyses reveal
the efficiency and reliability differences among a wide range of preconditioner
fitting methods, from closed-form to iterative solutions, using Hessian-vector
products or stochastic gradients only, with Hessian fittings in the Euclidean
space, the manifold of symmetric positive definite (SPL) matrices, to a variety
of Lie groups. The most intriguing discovery is that the Hessian fitting itself
as an optimization problem is strongly convex under mild conditions with a
specific yet general enough Lie group. This discovery turns Hessian fitting
into a well behaved optimization problem, and facilitates the designs of highly
efficient and elegant Lie group sparse preconditioner fitting methods for large
scale stochastic optimizations.
[COMMENTS]
13 pages, 6 figures, 3 tables
[LINK]
http://arxiv.org/abs/2402.11858v3
[DATE]
2024-04-15 10:53:41+08:00
[CATEGORIES]
cs.LG
Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
[AUTHORS]
Junpei Komiyama
[ABSTRACT]
We consider the fixed-budget best arm identification problem with rewards
following normal distributions. In this problem, the forecaster is given $K$
arms (or treatments) and $T$ time steps. The forecaster attempts to find the
arm with the largest mean, via an adaptive experiment conducted using an
algorithm. The algorithm’s performance is evaluated by simple regret,
reflecting the quality of the estimated best arm. While frequentist simple
regret can decrease exponentially with respect to $T$, Bayesian simple regret
decreases polynomially. This paper demonstrates that the Bayes optimal
algorithm, which minimizes the Bayesian simple regret, does not yield an
exponential decrease in simple regret under certain parameter settings. This
contrasts with the numerous findings that suggest the asymptotic equivalence of
Bayesian and frequentist approaches in fixed sampling regimes. Although the
Bayes optimal algorithm is formulated as a recursive equation that is virtually
impossible to compute exactly, we lay the groundwork for future research by
introducing a novel concept termed the expected Bellman improvement.
[LINK]
http://arxiv.org/abs/2202.05193v3
[DATE]
2024-04-15 10:46:34+08:00
[CATEGORIES]
cs.LG
On the Optimal Regret of Locally Private Linear Contextual Bandit
[AUTHORS]
Jiachun Li, David Simchi-Levi, Yining Wang
[LINK]
http://arxiv.org/abs/2404.09413v1
[DATE]
2024-04-15 10:00:24+08:00
[CATEGORIES]
cs.LG
Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers
[AUTHORS]
Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cassandra Burdziak, Tal Nawy, Anna Gilbert, Dana Pe’er
[ABSTRACT]
Optimal transport (OT) and the related Wasserstein metric (W) are powerful
and ubiquitous tools for comparing distributions. However, computing pairwise
Wasserstein distances rapidly becomes intractable as cohort size grows. An
attractive alternative would be to find an embedding space in which pairwise
Euclidean distances map to OT distances, akin to standard multidimensional
scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder
that embeds empirical distributions into a latent space wherein Euclidean
distances approximate OT distances. Extending MDS theory, we show that our
objective function implies a bound on the error incurred when embedding
non-Euclidean distances. Empirically, distances between Wormhole embeddings
closely match Wasserstein distances, enabling linear time computation of OT
distances. Along with an encoder that maps distributions to embeddings,
Wasserstein Wormhole includes a decoder that maps embeddings back to
distributions, allowing for operations in the embedding space to generalize to
OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By
lending scalability and interpretability to OT approaches, Wasserstein Wormhole
unlocks new avenues for data analysis in the fields of computational geometry
and single-cell biology.
[COMMENTS]
23 Figures, 7 main figures, 2 supplemental figures
[LINK]
http://arxiv.org/abs/2404.09411v1
[DATE]
2024-04-15 09:58:18+08:00
[CATEGORIES]
cs.LG
Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning
[AUTHORS]
Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan
[ABSTRACT]
Integrating and processing information from various sources or modalities are
critical for obtaining a comprehensive and accurate perception of the real
world in autonomous systems and cyber-physical systems. Drawing inspiration
from neuroscience, we develop the Information-Theoretic Hierarchical Perception
(ITHP) model, which utilizes the concept of information bottleneck. Different
from most traditional fusion models that incorporate all modalities identically
in neural networks, our model designates a prime modality and regards the
remaining modalities as detectors in the information pathway, serving to
distill the flow of information. Our proposed perception model focuses on
constructing an effective and compact information flow by achieving a balance
between the minimization of mutual information between the latent state and the
input modal state, and the maximization of mutual information between the
latent states and the remaining modal states. This approach leads to compact
latent state representations that retain relevant information while minimizing
redundancy, thereby substantially enhancing the performance of multimodal
representation learning. Experimental evaluations on the MUStARD, CMU-MOSI, and
CMU-MOSEI datasets demonstrate that our model consistently distills crucial
information in multimodal learning scenarios, outperforming state-of-the-art
benchmarks. Remarkably, on the CMU-MOSI dataset, ITHP surpasses human-level
performance in the multimodal sentiment binary classification task across all
evaluation metrics (i.e., Binary Accuracy, F1 Score, Mean Absolute Error, and
Pearson Correlation).
[COMMENTS]
The Twelfth International Conference on Learning Representations.
arXiv admin note: text overlap with arXiv:2309.15877
[LINK]
http://arxiv.org/abs/2404.09403v1
[DATE]
2024-04-15 09:34:44+08:00
[CATEGORIES]
cs.LG
Neural McKean-Vlasov Processes: Distributional Dependence in Diffusion Processes
[AUTHORS]
Haoming Yang, Ali Hasan, Yuting Ng, Vahid Tarokh
[ABSTRACT]
McKean-Vlasov stochastic differential equations (MV-SDEs) provide a
mathematical description of the behavior of an infinite number of interacting
particles by imposing a dependence on the particle density. As such, we study
the influence of explicitly including distributional information in the
parameterization of the SDE. We propose a series of semi-parametric methods for
representing MV-SDEs, and corresponding estimators for inferring parameters
from data based on the properties of the MV-SDE. We analyze the characteristics
of the different architectures and estimators, and consider their applicability
in relevant machine learning problems. We empirically compare the performance
of the different architectures and estimators on real and synthetic datasets
for time series and probabilistic modeling. The results suggest that explicitly
including distributional dependence in the parameterization of the SDE is
effective in modeling temporal data with interaction under an exchangeability
assumption while maintaining strong performance for standard It\^o-SDEs due to
the richer class of probability flows associated with MV-SDEs.
[COMMENTS]
Appears in AISTATS 2024
[LINK]
http://arxiv.org/abs/2404.09402v1
[DATE]
2024-04-15 09:28:16+08:00
[CATEGORIES]
cs.LG
OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
[AUTHORS]
Robik Shrestha, Kushal Kafle, Christopher Kanan
[ABSTRACT]
Dataset bias and spurious correlations can significantly impair
generalization in deep neural networks. Many prior efforts have addressed this
problem using either alternative loss functions or sampling strategies that
focus on rare patterns. We propose a new direction: modifying the network
architecture to impose inductive biases that make the network robust to dataset
bias. Specifically, we propose OccamNets, which are biased to favor simpler
solutions by design. OccamNets have two inductive biases. First, they are
biased to use as little network depth as needed for an individual example.
Second, they are biased toward using fewer image locations for prediction.
While OccamNets are biased toward simpler hypotheses, they can learn more
complex hypotheses if necessary. In experiments, OccamNets outperform or rival
state-of-the-art methods run on architectures that do not incorporate these
inductive biases. Furthermore, we demonstrate that when the state-of-the-art
debiasing methods are combined with OccamNets results further improve.
[COMMENTS]
ECCV 2022
[LINK]
http://arxiv.org/abs/2204.02426v5
[DATE]
2024-04-15 09:11:48+08:00
[CATEGORIES]
cs.LG
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
[AUTHORS]
Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton
[ABSTRACT]
To improve the efficiency of reinforcement learning, we propose a novel
asynchronous federated reinforcement learning framework termed AFedPG, which
constructs a global model through collaboration among $N$ agents using policy
gradient (PG) updates. To handle the challenge of lagged policies in
asynchronous settings, we design delay-adaptive lookahead and normalized update
techniques that can effectively handle the heterogeneous arrival times of
policy gradients. We analyze the theoretical global convergence bound of
AFedPG, and characterize the advantage of the proposed algorithm in terms of
both the sample complexity and time complexity. Specifically, our AFedPG method
achieves $\mathcal{O}(\frac{{\epsilon}^{-2.5}}{N})$ sample complexity at each
agent on average. Compared to the single agent setting with
$\mathcal{O}(\epsilon^{-2.5})$ sample complexity, it enjoys a linear speedup
with respect to the number of agents. Moreover, compared to synchronous FedPG,
AFedPG improves the time complexity from $\mathcal{O}(\frac{t_{\max}}{N})$ to
$\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$, where $t_{i}$ denotes
the time consumption in each iteration at the agent $i$, and $t_{\max}$ is the
largest one. The latter complexity $\mathcal{O}(\frac{1}{\sum_{i=1}^{N}
\frac{1}{t_{i}}})$ is always smaller than the former one, and this improvement
becomes significant in large-scale federated settings with heterogeneous
computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the
improved performances of AFedPG in three MuJoCo environments with varying
numbers of agents. We also demonstrate the improvements with different
computing heterogeneity.
[LINK]
http://arxiv.org/abs/2404.08003v2
[DATE]
2024-04-15 08:59:59+08:00
[CATEGORIES]
cs.LG
An Autoencoder-Based Constellation Design for AirComp in Wireless Federated Learning
[AUTHORS]
Yujia Mu, Xizixiang Wei, Cong Shen
[ABSTRACT]
Wireless federated learning (FL) relies on efficient uplink communications to
aggregate model updates across distributed edge devices. Over-the-air
computation (a.k.a. AirComp) has emerged as a promising approach for addressing
the scalability challenge of FL over wireless links with limited communication
resources. Unlike conventional methods, AirComp allows multiple edge devices to
transmit uplink signals simultaneously, enabling the parameter server to
directly decode the average global model. However, existing AirComp solutions
are intrinsically analog, while modern wireless systems predominantly adopt
digital modulations. Consequently, careful constellation designs are necessary
to accurately decode the sum model updates without ambiguity. In this paper, we
propose an end-to-end communication system supporting AirComp with digital
modulation, aiming to overcome the challenges associated with accurate decoding
of the sum signal with constellation designs. We leverage autoencoder network
structures and explore the joint optimization of transmitter and receiver
components. Our approach fills an important gap in the context of accurately
decoding the sum signal in digital modulation-based AirComp, which can advance
the deployment of FL in contemporary wireless systems.
[LINK]
http://arxiv.org/abs/2404.09392v1
[DATE]
2024-04-15 08:25:12+08:00
[CATEGORIES]
cs.LG
Privacy at a Price: Exploring its Dual Impact on AI Fairness
[AUTHORS]
Mengmeng Yang, Ming Ding, Youyang Qu, Wei Ni, David Smith, Thierry Rakotoarivelo
[ABSTRACT]
The worldwide adoption of machine learning (ML) and deep learning models,
particularly in critical sectors, such as healthcare and finance, presents
substantial challenges in maintaining individual privacy and fairness. These
two elements are vital to a trustworthy environment for learning systems. While
numerous studies have concentrated on protecting individual privacy through
differential privacy (DP) mechanisms, emerging research indicates that
differential privacy in machine learning models can unequally impact separate
demographic subgroups regarding prediction accuracy. This leads to a fairness
concern, and manifests as biased performance. Although the prevailing view is
that enhancing privacy intensifies fairness disparities, a smaller, yet
significant, subset of research suggests the opposite view. In this article,
with extensive evaluation results, we demonstrate that the impact of
differential privacy on fairness is not monotonous. Instead, we observe that
the accuracy disparity initially grows as more DP noise (enhanced privacy) is
added to the ML process, but subsequently diminishes at higher privacy levels
with even more noise. Moreover, implementing gradient clipping in the
differentially private stochastic gradient descent ML method can mitigate the
negative impact of DP noise on fairness. This mitigation is achieved by
moderating the disparity growth through a lower clipping threshold.
[LINK]
http://arxiv.org/abs/2404.09391v1
[DATE]
2024-04-15 08:23:41+08:00
[CATEGORIES]
cs.LG
Masked and Shuffled Blind Spot Denoising for Real-World Images
[AUTHORS]
Hamadi Chihaoui, Paolo Favaro
[ABSTRACT]
We introduce a novel approach to single image denoising based on the Blind
Spot Denoising principle, which we call MAsked and SHuffled Blind Spot
Denoising (MASH). We focus on the case of correlated noise, which often plagues
real images. MASH is the result of a careful analysis to determine the
relationships between the level of blindness (masking) of the input and the
(unknown) noise correlation. Moreover, we introduce a shuffling technique to
weaken the local correlation of noise, which in turn yields an additional
denoising performance improvement. We evaluate MASH via extensive experiments
on real-world noisy image datasets. We demonstrate on par or better results
compared to existing self-supervised denoising methods.
[LINK]
http://arxiv.org/abs/2404.09389v1
[DATE]
2024-04-15 08:19:47+08:00
[CATEGORIES]
cs.LG
Spatiotemporal k-means
[AUTHORS]
Olga Dorabiala, Devavrat Vivek Dabke, Jennifer Webster, Nathan Kutz, Aleksandr Aravkin
[ABSTRACT]
Spatiotemporal data is increasingly available due to emerging sensor and data
acquisition technologies that track moving objects. Spatiotemporal clustering
addresses the need to efficiently discover patterns and trends in moving object
behavior without human supervision. One application of interest is the
discovery of moving clusters, where clusters have a static identity, but their
location and content can change over time. We propose a two phase
spatiotemporal clustering method called spatiotemporal k-means (STkM) that is
able to analyze the multi-scale relationships within spatiotemporal data. By
optimizing an objective function that is unified over space and time, the
method can track dynamic clusters at both short and long timescales with
minimal parameter tuning and no post-processing. We begin by proposing a
theoretical generating model for spatiotemporal data and prove the efficacy of
STkM in this setting. We then evaluate STkM on a recently developed collective
animal behavior benchmark dataset and show that STkM outperforms baseline
methods in the low-data limit, which is a critical regime of consideration in
many emerging applications. Finally, we showcase how STkM can be extended to
more complex machine learning tasks, particularly unsupervised region of
interest detection and tracking in videos.
[COMMENTS]
18 pages, 5 figures
[LINK]
http://arxiv.org/abs/2211.05337v2
[DATE]
2024-04-15 08:19:41+08:00
[CATEGORIES]
cs.LG
RankCLIP: Ranking-Consistent Language-Image Pretraining
[AUTHORS]
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zhili Feng, Zenghui Ding, Yining Sun
[ABSTRACT]
Among the ever-evolving development of vision-language models, contrastive
language-image pretraining (CLIP) has set new benchmarks in many downstream
tasks such as zero-shot classifications by leveraging self-supervised
contrastive learning on large amounts of text-image pairs. However, its
dependency on rigid one-to-one mappings overlooks the complex and often
multifaceted relationships between and within texts and images. To this end, we
introduce RankCLIP, a novel pretraining method that extends beyond the rigid
one-to-one matching framework of CLIP and its variants. By leveraging both
in-modal and cross-modal ranking consistency, RankCLIP improves the alignment
process, enabling it to capture the nuanced many-to-many relationships between
and within each modality. Through comprehensive experiments, we demonstrate the
enhanced capability of RankCLIP to effectively improve performance across
various downstream tasks, notably achieving significant gains in zero-shot
classifications over state-of-the-art methods, underscoring the potential of
RankCLIP in further advancing vision-language pretraining.
[COMMENTS]
10 pages, 3 figures, 6 tables. Code and model checkpoints are
available at https://github.com/Jam1ezhang/RankCLIP
[LINK]
http://arxiv.org/abs/2404.09387v1
[DATE]
2024-04-15 08:12:27+08:00
[CATEGORIES]
cs.LG
Integrating Marketing Channels into Quantile Transformation and Bayesian Optimization of Ensemble Kernels for Sales Prediction with Gaussian Process Models
[AUTHORS]
Shahin Mirshekari, Negin Hayeri Motedayen, Mohammad Ensaf
[ABSTRACT]
This study introduces an innovative Gaussian Process (GP) model utilizing an
ensemble kernel that integrates Radial Basis Function (RBF), Rational
Quadratic, and Mat'ern kernels for product sales forecasting. By applying
Bayesian optimization, we efficiently find the optimal weights for each kernel,
enhancing the model’s ability to handle complex sales data patterns. Our
approach significantly outperforms traditional GP models, achieving a notable
98\% accuracy and superior performance across key metrics including Mean
Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE),
and Coefficient of Determination ($R^2$). This advancement underscores the
effectiveness of ensemble kernels and Bayesian optimization in improving
predictive accuracy, offering profound implications for machine learning
applications in sales forecasting.
[COMMENTS]
11 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.09386v1
[DATE]
2024-04-15 08:11:01+08:00
[CATEGORIES]
cs.LG
Sampling-based Distributed Training with Message Passing Neural Network
[AUTHORS]
Priyesh Kakka, Sheel Nidhan, Rishikesh Ranade, Jonathan F. MacArt
[ABSTRACT]
In this study, we introduce a domain-decomposition-based distributed training
and inference approach for message-passing neural networks (MPNN). Our
objective is to address the challenge of scaling edge-based graph neural
networks as the number of nodes increases. Through our distributed training
approach, coupled with Nystr"om-approximation sampling techniques, we present
a scalable graph neural network, referred to as DS-MPNN (D and S standing for
distributed and sampled, respectively), capable of scaling up to $O(10^5)$
nodes. We validate our sampling and distributed training approach on two cases:
(a) a Darcy flow dataset and (b) steady RANS simulations of 2-D airfoils,
providing comparisons with both single-GPU implementation and node-based graph
convolution networks (GCNs). The DS-MPNN model demonstrates comparable accuracy
to single-GPU implementation, can accommodate a significantly larger number of
nodes compared to the single-GPU variant (S-MPNN), and significantly
outperforms the node-based GCN.
[LINK]
http://arxiv.org/abs/2402.15106v2
[DATE]
2024-04-15 08:10:25+08:00
[CATEGORIES]
cs.LG
Convex SGD: Generalization Without Early Stopping
[AUTHORS]
Julien Hendrickx, Alex Olshevsky
[ABSTRACT]
We consider the generalization error associated with stochastic gradient
descent on a smooth convex function over a compact set. We show the first bound
on the generalization error that vanishes when the number of iterations $T$ and
the dataset size $n$ go to zero at arbitrary rates; our bound scales as
$\tilde{O}(1/\sqrt{T} + 1/\sqrt{n})$ with step-size $\alpha_t = 1/\sqrt{t}$. In
particular, strong convexity is not needed for stochastic gradient descent to
generalize well.
[LINK]
http://arxiv.org/abs/2401.04067v2
[DATE]
2024-04-15 07:26:06+08:00
[CATEGORIES]
cs.LG
Tighter Generalization Bounds on Digital Computers via Discrete Optimal Transport
[AUTHORS]
Anastasis Kratsios, A. Martina Neuman, Gudmund Pammer
[ABSTRACT]
Machine learning models with inputs in a Euclidean space $\mathbb{R}^d$, when
implemented on digital computers, generalize, and their {\it generalization
gap} converges to $0$ at a rate of $c/N^{1/2}$ concerning the sample size $N$.
However, the constant $c>0$ obtained through classical methods can be large in
terms of the ambient dimension $d$ and the machine precision, posing a
challenge when $N$ is small to realistically large. In this paper, we derive a
family of generalization bounds $\{c_m/N^{1/(2\vee m)}\}{m=1}^{\infty}$
tailored for learning models on digital computers, which adapt to both the
sample size $N$ and the so-called geometric {\it representation dimension} $m$
of the discrete learning problem. Adjusting the parameter $m$ according to $N$
results in significantly tighter generalization bounds for practical sample
sizes $N$, while setting $m$ small maintains the optimal dimension-free
worst-case rate of $\mathcal{O}(1/N^{1/2})$. Notably, $c{m}\in
\mathcal{O}(\sqrt{m})$ for learning models on discretized Euclidean domains.
Furthermore, our adaptive generalization bounds are formulated based on our
new non-asymptotic result for concentration of measure in discrete optimal
transport, established via leveraging metric embedding arguments.
[LINK]
http://arxiv.org/abs/2402.05576v2
[DATE]
2024-04-15 07:17:15+08:00
[CATEGORIES]
cs.LG
Trajeglish: Traffic Modeling as Next-Token Prediction
[AUTHORS]
Jonah Philion, Xue Bin Peng, Sanja Fidler
[ABSTRACT]
A longstanding challenge for self-driving development is simulating dynamic
driving scenarios seeded from recorded driving logs. In pursuit of this
functionality, we apply tools from discrete sequence modeling to model how
vehicles, pedestrians and cyclists interact in driving scenarios. Using a
simple data-driven tokenization scheme, we discretize trajectories to
centimeter-level resolution using a small vocabulary. We then model the
multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder
that is autoregressive in time and takes into account intra-timestep
interaction between agents. Scenarios sampled from our model exhibit
state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark,
surpassing prior work along the realism meta metric by 3.3% and along the
interaction metric by 9.9%. We ablate our modeling choices in full autonomy and
partial autonomy settings, and show that the representations learned by our
model can quickly be adapted to improve performance on nuScenes. We
additionally evaluate the scalability of our model with respect to parameter
count and dataset size, and use density estimates from our model to quantify
the saliency of context length and intra-timestep interaction for the traffic
modeling task.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2312.04535v2
[DATE]
2024-04-15 06:51:18+08:00
[CATEGORIES]
cs.LG
FiP: a Fixed-Point Approach for Causal Generative Modeling
[AUTHORS]
Meyer Scetbon, Joel Jennings, Agrin Hilmkil, Cheng Zhang, Chao Ma
[ABSTRACT]
Modeling true world data-generating processes lies at the heart of empirical
science. Structural Causal Models (SCMs) and their associated Directed Acyclic
Graphs (DAGs) provide an increasingly popular answer to such problems by
defining the causal generative process that transforms random noise into
observations. However, learning them from observational data poses an ill-posed
and NP-hard inverse problem in general. In this work, we propose a new and
equivalent formalism that does not require DAGs to describe them, viewed as
fixed-point problems on the causally ordered variables, and we show three
important cases where they can be uniquely recovered given the topological
ordering (TO). To the best of our knowledge, we obtain the weakest conditions
for their recovery when TO is known. Based on this, we design a two-stage
causal generative model that first infers the causal order from observations in
a zero-shot manner, thus by-passing the search, and then learns the generative
fixed-point SCM on the ordered variables. To infer TOs from observations, we
propose to amortize the learning of TOs on generated datasets by sequentially
predicting the leaves of graphs seen during training. To learn fixed-point
SCMs, we design a transformer-based architecture that exploits a new attention
mechanism enabling the modeling of causal structures, and show that this
parameterization is consistent with our formalism. Finally, we conduct an
extensive evaluation of each method individually, and show that when combined,
our model outperforms various baselines on generated out-of-distribution
problems.
[LINK]
http://arxiv.org/abs/2404.06969v2
[DATE]
2024-04-15 06:44:11+08:00
[CATEGORIES]
cs.LG
Hierarchical Attention Models for Multi-Relational Graphs
[AUTHORS]
Roshni G. Iyer, Wei Wang, Yizhou Sun
[ABSTRACT]
We present Bi-Level Attention-Based Relational Graph Convolutional Networks
(BR-GCN), unique neural network architectures that utilize masked
self-attentional layers with relational graph convolutions, to effectively
operate on highly multi-relational data. BR-GCN models use bi-level attention
to learn node embeddings through (1) node-level attention, and (2)
relation-level attention. The node-level self-attentional layers use
intra-relational graph interactions to learn relation-specific node embeddings
using a weighted aggregation of neighborhood features in a sparse subgraph
region. The relation-level self-attentional layers use inter-relational graph
interactions to learn the final node embeddings using a weighted aggregation of
relation-specific node embeddings. The BR-GCN bi-level attention mechanism
extends Transformer-based multiplicative attention from the natural language
processing (NLP) domain, and Graph Attention Networks (GAT)-based attention, to
large-scale heterogeneous graphs (HGs). On node classification, BR-GCN
outperforms baselines from 0.29% to 14.95% as a stand-alone model, and on link
prediction, BR-GCN outperforms baselines from 0.02% to 7.40% as an auto-encoder
model. We also conduct ablation studies to evaluate the quality of BR-GCN’s
relation-level attention and discuss how its learning of graph structure may be
transferred to enrich other graph neural networks (GNNs). Through various
experiments, we show that BR-GCN’s attention mechanism is both scalable and
more effective in learning compared to state-of-the-art GNNs.
[LINK]
http://arxiv.org/abs/2404.09365v1
[DATE]
2024-04-15 05:37:39+08:00
[CATEGORIES]
cs.LG
Momentum-based gradient descent methods for Lie groups
[AUTHORS]
Cédric M. Campos, David Martín de Diego, José Torrente
[ABSTRACT]
Polyak’s Heavy Ball (PHB; Polyak, 1964), a.k.a. Classical Momentum, and
Nesterov’s Accelerated Gradient (NAG; Nesterov, 1983) are well know examples of
momentum-descent methods for optimization. While the latter outperforms the
former, solely generalizations of PHB-like methods to nonlinear spaces have
been described in the literature. We propose here a generalization of NAG-like
methods for Lie group optimization based on the variational one-to-one
correspondence between classical and accelerated momentum methods (Campos et
al., 2023). Numerical experiments are shown.
[COMMENTS]
24 pages, 2 algorithms, 5 figures
[LINK]
http://arxiv.org/abs/2404.09363v1
[DATE]
2024-04-15 05:30:00+08:00
[CATEGORIES]
cs.LG
Exploring Feedback Generation in Automated Skeletal Movement Assessment: A Comprehensive Overview
[AUTHORS]
Tal Hakim
[ABSTRACT]
The application of machine-learning solutions to movement assessment from
skeleton videos has attracted significant research attention in recent years.
This advancement has made rehabilitation at home more accessible, utilizing
movement assessment algorithms that can operate on affordable equipment for
human pose detection from 2D or 3D videos. While the primary objective of
automatic assessment tasks is to score movements, the automatic generation of
feedback highlighting key movement issues has the potential to significantly
enhance and accelerate the rehabilitation process. In this study, we explain
the types of feedback that can be generated, review existing solutions for
automatic feedback generation, and discuss future research directions. To our
knowledge, this is the first comprehensive review of feedback generation in
skeletal movement assessment.
[LINK]
http://arxiv.org/abs/2404.09359v1
[DATE]
2024-04-15 05:14:47+08:00
[CATEGORIES]
cs.LG
Can AI Understand Our Universe? Test of Fine-Tuning GPT by Astrophysical Data
[AUTHORS]
Yu Wang, Shu-Rui Zhang, Aidin Momtaz, Rahim Moradi, Fatemeh Rastegarnia, Narek Sahakyan, Soroush Shakeri, Liang Li
[ABSTRACT]
ChatGPT has been the most talked-about concept in recent months, captivating
both professionals and the general public alike, and has sparked discussions
about the changes that artificial intelligence (AI) will bring to the world. As
physicists and astrophysicists, we are curious about if scientific data can be
correctly analyzed by large language models (LLMs) and yield accurate physics.
In this article, we fine-tune the generative pre-trained transformer (GPT)
model by the astronomical data from the observations of galaxies, quasars,
stars, gamma-ray bursts (GRBs), and the simulations of black holes (BHs), the
fine-tuned model demonstrates its capability to classify astrophysical
phenomena, distinguish between two types of GRBs, deduce the redshift of
quasars, and estimate BH parameters. We regard this as a successful test,
marking the LLM’s proven efficacy in scientific research. With the ever-growing
volume of multidisciplinary data and the advancement of AI technology, we look
forward to the emergence of a more fundamental and comprehensive understanding
of our universe. This article also shares some interesting thoughts on data
collection and AI design. Using the approach of understanding the universe -
looking outward at data and inward for fundamental building blocks - as a
guideline, we propose a method of series expansion for AI, suggesting ways to
train and control AI that is smarter than humans.
[COMMENTS]
27 pages, 7 figures. Comments welcome
[LINK]
http://arxiv.org/abs/2404.10019v1
[DATE]
2024-04-15 04:52:19+08:00
[CATEGORIES]
cs.LG
Machine learning-based identification of Gaia astrometric exoplanet orbits
[AUTHORS]
Johannes Sahlmann, Pablo Gómez
[ABSTRACT]
The third Gaia data release (DR3) contains $\sim$170 000 astrometric orbit
solutions of two-body systems located within $\sim$500 pc of the Sun.
Determining component masses in these systems, in particular of stars hosting
exoplanets, usually hinges on incorporating complementary observations in
addition to the astrometry, e.g. spectroscopy and radial velocities. Several
DR3 two-body systems with exoplanet, brown-dwarf, stellar, and black-hole
components have been confirmed in this way. We developed an alternative machine
learning approach that uses only the DR3 orbital solutions with the aim of
identifying the best candidates for exoplanets and brown-dwarf companions.
Based on confirmed substellar companions in the literature, we use
semi-supervised anomaly detection methods in combination with extreme gradient
boosting and random forest classifiers to determine likely low-mass outliers in
the population of non-single sources. We employ and study feature importance to
investigate the method’s plausibility and produced a list of 22 best candidates
of which four are exoplanet candidates and another five are either very-massive
brown dwarfs or very-low mass stars. Three candidates, including one initial
exoplanet candidate, correspond to false-positive solutions where longer-period
binary star motion was fitted with a biased shorter-period orbit. We highlight
nine candidates with brown-dwarf companions for preferential follow-up. One
candidate companion around the Sun-like star G 15-6 could be confirmed as a
genuine brown dwarf using external radial-velocity data. This new approach is a
powerful complement to the traditional identification methods for substellar
companions among Gaia astrometric orbits. It is particularly relevant in the
context of Gaia DR4 and its expected exoplanet discovery yield.
[COMMENTS]
14 pages, 15 figures. Submitted to MNRAS. Comments are welcome
[LINK]
http://arxiv.org/abs/2404.09350v1
[DATE]
2024-04-15 04:17:14+08:00
[CATEGORIES]
cs.LG
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
[AUTHORS]
Brian R. Bartoldson, James Diffenderfer, Konstantinos Parasyris, Bhavya Kailkhura
[ABSTRACT]
This paper revisits the simple, long-studied, yet still unsolved problem of
making image classifiers robust to imperceptible perturbations. Taking CIFAR10
as an example, SOTA clean accuracy is about $100$%, but SOTA robustness to
$\ell_{\infty}$-norm bounded perturbations barely exceeds $70$%. To understand
this gap, we analyze how model size, dataset size, and synthetic data quality
affect robustness by developing the first scaling laws for adversarial
training. Our scaling laws reveal inefficiencies in prior art and provide
actionable feedback to advance the field. For instance, we discovered that SOTA
methods diverge notably from compute-optimal setups, using excess compute for
their level of robustness. Leveraging a compute-efficient setup, we surpass the
prior SOTA with $20$% ($70$%) fewer training (inference) FLOPs. We trained
various compute-efficient models, with our best achieving $74$% AutoAttack
accuracy ($+3$% gain). However, our scaling laws also predict robustness slowly
grows then plateaus at $90$%: dwarfing our new SOTA by scaling is impractical,
and perfect robustness is impossible. To better understand this predicted
limit, we carry out a small-scale human evaluation on the AutoAttack data that
fools our top-performing model. Concerningly, we estimate that human
performance also plateaus near $90$%, which we show to be attributable to
$\ell_{\infty}$-constrained attacks’ generation of invalid images not
consistent with their original labels. Having characterized limiting
roadblocks, we outline promising paths for future research.
[LINK]
http://arxiv.org/abs/2404.09349v1
[DATE]
2024-04-15 04:14:38+08:00
[CATEGORIES]
cs.LG
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback
[AUTHORS]
Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi
[ABSTRACT]
The effectiveness of reinforcement learning (RL) agents in continuous control
robotics tasks is heavily dependent on the design of the underlying reward
function. However, a misalignment between the reward function and user
intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions
from human preferences; however, they inadvertently introduce a risk of reward
overoptimization. In this work, we address this challenge by advocating for the
adoption of regularized reward functions that more accurately mirror the
intended behaviors. We propose a novel concept of reward regularization within
the robotic RLHF (RL from Human Feedback) framework, which we refer to as
\emph{agent preferences}. Our approach uniquely incorporates not just human
feedback in the form of preferences but also considers the preferences of the
RL agent itself during the reward function learning process. This dual
consideration significantly mitigates the issue of reward function
overoptimization in RL. We provide a theoretical justification for the proposed
approach by formulating the robotic RLHF problem as a bilevel optimization
problem. We demonstrate the efficiency of our algorithm {\ours} in several
continuous control benchmarks including DeepMind Control Suite
\cite{tassa2018deepmind} and MetaWorld \cite{yu2021metaworld} and high
dimensional visual environments, with an improvement of more than 70\% in
sample efficiency in comparison to current SOTA baselines. This showcases our
approach’s effectiveness in aligning reward functions with true behavioral
intentions, setting a new benchmark in the field.
[LINK]
http://arxiv.org/abs/2312.14436v2
[DATE]
2024-04-15 04:07:19+08:00
[CATEGORIES]
cs.LG
Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling
[AUTHORS]
Mert Gürbüzbalaban, Yuanhan Hu, Lingjiong Zhu
[ABSTRACT]
We consider the constrained sampling problem where the goal is to sample from
a target distribution $\pi(x)\propto e^{-f(x)}$ when $x$ is constrained to lie
on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous
optimization, we propose penalized Langevin Dynamics (PLD) and penalized
underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained
sampling problem into an unconstrained sampling problem by introducing a
penalty function for constraint violations. When $f$ is smooth and gradients
are available, we get $\tilde{\mathcal{O}}(d/\varepsilon^{10})$ iteration
complexity for PLD to sample the target up to an $\varepsilon$-error where the
error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides
logarithmic factors. For PULMC, we improve the result to
$\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon^{7})$ when the Hessian of $f$ is
Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our
knowledge, these are the first convergence results for underdamped Langevin
Monte Carlo methods in the constrained sampling that handle non-convex $f$ and
provide guarantees with the best dimension dependency among existing methods
with deterministic gradient. If unbiased stochastic estimates of the gradient
of $f$ are available, we propose PSGLD and PSGULMC methods that can handle
stochastic gradients and are scaleable to large datasets without requiring
Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when $f$ is
strongly convex and smooth, we obtain $\tilde{\mathcal{O}}(d/\varepsilon^{18})$
and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon^{39})$ iteration complexity in
W2 distance. When $f$ is smooth and can be non-convex, we provide finite-time
performance bounds and iteration complexity results. Finally, we illustrate the
performance on Bayesian LASSO regression and Bayesian constrained deep learning
problems.
[LINK]
http://arxiv.org/abs/2212.00570v2
[DATE]
2024-04-15 03:57:21+08:00
[CATEGORIES]
cs.LG
SNN4Agents: A Framework for Developing Energy-Efficient Embodied Spiking Neural Networks for Autonomous Agents
[AUTHORS]
Rachmad Vidya Wicaksana Putra, Alberto Marchisio, Muhammad Shafique
[ABSTRACT]
Recent trends have shown that autonomous agents, such as Autonomous Ground
Vehicles (AGVs), Unmanned Aerial Vehicles (UAVs), and mobile robots,
effectively improve human productivity in solving diverse tasks. However, since
these agents are typically powered by portable batteries, they require
extremely low power/energy consumption to operate in a long lifespan. To solve
this challenge, neuromorphic computing has emerged as a promising solution,
where bio-inspired Spiking Neural Networks (SNNs) use spikes from event-based
cameras or data conversion pre-processing to perform sparse computations
efficiently. However, the studies of SNN deployments for autonomous agents are
still at an early stage. Hence, the optimization stages for enabling efficient
embodied SNN deployments for autonomous agents have not been defined
systematically. Toward this, we propose a novel framework called SNN4Agents
that consists of a set of optimization techniques for designing
energy-efficient embodied SNNs targeting autonomous agent applications. Our
SNN4Agents employs weight quantization, timestep reduction, and attention
window reduction to jointly improve the energy efficiency, reduce the memory
footprint, optimize the processing latency, while maintaining high accuracy. In
the evaluation, we investigate use cases of event-based car recognition, and
explore the trade-offs among accuracy, latency, memory, and energy consumption.
The experimental results show that our proposed framework can maintain high
accuracy (i.e., 84.12% accuracy) with 68.75% memory saving, 3.58x speed-up, and
4.03x energy efficiency improvement as compared to the state-of-the-art work
for NCARS dataset, thereby enabling energy-efficient embodied SNN deployments
for autonomous agents.
[COMMENTS]
18 pages, 15 figures
[LINK]
http://arxiv.org/abs/2404.09331v1
[DATE]
2024-04-15 03:06:00+08:00
[CATEGORIES]
cs.LG
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
[AUTHORS]
Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo, Tor Johansen, Andreea Iuliana Ionescu, Radu Tudor Ionescu
[ABSTRACT]
Few-shot knowledge distillation recently emerged as a viable approach to
harness the knowledge of large-scale pre-trained models, using limited data and
computational resources. In this paper, we propose a novel few-shot feature
distillation approach for vision transformers. Our approach is based on two key
steps. Leveraging the fact that vision transformers have a consistent
depth-wise structure, we first copy the weights from intermittent layers of
existing pre-trained vision transformers (teachers) into shallower
architectures (students), where the intermittence factor controls the
complexity of the student transformer with respect to its teacher. Next, we
employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge
into the student in a few-shot scenario, aiming to recover the information
processing carried out by the skipped teacher layers. We present comprehensive
experiments with supervised and self-supervised transformers as teachers, on
five data sets from various domains, including natural, medical and satellite
images. The empirical results confirm the superiority of our approach over
competitive baselines. Moreover, the ablation results demonstrate the
usefulness of each component of the proposed pipeline.
[LINK]
http://arxiv.org/abs/2404.09326v1
[DATE]
2024-04-15 02:57:38+08:00
[CATEGORIES]
cs.LG
Exponential concentration in quantum kernel methods
[AUTHORS]
Supanut Thanasilp, Samson Wang, M. Cerezo, Zoë Holmes
[ABSTRACT]
Kernel methods in Quantum Machine Learning (QML) have recently gained
significant attention as a potential candidate for achieving a quantum
advantage in data analysis. Among other attractive properties, when training a
kernel-based model one is guaranteed to find the optimal model’s parameters due
to the convexity of the training landscape. However, this is based on the
assumption that the quantum kernel can be efficiently obtained from quantum
hardware. In this work we study the performance of quantum kernel models from
the perspective of the resources needed to accurately estimate kernel values.
We show that, under certain conditions, values of quantum kernels over
different input data can be exponentially concentrated (in the number of
qubits) towards some fixed value. Thus on training with a polynomial number of
measurements, one ends up with a trivial model where the predictions on unseen
inputs are independent of the input data. We identify four sources that can
lead to concentration including: expressivity of data embedding, global
measurements, entanglement and noise. For each source, an associated
concentration bound of quantum kernels is analytically derived. Lastly, we show
that when dealing with classical data, training a parametrized data embedding
with a kernel alignment method is also susceptible to exponential
concentration. Our results are verified through numerical simulations for
several QML tasks. Altogether, we provide guidelines indicating that certain
features should be avoided to ensure the efficient evaluation of quantum
kernels and so the performance of quantum kernel methods.
[COMMENTS]
15+50 pages, 15 figures
[LINK]
http://arxiv.org/abs/2208.11060v2
[DATE]
2024-04-15 02:47:09+08:00
[CATEGORIES]
cs.LG
Gradient Estimation with Discrete Stein Operators
[AUTHORS]
Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey
[ABSTRACT]
Gradient estimation – approximating the gradient of an expectation with
respect to the parameters of a distribution – is central to the solution of
many machine learning problems. However, when the distribution is discrete,
most common gradient estimators suffer from excessive variance. To improve the
quality of gradient estimation, we introduce a variance reduction technique
based on Stein operators for discrete distributions. We then use this technique
to build flexible control variates for the REINFORCE leave-one-out estimator.
Our control variates can be adapted online to minimize variance and do not
require extra evaluations of the target function. In benchmark generative
modeling tasks such as training binary variational autoencoders, our gradient
estimator achieves substantially lower variance than state-of-the-art
estimators with the same number of function evaluations.
[COMMENTS]
NeurIPS 2022. Source code: https://github.com/thjashin/rodeo
[LINK]
http://arxiv.org/abs/2202.09497v8
[DATE]
2024-04-15 01:08:45+08:00
[CATEGORIES]
cs.LG
Language Models for Text Classification: Is In-Context Learning Enough?
[AUTHORS]
Aleksandra Edwards, Jose Camacho-Collados
[ABSTRACT]
Recent foundational language models have shown state-of-the-art performance
in many NLP tasks in zero- and few-shot settings. An advantage of these models
over more standard approaches based on fine-tuning is the ability to understand
instructions written in natural language (prompts), which helps them generalise
better to different tasks and domains without the need for specific training
data. This makes them suitable for addressing text classification problems for
domains with limited amounts of annotated instances. However, existing research
is limited in scale and lacks understanding of how text generation models
combined with prompting techniques compare to more established methods for text
classification such as fine-tuning masked language models. In this paper, we
address this research gap by performing a large-scale evaluation study for 16
text classification datasets covering binary, multiclass, and multilabel
problems. In particular, we compare zero- and few-shot approaches of large
language models to fine-tuning smaller language models. We also analyse the
results by prompt, classification type, domain, and number of labels. In
general, the results show how fine-tuning smaller and more efficient language
models can still outperform few-shot approaches of larger language models,
which have room for improvement when it comes to text classification.
[COMMENTS]
Accepted at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2403.17661v2
[DATE]
2024-04-14 23:45:53+08:00
[CATEGORIES]
cs.CL
TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning
[AUTHORS]
Quang Minh Dinh, Minh Khoi Ho, Anh Quan Dang, Hung Phong Tran
[ABSTRACT]
Traffic video description and analysis have received much attention recently
due to the growing demand for efficient and reliable urban surveillance
systems. Most existing methods only focus on locating traffic event segments,
which severely lack descriptive details related to the behaviour and context of
all the subjects of interest in the events. In this paper, we present
TrafficVLM, a novel multi-modal dense video captioning model for vehicle ego
camera view. TrafficVLM models traffic video events at different levels of
analysis, both spatially and temporally, and generates long fine-grained
descriptions for the vehicle and pedestrian at different phases of the event.
We also propose a conditional component for TrafficVLM to control the
generation outputs and a multi-task fine-tuning paradigm to enhance
TrafficVLM’s learning capability. Experiments show that TrafficVLM performs
well on both vehicle and overhead camera views. Our solution achieved
outstanding results in Track 2 of the AI City Challenge 2024, ranking us third
in the challenge standings. Our code is publicly available at
https://github.com/quangminhdinh/TrafficVLM.
[LINK]
http://arxiv.org/abs/2404.09275v1
[DATE]
2024-04-14 22:51:44+08:00
[CATEGORIES]
cs.CL
cs.LG
Test Code Generation for Telecom Software Systems using Two-Stage Generative Model
[AUTHORS]
Mohamad Nabeel, Doumitrou Daniil Nimara, Tahar Zanouda
[ABSTRACT]
In recent years, the evolution of Telecom towards achieving intelligent,
autonomous, and open networks has led to an increasingly complex Telecom
Software system, supporting various heterogeneous deployment scenarios, with
multi-standard and multi-vendor support. As a result, it becomes a challenge
for large-scale Telecom software companies to develop and test software for all
deployment scenarios. To address these challenges, we propose a framework for
Automated Test Generation for large-scale Telecom Software systems. We begin by
generating Test Case Input data for test scenarios observed using a time-series
Generative model trained on historical Telecom Network data during field
trials. Additionally, the time-series Generative model helps in preserving the
privacy of Telecom data. The generated time-series software performance data
are then utilized with test descriptions written in natural language to
generate Test Script using the Generative Large Language Model. Our
comprehensive experiments on public datasets and Telecom datasets obtained from
operational Telecom Networks demonstrate that the framework can effectively
generate comprehensive test case data input and useful test code.
[COMMENTS]
6 pages, 5 figures, Accepted at 1st Workshop on The Impact of Large
Language Models on 6G Networks - IEEE International Conference on
Communications (ICC) 2024
[LINK]
http://arxiv.org/abs/2404.09249v1
[DATE]
2024-04-14 21:25:15+08:00
[CATEGORIES]
cs.CL
cs.LG
Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
[AUTHORS]
Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu
[ABSTRACT]
Reinforcement learning (RL) trains agents to accomplish complex tasks through
environmental interaction data, but its capacity is also limited by the scope
of the available data. To obtain a knowledgeable agent, a promising approach is
to leverage the knowledge from large language models (LLMs). Despite previous
studies combining LLMs with RL, seamless integration of the two components
remains challenging due to their semantic gap. This paper introduces a novel
method, Knowledgeable Agents from Language Model Rollouts (KALM), which
extracts knowledge from LLMs in the form of imaginary rollouts that can be
easily learned by the agent through offline reinforcement learning methods. The
primary challenge of KALM lies in LLM grounding, as LLMs are inherently limited
to textual data, whereas environmental data often comprise numerical vectors
unseen to LLMs. To address this, KALM fine-tunes the LLM to perform various
tasks based on environmental data, including bidirectional translation between
natural language descriptions of skills and their corresponding rollout data.
This grounding process enhances the LLM’s comprehension of environmental
dynamics, enabling it to generate diverse and meaningful imaginary rollouts
that reflect novel skills. Initial empirical evaluations on the CLEVR-Robot
environment demonstrate that KALM enables agents to complete complex
rephrasings of task goals and extend their capabilities to novel tasks
requiring unprecedented optimal behaviors. KALM achieves a success rate of 46%
in executing tasks with unseen goals, substantially surpassing the 26% success
rate achieved by baseline methods. Furthermore, KALM effectively enables the
LLM to comprehend environmental dynamics, resulting in the generation of
meaningful imaginary rollouts that reflect novel skills and demonstrate the
seamless integration of large language models and reinforcement learning.
[LINK]
http://arxiv.org/abs/2404.09248v1
[DATE]
2024-04-14 21:19:40+08:00
[CATEGORIES]
cs.LG
cs.CL
EE-TTS: Emphatic Expressive TTS with Linguistic Information
[AUTHORS]
Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun
[ABSTRACT]
While Current TTS systems perform well in synthesizing high-quality speech,
producing highly expressive speech remains a challenge. Emphasis, as a critical
factor in determining the expressiveness of speech, has attracted more
attention nowadays. Previous works usually enhance the emphasis by adding
intermediate features, but they can not guarantee the overall expressiveness of
the speech. To resolve this matter, we propose Emphatic Expressive TTS
(EE-TTS), which leverages multi-level linguistic information from syntax and
semantics. EE-TTS contains an emphasis predictor that can identify appropriate
emphasis positions from text and a conditioned acoustic model to synthesize
expressive speech with emphasis and linguistic information. Experimental
results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49
and 0.67 in expressiveness and naturalness. EE-TTS also shows strong
generalization across different datasets according to AB test results.
[COMMENTS]
Accepted by Interspeech 2023, fix some typos
[LINK]
http://arxiv.org/abs/2305.12107v2
[DATE]
2024-04-14 20:33:07+08:00
[CATEGORIES]
cs.CL
Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts
[AUTHORS]
Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton
[ABSTRACT]
Despite the remarkable strides made by autoregressive language models, their
potential is often hampered by the slow inference speeds inherent in sequential
token generation. Blockwise parallel decoding (BPD) was proposed by Stern et
al. (2018) as a way to improve inference speed of language models. In this
paper, we make two contributions to understanding and improving BPD drafts. We
first offer an analysis of the token distributions produced by the BPD
prediction heads. Secondly, we use this analysis to inform algorithms to
improve BPD inference speed by refining the BPD drafts using small n-gram or
neural language models. We empirically show that these refined BPD drafts yield
a higher average verified prefix length across tasks.
[LINK]
http://arxiv.org/abs/2404.09221v1
[DATE]
2024-04-14 19:49:38+08:00
[CATEGORIES]
cs.CL
cs.LG
TransformerFAM: Feedback attention is working memory
[AUTHORS]
Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar
[ABSTRACT]
While Transformers have revolutionized deep learning, their quadratic
attention complexity hinders their ability to process infinitely long inputs.
We propose Feedback Attention Memory (FAM), a novel Transformer architecture
that leverages a feedback loop to enable the network to attend to its own
latent representations. This design fosters the emergence of working memory
within the Transformer, allowing it to process indefinitely long sequences.
TransformerFAM requires no additional weights, enabling seamless integration
with pre-trained models. Our experiments show that TransformerFAM significantly
improves Transformer performance on long-context tasks across various model
sizes (1B, 8B, and 24B). These results showcase the potential to empower Large
Language Models (LLMs) to process sequences of unlimited length.
[COMMENTS]
24 pages, 12 figures, 14 tables
[LINK]
http://arxiv.org/abs/2404.09173v1
[DATE]
2024-04-14 15:43:45+08:00
[CATEGORIES]
cs.LG
cs.CL
GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning
[AUTHORS]
Amani Namboori, Shivam Mangale, Andy Rosenbaum, Saleh Soltan
[ABSTRACT]
The emergence of Large Language Models (LLMs) with capabilities like
In-Context Learning (ICL) has ushered in new possibilities for data generation
across various domains while minimizing the need for extensive data collection
and modeling techniques. Researchers have explored ways to use this generated
synthetic data to optimize smaller student models for reduced deployment costs
and lower latency in downstream tasks. However, ICL-generated data often
suffers from low quality as the task specificity is limited with few examples
used in ICL. In this paper, we propose GeMQuAD - a semi-supervised learning
approach, extending the WeakDAP framework, applied to a dataset generated
through ICL with just one example in the target language using AlexaTM 20B
Seq2Seq LLM. Through our approach, we iteratively identify high-quality data to
enhance model performance, especially for low-resource multilingual setting in
the context of Extractive Question Answering task. Our framework outperforms
the machine translation-augmented model by 0.22/1.68 F1/EM (Exact Match) points
for Hindi and 0.82/1.37 F1/EM points for Spanish on the MLQA dataset, and it
surpasses the performance of model trained on an English-only dataset by
5.05/6.50 F1/EM points for Hindi and 3.81/3.69 points F1/EM for Spanish on the
same dataset. Notably, our approach uses a pre-trained LLM for generation with
no fine-tuning (FT), utilizing just a single annotated example in ICL to
generate data, providing a cost-effective development process.
[COMMENTS]
Accepted to The 37th International Conference on Neural Information
Processing Systems (NeurIPS 2023)December 10-16, 2023 - SyntheticData4ML
workshop, New Orleans, United States https://neurips.cc/Conferences/2023
[LINK]
http://arxiv.org/abs/2404.09163v1
[DATE]
2024-04-14 14:55:42+08:00
[CATEGORIES]
cs.CL
Making Large Language Models Perform Better in Knowledge Graph Completion
[AUTHORS]
Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Wen Zhang, Huajun Chen
[ABSTRACT]
Large language model (LLM) based knowledge graph completion (KGC) aims to
predict the missing triples in the KGs with LLMs. However, research about
LLM-based KGC fails to sufficiently harness LLMs’ inference proficiencies,
overlooking critical structural information integral to KGs. In this paper, we
explore methods to incorporate structural information into the LLMs, with the
overarching goal of facilitating structure-aware reasoning. We first discuss on
the existing LLM paradigms like in-context learning and instruction tuning,
proposing basic structural information injection approaches. Then we propose a
Knowledge Prefix Adapter (KoPA) to fulfill this stated goal. The KoPA uses a
structural pre-training phase to comprehend the intricate entities and
relations within KGs, representing them as structural embeddings. Then KoPA
communicates such cross-modal structural information understanding to the LLMs
through a knowledge prefix adapter which projects the structural embeddings
into the textual space and obtains virtual knowledge tokens positioned as a
prefix of the input prompt. We conduct comprehensive experiments and provide
incisive analysis concerning how the introduction of cross-modal structural
information would be better for LLM’s factual knowledge reasoning ability. Our
code and data are available at https://github.com/zjukg/KoPA .
[COMMENTS]
Working in progress
[LINK]
http://arxiv.org/abs/2310.06671v2
[DATE]
2024-04-14 13:30:19+08:00
[CATEGORIES]
cs.CL
The Curse of Recursion: Training on Generated Data Makes Models Forget
[AUTHORS]
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson
[ABSTRACT]
Stable Diffusion revolutionised image creation from descriptive text. GPT-2,
GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of
language tasks. ChatGPT introduced such language models to the general public.
It is now clear that large language models (LLMs) are here to stay, and will
bring about drastic change in the whole ecosystem of online text and images. In
this paper we consider what the future might hold. What will happen to GPT-{n}
once LLMs contribute much of the language found online? We find that use of
model-generated content in training causes irreversible defects in the
resulting models, where tails of the original content distribution disappear.
We refer to this effect as Model Collapse and show that it can occur in
Variational Autoencoders, Gaussian Mixture Models and LLMs. We build
theoretical intuition behind the phenomenon and portray its ubiquity amongst
all learned generative models. We demonstrate that it has to be taken seriously
if we are to sustain the benefits of training from large-scale data scraped
from the web. Indeed, the value of data collected about genuine human
interactions with systems will be increasingly valuable in the presence of
content generated by LLMs in data crawled from the Internet.
[COMMENTS]
Fixed typos in eqn 4,5
[LINK]
http://arxiv.org/abs/2305.17493v3
[DATE]
2024-04-14 13:20:10+08:00
[CATEGORIES]
cs.LG
cs.CL
ToNER: Type-oriented Named Entity Recognition with Generative Language Model
[AUTHORS]
Guochao Jiang, Ziqin Luo, Yuchen Shi, Dixuan Wang, Jiaqing Liang, Deqing Yang
[COMMENTS]
Accepted at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.09145v1
[DATE]
2024-04-14 13:13:37+08:00
[CATEGORIES]
cs.CL
In-Context Learning through the Bayesian Prism
[AUTHORS]
Madhur Panwar, Kabir Ahuja, Navin Goyal
[ABSTRACT]
In-context learning (ICL) is one of the surprising and useful features of
large language models and subject of intense research. Recently, stylized
meta-learning-like ICL setups have been devised that train transformers on
sequences of input-output pairs $(x, f(x))$. The function $f$ comes from a
function class and generalization is checked by evaluating on sequences
generated from unseen functions from the same class. One of the main
discoveries in this line of research has been that for several function
classes, such as linear regression, transformers successfully generalize to new
functions in the class. However, the inductive biases of these models resulting
in this behavior are not clearly understood. A model with unlimited training
data and compute is a Bayesian predictor: it learns the pretraining
distribution. In this paper we empirically examine how far this Bayesian
perspective can help us understand ICL. To this end, we generalize the previous
meta-ICL setup to hierarchical meta-ICL setup which involve unions of multiple
task families. We instantiate this setup on a diverse range of linear and
nonlinear function families and find that transformers can do ICL in this
setting as well. Where Bayesian inference is tractable, we find evidence that
high-capacity transformers mimic the Bayesian predictor. The Bayesian
perspective provides insights into the inductive bias of ICL and how
transformers perform a particular task when they are trained on multiple tasks.
We also find that transformers can learn to generalize to new function classes
that were not seen during pretraining. This involves deviation from the
Bayesian predictor. We examine these deviations in more depth offering new
insights and hypotheses.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2306.04891v2
[DATE]
2024-04-14 13:12:52+08:00
[CATEGORIES]
cs.LG
cs.CL
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
[AUTHORS]
Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo
[COMMENTS]
ICLR 2024 Spotlight
[LINK]
http://arxiv.org/abs/2307.10928v4
[DATE]
2024-04-14 12:29:51+08:00
[CATEGORIES]
cs.CL
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation
[AUTHORS]
Artur Kiulian, Anton Polishko, Mykola Khandoga, Oryna Chubych, Jack Connor, Raghav Ravishankar, Adarsh Shirawalmath
[ABSTRACT]
In the rapidly advancing field of AI and NLP, generative large language
models (LLMs) stand at the forefront of innovation, showcasing unparalleled
abilities in text understanding and generation. However, the limited
representation of low-resource languages like Ukrainian poses a notable
challenge, restricting the reach and relevance of this technology. Our paper
addresses this by fine-tuning the open-source Gemma and Mistral LLMs with
Ukrainian datasets, aiming to improve their linguistic proficiency and
benchmarking them against other existing models capable of processing Ukrainian
language. This endeavor not only aims to mitigate language bias in technology
but also promotes inclusivity in the digital realm. Our transparent and
reproducible approach encourages further NLP research and development.
Additionally, we present the Ukrainian Knowledge and Instruction Dataset (UKID)
to aid future efforts in language model fine-tuning. Our research not only
advances the field of NLP but also highlights the importance of linguistic
diversity in AI, which is crucial for cultural preservation, education, and
expanding AI’s global utility. Ultimately, we advocate for a future where
technology is inclusive, enabling AI to communicate effectively across all
languages, especially those currently underrepresented.
[LINK]
http://arxiv.org/abs/2404.09138v1
[DATE]
2024-04-14 12:25:41+08:00
[CATEGORIES]
cs.CL
cs.LG
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
[AUTHORS]
Taojun Hu, Xiao-Hua Zhou
[ABSTRACT]
Natural Language Processing (NLP) is witnessing a remarkable breakthrough
driven by the success of Large Language Models (LLMs). LLMs have gained
significant attention across academia and industry for their versatile
applications in text generation, question answering, and text summarization. As
the landscape of NLP evolves with an increasing number of domain-specific LLMs
employing diverse techniques and trained on various corpus, evaluating
performance of these models becomes paramount. To quantify the performance,
it’s crucial to have a comprehensive grasp of existing metrics. Among the
evaluation, metrics which quantifying the performance of LLMs play a pivotal
role. This paper offers a comprehensive exploration of LLM evaluation from a
metrics perspective, providing insights into the selection and interpretation
of metrics currently in use. Our main goal is to elucidate their mathematical
formulations and statistical interpretations. We shed light on the application
of these metrics using recent Biomedical LLMs. Additionally, we offer a
succinct comparison of these metrics, aiding researchers in selecting
appropriate metrics for diverse tasks. The overarching goal is to furnish
researchers with a pragmatic guide for effective LLM evaluation and metric
selection, thereby advancing the understanding and application of these large
language models.
[LINK]
http://arxiv.org/abs/2404.09135v1
[DATE]
2024-04-14 11:54:00+08:00
[CATEGORIES]
cs.CL
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
[AUTHORS]
Yanhong Li, Chenghao Yang, Allyson Ettinger
[COMMENTS]
NAACL 2024 Findings paper (Camera-Ready Version)
[LINK]
http://arxiv.org/abs/2404.09129v1
[DATE]
2024-04-14 10:47:32+08:00
[CATEGORIES]
cs.CL
Provable Interactive Learning with Hindsight Instruction Feedback
[AUTHORS]
Dipendra Misra, Aldo Pacchiano, Robert E. Schapire
[ABSTRACT]
We study interactive learning in a setting where the agent has to generate a
response (e.g., an action or trajectory) given a context and an instruction. In
contrast, to typical approaches that train the system using reward or expert
supervision on response, we study learning with hindsight instruction where a
teacher provides an instruction that is most suitable for the agent’s generated
response. This hindsight labeling of instruction is often easier to provide
than providing expert supervision of the optimal response which may require
expert knowledge or can be impractical to elicit. We initiate the theoretical
analysis of interactive learning with hindsight labeling. We first provide a
lower bound showing that in general, the regret of any algorithm must scale
with the size of the agent’s response space. We then study a specialized
setting where the underlying instruction-response distribution can be
decomposed as a low-rank matrix. We introduce an algorithm called LORIL for
this setting and show that its regret scales as $\sqrt{T}$ where $T$ is the
number of rounds and depends on the intrinsic rank but does not depend on the
size of the agent’s response space. We provide experiments in two domains
showing that LORIL outperforms baselines even when the low-rank assumption is
violated.
[LINK]
http://arxiv.org/abs/2404.09123v1
[DATE]
2024-04-14 10:18:07+08:00
[CATEGORIES]
cs.LG
cs.CL
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
[AUTHORS]
Suhas Kotha, Jacob Mitchell Springer, Aditi Raghunathan
[ABSTRACT]
We lack a systematic understanding of the effects of fine-tuning (via methods
such as instruction-tuning or reinforcement learning from human feedback),
particularly on tasks outside the narrow fine-tuning distribution. In a
simplified scenario, we demonstrate that improving performance on tasks within
the fine-tuning data distribution comes at the expense of capabilities on other
tasks. We hypothesize that language models implicitly infer the task of the
prompt and that fine-tuning skews this inference towards tasks in the
fine-tuning distribution. To test this, we propose Conjugate Prompting, which
artificially makes the task look farther from the fine-tuning distribution
while requiring the same capability, and we find that this recovers some of the
pretraining capabilities in our synthetic setup. Since real-world fine-tuning
distributions are predominantly English, we apply conjugate prompting to
recover pretrained capabilities in LLMs by simply translating the prompts to
different languages. This allows us to recover in-context learning abilities
lost via instruction tuning, natural reasoning capability lost during code
fine-tuning, and, more concerningly, harmful content generation suppressed by
safety fine-tuning in chatbots like ChatGPT.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2309.10105v2
[DATE]
2024-04-14 09:15:31+08:00
[CATEGORIES]
cs.CL
cs.LG
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
[AUTHORS]
Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer
[COMMENTS]
ICLR 2024 camera-ready (updated figure1 and table2; corrected minor
details in the explanation of hierarchical merging)
[LINK]
http://arxiv.org/abs/2310.00785v4
[DATE]
2024-04-14 06:02:23+08:00
[CATEGORIES]
cs.CL
cs.LG
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
[AUTHORS]
Shuaijie She, Wei Zou, Shujian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen
[ABSTRACT]
Though reasoning abilities are considered language-agnostic, existing LLMs
exhibit inconsistent reasoning abilities across different languages, e.g.,
reasoning in the dominant language like English is superior to other languages
due to the imbalance of multilingual training data. To enhance reasoning
abilities in non-dominant languages, we propose a
Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to
align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency
between answers in non-dominant and dominant languages, which we adopt as the
preference for optimization, e.g., Direct Preference Optimization (DPO) or
Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves
significant improvements in the multilingual reasoning of various models on all
three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with
improved reasoning consistency across languages.
[COMMENTS]
The project is available at https://github.com/NJUNLP/MAPO
[LINK]
http://arxiv.org/abs/2401.06838v3
[DATE]
2024-04-14 02:27:04+08:00
[CATEGORIES]
cs.CL
Multilingual Evaluation of Semantic Textual Relatedness
[AUTHORS]
Sharvi Endait, Srushti Sonavane, Ridhima Sinare, Pritika Rohera, Advait Naik, Dipali Kadam
[ABSTRACT]
The explosive growth of online content demands robust Natural Language
Processing (NLP) techniques that can capture nuanced meanings and cultural
context across diverse languages. Semantic Textual Relatedness (STR) goes
beyond superficial word overlap, considering linguistic elements and
non-linguistic factors like topic, sentiment, and perspective. Despite its
pivotal role, prior NLP research has predominantly focused on English, limiting
its applicability across languages. Addressing this gap, our paper dives into
capturing deeper connections between sentences beyond simple word overlap.
Going beyond English-centric NLP research, we explore STR in Marathi, Hindi,
Spanish, and English, unlocking the potential for information retrieval,
machine translation, and more. Leveraging the SemEval-2024 shared task, we
explore various language models across three learning paradigms: supervised,
unsupervised, and cross-lingual. Our comprehensive methodology gains promising
results, demonstrating the effectiveness of our approach. This work aims to not
only showcase our achievements but also inspire further research in
multilingual STR, particularly for low-resourced languages.
[COMMENTS]
8 pages
[LINK]
http://arxiv.org/abs/2404.09047v1
[DATE]
2024-04-14 01:16:03+08:00
[CATEGORIES]
cs.CL
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering
[AUTHORS]
Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth
[ABSTRACT]
Recent advances in multimodal large language models (LLMs) have shown extreme
effectiveness in visual question answering (VQA). However, the design nature of
these end-to-end models prevents them from being interpretable to humans,
undermining trust and applicability in critical domains. While post-hoc
rationales offer certain insight into understanding model behavior, these
explanations are not guaranteed to be faithful to the model. In this paper, we
address these shortcomings by introducing an interpretable by design model that
factors model decisions into intermediate human-legible explanations, and
allows people to easily understand why a model fails or succeeds. We propose
the Dynamic Clue Bottleneck Model ( (DCLUB), a method that is designed towards
an inherently interpretable VQA system. DCLUB provides an explainable
intermediate space before the VQA decision and is faithful from the beginning,
while maintaining comparable performance to black-box systems. Given a
question, DCLUB first returns a set of visual clues: natural language
statements of visually salient evidence from the image, and then generates the
output based solely on the visual clues. To supervise and evaluate the
generation of VQA explanations within DCLUB, we collect a dataset of 1.7k
reasoning-focused questions with visual clues. Evaluations show that our
inherently interpretable system can improve 4.64% over a comparable black-box
system in reasoning-focused questions while preserving 99.43% of performance on
VQA-v2.
[COMMENTS]
Multimodal, Visual Question Answering, Vision and Language
[LINK]
http://arxiv.org/abs/2305.14882v2
[DATE]
2024-04-14 01:13:55+08:00
[CATEGORIES]
cs.CL
When are Lemons Purple? The Concept Association Bias of Vision-Language Models
[AUTHORS]
Yutaro Yamada, Yingtian Tang, Yoyo Zhang, Ilker Yildirim
[ABSTRACT]
Large-scale vision-language models such as CLIP have shown impressive
performance on zero-shot image classification and image-to-text retrieval.
However, such performance does not realize in tasks that require a
finer-grained correspondence between vision and language, such as Visual
Question Answering (VQA). As a potential cause of the difficulty of applying
these models to VQA and similar tasks, we report an interesting phenomenon of
vision-language models, which we call the Concept Association Bias (CAB). We
find that models with CAB tend to treat input as a bag of concepts and attempt
to fill in the other missing concept crossmodally, leading to an unexpected
zero-shot prediction. We demonstrate CAB by showing that CLIP’s zero-shot
classification performance greatly suffers when there is a strong concept
association between an object (e.g. eggplant) and an attribute (e.g. color
purple). We also show that the strength of CAB predicts the performance on VQA.
We observe that CAB is prevalent in vision-language models trained with
contrastive losses, even when autoregressive losses are jointly employed.
However, a model that solely relies on autoregressive loss seems to exhibit
minimal or no signs of CAB.
[COMMENTS]
EMNLP 2023 main
[LINK]
http://arxiv.org/abs/2212.12043v2
[DATE]
2024-04-14 01:02:25+08:00
[CATEGORIES]
cs.CL
cs.LG
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
[AUTHORS]
Jia Gu, Liang Pang, Huawei Shen, Xueqi Cheng
[ABSTRACT]
With the rapid advancement of large language models (LLMs) and their
remarkable capabilities in handling complex language tasks, an increasing
number of studies are employing LLMs as agents to emulate the sequential
decision-making processes of humans often represented as Markov decision-making
processes (MDPs). The actions within this decision-making framework adhere to
specific probability distributions and require iterative sampling. This arouses
our curiosity regarding the capacity of LLM agents to comprehend probability
distributions, thereby guiding the agent’s behavioral decision-making through
probabilistic sampling and generating behavioral sequences. To answer the above
question, we divide the problem into two main aspects: simulation where the
exact probability distribution is known, and generation of sequences where the
probability distribution is ambiguous. In the first case, the agent is required
to give the type and parameters of the probability distribution through the
problem description, and then give the sampling sequence. However, our analysis
shows that LLM agents perform poorly in this case, but the sampling success
rate can be improved through programming tools. Real-world scenarios often
entail unknown probability distributions. Thus, in the second case, we ask the
agents to change the activity level in online social networks and analyze the
frequency of actions. Ultimately, our analysis shows that LLM agents cannot
sample probability distributions even using programming tools. Therefore,
careful consideration is still required before directly applying LLM agents as
agents to simulate human behavior.
[LINK]
http://arxiv.org/abs/2404.09043v1
[DATE]
2024-04-14 00:59:28+08:00
[CATEGORIES]
cs.CL
Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling
[AUTHORS]
Junzhe Jiang, Shang Qu, Mingyue Cheng, Qi Liu, Zhiding Liu, Hao Zhang, Rujiao Zhang, Kai Zhang, Rui Li, Jiatong Li, Min Gao
[ABSTRACT]
Recommender systems are indispensable in the realm of online applications,
and sequential recommendation has enjoyed considerable prevalence due to its
capacity to encapsulate the dynamic shifts in user interests. However, previous
sequential modeling methods still have limitations in capturing contextual
information. The primary reason is the lack of understanding of domain-specific
knowledge and item-related textual content. Fortunately, the emergence of
powerful language models has unlocked the potential to incorporate extensive
world knowledge into recommendation algorithms, enabling them to go beyond
simple item attributes and truly understand the world surrounding user
preferences. To achieve this, we propose LANCER, which leverages the semantic
understanding capabilities of pre-trained language models to generate
personalized recommendations. Our approach bridges the gap between language
models and recommender systems, resulting in more human-like recommendations.
We demonstrate the effectiveness of our approach through a series of
experiments conducted on multiple benchmark datasets, showing promising results
and providing valuable insights into the influence of our model on sequential
recommendation tasks. Furthermore, our experimental codes are publicly
available at https://github.com/Gnimixy/lancer.
[LINK]
http://arxiv.org/abs/2309.10435v4
[DATE]
2024-04-14 00:32:33+08:00
[CATEGORIES]
cs.CL
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
[AUTHORS]
Fanxu Meng, Zhaohui Wang, Muhan Zhang
[ABSTRACT]
As the parameters of LLMs expand, the computational cost of fine-tuning the
entire model becomes prohibitive. To address this challenge, we introduce a
PEFT method, Principal Singular values and Singular vectors Adaptation (PiSSA),
which optimizes a significantly reduced parameter space while achieving or
surpassing the performance of full-parameter fine-tuning. PiSSA is inspired by
Intrinsic SAID, which suggests that pre-trained, over-parametrized models
inhabit a space of low intrinsic dimension. Consequently, PiSSA represents a
matrix W within the model by the product of two trainable matrices A and B,
plus a residual matrix $W^{res}$ for error correction. SVD is employed to
factorize W, and the principal singular values and vectors of W are utilized to
initialize A and B. The residual singular values and vectors initialize the
residual matrix $W^{res}$, which keeps frozen during fine-tuning. Notably,
PiSSA shares the same architecture with LoRA. However, LoRA approximates Delta
W through the product of two matrices, A, initialized with Gaussian noise, and
B, initialized with zeros, while PiSSA initializes A and B with principal
singular values and vectors of the original matrix W. PiSSA can better
approximate the outcomes of full-parameter fine-tuning at the beginning by
changing the essential parts while freezing the “noisy” parts. In comparison,
LoRA freezes the original matrix and updates the “noise”. This distinction
enables PiSSA to convergence much faster than LoRA and also achieve better
performance in the end. Due to the same architecture, PiSSA inherits many of
LoRA’s advantages, such as parameter efficiency and compatibility with
quantization. Leveraging a fast SVD method, the initialization of PiSSA takes
only a few seconds, inducing negligible cost of switching LoRA to PiSSA.
[LINK]
http://arxiv.org/abs/2404.02948v2
[DATE]
2024-04-14 23:24:10+08:00
[CATEGORIES]
cs.LG
Model-based Offline Quantum Reinforcement Learning
[AUTHORS]
Simon Eisenmann, Daniel Hein, Steffen Udluft, Thomas A. Runkler
[ABSTRACT]
This paper presents the first algorithm for model-based offline quantum
reinforcement learning and demonstrates its functionality on the cart-pole
benchmark. The model and the policy to be optimized are each implemented as
variational quantum circuits. The model is trained by gradient descent to fit a
pre-recorded data set. The policy is optimized with a gradient-free
optimization scheme using the return estimate given by the model as the fitness
function. This model-based approach allows, in principle, full realization on a
quantum computer during the optimization phase and gives hope that a quantum
advantage can be achieved as soon as sufficiently powerful quantum computers
are available.
[LINK]
http://arxiv.org/abs/2404.10017v1
[DATE]
2024-04-14 23:11:27+08:00
[CATEGORIES]
cs.LG
A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit
[AUTHORS]
Priyank Agrawal, Theja Tulabandhula, Vashist Avadhanula
[ABSTRACT]
In this paper, we consider the contextual variant of the MNL-Bandit problem.
More specifically, we consider a dynamic set optimization problem, where a
decision-maker offers a subset (assortment) of products to a consumer and
observes the response in every round. Consumers purchase products to maximize
their utility. We assume that a set of attributes describe the products, and
the mean utility of a product is linear in the values of these attributes. We
model consumer choice behavior using the widely used Multinomial Logit (MNL)
model and consider the decision maker problem of dynamically learning the model
parameters while optimizing cumulative revenue over the selling horizon $T$.
Though this problem has attracted considerable attention in recent times, many
existing methods often involve solving an intractable non-convex optimization
problem. Their theoretical performance guarantees depend on a problem-dependent
parameter which could be prohibitively large. In particular, existing
algorithms for this problem have regret bounded by $O(\sqrt{\kappa d T})$,
where $\kappa$ is a problem-dependent constant that can have an exponential
dependency on the number of attributes. In this paper, we propose an optimistic
algorithm and show that the regret is bounded by $O(\sqrt{dT} + \kappa)$,
significantly improving the performance over existing methods. Further, we
propose a convex relaxation of the optimization step, which allows for
tractable decision-making while retaining the favourable regret guarantee.
[COMMENTS]
Bug fixed
[LINK]
http://arxiv.org/abs/2011.14033v7
[DATE]
2024-04-14 22:47:24+08:00
[CATEGORIES]
cs.LG
Foundational GPT Model for MEG
[AUTHORS]
Richard Csaky, Mats W. J. van Es, Oiwi Parker Jones, Mark Woolrich
[ABSTRACT]
Deep learning techniques can be used to first training unsupervised models on
large amounts of unlabelled data, before fine-tuning the models on specific
tasks. This approach has seen massive success for various kinds of data, e.g.
images, language, audio, and holds the promise of improving performance in
various downstream tasks (e.g. encoding or decoding brain data). However, there
has been limited progress taking this approach for modelling brain signals,
such as Magneto-/electroencephalography (M/EEG). Here we propose two classes of
deep learning foundational models that can be trained using forecasting of
unlabelled MEG. First, we consider a modified Wavenet; and second, we consider
a modified Transformer-based (GPT2) model. The modified GPT2 includes a novel
application of tokenisation and embedding methods, allowing a model developed
initially for the discrete domain of language to be applied to continuous
multichannel time series data. We also extend the forecasting framework to
include condition labels as inputs, enabling better modelling (encoding) of
task data. We compare the performance of these deep learning models with
standard linear autoregressive (AR) modelling on MEG data. This shows that
GPT2-based models provide better modelling capabilities than Wavenet and linear
AR models, by better reproducing the temporal, spatial and spectral
characteristics of real data and evoked activity in task data. We show how the
GPT2 model scales well to multiple subjects, while adapting its model to each
subject through subject embedding. Finally, we show how such a model can be
useful in downstream decoding tasks through data simulation. All code is
available on GitHub (https://github.com/ricsinaruto/MEG-transfer-decoding).
[COMMENTS]
Code available on GitHub
(https://github.com/ricsinaruto/MEG-transfer-decoding). Part of PhD thesis
(https://ricsinaruto.github.io/docs/thesis_final_appendix.pdf)
[LINK]
http://arxiv.org/abs/2404.09256v1
[DATE]
2024-04-14 21:48:24+08:00
[CATEGORIES]
cs.LG
LSROM: Learning Self-Refined Organizing Map for Fast Imbalanced Streaming Data Clustering
[AUTHORS]
Yongqi Xu, Yujian Lee, Rong Zou, Yiqun Zhang, Yiu-Ming Cheung
[ABSTRACT]
Streaming data clustering is a popular research topic in the fields of data
mining and machine learning. Compared to static data, streaming data, which is
usually analyzed in data chunks, is more susceptible to encountering the
dynamic cluster imbalanced issue. That is, the imbalanced degree of clusters
varies in different streaming data chunks, leading to corruption in either the
accuracy or the efficiency of streaming data analysis based on existing
clustering methods. Therefore, we propose an efficient approach called Learning
Self-Refined Organizing Map (LSROM) to handle the imbalanced streaming data
clustering problem, where we propose an advanced SOM for representing the
global data distribution. The constructed SOM is first refined for guiding the
partition of the dataset to form many micro-clusters to avoid the missing small
clusters in imbalanced data. Then an efficient merging of the micro-clusters is
conducted through quick retrieval based on the SOM, which can automatically
yield a true number of imbalanced clusters. In comparison to existing
imbalanced data clustering approaches, LSROM is with a lower time complexity
$O(n\log n)$, while achieving very competitive clustering accuracy. Moreover,
LSROM is interpretable and insensitive to hyper-parameters. Extensive
experiments have verified its efficacy.
[COMMENTS]
13 pages, 7 figures
[LINK]
http://arxiv.org/abs/2404.09243v1
[DATE]
2024-04-14 21:08:21+08:00
[CATEGORIES]
cs.LG
Interpretable Neural Networks with Random Constructive Algorithm
[AUTHORS]
Jing Nan, Wei Dai
[ABSTRACT]
This paper introduces an Interpretable Neural Network (INN) incorporating
spatial information to tackle the opaque parameterization process of random
weighted neural networks. The INN leverages spatial information to elucidate
the connection between parameters and network residuals. Furthermore, it
devises a geometric relationship strategy using a pool of candidate nodes and
established relationships to select node parameters conducive to network
convergence. Additionally, a lightweight version of INN tailored for
large-scale data modeling tasks is proposed. The paper also showcases the
infinite approximation property of INN. Experimental findings on various
benchmark datasets and real-world industrial cases demonstrate INN’s
superiority over other neural networks of the same type in terms of modeling
speed, accuracy, and network structure.
[LINK]
http://arxiv.org/abs/2307.00185v3
[DATE]
2024-04-14 21:06:24+08:00
[CATEGORIES]
cs.LG
Fault Detection in Mobile Networks Using Diffusion Models
[AUTHORS]
Mohamad Nabeel, Doumitrou Daniil Nimara, Tahar Zanouda
[ABSTRACT]
In today’s hyper-connected world, ensuring the reliability of telecom
networks becomes increasingly crucial. Telecom networks encompass numerous
underlying and intertwined software and hardware components, each providing
different functionalities. To ensure the stability of telecom networks, telecom
software, and hardware vendors developed several methods to detect any aberrant
behavior in telecom networks and enable instant feedback and alerts. These
approaches, although powerful, struggle to generalize due to the unsteady
nature of the software-intensive embedded system and the complexity and
diversity of multi-standard mobile networks. In this paper, we present a system
to detect anomalies in telecom networks using a generative AI model. We
evaluate several strategies using diffusion models to train the model for
anomaly detection using multivariate time-series data. The contributions of
this paper are threefold: (i) A proposal of a framework for utilizing diffusion
models for time-series anomaly detection in telecom networks, (ii) A proposal
of a particular Diffusion model architecture that outperforms other
state-of-the-art techniques, (iii) Experiments on a real-world dataset to
demonstrate that our model effectively provides explainable results, exposing
some of its limitations and suggesting future research avenues to enhance its
capabilities further.
[COMMENTS]
6 pages, 4 figures, Accepted at Sixth International Workshop on Data
Driven Intelligence for Networks and Systems (DDINS) - IEEE International
Conference on Communications (ICC) 2024
[LINK]
http://arxiv.org/abs/2404.09240v1
[DATE]
2024-04-14 20:59:35+08:00
[CATEGORIES]
cs.LG
Breast Cancer Image Classification Method Based on Deep Transfer Learning
[AUTHORS]
Weimin Wang, Min Gao, Mingxuan Xiao, Xu Yan, Yufeng Li
[ABSTRACT]
To address the issues of limited samples, time-consuming feature design, and
low accuracy in detection and classification of breast cancer pathological
images, a breast cancer image classification model algorithm combining deep
learning and transfer learning is proposed. This algorithm is based on the
DenseNet structure of deep neural networks, and constructs a network model by
introducing attention mechanisms, and trains the enhanced dataset using
multi-level transfer learning. Experimental results demonstrate that the
algorithm achieves an efficiency of over 84.0\% in the test set, with a
significantly improved classification accuracy compared to previous models,
making it applicable to medical breast cancer detection tasks.
[LINK]
http://arxiv.org/abs/2404.09226v1
[DATE]
2024-04-14 20:09:47+08:00
[CATEGORIES]
cs.LG
Node Classification in Random Trees
[AUTHORS]
Wouter W. L. Nuijten, Vlado Menkovski
[ABSTRACT]
We propose a method for the classification of objects that are structured as
random trees. Our aim is to model a distribution over the node label
assignments in settings where the tree data structure is associated with node
attributes (typically high dimensional embeddings). The tree topology is not
predetermined and none of the label assignments are present during inference.
Other methods that produce a distribution over node label assignment in trees
(or more generally in graphs) either assume conditional independence of the
label assignment, operate on a fixed graph topology, or require part of the
node labels to be observed. Our method defines a Markov Network with the
corresponding topology of the random tree and an associated Gibbs distribution.
We parameterize the Gibbs distribution with a Graph Neural Network that
operates on the random tree and the node embeddings. This allows us to estimate
the likelihood of node assignments for a given random tree and use MCMC to
sample from the distribution of node assignments.
We evaluate our method on the tasks of node classification in trees on the
Stanford Sentiment Treebank dataset. Our method outperforms the baselines on
this dataset, demonstrating its effectiveness for modeling joint distributions
of node labels in random trees.
[LINK]
http://arxiv.org/abs/2311.12167v2
[DATE]
2024-04-14 19:28:37+08:00
[CATEGORIES]
cs.LG
Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds
[AUTHORS]
Shikun Li, Xiaobo Xia, Jiankang Deng, Shiming Ge, Tongliang Liu
[ABSTRACT]
Learning from crowds describes that the annotations of training data are
obtained with crowd-sourcing services. Multiple annotators each complete their
own small part of the annotations, where labeling mistakes that depend on
annotators occur frequently. Modeling the label-noise generation process by the
noise transition matrix is a power tool to tackle the label noise. In
real-world crowd-sourcing scenarios, noise transition matrices are both
annotator- and instance-dependent. However, due to the high complexity of
annotator- and instance-dependent transition matrices (AIDTM), annotation
sparsity, which means each annotator only labels a little part of instances,
makes modeling AIDTM very challenging. Prior works simplify the problem by
assuming the transition matrix is instance-independent or using simple
parametric ways, which lose modeling generality. Motivated by this, we target a
more realistic problem, estimating general AIDTM in practice. Without losing
modeling generality, we parameterize AIDTM with deep neural networks. To
alleviate the modeling challenge, we suppose every annotator shares its noise
pattern with similar annotators, and estimate AIDTM via knowledge transfer. We
hence first model the mixture of noise patterns by all annotators, and then
transfer this modeling to individual annotators. Furthermore, considering that
the transfer from the mixture of noise patterns to individuals may cause two
annotators with highly different noise generations to perturb each other, we
employ the knowledge transfer between identified neighboring annotators to
calibrate the modeling. Theoretical analyses are derived to demonstrate that
both the knowledge transfer from global to individuals and the knowledge
transfer between neighboring individuals can help model general AIDTM.
Experiments confirm the superiority of the proposed approach on synthetic and
real-world crowd-sourcing data.
[COMMENTS]
Accepted by IEEE TPAMI. 22 pages, 4 figures, and 8 tables
[LINK]
http://arxiv.org/abs/2306.03116v3
[DATE]
2024-04-14 19:08:27+08:00
[CATEGORIES]
cs.LG
Qandle: Accelerating State Vector Simulation Using Gate-Matrix Caching and Circuit Splitting
[AUTHORS]
Gerhard Stenzel, Sebastian Zielinski, Michael Kölle, Philipp Altmann, Jonas Nüßlein, Thomas Gabor
[ABSTRACT]
To address the computational complexity associated with state-vector
simulation for quantum circuits, we propose a combination of advanced
techniques to accelerate circuit execution. Quantum gate matrix caching reduces
the overhead of repeated applications of the Kronecker product when applying a
gate matrix to the state vector by storing decomposed partial matrices for each
gate. Circuit splitting divides the circuit into sub-circuits with fewer gates
by constructing a dependency graph, enabling parallel or sequential execution
on disjoint subsets of the state vector. These techniques are implemented using
the PyTorch machine learning framework. We demonstrate the performance of our
approach by comparing it to other PyTorch-compatible quantum state-vector
simulators. Our implementation, named Qandle, is designed to seamlessly
integrate with existing machine learning workflows, providing a user-friendly
API and compatibility with the OpenQASM format. Qandle is an open-source
project hosted on GitHub https://github.com/gstenzel/qandle and PyPI
https://pypi.org/project/qandle/ .
[LINK]
http://arxiv.org/abs/2404.09213v1
[DATE]
2024-04-14 18:52:01+08:00
[CATEGORIES]
cs.LG
DEGNN: Dual Experts Graph Neural Network Handling Both Edge and Node Feature Noise
[AUTHORS]
Tai Hasegawa, Sukwon Yun, Xin Liu, Yin Jun Phua, Tsuyoshi Murata
[ABSTRACT]
Graph Neural Networks (GNNs) have achieved notable success in various
applications over graph data. However, recent research has revealed that
real-world graphs often contain noise, and GNNs are susceptible to noise in the
graph. To address this issue, several Graph Structure Learning (GSL) models
have been introduced. While GSL models are tailored to enhance robustness
against edge noise through edge reconstruction, a significant limitation
surfaces: their high reliance on node features. This inherent dependence
amplifies their susceptibility to noise within node features. Recognizing this
vulnerability, we present DEGNN, a novel GNN model designed to adeptly mitigate
noise in both edges and node features. The core idea of DEGNN is to design two
separate experts: an edge expert and a node feature expert. These experts
utilize self-supervised learning techniques to produce modified edges and node
features. Leveraging these modified representations, DEGNN subsequently
addresses downstream tasks, ensuring robustness against noise present in both
edges and node features of real-world graphs. Notably, the modification process
can be trained end-to-end, empowering DEGNN to adjust dynamically and achieves
optimal edge and node representations for specific tasks. Comprehensive
experiments demonstrate DEGNN’s efficacy in managing noise, both in original
real-world graphs and in graphs with synthetic noise.
[COMMENTS]
PAKDD 2024, the code is available at
https://github.com/TaiHasegawa/DEGNN
[LINK]
http://arxiv.org/abs/2404.09207v1
[DATE]
2024-04-14 18:04:44+08:00
[CATEGORIES]
cs.LG
AceMap: Knowledge Discovery through Academic Graph
[AUTHORS]
Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou, Chenghu Zhou
[ABSTRACT]
The exponential growth of scientific literature requires effective management
and extraction of valuable insights. While existing scientific search engines
excel at delivering search results based on relational databases, they often
neglect the analysis of collaborations between scientific entities and the
evolution of ideas, as well as the in-depth analysis of content within
scientific publications. The representation of heterogeneous graphs and the
effective measurement, analysis, and mining of such graphs pose significant
challenges. To address these challenges, we present AceMap, an academic system
designed for knowledge discovery through academic graph. We present advanced
database construction techniques to build the comprehensive AceMap database
with large-scale academic entities that contain rich visual, textual, and
numerical information. AceMap also employs innovative visualization,
quantification, and analysis methods to explore associations and logical
relationships among academic entities. AceMap introduces large-scale academic
network visualization techniques centered on nebular graphs, providing a
comprehensive view of academic networks from multiple perspectives. In
addition, AceMap proposes a unified metric based on structural entropy to
quantitatively measure the knowledge content of different academic entities.
Moreover, AceMap provides advanced analysis capabilities, including tracing the
evolution of academic ideas through citation relationships and concept
co-occurrence, and generating concise summaries informed by this evolutionary
process. In addition, AceMap uses machine reading methods to generate potential
new ideas at the intersection of different fields. Exploring the integration of
large language models and knowledge graphs is a promising direction for future
research in idea evolution. Please visit \url{https://www.acemap.info} for
further exploration.
[COMMENTS]
Technical Report for AceMap (https://www.acemap.info)
[LINK]
http://arxiv.org/abs/2403.02576v2
[DATE]
2024-04-14 17:57:48+08:00
[CATEGORIES]
cs.LG
Accelerated Optimization Landscape of Linear-Quadratic Regulator
[AUTHORS]
Lechen Feng, Yuan-Hua Ni
[ABSTRACT]
Linear-quadratic regulator (LQR) is a landmark problem in the field of
optimal control, which is the concern of this paper. Generally, LQR is
classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based
on whether the full state is obtained. It has been suggested in existing
literature that both SLQR and OLQR could be viewed as \textit{constrained
nonconvex matrix optimization} problems in which the only variable to be
optimized is the feedback gain matrix. In this paper, we introduce a
first-order accelerated optimization framework of handling the LQR problem, and
give its convergence analysis for the cases of SLQR and OLQR, respectively.
Specifically, a Lipschiz Hessian property of LQR performance criterion is
presented, which turns out to be a crucial property for the application of
modern optimization techniques. For the SLQR problem, a continuous-time hybrid
dynamic system is introduced, whose solution trajectory is shown to converge
exponentially to the optimal feedback gain with Nesterov-optimal order
$1-\frac{1}{\sqrt{\kappa}}$ ($\kappa$ the condition number). Then, the
symplectic Euler scheme is utilized to discretize the hybrid dynamic system,
and a Nesterov-type method with a restarting rule is proposed that preserves
the continuous-time convergence rate, i.e., the discretized algorithm admits
the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free
accelerated framework is proposed, which is a two-procedure method consisting
of semiconvex function optimization and negative curvature exploitation. In a
time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an
$\epsilon$-stationary point of the performance criterion; this entails that the
method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla
gradient descent. Moreover, our method provides the second-order guarantee of
stationary point.
[LINK]
http://arxiv.org/abs/2307.03590v3
[DATE]
2024-04-14 16:10:02+08:00
[CATEGORIES]
cs.LG
The last Dance : Robust backdoor attack via diffusion models and bayesian approach
[AUTHORS]
Orson Mengara
[ABSTRACT]
Diffusion models are state-of-the-art deep learning generative models that
are trained on the principle of learning forward and backward diffusion
processes via the progressive addition of noise and denoising. In this paper,
we aim to fool audio-based DNN models, such as those from the Hugging Face
framework, primarily those that focus on audio, in particular transformer-based
artificial intelligence models, which are powerful machine learning models that
save time and achieve results faster and more efficiently. We demonstrate the
feasibility of backdoor attacks (called BacKBayDiffMod
) on audio transformers
derived from Hugging Face, a popular framework in the world of artificial
intelligence research. The backdoor attack developed in this paper is based on
poisoning model training data uniquely by incorporating backdoor diffusion
sampling and a Bayesian approach to the distribution of poisoned data.
[COMMENTS]
Preprint (Last update): audio backdoor attack on Hugging Face’s
Transformer pre-trained models. This attack incorporates state-of-the-art
Bayesian techniques, a modified Fokker-Planck equation (via Yang-Mills), and
a diffusion model approach
[LINK]
http://arxiv.org/abs/2402.05967v3
[DATE]
2024-04-14 15:58:40+08:00
[CATEGORIES]
cs.LG
An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
[AUTHORS]
Gabriel Meseguer-Brocal, Dorian Desblancs, Romain Hennequin
[ABSTRACT]
Self-supervised learning has emerged as a powerful way to pre-train
generalizable machine learning models on large amounts of unlabeled data. It is
particularly compelling in the music domain, where obtaining labeled data is
time-consuming, error-prone, and ambiguous. During the self-supervised process,
models are trained on pretext tasks, with the primary objective of acquiring
robust and informative features that can later be fine-tuned for specific
downstream tasks. The choice of the pretext task is critical as it guides the
model to shape the feature space with meaningful constraints for information
encoding. In the context of music, most works have relied on contrastive
learning or masking techniques. In this study, we expand the scope of pretext
tasks applied to music by investigating and comparing the performance of new
self-supervised methods for music tagging. We open-source a simple ResNet model
trained on a diverse catalog of millions of tracks. Our results demonstrate
that, although most of these pre-training methods result in similar downstream
results, contrastive learning consistently results in better downstream
performance compared to other self-supervised pre-training methods. This holds
true in a limited-data downstream context.
[LINK]
http://arxiv.org/abs/2404.09177v1
[DATE]
2024-04-14 15:56:08+08:00
[CATEGORIES]
cs.LG
Guidance with Spherical Gaussian Constraint for Conditional Diffusion
[AUTHORS]
Lingxiao Yang, Shutong Ding, Yifan Cai, Jingyi Yu, Jingya Wang, Ye Shi
[ABSTRACT]
Recent advances in diffusion models attempt to handle conditional generative
tasks by utilizing a differentiable loss function for guidance without the need
for additional training. While these methods achieved certain success, they
often compromise on sample quality and require small guidance step sizes,
leading to longer sampling processes. This paper reveals that the fundamental
issue lies in the manifold deviation during the sampling process when loss
guidance is employed. We theoretically show the existence of manifold deviation
by establishing a certain lower bound for the estimation error of the loss
guidance. To mitigate this problem, we propose Diffusion with Spherical
Gaussian constraint (DSG), drawing inspiration from the concentration
phenomenon in high-dimensional Gaussian distributions. DSG effectively
constrains the guidance step within the intermediate data manifold through
optimization and enables the use of larger guidance steps. Furthermore, we
present a closed-form solution for DSG denoising with the Spherical Gaussian
constraint. Notably, DSG can seamlessly integrate as a plugin module within
existing training-free conditional diffusion methods. Implementing DSG merely
involves a few lines of additional code with almost no extra computational
overhead, yet it leads to significant performance improvements. Comprehensive
experimental results in various conditional generation tasks validate the
superiority and adaptability of DSG in terms of both sample quality and time
efficiency.
[LINK]
http://arxiv.org/abs/2402.03201v2
[DATE]
2024-04-14 15:28:32+08:00
[CATEGORIES]
cs.LG
Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning
[AUTHORS]
Chenyu Zhang, Han Wang, Aritra Mitra, James Anderson
[ABSTRACT]
Federated reinforcement learning (FRL) has emerged as a promising paradigm
for reducing the sample complexity of reinforcement learning tasks by
exploiting information from different agents. However, when each agent
interacts with a potentially different environment, little to nothing is known
theoretically about the non-asymptotic performance of FRL algorithms. The lack
of such results can be attributed to various technical challenges and their
intricate interplay: Markovian sampling, linear function approximation,
multiple local updates to save communication, heterogeneity in the reward
functions and transition kernels of the agents’ MDPs, and continuous
state-action spaces. Moreover, in the on-policy setting, the behavior policies
vary with time, further complicating the analysis. In response, we introduce
FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped
with linear function approximation, to address these challenges and provide a
comprehensive finite-time error analysis. Notably, we establish that FedSARSA
converges to a policy that is near-optimal for all agents, with the extent of
near-optimality proportional to the level of heterogeneity. Furthermore, we
prove that FedSARSA leverages agent collaboration to enable linear speedups as
the number of agents increases, which holds for both fixed and adaptive
step-size configurations.
[COMMENTS]
Published as a conference paper at ICLR 2024
[LINK]
http://arxiv.org/abs/2401.15273v2
[DATE]
2024-04-14 15:17:28+08:00
[CATEGORIES]
cs.LG
A convergence result of a continuous model of deep learning via Łojasiewicz–Simon inequality
[AUTHORS]
Noboru Isobe
[ABSTRACT]
This study focuses on a Wasserstein-type gradient flow, which represents an
optimization process of a continuous model of a Deep Neural Network (DNN).
First, we establish the existence of a minimizer for an average loss of the
model under $L^2$-regularization. Subsequently, we show the existence of a
curve of maximal slope of the loss. Our main result is the convergence of flow
to a critical point of the loss as time goes to infinity. An essential aspect
of proving this result involves the establishment of the \L{}ojasiewicz–Simon
gradient inequality for the loss. We derive this inequality by assuming the
analyticity of NNs and loss functions. Our proofs offer a new approach for
analyzing the asymptotic behavior of Wasserstein-type gradient flows for
nonconvex functionals.
[COMMENTS]
31 pages, fix the title
[LINK]
http://arxiv.org/abs/2311.15365v2
[DATE]
2024-04-14 13:39:11+08:00
[CATEGORIES]
cs.LG
RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion
[AUTHORS]
Guoxuan Chi, Zheng Yang, Chenshu Wu, Jingao Xu, Yuchong Gao, Yunhao Liu, Tony Xiao Han
[ABSTRACT]
Along with AIGC shines in CV and NLP, its potential in the wireless domain
has also emerged in recent years. Yet, existing RF-oriented generative
solutions are ill-suited for generating high-quality, time-series RF data due
to limited representation capabilities. In this work, inspired by the stellar
achievements of the diffusion model in CV and NLP, we adapt it to the RF domain
and propose RF-Diffusion. To accommodate the unique characteristics of RF
signals, we first introduce a novel Time-Frequency Diffusion theory to enhance
the original diffusion model, enabling it to tap into the information within
the time, frequency, and complex-valued domains of RF signals. On this basis,
we propose a Hierarchical Diffusion Transformer to translate the theory into a
practical generative DNN through elaborated design spanning network
architecture, functional block, and complex-valued operator, making
RF-Diffusion a versatile solution to generate diverse, high-quality, and
time-series RF data. Performance comparison with three prevalent generative
models demonstrates the RF-Diffusion’s superior performance in synthesizing
Wi-Fi and FMCW signals. We also showcase the versatility of RF-Diffusion in
boosting Wi-Fi sensing systems and performing channel estimation in 5G
networks.
[COMMENTS]
Accepted by MobiCom 2024
[LINK]
http://arxiv.org/abs/2404.09140v1
[DATE]
2024-04-14 12:56:05+08:00
[CATEGORIES]
cs.LG
Interactive Generative AI Agents for Satellite Networks through a Mixture of Experts Transmission
[AUTHORS]
Ruichen Zhang, Hongyang Du, Yinqiu Liu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Dong In Kim
[ABSTRACT]
In response to the needs of 6G global communications, satellite communication
networks have emerged as a key solution. However, the large-scale development
of satellite communication networks is constrained by the complex system
models, whose modeling is challenging for massive users. Moreover, transmission
interference between satellites and users seriously affects communication
performance. To solve these problems, this paper develops generative artificial
intelligence (AI) agents for model formulation and then applies a mixture of
experts (MoE) approach to design transmission strategies. Specifically, we
leverage large language models (LLMs) to build an interactive modeling paradigm
and utilize retrieval-augmented generation (RAG) to extract satellite expert
knowledge that supports mathematical modeling. Afterward, by integrating the
expertise of multiple specialized components, we propose an MoE-proximal policy
optimization (PPO) approach to solve the formulated problem. Each expert can
optimize the optimization variables at which it excels through specialized
training through its own network and then aggregates them through the gating
network to perform joint optimization. The simulation results validate the
accuracy and effectiveness of employing a generative agent for problem
formulation. Furthermore, the superiority of the proposed MoE-ppo approach over
other benchmarks is confirmed in solving the formulated problem. The
adaptability of MoE-PPO to various customized modeling problems has also been
demonstrated.
[COMMENTS]
13 pages, 9 figures
[LINK]
http://arxiv.org/abs/2404.09134v1
[DATE]
2024-04-14 11:44:54+08:00
[CATEGORIES]
cs.LG
Retro-fallback: retrosynthetic planning in an uncertain world
[AUTHORS]
Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato
[ABSTRACT]
Retrosynthesis is the task of planning a series of chemical reactions to
create a desired molecule from simpler, buyable molecules. While previous works
have proposed algorithms to find optimal solutions for a range of metrics (e.g.
shortest, lowest-cost), these works generally overlook the fact that we have
imperfect knowledge of the space of possible reactions, meaning plans created
by algorithms may not work in a laboratory. In this paper we propose a novel
formulation of retrosynthesis in terms of stochastic processes to account for
this uncertainty. We then propose a novel greedy algorithm called
retro-fallback which maximizes the probability that at least one synthesis plan
can be executed in the lab. Using in-silico benchmarks we demonstrate that
retro-fallback generally produces better sets of synthesis plans than the
popular MCTS and retro* algorithms.
[COMMENTS]
ICLR 2024 camera ready version
(https://openreview.net/forum?id=dl0u4ODCuW). 58 pages total. Code available
at: https://github.com/AustinT/retro-fallback-iclr24. This version has 1)
updated writing 2) updated figures 3) additional experimental results 4) more
complete explanation of AND/OR graphs in the appendices 5) correct typos +
error in fig G.5 caption
[LINK]
http://arxiv.org/abs/2310.09270v3
[DATE]
2024-04-14 10:50:35+08:00
[CATEGORIES]
cs.LG
Intelligent Chemical Purification Technique Based on Machine Learning
[AUTHORS]
Wenchao Wu, Hao Xu, Dongxiao Zhang, Fanyang Mo
[ABSTRACT]
We present an innovative of artificial intelligence with column
chromatography, aiming to resolve inefficiencies and standardize data
collection in chemical separation and purification domain. By developing an
automated platform for precise data acquisition and employing advanced machine
learning algorithms, we constructed predictive models to forecast key
separation parameters, thereby enhancing the efficiency and quality of
chromatographic processes. The application of transfer learning allows the
model to adapt across various column specifications, broadening its utility. A
novel metric, separation probability ($S_p$), quantifies the likelihood of
effective compound separation, validated through experimental verification.
This study signifies a significant step forward int the application of AI in
chemical research, offering a scalable solution to traditional chromatography
challenges and providing a foundation for future technological advancements in
chemical analysis and purification.
[COMMENTS]
22 pages, 5 Figures, Submitted to Nature Machine Intelligence
[LINK]
http://arxiv.org/abs/2404.09114v1
[DATE]
2024-04-14 09:44:58+08:00
[CATEGORIES]
cs.LG
Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation
[AUTHORS]
Bohan Wu, David Blei
[ABSTRACT]
Variational inference (VI) has emerged as a popular method for approximate
inference for high-dimensional Bayesian models. In this paper, we propose a
novel VI method that extends the naive mean field via entropic regularization,
referred to as $\Xi$-variational inference ($\Xi$-VI). $\Xi$-VI has a close
connection to the entropic optimal transport problem and benefits from the
computationally efficient Sinkhorn algorithm. We show that $\Xi$-variational
posteriors effectively recover the true posterior dependency, where the
dependence is downweighted by the regularization parameter. We analyze the role
of dimensionality of the parameter space on the accuracy of $\Xi$-variational
approximation and how it affects computational considerations, providing a
rough characterization of the statistical-computational trade-off in $\Xi$-VI.
We also investigate the frequentist properties of $\Xi$-VI and establish
results on consistency, asymptotic normality, high-dimensional asymptotics, and
algorithmic stability. We provide sufficient criteria for achieving
polynomial-time approximate inference using the method. Finally, we demonstrate
the practical advantage of $\Xi$-VI over mean-field variational inference on
simulated and real data.
[LINK]
http://arxiv.org/abs/2404.09113v1
[DATE]
2024-04-14 09:40:11+08:00
[CATEGORIES]
cs.LG
Quantum Machine Learning with HQC Architectures using non-Classically Simulable Feature Maps
[AUTHORS]
Syed Farhan Ahmad, Raghav Rawat, Minal Moharir
[ABSTRACT]
Hybrid Quantum-Classical (HQC) Architectures are used in near-term NISQ
Quantum Computers for solving Quantum Machine Learning problems. The quantum
advantage comes into picture due to the exponential speedup offered over
classical computing. One of the major challenges in implementing such
algorithms is the choice of quantum embeddings and the use of a functionally
correct quantum variational circuit. In this paper, we present an application
of QSVM (Quantum Support Vector Machines) to predict if a person will require
mental health treatment in the tech world in the future using the dataset from
OSMI Mental Health Tech Surveys. We achieve this with non-classically simulable
feature maps and prove that NISQ HQC Architectures for Quantum Machine Learning
can be used alternatively to create good performance models in near-term
real-world applications.
[COMMENTS]
The results from an actual hardware are not performant enough and do
not match up with that of the simulator. Moreover, hyperparameter is not
considered
[LINK]
http://arxiv.org/abs/2103.11381v2
[DATE]
2024-04-14 09:07:38+08:00
[CATEGORIES]
cs.LG
Mixture of Experts Soften the Curse of Dimensionality in Operator Learning
[AUTHORS]
Anastasis Kratsios, Takashi Furuya, J. Antonio Lara B., Matti Lassas, Maarten de Hoop
[ABSTRACT]
In this paper, we construct a mixture of neural operators (MoNOs) between
function spaces whose complexity is distributed over a network of expert neural
operators (NOs), with each NO satisfying parameter scaling restrictions. Our
main result is a \textit{distributed} universal approximation theorem
guaranteeing that any Lipschitz non-linear operator between $L^2([0,1]^d)$
spaces can be approximated uniformly over the Sobolev unit ball therein, to any
given $\varepsilon>0$ accuracy, by an MoNO while satisfying the constraint
that: each expert NO has a depth, width, and rank of
$\mathcal{O}(\varepsilon^{-1})$. Naturally, our result implies that the
required number of experts must be large, however, each NO is guaranteed to be
small enough to be loadable into the active memory of most computers for
reasonable accuracies $\varepsilon$. During our analysis, we also obtain new
quantitative expression rates for classical NOs approximating uniformly
continuous non-linear operators uniformly on compact subsets of $L^2([0,1]^d)$.
[LINK]
http://arxiv.org/abs/2404.09101v1
[DATE]
2024-04-14 07:20:16+08:00
[CATEGORIES]
cs.LG
Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models
[AUTHORS]
Zeyu Zhou, Ruqi Bai, Sean Kulinski, Murat Kocaoglu, David I. Inouye
[ABSTRACT]
Answering counterfactual queries has important applications such as
explainability, robustness, and fairness but is challenging when the causal
variables are unobserved and the observations are non-linear mixtures of these
latent variables, such as pixels in images. One approach is to recover the
latent Structural Causal Model (SCM), which may be infeasible in practice due
to requiring strong assumptions, e.g., linearity of the causal mechanisms or
perfect atomic interventions. Meanwhile, more practical ML-based approaches
using naive domain translation models to generate counterfactual samples lack
theoretical grounding and may construct invalid counterfactuals. In this work,
we strive to strike a balance between practicality and theoretical guarantees
by analyzing a specific type of causal query called domain counterfactuals,
which hypothesizes what a sample would have looked like if it had been
generated in a different domain (or environment). We show that recovering the
latent SCM is unnecessary for estimating domain counterfactuals, thereby
sidestepping some of the theoretic challenges. By assuming invertibility and
sparsity of intervention, we prove domain counterfactual estimation error can
be bounded by a data fit term and intervention sparsity term. Building upon our
theoretical results, we develop a theoretically grounded practical algorithm
that simplifies the modeling process to generative model estimation under
autoregressive and shared parameter constraints that enforce intervention
sparsity. Finally, we show an improvement in counterfactual estimation over
baseline methods through extensive simulated and image-based experiments.
[COMMENTS]
In ICLR 2024
[LINK]
http://arxiv.org/abs/2306.11281v3
[DATE]
2024-04-14 05:52:38+08:00
[CATEGORIES]
cs.LG
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming
[AUTHORS]
Sen Na, Michael W. Mahoney
[ABSTRACT]
We consider online statistical inference of constrained stochastic nonlinear
optimization problems. We apply the Stochastic Sequential Quadratic Programming
(StoSQP) method to solve these problems, which can be regarded as applying
second-order Newton’s method to the Karush-Kuhn-Tucker (KKT) conditions. In
each iteration, the StoSQP method computes the Newton direction by solving a
quadratic program, and then selects a proper adaptive stepsize $\bar{\alpha}_t$
to update the primal-dual iterate. To reduce dominant computational cost of the
method, we inexactly solve the quadratic program in each iteration by employing
an iterative sketching solver. Notably, the approximation error of the
sketching solver need not vanish as iterations proceed, meaning that the
per-iteration computational cost does not blow up. For the above StoSQP method,
we show that under mild assumptions, the rescaled primal-dual sequence
$1/\sqrt{\bar{\alpha}_t}\cdot (x_t - x^\star, \lambda_t - \lambda^\star)$
converges to a mean-zero Gaussian distribution with a nontrivial covariance
matrix depending on the underlying sketching distribution. To perform inference
in practice, we also analyze a plug-in covariance matrix estimator. We
illustrate the asymptotic normality result of the method both on benchmark
nonlinear problems in CUTEst test set and on linearly/nonlinearly constrained
regression problems.
[COMMENTS]
59 pages, 3 figures, 11 tables
[LINK]
http://arxiv.org/abs/2205.13687v4
[DATE]
2024-04-14 05:08:29+08:00
[CATEGORIES]
cs.LG
Probabilistic Directed Distance Fields for Ray-Based Shape Representations
[AUTHORS]
Tristan Aumentado-Armstrong, Stavros Tsogkas, Sven Dickinson, Allan Jepson
[ABSTRACT]
In modern computer vision, the optimal representation of 3D shape continues
to be task-dependent. One fundamental operation applied to such representations
is differentiable rendering, as it enables inverse graphics approaches in
learning frameworks. Standard explicit shape representations (voxels, point
clouds, or meshes) are often easily rendered, but can suffer from limited
geometric fidelity, among other issues. On the other hand, implicit
representations (occupancy, distance, or radiance fields) preserve greater
fidelity, but suffer from complex or inefficient rendering processes, limiting
scalability. In this work, we devise Directed Distance Fields (DDFs), a novel
neural shape representation that builds upon classical distance fields. The
fundamental operation in a DDF maps an oriented point (position and direction)
to surface visibility and depth. This enables efficient differentiable
rendering, obtaining depth with a single forward pass per pixel, as well as
differential geometric quantity extraction (e.g., surface normals), with only
additional backward passes. Using probabilistic DDFs (PDDFs), we show how to
model inherent discontinuities in the underlying field. We then apply DDFs to
several applications, including single-shape fitting, generative modelling, and
single-image 3D reconstruction, showcasing strong performance with simple
architectural components via the versatility of our representation. Finally,
since the dimensionality of DDFs permits view-dependent geometric artifacts, we
conduct a theoretical investigation of the constraints necessary for view
consistency. We find a small set of field properties that are sufficient to
guarantee a DDF is consistent, without knowing, for instance, which shape the
field is expressing.
[COMMENTS]
Extension of arXiv:2112.05300
[LINK]
http://arxiv.org/abs/2404.09081v1
[DATE]
2024-04-14 05:02:49+08:00
[CATEGORIES]
cs.LG
Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications
[AUTHORS]
Puze Liu, Haitham Bou-Ammar, Jan Peters, Davide Tateo
[ABSTRACT]
Integrating learning-based techniques, especially reinforcement learning,
into robotics is promising for solving complex problems in unstructured
environments. However, most existing approaches are trained in well-tuned
simulators and subsequently deployed on real robots without online fine-tuning.
In this setting, the simulation’s realism seriously impacts the deployment’s
success rate. Instead, learning with real-world interaction data offers a
promising alternative: not only eliminates the need for a fine-tuned simulator
but also applies to a broader range of tasks where accurate modeling is
unfeasible. One major problem for on-robot reinforcement learning is ensuring
safety, as uncontrolled exploration can cause catastrophic damage to the robot
or the environment. Indeed, safety specifications, often represented as
constraints, can be complex and non-linear, making safety challenging to
guarantee in learning systems. In this paper, we show how we can impose complex
safety constraints on learning-based robotics systems in a principled manner,
both from theoretical and practical points of view. Our approach is based on
the concept of the Constraint Manifold, representing the set of safe robot
configurations. Exploiting differential geometry techniques, i.e., the tangent
space, we can construct a safe action space, allowing learning agents to sample
arbitrary actions while ensuring safety. We demonstrate the method’s
effectiveness in a real-world Robot Air Hockey task, showing that our method
can handle high-dimensional tasks with complex constraints. Videos of the real
robot experiments are available on the project website
(https://puzeliu.github.io/TRO-ATACOM).
[COMMENTS]
19 pages; sumitted to IEEE Transactions on Robotics
[LINK]
http://arxiv.org/abs/2404.09080v1
[DATE]
2024-04-14 04:55:15+08:00
[CATEGORIES]
cs.LG
When and How: Learning Identifiable Latent States for Nonstationary Time Series Forecasting
[AUTHORS]
Zijian Li, Ruichu Cai, Zhenhui Yang, Haiqin Huang, Guangyi Chen, Yifan Shen, Zhengming Chen, Xiangchen Song, Zhifeng Hao, Kun Zhang
[ABSTRACT]
Temporal distribution shifts are ubiquitous in time series data. One of the
most popular methods assumes that the temporal distribution shift occurs
uniformly to disentangle the stationary and nonstationary dependencies. But
this assumption is difficult to meet, as we do not know when the distribution
shifts occur. To solve this problem, we propose to learn IDentifiable latEnt
stAtes (IDEA) to detect when the distribution shifts occur. Beyond that, we
further disentangle the stationary and nonstationary latent states via
sufficient observation assumption to learn how the latent states change.
Specifically, we formalize the causal process with environment-irrelated
stationary and environment-related nonstationary variables. Under mild
conditions, we show that latent environments and stationary/nonstationary
variables are identifiable. Based on these theories, we devise the IDEA model,
which incorporates an autoregressive hidden Markov model to estimate latent
environments and modular prior networks to identify latent states. The IDEA
model outperforms several latest nonstationary forecasting methods on various
benchmark datasets, highlighting its advantages in real-world scenarios.
[LINK]
http://arxiv.org/abs/2402.12767v2
[DATE]
2024-04-14 04:03:26+08:00
[CATEGORIES]
cs.LG
Improving Convergence and Generalization Using Parameter Symmetries
[AUTHORS]
Bo Zhao, Robert M. Gower, Robin Walters, Rose Yu
[COMMENTS]
28 pages, 13 figures, ICLR 2024
[LINK]
http://arxiv.org/abs/2305.13404v3
[DATE]
2024-04-14 02:28:52+08:00
[CATEGORIES]
cs.LG
Scalable Spatiotemporally Varying Coefficient Modelling with Bayesian Kernelized Tensor Regression
[AUTHORS]
Mengying Lei, Aurelie Labbe, Lijun Sun
[ABSTRACT]
As a regression technique in spatial statistics, the spatiotemporally varying
coefficient model (STVC) is an important tool for discovering nonstationary and
interpretable response-covariate associations over both space and time.
However, it is difficult to apply STVC for large-scale spatiotemporal analyses
due to its high computational cost. To address this challenge, we summarize the
spatiotemporally varying coefficients using a third-order tensor structure and
propose to reformulate the spatiotemporally varying coefficient model as a
special low-rank tensor regression problem. The low-rank decomposition can
effectively model the global patterns of large data sets with a substantially
reduced number of parameters. To further incorporate the local spatiotemporal
dependencies, we use Gaussian process (GP) priors on the spatial and temporal
factor matrices. We refer to the overall framework as Bayesian Kernelized
Tensor Regression (BKTR), and kernelized tensor factorization can be considered
a new and scalable approach to modeling multivariate spatiotemporal processes
with a low-rank covariance structure. For model inference, we develop an
efficient Markov chain Monte Carlo (MCMC) algorithm, which uses Gibbs sampling
to update factor matrices and slice sampling to update kernel hyperparameters.
We conduct extensive experiments on both synthetic and real-world data sets,
and our results confirm the superior performance and efficiency of BKTR for
model estimation and parameter inference.
[LINK]
http://arxiv.org/abs/2109.00046v4
[DATE]
2024-04-14 02:25:28+08:00
[CATEGORIES]
cs.LG
Tackling Structural Hallucination in Image Translation with Local Diffusion
[AUTHORS]
Seunghoi Kim, Chen Jin, Tom Diethe, Matteo Figini, Henry F. J. Tregidgo, Asher Mullokandov, Philip Teare, Daniel C. Alexander
[ABSTRACT]
Recent developments in diffusion models have advanced conditioned image
generation, yet they struggle with reconstructing out-of-distribution (OOD)
images, such as unseen tumors in medical images, causing ``image
hallucination’’ and risking misdiagnosis. We hypothesize such hallucinations
result from local OOD regions in the conditional images. We verify that
partitioning the OOD region and conducting separate image generations
alleviates hallucinations in several applications. From this, we propose a
training-free diffusion framework that reduces hallucination with multiple
Local Diffusion processes. Our approach involves OOD estimation followed by two
modules: a “branching” module generates locally both within and outside OOD
regions, and a “fusion” module integrates these predictions into one. Our
evaluation shows our method mitigates hallucination over baseline models
quantitatively and qualitatively, reducing misdiagnosis by 40% and 25% in the
real-world medical and natural image datasets, respectively. It also
demonstrates compatibility with various pre-trained diffusion models.
[LINK]
http://arxiv.org/abs/2404.05980v2
[DATE]
2024-04-14 02:10:00+08:00
[CATEGORIES]
cs.LG
Explainable Traffic Flow Prediction with Large Language Models
[AUTHORS]
Xusen Guo, Qiming Zhang, Junyue Jiang, Mingxing Peng, Meixin Zhu, Hao, Yang
[ABSTRACT]
Traffic flow prediction is crucial for intelligent transportation systems. It
has experienced significant advancements thanks to the power of deep learning
in capturing latent patterns of traffic data. However, recent deep-learning
architectures require intricate model designs and lack an intuitive
understanding of the mapping from input data to predicted results. Achieving
both accuracy and interpretability in traffic prediction models remains to be a
challenge due to the complexity of traffic data and the inherent opacity of
deep learning models. To tackle these challenges, we propose a novel approach,
Traffic Flow Prediction LLM (TF-LLM), which leverages large language models
(LLMs) to generate interpretable traffic flow predictions. By transferring
multi-modal traffic data into natural language descriptions, TF-LLM captures
complex spatial-temporal patterns and external factors from comprehensive
traffic data. The LLM framework is fine-tuned using language-based instructions
to align with spatial-temporal traffic flow data. Empirically, TF-LLM shows
competitive accuracy compared with deep learning baselines, while providing
intuitive and interpretable predictions. We discuss the spatial-temporal and
input dependencies for explainable future flow forecasting, showcasing TF-LLM’s
potential for diverse city prediction tasks. This paper contributes to
advancing explainable traffic prediction models and lays a foundation for
future exploration of LLM applications in transportation. To the best of our
knowledge, this is the first study to use LLM for interpretable prediction of
traffic flow.
[COMMENTS]
27pages, 8 figures
[LINK]
http://arxiv.org/abs/2404.02937v3
[DATE]
2024-04-14 00:12:53+08:00
[CATEGORIES]
cs.LG
Compressive Mahalanobis Metric Learning Adapts to Intrinsic Dimension
[AUTHORS]
Efstratios Palias, Ata Kabán
[ABSTRACT]
Metric learning aims at finding a suitable distance metric over the input
space, to improve the performance of distance-based learning algorithms. In
high-dimensional settings, it can also serve as dimensionality reduction by
imposing a low-rank restriction to the learnt metric. In this paper, we
consider the problem of learning a Mahalanobis metric, and instead of training
a low-rank metric on high-dimensional data, we use a randomly compressed
version of the data to train a full-rank metric in this reduced feature space.
We give theoretical guarantees on the error for Mahalanobis metric learning,
which depend on the stable dimension of the data support, but not on the
ambient dimension. Our bounds make no assumptions aside from i.i.d. data
sampling from a bounded support, and automatically tighten when benign
geometrical structures are present. An important ingredient is an extension of
Gordon’s theorem, which may be of independent interest. We also corroborate our
findings by numerical experiments.
[COMMENTS]
8 pages, 2 figures
[LINK]
http://arxiv.org/abs/2309.05751v3
[DATE]
2024-04-14 00:00:38+08:00
[CATEGORIES]
cs.LG
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
[AUTHORS]
Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang
[ABSTRACT]
Large language models like ChatGPT have shown substantial progress in natural
language understanding and generation, proving valuable across various
disciplines, including the medical field. Despite advancements, challenges
persist due to the complexity and diversity inherent in medical tasks which
often require multi-task learning capabilities. Previous approaches, although
beneficial, fall short in real-world applications because they necessitate
task-specific annotations at inference time, limiting broader generalization.
This paper introduces MING-MOE, a novel Mixture-of-Expert~(MOE)-based medical
large language model designed to manage diverse and complex medical tasks
without requiring task-specific annotations, thus enhancing its usability
across extensive datasets. MING-MOE employs a Mixture of Low-Rank Adaptation
(MoLoRA) technique, allowing for efficient parameter usage by maintaining base
model parameters static while adapting through a minimal set of trainable
parameters. We demonstrate that MING-MOE achieves state-of-the-art (SOTA)
performance on over 20 medical tasks, illustrating a significant improvement
over existing models. This approach not only extends the capabilities of
medical language models but also improves inference efficiency.
[COMMENTS]
15 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.09027v1
[DATE]
2024-04-13 23:28:52+08:00
[CATEGORIES]
cs.CL
Probing Large Language Models from A Human Behavioral Perspective
[AUTHORS]
Xintong Wang, Xiaoyu Li, Xingshan Li, Chris Biemann
[ABSTRACT]
Large Language Models (LLMs) have emerged as dominant foundational models in
modern NLP. However, the understanding of their prediction processes and
internal mechanisms, such as feed-forward networks (FFN) and multi-head
self-attention (MHSA), remains largely unexplored. In this work, we probe LLMs
from a human behavioral perspective, correlating values from LLMs with
eye-tracking measures, which are widely recognized as meaningful indicators of
human reading patterns. Our findings reveal that LLMs exhibit a similar
prediction pattern with humans but distinct from that of Shallow Language
Models (SLMs). Moreover, with the escalation of LLM layers from the middle
layers, the correlation coefficients also increase in FFN and MHSA, indicating
that the logits within FFN increasingly encapsulate word semantics suitable for
predicting tokens from the vocabulary.
[COMMENTS]
Accepted by LREC-COLING NeusymBridge 2024
[LINK]
http://arxiv.org/abs/2310.05216v2
[DATE]
2024-04-13 23:22:39+08:00
[CATEGORIES]
cs.CL
HyperCLOVA X Technical Report
[AUTHORS]
Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han, Youngkyun Jin, Hyein Jun, Jaeseung Jung, Chanwoong Kim, Jinhong Kim, Jinuk Kim, Dokyeong Lee, Dongwook Park, Jeong Min Sohn, Sujung Han, Jiae Heo, Sungju Hong, Mina Jeon, Hyunhoon Jung, Jungeun Jung, Wangkyo Jung, Chungjoon Kim, Hyeri Kim, Jonghyun Kim, Min Young Kim, Soeun Lee, Joonhee Park, Jieun Shin, Sojin Yang, Jungsoon Yoon, Hwaran Lee, Sanghwan Bae, Jeehwan Cha, Karl Gylleus, Donghoon Ham, Mihak Hong, Youngki Hong, Yunki Hong, Dahyun Jang, Hyojun Jeon, Yujin Jeon, Yeji Jeong, Myunggeun Ji, Yeguk Jin, Chansong Jo, Shinyoung Joo, Seunghwan Jung, Adrian Jungmyung Kim, Byoung Hoon Kim, Hyomin Kim, Jungwhan Kim, Minkyoung Kim, Minseung Kim, Sungdong Kim, Yonghee Kim, Youngjun Kim, Youngkwan Kim, Donghyeon Ko, Dughyun Lee, Ha Young Lee, Jaehong Lee, Jieun Lee, Jonghyun Lee, Jongjin Lee, Min Young Lee, Yehbin Lee, Taehong Min, Yuri Min, Kiyoon Moon, Hyangnam Oh, Jaesun Park, Kyuyon Park, Younghun Park, Hanbae Seo, Seunghyun Seo, Mihyun Sim, Gyubin Son, Matt Yeo, Kyung Hoon Yeom, Wonjoon Yoo, Myungin You, Doheon Ahn, Homin Ahn, Joohee Ahn, Seongmin Ahn, Chanwoo An, Hyeryun An, Junho An, Sang-Min An, Boram Byun, Eunbin Byun, Jongho Cha, Minji Chang, Seunggyu Chang, Haesong Cho, Youngdo Cho, Dalnim Choi, Daseul Choi, Hyoseok Choi, Minseong Choi, Sangho Choi, Seongjae Choi, Wooyong Choi, Sewhan Chun, Dong Young Go, Chiheon Ham, Danbi Han, Jaemin Han, Moonyoung Hong, Sung Bum Hong, Dong-Hyun Hwang, Seongchan Hwang, Jinbae Im, Hyuk Jin Jang, Jaehyung Jang, Jaeni Jang, Sihyeon Jang, Sungwon Jang, Joonha Jeon, Daun Jeong, Joonhyun Jeong, Kyeongseok Jeong, Mini Jeong, Sol Jin, Hanbyeol Jo, Hanju Jo, Minjung Jo, Chaeyoon Jung, Hyungsik Jung, Jaeuk Jung, Ju Hwan Jung, Kwangsun Jung, Seungjae Jung, Soonwon Ka, Donghan Kang, Soyoung Kang, Taeho Kil, Areum Kim, Beomyoung Kim, Byeongwook Kim, Daehee Kim, Dong-Gyun Kim, Donggook Kim, Donghyun Kim, Euna Kim, Eunchul Kim, Geewook Kim, Gyu Ri Kim, Hanbyul Kim, Heesu Kim, Isaac Kim, Jeonghoon Kim, Jihye Kim, Joonghoon Kim, Minjae Kim, Minsub Kim, Pil Hwan Kim, Sammy Kim, Seokhun Kim, Seonghyeon Kim, Soojin Kim, Soong Kim, Soyoon Kim, Sunyoung Kim, Taeho Kim, Wonho Kim, Yoonsik Kim, You Jin Kim, Yuri Kim, Beomseok Kwon, Ohsung Kwon, Yoo-Hwan Kwon, Anna Lee, Byungwook Lee, Changho Lee, Daun Lee, Dongjae Lee, Ha-Ram Lee, Hodong Lee, Hwiyeong Lee, Hyunmi Lee, Injae Lee, Jaeung Lee, Jeongsang Lee, Jisoo Lee, Jongsoo Lee, Joongjae Lee, Juhan Lee, Jung Hyun Lee, Junghoon Lee, Junwoo Lee, Se Yun Lee, Sujin Lee, Sungjae Lee, Sungwoo Lee, Wonjae Lee, Zoo Hyun Lee, Jong Kun Lim, Kun Lim, Taemin Lim, Nuri Na, Jeongyeon Nam, Kyeong-Min Nam, Yeonseog Noh, Biro Oh, Jung-Sik Oh, Solgil Oh, Yeontaek Oh, Boyoun Park, Cheonbok Park, Dongju Park, Hyeonjin Park, Hyun Tae Park, Hyunjung Park, Jihye Park, Jooseok Park, Junghwan Park, Jungsoo Park, Miru Park, Sang Hee Park, Seunghyun Park, Soyoung Park, Taerim Park, Wonkyeong Park, Hyunjoon Ryu, Jeonghun Ryu, Nahyeon Ryu, Soonshin Seo, Suk Min Seo, Yoonjeong Shim, Kyuyong Shin, Wonkwang Shin, Hyun Sim, Woongseob Sim, Hyejin Soh, Bokyong Son, Hyunjun Son, Seulah Son, Chi-Yun Song, Chiyoung Song, Ka Yeon Song, Minchul Song, Seungmin Song, Jisung Wang, Yonggoo Yeo, Myeong Yeon Yi, Moon Bin Yim, Taehwan Yoo, Youngjoon Yoo, Sungmin Yoon, Young Jin Yoon, Hangyeol Yu, Ui Seon Yu, Xingdong Zuo, Jeongin Bae, Joungeun Bae, Hyunsoo Cho, Seonghyun Cho, Yongjin Cho, Taekyoon Choi, Yera Choi, Jiwan Chung, Zhenghui Han, Byeongho Heo, Euisuk Hong, Taebaek Hwang, Seonyeol Im, Sumin Jegal, Sumin Jeon, Yelim Jeong, Yonghyun Jeong, Can Jiang, Juyong Jiang, Jiho Jin, Ara Jo, Younghyun Jo, Hoyoun Jung, Juyoung Jung, Seunghyeong Kang, Dae Hee Kim, Ginam Kim, Hangyeol Kim, Heeseung Kim, Hyojin Kim, Hyojun Kim, Hyun-Ah Kim, Jeehye Kim, Jin-Hwa Kim, Jiseon Kim, Jonghak Kim, Jung Yoon Kim, Rak Yeong Kim, Seongjin Kim, Seoyoon Kim, Sewon Kim, Sooyoung Kim, Sukyoung Kim, Taeyong Kim, Naeun Ko, Bonseung Koo, Heeyoung Kwak, Haena Kwon, Youngjin Kwon, Boram Lee, Bruce W. Lee, Dagyeong Lee, Erin Lee, Euijin Lee, Ha Gyeong Lee, Hyojin Lee, Hyunjeong Lee, Jeeyoon Lee, Jeonghyun Lee, Jongheok Lee, Joonhyung Lee, Junhyuk Lee, Mingu Lee, Nayeon Lee, Sangkyu Lee, Se Young Lee, Seulgi Lee, Seung Jin Lee, Suhyeon Lee, Yeonjae Lee, Yesol Lee, Youngbeom Lee, Yujin Lee, Shaodong Li, Tianyu Liu, Seong-Eun Moon, Taehong Moon, Max-Lasse Nihlenramstroem, Wonseok Oh, Yuri Oh, Hongbeen Park, Hyekyung Park, Jaeho Park, Nohil Park, Sangjin Park, Jiwon Ryu, Miru Ryu, Simo Ryu, Ahreum Seo, Hee Seo, Kangdeok Seo, Jamin Shin, Seungyoun Shin, Heetae Sin, Jiangping Wang, Lei Wang, Ning Xiang, Longxiang Xiao, Jing Xu, Seonyeong Yi, Haanju Yoo, Haneul Yoo, Hwanhee Yoo, Liang Yu, Youngjae Yu, Weijie Yuan, Bo Zeng, Qian Zhou, Kyunghyun Cho, Jung-Woo Ha, Joonsuk Park, Jihyun Hwang, Hyoung Jo Kwon, Soonyong Kwon, Jungyeon Lee, Seungho Lee, Seonghyeon Lim, Hyunkyung Noh, Seungho Choi, Sang-Woo Lee, Jung Hwa Lim, Nako Sung
[ABSTRACT]
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored
to the Korean language and culture, along with competitive capabilities in
English, math, and coding. HyperCLOVA X was trained on a balanced mix of
Korean, English, and code data, followed by instruction-tuning with
high-quality human-annotated datasets while abiding by strict safety guidelines
reflecting our commitment to responsible AI. The model is evaluated across
various benchmarks, including comprehensive reasoning, knowledge, commonsense,
factuality, coding, math, chatting, instruction-following, and harmlessness, in
both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in
Korean backed by a deep understanding of the language and cultural nuances.
Further analysis of the inherent bilingual nature and its extension to
multilingualism highlights the model’s cross-lingual proficiency and strong
generalization ability to untargeted languages, including machine translation
between several language pairs and cross-lingual inference tasks. We believe
that HyperCLOVA X can provide helpful guidance for regions or countries in
developing their sovereign LLMs.
[COMMENTS]
44 pages; updated authors list and fixed author names
[LINK]
http://arxiv.org/abs/2404.01954v2
[DATE]
2024-04-13 23:06:19+08:00
[CATEGORIES]
cs.CL
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
[AUTHORS]
Benjue Weng
[ABSTRACT]
With the surge of ChatGPT,the use of large models has significantly
increased,rapidly rising to prominence across the industry and sweeping across
the internet. This article is a comprehensive review of fine-tuning methods for
large models. This paper investigates the latest technological advancements and
the application of advanced methods in aspects such as task-adaptive
fine-tuning,domain-adaptive fine-tuning,few-shot learning,knowledge
distillation,multi-task learning,parameter-efficient fine-tuning,and dynamic
fine-tuning.
[LINK]
http://arxiv.org/abs/2404.09022v1
[DATE]
2024-04-13 23:03:03+08:00
[CATEGORIES]
cs.LG
cs.CL
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
[AUTHORS]
Minqian Liu, Ying Shen, Zhiyang Xu, Yixin Cao, Eunah Cho, Vaibhav Kumar, Reza Ghanadan, Lifu Huang
[ABSTRACT]
Natural Language Generation (NLG) typically involves evaluating the generated
text in various aspects (e.g., consistency and naturalness) to obtain a
comprehensive assessment. However, multi-aspect evaluation remains challenging
as it may require the evaluator to generalize to any given evaluation aspect
even if it’s absent during training. In this paper, we introduce X-Eval, a
two-stage instruction tuning framework to evaluate the text in both seen and
unseen aspects customized by end users. X-Eval consists of two learning stages:
the vanilla instruction tuning stage that improves the model’s ability to
follow evaluation instructions, and an enhanced instruction tuning stage that
exploits the connections between fine-grained evaluation aspects to better
assess text quality. To support the training of X-Eval, we collect
AspectInstruct, the first instruction tuning dataset tailored for multi-aspect
NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance
task diversity, we devise an augmentation strategy that converts human rating
annotations into diverse forms of NLG evaluation tasks, including scoring,
comparison, ranking, and Boolean question answering. Extensive experiments
across three essential categories of NLG tasks: dialogue generation,
summarization, and data-to-text coupled with 21 aspects in meta-evaluation,
demonstrate that our X-Eval enables even a lightweight language model to
achieve a comparable if not higher correlation with human judgments compared to
the state-of-the-art NLG evaluators, such as GPT-4.
[COMMENTS]
NAACL 2024 Main Conference. 20 pages, 6 figures, 17 tables
[LINK]
http://arxiv.org/abs/2311.08788v2
[DATE]
2024-04-13 22:41:24+08:00
[CATEGORIES]
cs.CL
cs.LG
Adapting Fake News Detection to the Era of Large Language Models
[AUTHORS]
Jinyan Su, Claire Cardie, Preslav Nakov
[COMMENTS]
Accept to NAACL 2024 Findings
[LINK]
http://arxiv.org/abs/2311.04917v2
[DATE]
2024-04-13 21:52:01+08:00
[CATEGORIES]
cs.CL
WikiSplit++: Easy Data Refinement for Split and Rephrase
[AUTHORS]
Hayato Tsukagoshi, Tsutomu Hirao, Makoto Morishita, Katsuki Chousa, Ryohei Sasano, Koichi Takeda
[ABSTRACT]
The task of Split and Rephrase, which splits a complex sentence into multiple
simple sentences with the same meaning, improves readability and enhances the
performance of downstream tasks in natural language processing (NLP). However,
while Split and Rephrase can be improved using a text-to-text generation
approach that applies encoder-decoder models fine-tuned with a large-scale
dataset, it still suffers from hallucinations and under-splitting. To address
these issues, this paper presents a simple and strong data refinement approach.
Here, we create WikiSplit++ by removing instances in WikiSplit where complex
sentences do not entail at least one of the simpler sentences and reversing the
order of reference simple sentences. Experimental results show that training
with WikiSplit++ leads to better performance than training with WikiSplit, even
with fewer training instances. In particular, our approach yields significant
gains in the number of splits and the entailment ratio, a proxy for measuring
hallucinations.
[COMMENTS]
Accepted at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.09002v1
[DATE]
2024-04-13 21:07:32+08:00
[CATEGORIES]
cs.CL
Labeled Morphological Segmentation with Semi-Markov Models
[AUTHORS]
Ryan Cotterell, Thomas Müller, Alexander Fraser, Hinrich Schütze
[COMMENTS]
CoNLL 2015
[LINK]
http://arxiv.org/abs/2404.08997v1
[DATE]
2024-04-13 20:51:53+08:00
[CATEGORIES]
cs.CL
RoNID: New Intent Discovery with Generated-Reliable Labels and Cluster-friendly Representations
[AUTHORS]
Shun Zhang, Chaoran Yan, Jian Yang, Changyu Ren, Jiaqi Bai, Tongliang Li, Zhoujun Li
[ABSTRACT]
New Intent Discovery (NID) strives to identify known and reasonably deduce
novel intent groups in the open-world scenario. But current methods face issues
with inaccurate pseudo-labels and poor representation learning, creating a
negative feedback loop that degrades overall model performance, including
accuracy and the adjusted rand index. To address the aforementioned challenges,
we propose a Robust New Intent Discovery (RoNID) framework optimized by an
EM-style method, which focuses on constructing reliable pseudo-labels and
obtaining cluster-friendly discriminative representations. RoNID comprises two
main modules: reliable pseudo-label generation module and cluster-friendly
representation learning module. Specifically, the pseudo-label generation
module assigns reliable synthetic labels by solving an optimal transport
problem in the E-step, which effectively provides high-quality supervised
signals for the input of the cluster-friendly representation learning module.
To learn cluster-friendly representation with strong intra-cluster compactness
and large inter-cluster separation, the representation learning module combines
intra-cluster and inter-cluster contrastive learning in the M-step to feed more
discriminative features into the generation module. RoNID can be performed
iteratively to ultimately yield a robust model with reliable pseudo-labels and
cluster-friendly representations. Experimental results on multiple benchmarks
demonstrate our method brings substantial improvements over previous
state-of-the-art methods by a large margin of +1~+4 points.
[COMMENTS]
DASFAA 2024
[LINK]
http://arxiv.org/abs/2404.08977v1
[DATE]
2024-04-13 19:58:28+08:00
[CATEGORIES]
cs.CL
cs.LG
Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System
[AUTHORS]
Chang Tian, Wenpeng Yin, Marie-Francine Moens
[ABSTRACT]
A dialogue policy module is an essential part of task-completion dialogue
systems. Recently, increasing interest has focused on reinforcement learning
(RL)-based dialogue policy. Its favorable performance and wise action decisions
rely on an accurate estimation of action values. The overestimation problem is
a widely known issue of RL since its estimate of the maximum action value is
larger than the ground truth, which results in an unstable learning process and
suboptimal policy. This problem is detrimental to RL-based dialogue policy
learning. To mitigate this problem, this paper proposes a dynamic partial
average estimator (DPAV) of the ground truth maximum action value. DPAV
calculates the partial average between the predicted maximum action value and
minimum action value, where the weights are dynamically adaptive and
problem-dependent. We incorporate DPAV into a deep Q-network as the dialogue
policy and show that our method can achieve better or comparable results
compared to top baselines on three dialogue datasets of different domains with
a lower computational load. In addition, we also theoretically prove the
convergence and derive the upper and lower bounds of the bias compared with
those of other methods.
[COMMENTS]
NAACL Findings 2022, see
https://aclanthology.org/2022.findings-naacl.43
[LINK]
http://arxiv.org/abs/2207.11762v2
[DATE]
2024-04-13 19:51:55+08:00
[CATEGORIES]
cs.CL
OOVs in the Spotlight: How to Inflect them?
[AUTHORS]
Tomáš Sourada, Jana Straková, Rudolf Rosa
[ABSTRACT]
We focus on morphological inflection in out-of-vocabulary (OOV) conditions,
an under-researched subtask in which state-of-the-art systems usually are less
effective. We developed three systems: a retrograde model and two
sequence-to-sequence (seq2seq) models based on LSTM and Transformer. For
testing in OOV conditions, we automatically extracted a large dataset of nouns
in the morphologically rich Czech language, with lemma-disjoint data splits,
and we further manually annotated a real-world OOV dataset of neologisms. In
the standard OOV conditions, Transformer achieves the best results, with
increasing performance in ensemble with LSTM, the retrograde model and
SIGMORPHON baselines. On the real-world OOV dataset of neologisms, the
retrograde model outperforms all neural models. Finally, our seq2seq models
achieve state-of-the-art results in 9 out of 16 languages from SIGMORPHON 2022
shared task data in the OOV evaluation (feature overlap) in the large data
condition. We release the Czech OOV Inflection Dataset for rigorous evaluation
in OOV conditions. Further, we release the inflection system with the seq2seq
models as a ready-to-use Python library.
[COMMENTS]
To be published in LREC-COLING 2024. 12 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.08974v1
[DATE]
2024-04-13 19:40:06+08:00
[CATEGORIES]
cs.CL
Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
[AUTHORS]
Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy
[ABSTRACT]
Event coreference resolution (ECR) is the task of determining whether
distinct mentions of events within a multi-document corpus are actually linked
to the same underlying occurrence. Images of the events can help facilitate
resolution when language is ambiguous. Here, we propose a multimodal
cross-document event coreference resolution method that integrates visual and
textual cues with a simple linear map between vision and language models. As
existing ECR benchmark datasets rarely provide images for all event mentions,
we augment the popular ECB+ dataset with event-centric images scraped from the
internet and generated using image diffusion models. We establish three methods
that incorporate images and text for coreference: 1) a standard fused model
with finetuning, 2) a novel linear mapping method without finetuning and 3) an
ensembling approach based on splitting mention pairs by semantic and
discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and
AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish
an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing
assumptions used, and establish a novel baseline on AIDA Phase 1. Our results
demonstrate the utility of multimodal information in ECR for certain
challenging coreference problems, and highlight a need for more multimodal
resources in the coreference resolution space.
[COMMENTS]
To appear at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.08949v1
[DATE]
2024-04-13 18:01:58+08:00
[CATEGORIES]
cs.CL
Introducing Super RAGs in Mistral 8x7B-v1
[AUTHORS]
Ayush Thakur, Raghav Gupta
[ABSTRACT]
The relentless pursuit of enhancing Large Language Models (LLMs) has led to
the advent of Super Retrieval-Augmented Generation (Super RAGs), a novel
approach designed to elevate the performance of LLMs by integrating external
knowledge sources with minimal structural modifications. This paper presents
the integration of Super RAGs into the Mistral 8x7B v1, a state-of-the-art LLM,
and examines the resultant improvements in accuracy, speed, and user
satisfaction. Our methodology uses a fine-tuned instruct model setup and a
cache tuning fork system, ensuring efficient and relevant data retrieval. The
evaluation, conducted over several epochs, demonstrates significant
enhancements across all metrics. The findings suggest that Super RAGs can
effectively augment LLMs, paving the way for more sophisticated and reliable AI
systems. This research contributes to the field by providing empirical evidence
of the benefits of Super RAGs and offering insights into their potential
applications.
[LINK]
http://arxiv.org/abs/2404.08940v1
[DATE]
2024-04-13 17:33:00+08:00
[CATEGORIES]
cs.CL
cs.LG
Enforcing Paraphrase Generation via Controllable Latent Diffusion
[AUTHORS]
Wei Zou, Ziyuan Zhuang, Shujian Huang, Jia Liu, Jiajun Chen
[ABSTRACT]
Paraphrase generation aims to produce high-quality and diverse utterances of
a given text. Though state-of-the-art generation via the diffusion model
reconciles generation quality and diversity, textual diffusion suffers from a
truncation issue that hinders efficiency and quality control. In this work, we
propose \textit{L}atent \textit{D}iffusion \textit{P}araphraser~(LDP), a novel
paraphrase generation by modeling a controllable diffusion process given a
learned latent space. LDP achieves superior generation efficiency compared to
its diffusion counterparts. It facilitates only input segments to enforce
paraphrase semantics, which further improves the results without external
features. Experiments show that LDP achieves improved and diverse paraphrase
generation compared to baselines. Further analysis shows that our method is
also helpful to other similar text generations and domain adaptations. Our code
and data are available at https://github.com/NIL-zhuang/ld4pg.
[LINK]
http://arxiv.org/abs/2404.08938v1
[DATE]
2024-04-13 17:24:32+08:00
[CATEGORIES]
cs.CL
[AUTHORS]
Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney [ABSTRACT]
Internal language model (ILM) subtraction has been widely applied to improve
the performance of the RNN-Transducer with external language model (LM) fusion
for speech recognition. In this work, we show that sequence discriminative
training has a strong correlation with ILM subtraction from both theoretical
and empirical points of view. Theoretically, we derive that the global optimum
of maximum mutual information (MMI) training shares a similar formula as ILM
subtraction. Empirically, we show that ILM subtraction and sequence
discriminative training achieve similar effects across a wide range of
experiments on Librispeech, including both MMI and minimum Bayes risk (MBR)
criteria, as well as neural transducers and LMs of both full and limited
context. The benefit of ILM subtraction also becomes much smaller after
sequence discriminative training. We also provide an in-depth study to show
that sequence discriminative training has a minimal effect on the commonly used
zero-encoder ILM estimation, but a joint effect on both encoder and predictionjoint network for posterior probability reshaping including both ILM and
blank suppression.
[COMMENTS]
accepted at ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.14130v2
[DATE]
2024-04-13 16:06:37+08:00
[CATEGORIES]
cs.CL
cs.LG
A Mathematical Theory for Learning Semantic Languages by Abstract Learners
[AUTHORS]
Kuo-Yu Liao, Cheng-Shang Chang, Y. -W. Peter Hong
[ABSTRACT]
Recent advances in Large Language Models (LLMs) have demonstrated the
emergence of capabilities (learned skills) when the number of system parameters
and the size of training data surpass certain thresholds. The exact mechanisms
behind such phenomena are not fully understood and remain a topic of active
research. Inspired by the skill-text bipartite graph model presented in [1] for
modeling semantic language, we develop a mathematical theory to explain the
emergence of learned skills, taking the learning (or training) process into
account. Our approach models the learning process for skills in the skill-text
bipartite graph as an iterative decoding process in Low-Density Parity Check
(LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density
evolution analysis, we demonstrate the emergence of learned skills when the
ratio of the size of training texts to the number of skills exceeds a certain
threshold. Our analysis also yields a scaling law for testing errors relative
to the size of training texts. Upon completion of the training, we propose a
method for semantic compression and discuss its application in semantic
communication.
[COMMENTS]
V1 was submitted to ISIT 2024 on Jan. 28, 2024. V2 was uploaded to
ArXiv on April 13, 2024
[LINK]
http://arxiv.org/abs/2404.07009v2
[DATE]
2024-04-13 14:43:47+08:00
[CATEGORIES]
cs.CL
cs.LG
Latent Distance Guided Alignment Training for Large Language Models
[AUTHORS]
Haotian Luo
[ABSTRACT]
Ensuring alignment with human preferences is a crucial characteristic of
large language models (LLMs). Presently, the primary alignment methods, RLHF
and DPO, require extensive human annotation, which is expensive despite their
efficacy. The significant expenses associated with current alignment techniques
motivate researchers to investigate the development of annotation-free
alignment training methods. In pursuit of improved alignment without relying on
external annotation, we introduce Latent Distance Guided Alignment Training
(LD-Align). This approach seeks to align the model with a high-quality
supervised fine-tune dataset using guidance from a latent space. The latent
space is generated through sample reconstruction, akin to auto-encoding.
Consequently, we utilize the distance between sample pairs in the latent space
to guide DPO-based alignment training. Extensive experimentation and evaluation
show the efficacy of our proposed method in achieving notable alignment.
[LINK]
http://arxiv.org/abs/2404.06390v2
[DATE]
2024-04-13 13:20:45+08:00
[CATEGORIES]
cs.CL
Towards Enhancing Health Coaching Dialogue in Low-Resource Settings
[AUTHORS]
Yue Zhou, Barbara Di Eugenio, Brian Ziebart, Lisa Sharp, Bing Liu, Ben Gerber, Nikolaos Agadakos, Shweta Yadav
[ABSTRACT]
Health coaching helps patients identify and accomplish lifestyle-related
goals, effectively improving the control of chronic diseases and mitigating
mental health conditions. However, health coaching is cost-prohibitive due to
its highly personalized and labor-intensive nature. In this paper, we propose
to build a dialogue system that converses with the patients, helps them create
and accomplish specific goals, and can address their emotions with empathy.
However, building such a system is challenging since real-world health coaching
datasets are limited and empathy is subtle. Thus, we propose a modularized
health coaching dialogue system with simplified NLU and NLG frameworks combined
with mechanism-conditioned empathetic response generation. Through automatic
and human evaluation, we show that our system generates more empathetic,
fluent, and coherent responses and outperforms the state-of-the-art in NLU
tasks while requiring less annotation. We view our approach as a key step
towards building automated and more accessible health coaching systems.
[COMMENTS]
Accepted to the main conference of COLING 2022
[LINK]
http://arxiv.org/abs/2404.08888v1
[DATE]
2024-04-13 11:23:15+08:00
[CATEGORIES]
cs.CL
cs.LG
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
[AUTHORS]
Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea
[COMMENTS]
Accepted by NAACL 2024 Industry Track
[LINK]
http://arxiv.org/abs/2404.08886v1
[DATE]
2024-04-13 11:15:56+08:00
[CATEGORIES]
cs.CL
cs.LG
Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension
[AUTHORS]
Mengnan Qi, Yufan Huang, Yongqiang Yao, Maoquan Wang, Bin Gu, Neel Sundaresan
[ABSTRACT]
Large language models (LLMs) has experienced exponential growth, they
demonstrate remarkable performance across various tasks. Notwithstanding,
contemporary research primarily centers on enhancing the size and quality of
pretraining data, still utilizing the next token prediction task on
autoregressive transformer model structure. The efficacy of this task in truly
facilitating the model’s comprehension of code logic remains questionable, we
speculate that it still interprets code as mere text, while human emphasizes
the underlying logical knowledge. In order to prove it, we introduce a new
task, “Logically Equivalent Code Selection,” which necessitates the selection
of logically equivalent code from a candidate set, given a query code. Our
experimental findings indicate that current LLMs underperform in this task,
since they understand code by unordered bag of keywords. To ameliorate their
performance, we propose an advanced pretraining task, “Next Token Prediction+”.
This task aims to modify the sentence embedding distribution of the LLM without
sacrificing its generative capabilities. Our experimental results reveal that
following this pretraining, both Code Llama and StarCoder, the prevalent code
domain pretraining models, display significant improvements on our logically
equivalent code selection task and the code completion task.
[LINK]
http://arxiv.org/abs/2404.08885v1
[DATE]
2024-04-13 11:11:07+08:00
[CATEGORIES]
cs.CL
cs.LG
Aligning LLMs for FL-free Program Repair
[AUTHORS]
Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He
[ABSTRACT]
Large language models (LLMs) have achieved decent results on automated
program repair (APR). However, the next token prediction training objective of
decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction
objective of current infilling-style methods, which impedes LLMs from fully
leveraging pre-trained knowledge for program repair. In addition, while some
LLMs are capable of locating and repairing bugs end-to-end when using the
related artifacts (e.g., test cases) as input, existing methods regard them as
separate tasks and ask LLMs to generate patches at fixed locations. This
restriction hinders LLMs from exploring potential patches beyond the given
locations.
In this paper, we investigate a new approach to adapt LLMs to program repair.
Our core insight is that LLM’s APR capability can be greatly improved by simply
aligning the output to their training objective and allowing them to refine the
whole program without first performing fault localization. Based on this
insight, we designed D4C, a straightforward prompting framework for APR. D4C
can repair 180 bugs correctly in Defects4J, with each patch being sampled only
10 times. This surpasses the SOTA APR methods with perfect fault localization
by 10% and reduces the patch sampling number by 90%. Our findings reveal that
(1) objective alignment is crucial for fully exploiting LLM’s pre-trained
capability, and (2) replacing the traditional localize-then-repair workflow
with direct debugging is more effective for LLM-based APR methods. Thus, we
believe this paper introduces a new mindset for harnessing LLMs in APR.
[LINK]
http://arxiv.org/abs/2404.08877v1
[DATE]
2024-04-13 10:36:40+08:00
[CATEGORIES]
cs.CL
cs.LG
Toward Informal Language Processing: Knowledge of Slang in Large Language Models
[AUTHORS]
Zhewei Sun, Qian Hu, Rahul Gupta, Richard Zemel, Yang Xu
[COMMENTS]
Accepted to NAACL 2024 main conference
[LINK]
http://arxiv.org/abs/2404.02323v2
[DATE]
2024-04-13 10:17:01+08:00
[CATEGORIES]
cs.CL
Evaluating Spatial Understanding of Large Language Models
[AUTHORS]
Yutaro Yamada, Yihan Bao, Andrew K. Lampinen, Jungo Kasai, Ilker Yildirim
[ABSTRACT]
Large language models (LLMs) show remarkable capabilities across a variety of
tasks. Despite the models only seeing text in training, several recent studies
suggest that LLM representations implicitly capture aspects of the underlying
grounded concepts. Here, we explore LLM representations of a particularly
salient kind of grounded knowledge – spatial relationships. We design
natural-language navigation tasks and evaluate the ability of LLMs, in
particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and
reason about spatial structures. These tasks reveal substantial variability in
LLM performance across different spatial structures, including square,
hexagonal, and triangular grids, rings, and trees. In extensive error analysis,
we find that LLMs’ mistakes reflect both spatial and non-spatial factors. These
findings suggest that LLMs appear to capture certain aspects of spatial
structure implicitly, but room for improvement remains.
[COMMENTS]
Accepted to TMLR 2024. Our code and data are available at
https://github.com/runopti/SpatialEvalLLM,
https://huggingface.co/datasets/yyamada/SpatialEvalLLM
[LINK]
http://arxiv.org/abs/2310.14540v3
[DATE]
2024-04-13 09:59:06+08:00
[CATEGORIES]
cs.CL
LLM In-Context Recall is Prompt Dependent
[AUTHORS]
Daniel Machlab, Rick Battle
[ABSTRACT]
The proliferation of Large Language Models (LLMs) highlights the critical
importance of conducting thorough evaluations to discern their comparative
advantages, limitations, and optimal use cases. Particularly important is
assessing their capacity to accurately retrieve information included in a given
prompt. A model’s ability to do this significantly influences how effectively
it can utilize contextual details, thus impacting its practical efficacy and
dependability in real-world applications.
Our research analyzes the in-context recall performance of various LLMs using
the needle-in-a-haystack method. In this approach, a factoid (the “needle”) is
embedded within a block of filler text (the “haystack”), which the model is
asked to retrieve. We assess the recall performance of each model across
various haystack lengths and with varying needle placements to identify
performance patterns. This study demonstrates that an LLM’s recall capability
is not only contingent upon the prompt’s content but also may be compromised by
biases in its training data. Conversely, adjustments to model architecture,
training strategy, or fine-tuning can improve performance. Our analysis
provides insight into LLM behavior, offering direction for the development of
more effective applications of LLMs.
[LINK]
http://arxiv.org/abs/2404.08865v1
[DATE]
2024-04-13 09:13:59+08:00
[CATEGORIES]
cs.CL
cs.LG
L-TUNING: Synchronized Label Tuning for Prompt and Prefix in LLMs
[AUTHORS]
Md. Kowsher, Md. Shohanur Islam Sobuj, Asif Mahmud, Nusrat Jahan Prottasha, Prakash Bhat
[ABSTRACT]
Efficiently fine-tuning Large Language Models (LLMs) for specific tasks
presents a considerable challenge in natural language processing. Traditional
methods, like prompt or prefix tuning, typically rely on arbitrary tokens for
training, leading to prolonged training times and generalized token use across
various class labels. To address these issues, this paper introduces L-Tuning,
an efficient fine-tuning approach designed for classification tasks within the
Natural Language Inference (NLI) framework. Diverging from conventional
methods, L-Tuning focuses on the fine-tuning of label tokens processed through
a pre-trained LLM, thereby harnessing its pre-existing semantic knowledge. This
technique not only improves the fine-tuning accuracy and efficiency but also
facilitates the generation of distinct label embeddings for each class,
enhancing the model’s training nuance. Our experimental results indicate a
significant improvement in training efficiency and classification accuracy with
L-Tuning compared to traditional approaches, marking a promising advancement in
fine-tuning LLMs for complex language tasks.
[COMMENTS]
Published in the ICLR TinyPaper track
[LINK]
http://arxiv.org/abs/2402.01643v2
[DATE]
2024-04-13 08:14:21+08:00
[CATEGORIES]
cs.CL
cs.LG
On Speculative Decoding for Multimodal Large Language Models
[AUTHORS]
Mukul Gagrani, Raghavv Goel, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott
[ABSTRACT]
Inference with Multimodal Large Language Models (MLLMs) is slow due to their
large-language-model backbone which suffers from memory bandwidth bottleneck
and generates tokens auto-regressively. In this paper, we explore the
application of speculative decoding to enhance the inference efficiency of
MLLMs, specifically the LLaVA 7B model. We show that a language-only model can
serve as a good draft model for speculative decoding with LLaVA 7B, bypassing
the need for image tokens and their associated processing components from the
draft model. Our experiments across three different tasks show that speculative
decoding can achieve a memory-bound speedup of up to 2.37$\times$ using a 115M
parameter language model that we trained from scratch. Additionally, we
introduce a compact LLaVA draft model incorporating an image adapter, which
shows marginal performance gains in image captioning while maintaining
comparable results in other tasks.
[COMMENTS]
Accepted as a spotlight paper to ELVM workshop at CVPR 2024
[LINK]
http://arxiv.org/abs/2404.08856v1
[DATE]
2024-04-13 08:02:36+08:00
[CATEGORIES]
cs.CL
cs.LG
Using Letter Positional Probabilities to Assess Word Complexity
[AUTHORS]
Michael Dalvean
[ABSTRACT]
Word complexity is defined in a number of different ways. Psycholinguistic,
morphological and lexical proxies are often used. Human ratings are also used.
The problem here is that these proxies do not measure complexity directly, and
human ratings are susceptible to subjective bias. In this study we contend that
some form of ‘latent complexity’ can be approximated by using samples of simple
and complex words. We use a sample of ‘simple’ words from primary school
picture books and a sample of ‘complex’ words from high school and academic
settings. In order to analyse the differences between these classes, we look at
the letter positional probabilities (LPPs). We find strong statistical
associations between several LPPs and complexity. For example, simple words are
significantly (p<.001) more likely to start with w, b, s, h, g, k, j, t, y or
f, while complex words are significantly (p<.001) more likely to start with i,
a, e, r, v, u or d. We find similar strong associations for subsequent letter
positions, with 84 letter-position variables in the first 6 positions being
significant at the p<.001 level. We then use LPPs as variables in creating a
classifier which can classify the two classes with an 83% accuracy. We test
these findings using a second data set, with 66 LPPs significant (p<.001) in
the first 6 positions common to both datasets. We use these 66 variables to
create a classifier that is able to classify a third dataset with an accuracy
of 70%. Finally, we create a fourth sample by combining the extreme high and
low scoring words generated by three classifiers built on the first three
separate datasets and use this sample to build a classifier which has an
accuracy of 97%. We use this to score the four levels of English word groups
from an ESL program.
[COMMENTS]
25 Pages, 15 Tables
[LINK]
http://arxiv.org/abs/2404.07768v2
[DATE]
2024-04-13 08:02:25+08:00
[CATEGORIES]
cs.CL
Experimental Design for Active Transductive Inference in Large Language Models
[AUTHORS]
Subhojyoti Mukherjee, Ge Liu, Aniket Deshmukh, Anusha Lalitha, Yifei Ma, Branislav Kveton
[ABSTRACT]
Transduction, the ability to include query-specific examples in the prompt at
inference time, is one of the emergent abilities of large language models
(LLMs). In this work, we propose a framework for adaptive prompt design called
active transductive inference (ATI). We design the LLM prompt by adaptively
choosing few-shot examples for a given inference query. The examples are
initially unlabeled and we query the user to label the most informative ones,
which maximally reduces the uncertainty in the LLM prediction. We propose two
algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
We analyze these algorithms in linear models: first GO and then use its
equivalence with SAL. We experiment with many different tasks and show that GO
and SAL outperform other methods for choosing few-shot examples in the LLM
prompt at inference time.
[LINK]
http://arxiv.org/abs/2404.08846v1
[DATE]
2024-04-13 07:27:46+08:00
[CATEGORIES]
cs.LG
cs.CL
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying
[AUTHORS]
Adithya Renduchintala, Tugrul Konuk, Oleksii Kuchaiev
[ABSTRACT]
We introduce Tied-LoRA, a novel paradigm leveraging weight tying and
selective training to enhance the parameter efficiency of Low-rank Adaptation
(LoRA). Our exploration encompasses different plausible combinations of
parameter training and freezing, coupled with weight tying, aimed at
identifying the optimal trade-off between performance and the count of
trainable parameters. Across $5$ diverse tasks and two foundational language
models with different parameter counts, our experiments provide comprehensive
insights into the inherent trade-offs between efficiency and performance.
Our findings reveal a specific Tied-LoRA configuration that distinguishes
itself by showcasing comparable performance to LoRA across multiple tasks while
utilizing only a fraction of the parameters employed by the standard LoRA
method, particularly at elevated ranks. This underscores the efficacy of
Tied-LoRA in achieving impressive results with significantly reduced model
complexity.
[COMMENTS]
8 pages 4 figures
[LINK]
http://arxiv.org/abs/2311.09578v2
[DATE]
2024-04-13 07:15:51+08:00
[CATEGORIES]
cs.CL
cs.LG
Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation
[AUTHORS]
Yixin Wan, Fanyou Wu, Weijie Xu, Srinivasan H. Sengamedu
[ABSTRACT]
In this work, we propose sequence-level certainty as a common theme over
hallucination in Knowledge Grounded Dialogue Generation (KGDG). We explore the
correlation between the level of hallucination in model responses and two types
of sequence-level certainty: probabilistic certainty and semantic certainty.
Empirical results reveal that higher levels of both types of certainty in model
responses are correlated with lower levels of hallucination. We further propose
Certainty-based Response Ranking (CRR), a decoding-time hallucination
mitigation method that samples several response candidates, ranks them based on
sequence-level certainty, and outputs the response with the highest certainty
level. Aligning with our definitions of sequence-level certainty, we design 2
types of CRR approaches: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR).
P-CRR ranks individually sampled model responses using the arithmetic mean
log-probability of the entire sequence. S-CRR approaches certainty estimation
from meaning-space, and ranks model response candidates based on their semantic
certainty level as measured by an entailment-based Agreement Score (AS).
Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and 4
KGDG models, we validate the effectiveness of CRR for reducing hallucination in
KGDG task.
[LINK]
http://arxiv.org/abs/2310.18794v3
[DATE]
2024-04-13 07:09:52+08:00
[CATEGORIES]
cs.CL
Constrained C-Test Generation via Mixed-Integer Programming
[AUTHORS]
Ji-Ung Lee, Marc E. Pfetsch, Iryna Gurevych
[ABSTRACT]
This work proposes a novel method to generate C-Tests; a deviated form of
cloze tests (a gap filling exercise) where only the last part of a word is
turned into a gap. In contrast to previous works that only consider varying the
gap size or gap placement to achieve locally optimal solutions, we propose a
mixed-integer programming (MIP) approach. This allows us to consider gap size
and placement simultaneously, achieving globally optimal solutions, and to
directly integrate state-of-the-art models for gap difficulty prediction into
the optimization problem. A user study with 40 participants across four C-Test
generation strategies (including GPT-4) shows that our approach (MIP)
significantly outperforms two of the baseline strategies (based on gap
placement and GPT-4); and performs on-par with the third (based on gap size).
Our analysis shows that GPT-4 still struggles to fulfill explicit constraints
during generation and that MIP produces C-Tests that correlate best with the
perceived difficulty. We publish our code, model, and collected data consisting
of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap
responses) under an open source license.
[COMMENTS]
Github:
https://github.com/UKPLab/arxiv2024-constrained-ctest-generation
[LINK]
http://arxiv.org/abs/2404.08821v1
[DATE]
2024-04-13 05:35:21+08:00
[CATEGORIES]
cs.CL
The Illusion of State in State-Space Models
[AUTHORS]
William Merrill, Jackson Petty, Ashish Sabharwal
[ABSTRACT]
State-space models (SSMs) have emerged as a potential alternative
architecture for building large language models (LLMs) compared to the
previously ubiquitous transformer architecture. One theoretical weakness of
transformers is that they cannot express certain kinds of sequential
computation and state tracking (Merrill and Sabharwal, 2023), which SSMs are
explicitly designed to address via their close architectural similarity to
recurrent neural networks (RNNs). But do SSMs truly have an advantage (over
transformers) in expressive power for state tracking? Surprisingly, the answer
is no. Our analysis reveals that the expressive power of SSMs is limited very
similarly to transformers: SSMs cannot express computation outside the
complexity class $\mathsf{TC}^0$. In particular, this means they cannot solve
simple state-tracking problems like permutation composition. It follows that
SSMs are provably unable to accurately track chess moves with certain notation,
evaluate code, or track entities in a long narrative. To supplement our formal
analysis, we report experiments showing that Mamba-style SSMs indeed struggle
with state tracking. Thus, despite its recurrent formulation, the “state” in an
SSM is an illusion: SSMs have similar expressiveness limitations to
non-recurrent models like transformers, which may fundamentally limit their
ability to solve real-world state-tracking problems.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2404.08819v1
[DATE]
2024-04-13 05:30:06+08:00
[CATEGORIES]
cs.LG
cs.CL
Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance
[AUTHORS]
Yewei Song, Cedric Lothritz, Daniel Tang, Tegawendé F. Bissyandé, Jacques Klein
[ABSTRACT]
This paper revisits recent code similarity evaluation metrics, particularly
focusing on the application of Abstract Syntax Tree (AST) editing distance in
diverse programming languages. In particular, we explore the usefulness of
these metrics and compare them to traditional sequence similarity metrics. Our
experiments showcase the effectiveness of AST editing distance in capturing
intricate code structures, revealing a high correlation with established
metrics. Furthermore, we explore the strengths and weaknesses of AST editing
distance and prompt-based GPT similarity scores in comparison to BLEU score,
execution match, and Jaccard Similarity. We propose, optimize, and publish an
adaptable metric that demonstrates effectiveness across all tested languages,
representing an enhanced version of Tree Similarity of Edit Distance (TSED).
[LINK]
http://arxiv.org/abs/2404.08817v1
[DATE]
2024-04-13 05:28:18+08:00
[CATEGORIES]
cs.CL
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
[AUTHORS]
Akash Ghosh, Arkadeep Acharya, Sriparna Saha, Vinija Jain, Aman Chadha
[ABSTRACT]
The advent of Large Language Models (LLMs) has significantly reshaped the
trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable
limitation, as they are primarily adept at processing textual information. To
address this constraint, researchers have endeavored to integrate visual
capabilities with LLMs, resulting in the emergence of Vision-Language Models
(VLMs). These advanced models are instrumental in tackling more intricate tasks
such as image captioning and visual question answering. In our comprehensive
survey paper, we delve into the key advancements within the realm of VLMs. Our
classification organizes VLMs into three distinct categories: models dedicated
to vision-language understanding, models that process multimodal inputs to
generate unimodal (textual) outputs and models that both accept and produce
multimodal inputs and outputs.This classification is based on their respective
capabilities and functionalities in processing and generating various
modalities of data.We meticulously dissect each model, offering an extensive
analysis of its foundational architecture, training data sources, as well as
its strengths and limitations wherever possible, providing readers with a
comprehensive understanding of its essential components. We also analyzed the
performance of VLMs in various benchmark datasets. By doing so, we aim to offer
a nuanced understanding of the diverse landscape of VLMs. Additionally, we
underscore potential avenues for future research in this dynamic domain,
anticipating further breakthroughs and advancements.
[COMMENTS]
The most extensive and up to date Survey on Visual Language Models
covering 76 Visual Language Models
[LINK]
http://arxiv.org/abs/2404.07214v2
[DATE]
2024-04-13 05:20:37+08:00
[CATEGORIES]
cs.CL
ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models
[AUTHORS]
Jierui Li, Vipul Raheja, Dhruv Kumar
[COMMENTS]
Accepted to NAACL 2024 main conference
[LINK]
http://arxiv.org/abs/2311.09182v2
[DATE]
2024-04-13 05:06:43+08:00
[CATEGORIES]
cs.CL
CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation
[AUTHORS]
Matthew DeLorenzo, Vasudev Gohil, Jeyavijayan Rajendran
[ABSTRACT]
Large Language Models (LLMs) have proved effective and efficient in
generating code, leading to their utilization within the hardware design
process. Prior works evaluating LLMs’ abilities for register transfer level
code generation solely focus on functional correctness. However, the creativity
associated with these LLMs, or the ability to generate novel and unique
solutions, is a metric not as well understood, in part due to the challenge of
quantifying this quality.
To address this research gap, we present CreativeEval, a framework for
evaluating the creativity of LLMs within the context of generating hardware
designs. We quantify four creative sub-components, fluency, flexibility,
originality, and elaboration, through various prompting and post-processing
techniques. We then evaluate multiple popular LLMs (including GPT models,
CodeLlama, and VeriGen) upon this creativity metric, with results indicating
GPT-3.5 as the most creative model in generating hardware designs.
[LINK]
http://arxiv.org/abs/2404.08806v1
[DATE]
2024-04-13 04:41:47+08:00
[CATEGORIES]
cs.CL
PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck
[AUTHORS]
Thang M. Pham, Peijie Chen, Tin Nguyen, Seunghyun Yoon, Trung Bui, Anh Totti Nguyen
[COMMENTS]
Findings of NAACL 2024 (long paper)
[LINK]
http://arxiv.org/abs/2403.05297v3
[DATE]
2024-04-13 04:10:29+08:00
[CATEGORIES]
cs.CL
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
[AUTHORS]
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Ke Li, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
[ABSTRACT]
In this work, we extend the instruction-tuned Llama-2 model with end-to-end
general-purpose speech processing and reasoning abilities while maintaining the
wide range of original LLM capabilities, without using any carefully curated
paired data. The resulting end-to-end model, named AudioChatLlama, can utilize
audio prompts as a replacement for text and sustain a conversation. Such a
model also has extended cross-modal capabilities such as being able to perform
spoken question answering (QA), speech translation, and audio summarization
amongst many other closed and open-domain tasks. This is unlike prior
approaches in speech, in which LLMs are extended to handle audio for a limited
number of pre-designated tasks. On both synthesized and recorded speech QA test
sets, evaluations show that our end-to-end approach is on par with or
outperforms cascaded systems (speech recognizer + LLM) in terms of modeling the
response to a prompt. Furthermore, unlike cascades, our approach can
interchange text and audio modalities and intrinsically utilize prior context
in a conversation to provide better results.
[LINK]
http://arxiv.org/abs/2311.06753v2
[DATE]
2024-04-13 02:55:22+08:00
[CATEGORIES]
cs.CL
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
[AUTHORS]
Je-Yong Lee, Donghyun Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini
[ABSTRACT]
Large Language Models (LLMs) have dramatically advanced AI applications, yet
their deployment remains challenging due to their immense inference costs.
Recent studies ameliorate the computational costs of LLMs by increasing their
activation sparsity but suffer from significant performance degradation on
downstream tasks. In this work, we introduce a new framework for sparsifying
the activations of base LLMs and reducing inference costs, dubbed Contextually
Aware Thresholding for Sparsity (CATS). CATS is relatively simple, easy to
implement, and highly effective. At the heart of our framework is a new
non-linear activation function. We demonstrate that CATS can be applied to
various base models, including Mistral-7B and Llama2-7B, and outperforms
existing sparsification techniques in downstream task performance. More
precisely, CATS-based models often achieve downstream task performance within
1-2% of their base models without any fine-tuning and even at activation
sparsity levels of 50%. Furthermore, CATS-based models converge faster and
display better task performance than competing techniques when fine-tuning is
applied. Finally, we develop a custom GPU kernel for efficient implementation
of CATS that translates the activation of sparsity of CATS to real wall-clock
time speedups. Our custom kernel implementation of CATS results in a ~15%
improvement in wall-clock inference latency of token generation on both
Llama-7B and Mistral-7B.
[LINK]
http://arxiv.org/abs/2404.08763v1
[DATE]
2024-04-13 02:42:18+08:00
[CATEGORIES]
cs.LG
cs.CL
The Generation Gap:Exploring Age Bias in Large Language Models
[AUTHORS]
Siyang Liu, Trish Maturi, Siqi Shen, Rada Mihalcea
[COMMENTS]
4 pages
[LINK]
http://arxiv.org/abs/2404.08760v1
[DATE]
2024-04-13 02:36:20+08:00
[CATEGORIES]
cs.CL
Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping
[AUTHORS]
Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato
[ABSTRACT]
This paper presents Conformer-1, an end-to-end Automatic Speech Recognition
(ASR) model trained on an extensive dataset of 570k hours of speech audio data,
91% of which was acquired from publicly available sources. To achieve this, we
perform Noisy Student Training after generating pseudo-labels for the unlabeled
public data using a strong Conformer RNN-T baseline model. The addition of
these pseudo-labeled data results in remarkable improvements in relative Word
Error Rate (WER) by 11.5% and 24.3% for our asynchronous and realtime models,
respectively. Additionally, the model is more robust to background noise owing
to the addition of these data. The results obtained in this study demonstrate
that the incorporation of pseudo-labeled publicly available data is a highly
effective strategy for improving ASR accuracy and noise robustness.
[LINK]
http://arxiv.org/abs/2404.07341v2
[DATE]
2024-04-13 02:23:35+08:00
[CATEGORIES]
cs.CL
cs.LG
Pre-training Small Base LMs with Fewer Tokens
[AUTHORS]
Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis
[ABSTRACT]
We study the effectiveness of a simple approach to develop a small base
language model (LM) starting from an existing large base LM: first inherit a
few transformer blocks from the larger LM, and then train this smaller model on
a very small subset (0.1\%) of the raw pretraining data of the larger model. We
call our simple recipe Inheritune and first demonstrate it for building a small
base LM with 1.5B parameters using 1B tokens (and a starting few layers of
larger LM of 3B parameters); we do this using a single A6000 GPU for less than
half a day. Across 9 diverse evaluation datasets as well as the MMLU benchmark,
the resulting model compares favorably to publicly available base models of
1B-2B size, some of which have been trained using 50-1000 times more tokens.
We investigate Inheritune in a slightly different setting where we train
small LMs utilizing larger LMs and their full pre-training dataset. Here we
show that smaller LMs trained utilizing some of the layers of GPT2-medium
(355M) and GPT-2-large (770M) can effectively match the val loss of their
bigger counterparts when trained from scratch for the same number of training
steps on OpenWebText dataset with 9B tokens. We analyze our recipe with
extensive experiments and demonstrate it efficacy on diverse settings. Our code
is available at https://github.com/sanyalsunny111/LLM-Inheritune.
[COMMENTS]
15 pages, 6 figures, 10 tables
[LINK]
http://arxiv.org/abs/2404.08634v1
[DATE]
2024-04-13 01:53:34+08:00
[CATEGORIES]
cs.CL
cs.LG
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
[AUTHORS]
Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
[ABSTRACT]
Large Multimodal Models (LMMs) have shown significant reasoning capabilities
by connecting a visual encoder and a large language model. LMMs typically use a
fixed amount of visual tokens, such as the penultimate layer features in the
CLIP visual encoder, as the prefix content. Recent LMMs incorporate more
complex visual inputs, such as high-resolution images and videos, which
increase the number of visual tokens significantly. However, due to the design
of the Transformer architecture, computational costs associated with these
models tend to increase quadratically with the number of input tokens. To
tackle this problem, we explore a token reduction mechanism and find, similar
to prior work, that many visual tokens are spatially redundant. Based on this,
we propose PruMerge, a novel adaptive visual token reduction approach, which
largely reduces the number of visual tokens while maintaining comparable model
performance. We first select the unpruned visual tokens based on their
similarity to class tokens and spatial tokens. We then cluster the pruned
tokens based on key similarity and merge the clustered tokens with the unpruned
tokens to supplement their information. Empirically, when applied to LLaVA-1.5,
our approach can compress the visual tokens by 18 times on average, and achieve
comparable performance across diverse visual question-answering and reasoning
tasks. Code and checkpoints are at https://llava-prumerge.github.io/.
[COMMENTS]
Project page: https://llava-prumerge.github.io/
[LINK]
http://arxiv.org/abs/2403.15388v4
[DATE]
2024-04-13 01:34:29+08:00
[CATEGORIES]
cs.CL
Synthetic Dataset Creation and Fine-Tuning of Transformer Models for Question Answering in Serbian
[AUTHORS]
Aleksa Cvetanović, Predrag Tadić
[ABSTRACT]
In this paper, we focus on generating a synthetic question answering (QA)
dataset using an adapted Translate-Align-Retrieve method. Using this method, we
created the largest Serbian QA dataset of more than 87K samples, which we name
SQuAD-sr. To acknowledge the script duality in Serbian, we generated both
Cyrillic and Latin versions of the dataset. We investigate the dataset quality
and use it to fine-tune several pre-trained QA models. Best results were
obtained by fine-tuning the BERTi'c model on our Latin SQuAD-sr dataset,
achieving 73.91% Exact Match and 82.97% F1 score on the benchmark XQuAD
dataset, which we translated into Serbian for the purpose of evaluation. The
results show that our model exceeds zero-shot baselines, but fails to go beyond
human performance. We note the advantage of using a monolingual pre-trained
model over multilingual, as well as the performance increase gained by using
Latin over Cyrillic. By performing additional analysis, we show that questions
about numeric values or dates are more likely to be answered correctly than
other types of questions. Finally, we conclude that SQuAD-sr is of sufficient
quality for fine-tuning a Serbian QA model, in the absence of a manually
crafted and annotated dataset.
[LINK]
http://arxiv.org/abs/2404.08617v1
[DATE]
2024-04-13 01:27:54+08:00
[CATEGORIES]
cs.CL
PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination
[AUTHORS]
Anant Khandelwal
[ABSTRACT]
The potential for zero-shot generalization in vision-language (V-L) models
such as CLIP has spurred their widespread adoption in addressing numerous
downstream tasks. Previous methods have employed test-time prompt tuning to
adapt the model to unseen domains, but they overlooked the issue of imbalanced
class distributions. In this study, we explicitly address this problem by
employing class-aware prototype alignment weighted by mean class probabilities
obtained for the test sample and filtered augmented views. Additionally, we
ensure that the class probabilities are as accurate as possible by performing
prototype discrimination using contrastive learning. The combination of
alignment and discriminative loss serves as a geometric regularizer, preventing
the prompt representation from collapsing onto a single class and effectively
bridging the distribution gap between the source and test domains. Our method,
named PromptSync, synchronizes the prompts for each test sample on both the
text and vision branches of the V-L model. In empirical evaluations on the
domain generalization benchmark, our method outperforms previous best methods
by 2.33% in overall performance, by 1% in base-to-novel generalization, and by
2.84% in cross-dataset transfer tasks.
[COMMENTS]
Accepted at CVPR 2024 LIMIT, 12 pages, 8 Tables, 2 Figures
[LINK]
http://arxiv.org/abs/2404.07520v2
[DATE]
2024-04-13 01:01:04+08:00
[CATEGORIES]
cs.CL
Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases
[AUTHORS]
Xiang Zhang, Khatoon Khedri, Reza Rawassizadeh
[ABSTRACT]
Large Language Models (LLMs) can automate or substitute different types of
tasks in the software engineering process. This study evaluates the resource
utilization and accuracy of LLM in interpreting and executing natural language
queries against traditional SQL within relational database management systems.
We empirically examine the resource utilization and accuracy of nine LLMs
varying from 7 to 34 Billion parameters, including Llama2 7B, Llama2 13B,
Mistral, Mixtral, Optimus-7B, SUS-chat-34B, platypus-yi-34b,
NeuralHermes-2.5-Mistral-7B and Starling-LM-7B-alpha, using a small transaction
dataset. Our findings indicate that using LLMs for database queries incurs
significant energy overhead (even small and quantized models), making it an
environmentally unfriendly approach. Therefore, we advise against replacing
relational databases with LLMs due to their substantial resource utilization.
[COMMENTS]
13 pages, 2 figures, 5 tables
[LINK]
http://arxiv.org/abs/2404.08727v1
[DATE]
2024-04-13 00:44:28+08:00
[CATEGORIES]
cs.CL
Small Models Are (Still) Effective Cross-Domain Argument Extractors
[AUTHORS]
William Gantt, Aaron Steven White
[COMMENTS]
ACL Rolling Review Short Paper
[LINK]
http://arxiv.org/abs/2404.08579v1
[DATE]
2024-04-13 00:23:41+08:00
[CATEGORIES]
cs.CL
cs.LG
Incremental Extractive Opinion Summarization Using Cover Trees
[AUTHORS]
Somnath Basu Roy Chowdhury, Nicholas Monath, Avinava Dubey, Manzil Zaheer, Andrew McCallum, Amr Ahmed, Snigdha Chaturvedi
[ABSTRACT]
Extractive opinion summarization involves automatically producing a summary
of text about an entity (e.g., a product’s reviews) by extracting
representative sentences that capture prevalent opinions in the review set.
Typically, in online marketplaces user reviews accumulate over time, and
opinion summaries need to be updated periodically to provide customers with
up-to-date information. In this work, we study the task of extractive opinion
summarization in an incremental setting, where the underlying review set
evolves over time. Many of the state-of-the-art extractive opinion
summarization approaches are centrality-based, such as CentroidRank (Radev et
al., 2004; Chowdhury et al., 2022). CentroidRank performs extractive
summarization by selecting a subset of review sentences closest to the centroid
in the representation space as the summary. However, these methods are not
capable of operating efficiently in an incremental setting, where reviews
arrive one at a time. In this paper, we present an efficient algorithm for
accurately computing the CentroidRank summaries in an incremental setting. Our
approach, CoverSumm, relies on indexing review representations in a cover tree
and maintaining a reservoir of candidate summary review sentences. CoverSumm’s
efficacy is supported by a theoretical and empirical analysis of running time.
Empirically, on a diverse collection of data (both real and synthetically
created to illustrate scaling considerations), we demonstrate that CoverSumm is
up to 36x faster than baseline methods, and capable of adapting to nuanced
changes in data distribution. We also conduct human evaluations of the
generated summaries and find that CoverSumm is capable of producing informative
summaries consistent with the underlying review set.
[COMMENTS]
Accepted at TMLR
[LINK]
http://arxiv.org/abs/2401.08047v2
[DATE]
2024-04-13 00:13:06+08:00
[CATEGORIES]
cs.CL
cs.LG
Predicting Mergers and Acquisitions: Temporal Dynamic Industry Networks
[AUTHORS]
Dayu Yang
[ABSTRACT]
M&A activities are pivotal for market consolidation, enabling firms to
augment market power through strategic complementarities. Existing research
often overlooks the peer effect, the mutual influence of M&A behaviors among
firms, and fails to capture complex interdependencies within industry networks.
Common approaches suffer from reliance on ad-hoc feature engineering, data
truncation leading to significant information loss, reduced predictive
accuracy, and challenges in real-world application. Additionally, the rarity of
M&A events necessitates data rebalancing in conventional models, introducing
bias and undermining prediction reliability. We propose an innovative M&A
predictive model utilizing the Temporal Dynamic Industry Network (TDIN),
leveraging temporal point processes and deep learning to adeptly capture
industry-wide M&A dynamics. This model facilitates accurate, detailed
deal-level predictions without arbitrary data manipulation or rebalancing,
demonstrated through superior evaluation results from M&A cases between January
1997 and December 2020. Our approach marks a significant improvement over
traditional models by providing detailed insights into M&A activities and
strategic recommendations for specific firms.
[COMMENTS]
Data Processing Code:
https://github.com/dayuyang1999/Merger_Acquisition_Data Modeling Code:
https://github.com/dayuyang1999/Merger_Acquisition_Prediction
[LINK]
http://arxiv.org/abs/2404.07298v2
[DATE]
2024-04-13 23:54:27+08:00
[CATEGORIES]
cs.LG
Active Learning for Control-Oriented Identification of Nonlinear Systems
[AUTHORS]
Bruce D. Lee, Ingvar Ziemann, George J. Pappas, Nikolai Matni
[ABSTRACT]
Model-based reinforcement learning is an effective approach for controlling
an unknown system. It is based on a longstanding pipeline familiar to the
control community in which one performs experiments on the environment to
collect a dataset, uses the resulting dataset to identify a model of the
system, and finally performs control synthesis using the identified model. As
interacting with the system may be costly and time consuming, targeted
exploration is crucial for developing an effective control-oriented model with
minimal experimentation. Motivated by this challenge, recent work has begun to
study finite sample data requirements and sample efficient algorithms for the
problem of optimal exploration in model-based reinforcement learning. However,
existing theory and algorithms are limited to model classes which are linear in
the parameters. Our work instead focuses on models with nonlinear parameter
dependencies, and presents the first finite sample analysis of an active
learning algorithm suitable for a general class of nonlinear dynamics. In
certain settings, the excess control cost of our algorithm achieves the optimal
rate, up to logarithmic factors. We validate our approach in simulation,
showcasing the advantage of active, control-oriented exploration for
controlling nonlinear systems.
[LINK]
http://arxiv.org/abs/2404.09030v1
[DATE]
2024-04-13 23:40:39+08:00
[CATEGORIES]
cs.LG
Annealing Self-Distillation Rectification Improves Adversarial Training
[AUTHORS]
Yu-Yu Wu, Hung-Jui Wang, Shang-Tse Chen
[ABSTRACT]
In standard adversarial training, models are optimized to fit one-hot labels
within allowable adversarial perturbation budgets. However, the ignorance of
underlying distribution shifts brought by perturbations causes the problem of
robust overfitting. To address this issue and enhance adversarial robustness,
we analyze the characteristics of robust models and identify that robust models
tend to produce smoother and well-calibrated outputs. Based on the observation,
we propose a simple yet effective method, Annealing Self-Distillation
Rectification (ADR), which generates soft labels as a better guidance mechanism
that accurately reflects the distribution shift under attack during adversarial
training. By utilizing ADR, we can obtain rectified distributions that
significantly improve model robustness without the need for pre-trained models
or extensive extra computation. Moreover, our method facilitates seamless
plug-and-play integration with other adversarial training techniques by
replacing the hard labels in their objectives. We demonstrate the efficacy of
ADR through extensive experiments and strong performances across datasets.
[COMMENTS]
Accepted to ICLR 2024
[LINK]
http://arxiv.org/abs/2305.12118v2
[DATE]
2024-04-13 23:01:14+08:00
[CATEGORIES]
cs.LG
Deep Reinforcement Learning-Based Approach for a Single Vehicle Persistent Surveillance Problem with Fuel Constraints
[AUTHORS]
Hritik Bana, Manav Mishra, Saswata Sarkar, Sujeevraja Sanjeevi, PB Sujit, Kaarthik Sundar
[ABSTRACT]
This article presents a deep reinforcement learning-based approach to tackle
a persistent surveillance mission requiring a single unmanned aerial vehicle
initially stationed at a depot with fuel or time-of-flight constraints to
repeatedly visit a set of targets with equal priority. Owing to the vehicle’s
fuel or time-of-flight constraints, the vehicle must be regularly refueled, or
its battery must be recharged at the depot. The objective of the problem is to
determine an optimal sequence of visits to the targets that minimizes the
maximum time elapsed between successive visits to any target while ensuring
that the vehicle never runs out of fuel or charge. We present a deep
reinforcement learning algorithm to solve this problem and present the results
of numerical experiments that corroborate the effectiveness of this approach in
comparison with common-sense greedy heuristics.
[COMMENTS]
6 pages
[LINK]
http://arxiv.org/abs/2404.06423v2
[DATE]
2024-04-13 22:58:53+08:00
[CATEGORIES]
cs.LG
Integrating Hyperparameter Search into Model-Free AutoML with Context-Free Grammars
[AUTHORS]
Hernán Ceferino Vázquez, Jorge Sanchez, Rafael Carrascosa
[ABSTRACT]
Automated Machine Learning (AutoML) has become increasingly popular in recent
years due to its ability to reduce the amount of time and expertise required to
design and develop machine learning systems. This is very important for the
practice of machine learning, as it allows building strong baselines quickly,
improving the efficiency of the data scientists, and reducing the time to
production. However, despite the advantages of AutoML, it faces several
challenges, such as defining the solutions space and exploring it efficiently.
Recently, some approaches have been shown to be able to do it using tree-based
search algorithms and context-free grammars. In particular, GramML presents a
model-free reinforcement learning approach that leverages pipeline
configuration grammars and operates using Monte Carlo tree search. However, one
of the limitations of GramML is that it uses default hyperparameters, limiting
the search problem to finding optimal pipeline structures for the available
data preprocessors and models. In this work, we propose an extension to GramML
that supports larger search spaces including hyperparameter search. We
evaluated the approach using an OpenML benchmark and found significant
improvements compared to other state-of-the-art techniques.
[LINK]
http://arxiv.org/abs/2404.03419v2
[DATE]
2024-04-13 22:57:37+08:00
[CATEGORIES]
cs.LG
Theoretical research on generative diffusion models: an overview
[AUTHORS]
Melike Nur Yeğin, Mehmet Fatih Amasyalı
[ABSTRACT]
Generative diffusion models showed high success in many fields with a
powerful theoretical background. They convert the data distribution to noise
and remove the noise back to obtain a similar distribution. Many existing
reviews focused on the specific application areas without concentrating on the
research about the algorithm. Unlike them we investigated the theoretical
developments of the generative diffusion models. These approaches mainly divide
into two: training-based and sampling-based. Awakening to this allowed us a
clear and understandable categorization for the researchers who will make new
developments in the future.
[LINK]
http://arxiv.org/abs/2404.09016v1
[DATE]
2024-04-13 22:08:56+08:00
[CATEGORIES]
cs.LG
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
[AUTHORS]
Zining Chen, Weiqiu Wang, Zhicheng Zhao, Fei Su, Aidong Men, Hongying Meng
[ABSTRACT]
Domain Generalization (DG) aims to resolve distribution shifts between source
and target domains, and current DG methods are default to the setting that data
from source and target domains share identical categories. Nevertheless, there
exists unseen classes from target domains in practical scenarios. To address
this issue, Open Set Domain Generalization (OSDG) has emerged and several
methods have been exclusively proposed. However, most existing methods adopt
complex architectures with slight improvement compared with DG methods.
Recently, vision-language models (VLMs) have been introduced in DG following
the fine-tuning paradigm, but consume huge training overhead with large vision
models. Therefore, in this paper, we innovate to transfer knowledge from VLMs
to lightweight vision models and improve the robustness by introducing
Perturbation Distillation (PD) from three perspectives, including Score, Class
and Instance (SCI), named SCI-PD. Moreover, previous methods are oriented by
the benchmarks with identical and fixed splits, ignoring the divergence between
source domains. These methods are revealed to suffer from sharp performance
decay with our proposed new benchmark Hybrid Domain Generalization (HDG) and a
novel metric $H^{2}$-CV, which construct various splits to comprehensively
assess the robustness of algorithms. Extensive experiments demonstrate that our
method outperforms state-of-the-art algorithms on multiple datasets, especially
improving the robustness when confronting data scarcity.
[COMMENTS]
Accepted to CVPR2024
[LINK]
http://arxiv.org/abs/2404.09011v1
[DATE]
2024-04-13 21:41:13+08:00
[CATEGORIES]
cs.LG
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
[AUTHORS]
Kateryna Chumachenko, Alexandros Iosifidis, Moncef Gabbouj
[ABSTRACT]
Dynamic Facial Expression Recognition (DFER) has received significant
interest in the recent years dictated by its pivotal role in enabling empathic
and human-compatible technologies. Achieving robustness towards in-the-wild
data in DFER is particularly important for real-world applications. One of the
directions aimed at improving such models is multimodal emotion recognition
based on audio and video data. Multimodal learning in DFER increases the model
capabilities by leveraging richer, complementary data representations. Within
the field of multimodal DFER, recent methods have focused on exploiting
advances of self-supervised learning (SSL) for pre-training of strong
multimodal encoders. Another line of research has focused on adapting
pre-trained static models for DFER. In this work, we propose a different
perspective on the problem and investigate the advancement of multimodal DFER
performance by adapting SSL-pre-trained disjoint unimodal encoders. We identify
main challenges associated with this task, namely, intra-modality adaptation,
cross-modal alignment, and temporal adaptation, and propose solutions to each
of them. As a result, we demonstrate improvement over current state-of-the-art
on two popular DFER benchmarks, namely DFEW and MFAW.
[COMMENTS]
accepted to CVPR 2024 ABAW Workshop
[LINK]
http://arxiv.org/abs/2404.09010v1
[DATE]
2024-04-13 21:39:26+08:00
[CATEGORIES]
cs.LG
Proof-of-Learning with Incentive Security
[AUTHORS]
Zishuo Zhao, Zhixuan Fang, Xuechao Wang, Yuan Zhou
[ABSTRACT]
Most concurrent blockchain systems rely heavily on the Proof-of-Work (PoW) or
Proof-of-Stake (PoS) mechanisms for decentralized consensus and security
assurance. However, the substantial energy expenditure stemming from
computationally intensive yet meaningless tasks has raised considerable
concerns surrounding traditional PoW approaches, The PoS mechanism, while free
of energy consumption, is subject to security and economic issues. Addressing
these issues, the paradigm of Proof-of-Useful-Work (PoUW) seeks to employ
challenges of practical significance as PoW, thereby imbuing energy consumption
with tangible value. While previous efforts in Proof of Learning (PoL) explored
the utilization of deep learning model training SGD tasks as PoUW challenges,
recent research has revealed its vulnerabilities to adversarial attacks and the
theoretical hardness in crafting a byzantine-secure PoL mechanism. In this
paper, we introduce the concept of incentive-security that incentivizes
rational provers to behave honestly for their best interest, bypassing the
existing hardness to design a PoL mechanism with computational efficiency, a
provable incentive-security guarantee and controllable difficulty.
Particularly, our work is secure against two attacks to the recent work of Jia
et al. [2021], and also improves the computational overhead from $\Theta(1)$ to
$O(\frac{\log E}{E})$. Furthermore, while most recent research assumes trusted
problem providers and verifiers, our design also guarantees frontend
incentive-security even when problem providers are untrusted, and verifier
incentive-security that bypasses the Verifier’s Dilemma. By incorporating ML
training into blockchain consensus mechanisms with provable guarantees, our
research not only proposes an eco-friendly solution to blockchain systems, but
also provides a proposal for a completely decentralized computing power market
in the new AI age.
[COMMENTS]
22 pages, 6 figures
[LINK]
http://arxiv.org/abs/2404.09005v1
[DATE]
2024-04-13 21:18:40+08:00
[CATEGORIES]
cs.LG
MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images
[AUTHORS]
Yingjie Xi, Boyuan Cheng, Jingyao Cai, Jian Jun Zhang, Xiaosong Yang
[ABSTRACT]
The human whole-body X-rays could offer a valuable reference for various
applications, including medical diagnostics, digital animation modeling, and
ergonomic design. The traditional method of obtaining X-ray information
requires the use of CT (Computed Tomography) scan machines, which emit
potentially harmful radiation. Thus it faces a significant limitation for
realistic applications because it lacks adaptability and safety. In our work,
We proposed a new method to directly generate the 2D human whole-body X-rays
from the human masking images. The predicted images will be similar to the real
ones with the same image style and anatomic structure. We employed a
data-driven strategy. By leveraging advanced generative techniques, our model
MaSkel(Masking image to Skeleton X-rays) could generate a high-quality X-ray
image from a human masking image without the need for invasive and harmful
radiation exposure, which not only provides a new path to generate highly
anatomic and customized data but also reduces health risks. To our knowledge,
our model MaSkel is the first work for predicting whole-body X-rays. In this
paper, we did two parts of the work. The first one is to solve the data
limitation problem, the diffusion-based techniques are utilized to make a data
augmentation, which provides two synthetic datasets for preliminary
pretraining. Then we designed a two-stage training strategy to train MaSkel. At
last, we make qualitative and quantitative evaluations of the generated X-rays.
In addition, we invite some professional doctors to assess our predicted data.
These evaluations demonstrate the MaSkel’s superior ability to generate
anatomic X-rays from human masking images. The related code and links of the
dataset are available at https://github.com/2022yingjie/MaSkel.
[LINK]
http://arxiv.org/abs/2404.09000v1
[DATE]
2024-04-13 21:03:19+08:00
[CATEGORIES]
cs.LG
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
[AUTHORS]
Jinwoo Kim, Tien Dat Nguyen, Ayhan Suleymanzade, Hyeokjun An, Seunghoon Hong
[ABSTRACT]
We present a novel framework to overcome the limitations of equivariant
architectures in learning functions with group symmetries. In contrary to
equivariant architectures, we use an arbitrary base model such as an MLP or a
transformer and symmetrize it to be equivariant to the given group by employing
a small equivariant network that parameterizes the probabilistic distribution
underlying the symmetrization. The distribution is end-to-end trained with the
base model which can maximize performance while reducing sample complexity of
symmetrization. We show that this approach ensures not only equivariance to
given group but also universal approximation capability in expectation. We
implement our method on various base models, including patch-based transformers
that can be initialized from pretrained vision transformers, and test them for
a wide range of symmetry groups including permutation and Euclidean groups and
their combinations. Empirical tests show competitive results against tailored
equivariant architectures, suggesting the potential for learning equivariant
functions for diverse groups using a non-equivariant universal base
architecture. We further show evidence of enhanced learning in symmetric
modalities, like graphs, when pretrained from non-symmetric modalities, like
vision. Code is available at https://github.com/jw9730/lps.
[COMMENTS]
32 pages, 11 figures
[LINK]
http://arxiv.org/abs/2306.02866v3
[DATE]
2024-04-13 20:50:13+08:00
[CATEGORIES]
cs.LG
DTOR: Decision Tree Outlier Regressor to explain anomalies
[AUTHORS]
Riccardo Crupi, Daniele Regoli, Alessandro Damiano Sabatino, Immacolata Marano, Massimiliano Brinis, Luca Albertazzi, Andrea Cirillo, Andrea Claudio Cosentini
[ABSTRACT]
Explaining outliers occurrence and mechanism of their occurrence can be
extremely important in a variety of domains. Malfunctions, frauds, threats, in
addition to being correctly identified, oftentimes need a valid explanation in
order to effectively perform actionable counteracts. The ever more widespread
use of sophisticated Machine Learning approach to identify anomalies make such
explanations more challenging. We present the Decision Tree Outlier Regressor
(DTOR), a technique for producing rule-based explanations for individual data
points by estimating anomaly scores generated by an anomaly detection model.
This is accomplished by first applying a Decision Tree Regressor, which
computes the estimation score, and then extracting the relative path associated
with the data point score. Our results demonstrate the robustness of DTOR even
in datasets with a large number of features. Additionally, in contrast to other
rule-based approaches, the generated rules are consistently satisfied by the
points to be explained. Furthermore, our evaluation metrics indicate comparable
performance to Anchors in outlier explanation tasks, with reduced execution
time.
[LINK]
http://arxiv.org/abs/2403.10903v3
[DATE]
2024-04-13 20:49:43+08:00
[CATEGORIES]
cs.LG
Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery
[AUTHORS]
Ye Wang, Yaxiong Wang, Yujiao Wu, Bingchen Zhao, Xueming Qian
[ABSTRACT]
Generalized Class Discovery (GCD) aims to dynamically assign labels to
unlabelled data partially based on knowledge learned from labelled data, where
the unlabelled data may come from known or novel classes. The prevailing
approach generally involves clustering across all data and learning conceptions
by prototypical contrastive learning. However, existing methods largely hinge
on the performance of clustering algorithms and are thus subject to their
inherent limitations. Firstly, the estimated cluster number is often smaller
than the ground truth, making the existing methods suffer from the lack of
prototypes for comprehensive conception learning. To address this issue, we
propose an adaptive probing mechanism that introduces learnable potential
prototypes to expand cluster prototypes (centers). As there is no ground truth
for the potential prototype, we develop a self-supervised prototype learning
framework to optimize the potential prototype in an end-to-end fashion.
Secondly, clustering is computationally intensive, and the conventional
strategy of clustering both labelled and unlabelled instances exacerbates this
issue. To counteract this inefficiency, we opt to cluster only the unlabelled
instances and subsequently expand the cluster prototypes with our introduced
potential prototypes to fast explore novel classes. Despite the simplicity of
our proposed method, extensive empirical analysis on a wide range of datasets
confirms that our method consistently delivers state-of-the-art results.
Specifically, our method surpasses the nearest competitor by a significant
margin of \textbf{9.7}$\%$ within the Stanford Cars dataset and
\textbf{12$\times$} clustering efficiency within the Herbarium 19 dataset. We
will make the code and checkpoints publicly available at
\url{https://github.com/xjtuYW/PNP.git}.
[COMMENTS]
9 pages, 7 figures
[LINK]
http://arxiv.org/abs/2404.08995v1
[DATE]
2024-04-13 20:41:40+08:00
[CATEGORIES]
cs.LG
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning
[AUTHORS]
Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang
[ABSTRACT]
Large Language Models (LLMs) have demonstrated significant potential in
performing multiple tasks in multimedia applications, ranging from content
generation to interactive entertainment, and artistic creation. However, the
diversity of downstream tasks in multitask scenarios presents substantial
adaptation challenges for LLMs. While traditional methods often succumb to
knowledge confusion on their monolithic dense models, Mixture-of-Experts (MoE)
has been emerged as a promising solution with its sparse architecture for
effective task decoupling. Inspired by the principles of human cognitive
neuroscience, we design a novel framework \texttt{Intuition-MoR1E} that
leverages the inherent semantic clustering of instances to mimic the human
brain to deal with multitask, offering implicit guidance to router for
optimized feature allocation. Moreover, we introduce cutting-edge Rank-1
Experts formulation designed to manage a spectrum of intuitions, demonstrating
enhanced parameter efficiency and effectiveness in multitask LLM finetuning.
Extensive experiments demonstrate that Intuition-MoR1E achieves superior
efficiency and 2.15\% overall accuracy improvement across 14 public datasets
against other state-of-the-art baselines.
[COMMENTS]
13 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.08985v1
[DATE]
2024-04-13 20:14:58+08:00
[CATEGORIES]
cs.LG
Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification
[AUTHORS]
Denis Huseljic, Paul Hahn, Marek Herde, Lukas Rauch, Bernhard Sick
[ABSTRACT]
Deep active learning (AL) seeks to minimize the annotation costs for training
deep neural networks. BAIT, a recently proposed AL strategy based on the Fisher
Information, has demonstrated impressive performance across various datasets.
However, BAIT’s high computational and memory requirements hinder its
applicability on large-scale classification tasks, resulting in current
research neglecting BAIT in their evaluation. This paper introduces two methods
to enhance BAIT’s computational efficiency and scalability. Notably, we
significantly reduce its time complexity by approximating the Fisher
Information. In particular, we adapt the original formulation by i) taking the
expectation over the most probable classes, and ii) constructing a binary
classification task, leading to an alternative likelihood for gradient
computations. Consequently, this allows the efficient use of BAIT on
large-scale datasets, including ImageNet. Our unified and comprehensive
evaluation across a variety of datasets demonstrates that our approximations
achieve strong performance with considerably reduced time complexity.
Furthermore, we provide an extensive open-source toolbox that implements recent
state-of-the-art AL strategies, available at
https://github.com/dhuseljic/dal-toolbox.
[LINK]
http://arxiv.org/abs/2404.08981v1
[DATE]
2024-04-13 20:09:37+08:00
[CATEGORIES]
cs.LG
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
[AUTHORS]
Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang
[ABSTRACT]
In view of its power in extracting feature representation, contrastive
self-supervised learning has been successfully integrated into the practice of
(deep) reinforcement learning (RL), leading to efficient policy learning in
various applications. Despite its tremendous empirical successes, the
understanding of contrastive learning for RL remains elusive. To narrow such a
gap, we study how RL can be empowered by contrastive learning in a class of
Markov decision processes (MDPs) and Markov games (MGs) with low-rank
transitions. For both models, we propose to extract the correct feature
representations of the low-rank model by minimizing a contrastive loss.
Moreover, under the online setting, we propose novel upper confidence bound
(UCB)-type algorithms that incorporate such a contrastive loss with online RL
algorithms for MDPs or MGs. We further theoretically prove that our algorithm
recovers the true representations and simultaneously achieves sample efficiency
in learning the optimal policy and Nash equilibrium in MDPs and MGs. We also
provide empirical studies to demonstrate the efficacy of the UCB-based
contrastive learning method for RL. To the best of our knowledge, we provide
the first provably efficient online RL algorithm that incorporates contrastive
learning for representation learning. Our codes are available at
https://github.com/Baichenjia/Contrastive-UCB.
[COMMENTS]
ICML 2022
[LINK]
http://arxiv.org/abs/2207.14800v3
[DATE]
2024-04-13 20:08:51+08:00
[CATEGORIES]
cs.LG
Stability and Generalization in Free Adversarial Training
[AUTHORS]
Xiwei Cheng, Kexin Fu, Farzan Farnia
[ABSTRACT]
While adversarial training methods have resulted in significant improvements
in the deep neural nets’ robustness against norm-bounded adversarial
perturbations, their generalization performance from training samples to test
data has been shown to be considerably worse than standard empirical risk
minimization methods. Several recent studies seek to connect the generalization
behavior of adversarially trained classifiers to various gradient-based min-max
optimization algorithms used for their training. In this work, we study the
generalization performance of adversarial training methods using the
algorithmic stability framework. Specifically, our goal is to compare the
generalization performance of the vanilla adversarial training scheme fully
optimizing the perturbations at every iteration vs. the free adversarial
training simultaneously optimizing the norm-bounded perturbations and
classifier parameters. Our proven generalization bounds indicate that the free
adversarial training method could enjoy a lower generalization gap between
training and test samples due to the simultaneous nature of its min-max
optimization algorithm. We perform several numerical experiments to evaluate
the generalization performance of vanilla, fast, and free adversarial training
methods. Our empirical findings also show the improved generalization
performance of the free adversarial training method and further demonstrate
that the better generalization result could translate to greater robustness
against black-box attack schemes. The code is available at
https://github.com/Xiwei-Cheng/Stability_FreeAT.
[LINK]
http://arxiv.org/abs/2404.08980v1
[DATE]
2024-04-13 20:07:20+08:00
[CATEGORIES]
cs.LG
G-ACIL: Analytic Learning for Exemplar-Free Generalized Class Incremental Learning
[AUTHORS]
Huiping Zhuang, Yizhu Chen, Di Fang, Run He, Kai Tong, Hongxin Wei, Ziqian Zeng, Cen Chen
[ABSTRACT]
Class incremental learning (CIL) trains a network on sequential tasks with
separated categories but suffers from catastrophic forgetting, where models
quickly lose previously learned knowledge when acquiring new tasks. The
generalized CIL (GCIL) aims to address the CIL problem in a more real-world
scenario, where incoming data have mixed data categories and unknown sample
size distribution, leading to intensified forgetting. Existing attempts for the
GCIL either have poor performance, or invade data privacy by saving historical
exemplars. To address this, in this paper, we propose an exemplar-free
generalized analytic class incremental learning (G-ACIL). The G-ACIL adopts
analytic learning (a gradient-free training technique), and delivers an
analytical solution (i.e., closed-form) to the GCIL scenario. This solution is
derived via decomposing the incoming data into exposed and unexposed classes,
allowing an equivalence between the incremental learning and its joint
training, i.e., the weight-invariant property. Such an equivalence is
theoretically validated through matrix analysis tools, and hence contributes
interpretability in GCIL. It is also empirically evidenced by experiments on
various datasets and settings of GCIL. The results show that the G-ACIL
exhibits leading performance with high robustness compared with existing
competitive GCIL methods. Codes will be ready at
\url{https://github.com/ZHUANGHP/Analytic-continual-learning}.
[LINK]
http://arxiv.org/abs/2403.15706v2
[DATE]
2024-04-13 20:06:35+08:00
[CATEGORIES]
cs.LG
PraFFL: A Preference-Aware Scheme in Fair Federated Learning
[AUTHORS]
Rongguang Ye, Ming Tang
[ABSTRACT]
Fairness in federated learning has emerged as a critical concern, aiming to
develop an unbiased model for any special group (e.g., male or female) of
sensitive features. However, there is a trade-off between model performance and
fairness, i.e., improving fairness will decrease model performance. Existing
approaches have characterized such a trade-off by introducing hyperparameters
to quantify client’s preferences for fairness and model performance.
Nevertheless, these methods are limited to scenarios where each client has only
a single pre-defined preference. In practical systems, each client may
simultaneously have multiple preferences for the model performance and
fairness. The key challenge is to design a method that allows the model to
adapt to diverse preferences of each client in real time. To this end, we
propose a Preference-aware scheme in Fair Federated Learning paradigm (called
PraFFL). PraFFL can adaptively adjust the model based on each client’s
preferences to meet their needs. We theoretically prove that PraFFL can provide
the optimal model for client’s arbitrary preferences. Experimental results show
that our proposed PraFFL outperforms five existing fair federated learning
algorithms in terms of the model’s capability in adapting to clients’ different
preferences.
[COMMENTS]
10 pages, 10 figures, and 1 table. This paper has been submitted to
MobiHoc’24
[LINK]
http://arxiv.org/abs/2404.08973v1
[DATE]
2024-04-13 19:40:05+08:00
[CATEGORIES]
cs.LG
Fast Gradient Computation for Gromov-Wasserstein Distance
[AUTHORS]
Wei Zhang, Zihao Wang, Jie Fan, Hao Wu, Yong Zhang
[ABSTRACT]
The Gromov-Wasserstein distance is a notable extension of optimal transport.
In contrast to the classic Wasserstein distance, it solves a quadratic
assignment problem that minimizes the pair-wise distance distortion under the
transportation of distributions and thus could apply to distributions in
different spaces. These properties make Gromov-Wasserstein widely applicable to
many fields, such as computer graphics and machine learning. However, the
computation of the Gromov-Wasserstein distance and transport plan is expensive.
The well-known Entropic Gromov-Wasserstein approach has a cubic complexity
since the matrix multiplication operations need to be repeated in computing the
gradient of Gromov-Wasserstein loss. This becomes a key bottleneck of the
method. Currently, existing methods accelerate the computation focus on
sampling and approximation, which leads to low accuracy or incomplete transport
plan. In this work, we propose a novel method to accelerate accurate gradient
computation by dynamic programming techniques, reducing the complexity from
cubic to quadratic. In this way, the original computational bottleneck is
broken and the new entropic solution can be obtained with total quadratic time,
which is almost optimal complexity. Furthermore, it can be extended to some
variants easily. Extensive experiments validate the efficiency and
effectiveness of our method.
[COMMENTS]
Work in progress
[LINK]
http://arxiv.org/abs/2404.08970v1
[DATE]
2024-04-13 19:23:34+08:00
[CATEGORIES]
cs.LG
Concentration properties of fractional posterior in 1-bit matrix completion
[AUTHORS]
The Tien Mai
[ABSTRACT]
The problem of estimating a matrix based on a set of its observed entries is
commonly referred to as the matrix completion problem. In this work, we
specifically address the scenario of binary observations, often termed as 1-bit
matrix completion. While numerous studies have explored Bayesian and
frequentist methods for real-value matrix completion, there has been a lack of
theoretical exploration regarding Bayesian approaches in 1-bit matrix
completion. We tackle this gap by considering a general, non-uniform sampling
scheme and providing theoretical assurances on the efficacy of the fractional
posterior. Our contributions include obtaining concentration results for the
fractional posterior and demonstrating its effectiveness in recovering the
underlying parameter matrix. We accomplish this using two distinct types of
prior distributions: low-rank factorization priors and a spectral scaled
Student prior, with the latter requiring fewer assumptions. Importantly, our
results exhibit an adaptive nature by not mandating prior knowledge of the rank
of the parameter matrix. Our findings are comparable to those found in the
frequentist literature, yet demand fewer restrictive assumptions.
[LINK]
http://arxiv.org/abs/2404.08969v1
[DATE]
2024-04-13 19:22:53+08:00
[CATEGORIES]
cs.LG
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
[AUTHORS]
Bor-Shiun Wang, Chien-Yi Wang, Wei-Chen Chiu
[ABSTRACT]
Recent advancements in post-hoc and inherently interpretable methods have
markedly enhanced the explanations of black box classifier models. These
methods operate either through post-analysis or by integrating concept learning
during model training. Although being effective in bridging the semantic gap
between a model’s latent space and human interpretation, these explanation
methods only partially reveal the model’s decision-making process. The outcome
is typically limited to high-level semantics derived from the last feature map.
We argue that the explanations lacking insights into the decision processes at
low and mid-level features are neither fully faithful nor useful. Addressing
this gap, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet),
an inherently interpretable model. MCPNet autonomously learns meaningful
concept prototypes across multiple feature map levels using Centered Kernel
Alignment (CKA) loss and an energy-based weighted PCA mechanism, and it does so
without reliance on predefined concept labels. Further, we propose a novel
classifier paradigm that learns and aligns multi-level concept prototype
distributions for classification purposes via Class-aware Concept Distribution
(CCD) loss. Our experiments reveal that our proposed MCPNet while being
adaptable to various model architectures, offers comprehensive multi-level
explanations while maintaining classification accuracy. Additionally, its
concept distribution-based classification approach shows improved
generalization capabilities in few-shot classification scenarios.
[COMMENTS]
Accepted by CVPR 2024
[LINK]
http://arxiv.org/abs/2404.08968v1
[DATE]
2024-04-13 19:13:56+08:00
[CATEGORIES]
cs.LG
Understanding Multimodal Deep Neural Networks: A Concept Selection View
[AUTHORS]
Chenming Shang, Hengyuan Zhang, Hao Wen, Yujiu Yang
[ABSTRACT]
The multimodal deep neural networks, represented by CLIP, have generated rich
downstream applications owing to their excellent performance, thus making
understanding the decision-making process of CLIP an essential research topic.
Due to the complex structure and the massive pre-training data, it is often
regarded as a black-box model that is too difficult to understand and
interpret. Concept-based models map the black-box visual representations
extracted by deep neural networks onto a set of human-understandable concepts
and use the concepts to make predictions, enhancing the transparency of the
decision-making process. However, these methods involve the datasets labeled
with fine-grained attributes by expert knowledge, which incur high costs and
introduce excessive human prior knowledge and bias. In this paper, we observe
the long-tail distribution of concepts, based on which we propose a two-stage
Concept Selection Model (CSM) to mine core concepts without introducing any
human priors. The concept greedy rough selection algorithm is applied to
extract head concepts, and then the concept mask fine selection method performs
the extraction of core concepts. Experiments show that our approach achieves
comparable performance to end-to-end black-box models, and human evaluation
demonstrates that the concepts discovered by our method are interpretable and
comprehensible for humans.
[LINK]
http://arxiv.org/abs/2404.08964v1
[DATE]
2024-04-13 19:06:49+08:00
[CATEGORIES]
cs.LG
Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems
[AUTHORS]
Francesco G. Blanco, Enrico Russo, Maurizio Palesi, Davide Patti, Giuseppe Ascia, Vincenzo Catania
[ABSTRACT]
Currently, there is a growing trend of outsourcing the execution of DNNs to
cloud services. For service providers, managing multi-tenancy and ensuring
high-quality service delivery, particularly in meeting stringent execution time
constraints, assumes paramount importance, all while endeavoring to maintain
cost-effectiveness. In this context, the utilization of heterogeneous
multi-accelerator systems becomes increasingly relevant. This paper presents
RELMAS, a low-overhead deep reinforcement learning algorithm designed for the
online scheduling of DNNs in multi-tenant environments, taking into account the
dataflow heterogeneity of accelerators and memory bandwidths contentions. By
doing so, service providers can employ the most efficient scheduling policy for
user requests, optimizing Service-Level-Agreement (SLA) satisfaction rates and
enhancing hardware utilization. The application of RELMAS to a heterogeneous
multi-accelerator system composed of various instances of Simba and Eyeriss
sub-accelerators resulted in up to a 173% improvement in SLA satisfaction rate
compared to state-of-the-art scheduling techniques across different workload
scenarios, with less than a 1.5% energy overhead.
[LINK]
http://arxiv.org/abs/2404.08950v1
[DATE]
2024-04-13 18:13:07+08:00
[CATEGORIES]
cs.LG
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles
[AUTHORS]
Zhiwei Tang, Dmitry Rybin, Tsung-Hui Chang
[ABSTRACT]
In this study, we delve into an emerging optimization challenge involving a
black-box objective function that can only be gauged via a ranking oracle-a
situation frequently encountered in real-world scenarios, especially when the
function is evaluated by human judges. Such challenge is inspired from
Reinforcement Learning with Human Feedback (RLHF), an approach recently
employed to enhance the performance of Large Language Models (LLMs) using human
guidance. We introduce ZO-RankSGD, an innovative zeroth-order optimization
algorithm designed to tackle this optimization problem, accompanied by
theoretical assurances. Our algorithm utilizes a novel rank-based random
estimator to determine the descent direction and guarantees convergence to a
stationary point. Moreover, ZO-RankSGD is readily applicable to policy
optimization problems in Reinforcement Learning (RL), particularly when only
ranking oracles for the episode reward are available. Last but not least, we
demonstrate the effectiveness of ZO-RankSGD in a novel application: improving
the quality of images generated by a diffusion generative model with human
ranking feedback. Throughout experiments, we found that ZO-RankSGD can
significantly enhance the detail of generated images with only a few rounds of
human feedback. Overall, our work advances the field of zeroth-order
optimization by addressing the problem of optimizing functions with only
ranking feedback, and offers a new and effective approach for aligning
Artificial Intelligence (AI) with human intentions.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2303.03751v3
[DATE]
2024-04-13 17:38:13+08:00
[CATEGORIES]
cs.LG
Developing An Attention-Based Ensemble Learning Framework for Financial Portfolio Optimisation
[AUTHORS]
Zhenglong Li, Vincent Tam
[ABSTRACT]
In recent years, deep or reinforcement learning approaches have been applied
to optimise investment portfolios through learning the spatial and temporal
information under the dynamic financial market. Yet in most cases, the existing
approaches may produce biased trading signals based on the conventional price
data due to a lot of market noises, which possibly fails to balance the
investment returns and risks. Accordingly, a multi-agent and self-adaptive
portfolio optimisation framework integrated with attention mechanisms and time
series, namely the MASAAT, is proposed in this work in which multiple trading
agents are created to observe and analyse the price series and directional
change data that recognises the significant changes of asset prices at
different levels of granularity for enhancing the signal-to-noise ratio of
price series. Afterwards, by reconstructing the tokens of financial data in a
sequence, the attention-based cross-sectional analysis module and temporal
analysis module of each agent can effectively capture the correlations between
assets and the dependencies between time points. Besides, a portfolio generator
is integrated into the proposed framework to fuse the spatial-temporal
information and then summarise the portfolios suggested by all trading agents
to produce a newly ensemble portfolio for reducing biased trading actions and
balancing the overall returns and risks. The experimental results clearly
demonstrate that the MASAAT framework achieves impressive enhancement when
compared with many well-known portfolio optimsation approaches on three
challenging data sets of DJIA, S&P 500 and CSI 300. More importantly, our
proposal has potential strengths in many possible applications for future
study.
[LINK]
http://arxiv.org/abs/2404.08935v1
[DATE]
2024-04-13 17:10:05+08:00
[CATEGORIES]
cs.LG
Unraveling Batch Normalization for Realistic Test-Time Adaptation
[AUTHORS]
Zixian Su, Jingwei Guo, Kai Yao, Xi Yang, Qiufeng Wang, Kaizhu Huang
[ABSTRACT]
While recent test-time adaptations exhibit efficacy by adjusting batch
normalization to narrow domain disparities, their effectiveness diminishes with
realistic mini-batches due to inaccurate target estimation. As previous
attempts merely introduce source statistics to mitigate this issue, the
fundamental problem of inaccurate target estimation still persists, leaving the
intrinsic test-time domain shifts unresolved. This paper delves into the
problem of mini-batch degradation. By unraveling batch normalization, we
discover that the inexact target statistics largely stem from the substantially
reduced class diversity in batch. Drawing upon this insight, we introduce a
straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge
the class diversity gap between training and testing batches. Importantly, our
TEMA adaptively extends the scope of typical methods beyond the current batch
to incorporate a diverse set of class information, which in turn boosts an
accurate target estimation. Built upon this foundation, we further design a
novel layer-wise rectification strategy to consistently promote test-time
performance. Our proposed method enjoys a unique advantage as it requires
neither training nor tuning parameters, offering a truly hassle-free solution.
It significantly enhances model robustness against shifted domains and
maintains resilience in diverse real-world scenarios with various batch sizes,
achieving state-of-the-art performance on several major benchmarks. Code is
available at \url{https://github.com/kiwi12138/RealisticTTA}.
[COMMENTS]
Accepted by AAAI 2024
[LINK]
http://arxiv.org/abs/2312.09486v3
[DATE]
2024-04-13 17:00:35+08:00
[CATEGORIES]
cs.LG
GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy
[AUTHORS]
Tianhao Peng, Wenjun Wu, Haitao Yuan, Zhifeng Bao, Zhao Pengrui, Xin Yu, Xuetao Lin, Yu Liang, Yanjun Pu
[ABSTRACT]
Graph neural networks (GNNs) have shown advantages in graph-based analysis
tasks. However, most existing methods have the homogeneity assumption and show
poor performance on heterophilic graphs, where the linked nodes have dissimilar
features and different class labels, and the semantically related nodes might
be multi-hop away. To address this limitation, this paper presents GraphRARE, a
general framework built upon node relative entropy and deep reinforcement
learning, to strengthen the expressive capability of GNNs. An innovative node
relative entropy, which considers node features and structural similarity, is
used to measure mutual information between node pairs. In addition, to avoid
the sub-optimal solutions caused by mixing useful information and noises of
remote nodes, a deep reinforcement learning-based algorithm is developed to
optimize the graph topology. This algorithm selects informative nodes and
discards noisy nodes based on the defined node relative entropy. Extensive
experiments are conducted on seven real-world datasets. The experimental
results demonstrate the superiority of GraphRARE in node classification and its
capability to optimize the original graph topology.
[COMMENTS]
14 pages, 7 figures
[LINK]
http://arxiv.org/abs/2312.09708v2
[DATE]
2024-04-13 16:52:55+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks with Diverse Spectral Filtering
[AUTHORS]
Jingwei Guo, Kaizhu Huang, Xinping Yi, Rui Zhang
[ABSTRACT]
Spectral Graph Neural Networks (GNNs) have achieved tremendous success in
graph machine learning, with polynomial filters applied for graph convolutions,
where all nodes share the identical filter weights to mine their local
contexts. Despite the success, existing spectral GNNs usually fail to deal with
complex networks (e.g., WWW) due to such homogeneous spectral filtering setting
that ignores the regional heterogeneity as typically seen in real-world
networks. To tackle this issue, we propose a novel diverse spectral filtering
(DSF) framework, which automatically learns node-specific filter weights to
exploit the varying local structure properly. Particularly, the diverse filter
weights consist of two components – A global one shared among all nodes, and a
local one that varies along network edges to reflect node difference arising
from distinct graph parts – to balance between local and global information.
As such, not only can the global graph characteristics be captured, but also
the diverse local patterns can be mined with awareness of different node
positions. Interestingly, we formulate a novel optimization problem to assist
in learning diverse filters, which also enables us to enhance any spectral GNNs
with our DSF framework. We showcase the proposed framework on three
state-of-the-arts including GPR-GNN, BernNet, and JacobiConv. Extensive
experiments over 10 benchmark datasets demonstrate that our framework can
consistently boost model performance by up to 4.92% in node classification
tasks, producing diverse filters with enhanced interpretability. Code is
available at \url{https://github.com/jingweio/DSF}.
[COMMENTS]
Accepted by Proceedings of the ACM Web Conference 2023 (WWW ‘23)
[LINK]
http://arxiv.org/abs/2312.09041v2
[DATE]
2024-04-13 16:50:04+08:00
[CATEGORIES]
cs.LG
Meply: A Large-scale Dataset and Baseline Evaluations for Metastatic Perirectal Lymph Node Detection and Segmentation
[AUTHORS]
Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Jun Li, Peiquan Jin
[ABSTRACT]
Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial
for the staging and treatment of rectal cancer. However, existing segmentation
approaches face challenges due to the absence of pixel-level annotated datasets
tailored for lymph nodes around the rectum. Additionally, metastatic lymph
nodes are characterized by their relatively small size, irregular shapes, and
lower contrast compared to the background, further complicating the
segmentation task. To address these challenges, we present the first
large-scale perirectal metastatic lymph node CT image dataset called Meply,
which encompasses pixel-level annotations of 269 patients diagnosed with rectal
cancer. Furthermore, we introduce a novel lymph-node segmentation model named
CoSAM. The CoSAM utilizes sequence-based detection to guide the segmentation of
metastatic lymph nodes in rectal cancer, contributing to improved localization
performance for the segmentation model. It comprises three key components:
sequence-based detection module, segmentation module, and collaborative
convergence unit. To evaluate the effectiveness of CoSAM, we systematically
compare its performance with several popular segmentation methods using the
Meply dataset. Our code and dataset will be publicly available at:
https://github.com/kanydao/CoSAM.
[COMMENTS]
13 pages
[LINK]
http://arxiv.org/abs/2404.08916v1
[DATE]
2024-04-13 15:30:16+08:00
[CATEGORIES]
cs.LG
ES-GNN: Generalizing Graph Neural Networks Beyond Homophily with Edge Splitting
[AUTHORS]
Jingwei Guo, Kaizhu Huang, Rui Zhang, Xinping Yi
[ABSTRACT]
While Graph Neural Networks (GNNs) have achieved enormous success in multiple
graph analytical tasks, modern variants mostly rely on the strong inductive
bias of homophily. However, real-world networks typically exhibit both
homophilic and heterophilic linking patterns, wherein adjacent nodes may share
dissimilar attributes and distinct labels. Therefore, GNNs smoothing node
proximity holistically may aggregate both task-relevant and irrelevant (even
harmful) information, limiting their ability to generalize to heterophilic
graphs and potentially causing non-robustness. In this work, we propose a novel
Edge Splitting GNN (ES-GNN) framework to adaptively distinguish between graph
edges either relevant or irrelevant to learning tasks. This essentially
transfers the original graph into two subgraphs with the same node set but
complementary edge sets dynamically. Given that, information propagation
separately on these subgraphs and edge splitting are alternatively conducted,
thus disentangling the task-relevant and irrelevant features. Theoretically, we
show that our ES-GNN can be regarded as a solution to a disentangled graph
denoising problem, which further illustrates our motivations and interprets the
improved generalization beyond homophily. Extensive experiments over 11
benchmark and 1 synthetic datasets not only demonstrate the effective
performance of ES-GNN but also highlight its robustness to adversarial graphs
and mitigation of the over-smoothing problem.
[COMMENTS]
Under review
[LINK]
http://arxiv.org/abs/2205.13700v3
[DATE]
2024-04-13 15:15:04+08:00
[CATEGORIES]
cs.LG
On the best approximation by finite Gaussian mixtures
[AUTHORS]
Yun Ma, Yihong Wu, Pengkun Yang
[ABSTRACT]
We consider the problem of approximating a general Gaussian location mixture
by finite mixtures. The minimum order of finite mixtures that achieve a
prescribed accuracy (measured by various $f$-divergences) is determined within
constant factors for the family of mixing distributions with compactly support
or appropriate assumptions on the tail probability including subgaussian and
subexponential. While the upper bound is achieved using the technique of local
moment matching, the lower bound is established by relating the best
approximation error to the low-rank approximation of certain trigonometric
moment matrices, followed by a refined spectral analysis of their minimum
eigenvalue. In the case of Gaussian mixing distributions, this result corrects
a previous lower bound in [Allerton Conference 48 (2010) 620-628].
[LINK]
http://arxiv.org/abs/2404.08913v1
[DATE]
2024-04-13 14:57:44+08:00
[CATEGORIES]
cs.LG
Nonstationary Reinforcement Learning with Linear Function Approximation
[AUTHORS]
Huozhi Zhou, Jinglin Chen, Lav R. Varshney, Ashish Jagmohan
[ABSTRACT]
We consider reinforcement learning (RL) in episodic Markov decision processes
(MDPs) with linear function approximation under drifting environment.
Specifically, both the reward and state transition functions can evolve over
time but their total variations do not exceed a $\textit{variation budget}$. We
first develop $\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification
of least-squares value iteration with periodic restart, and bound its dynamic
regret when variation budgets are known. Then we propose a parameter-free
algorithm $\texttt{Ada-LSVI-UCB-Restart}$ that extends to unknown variation
budgets. We also derive the first minimax dynamic regret lower bound for
nonstationary linear MDPs and as a byproduct establish a minimax regret lower
bound for linear MDPs unsolved by Jin et al. (2020). Finally, we provide
numerical experiments to demonstrate the effectiveness of our proposed
algorithms.
[LINK]
http://arxiv.org/abs/2010.04244v3
[DATE]
2024-04-13 14:52:10+08:00
[CATEGORIES]
cs.LG
On the Computational Complexity of Private High-dimensional Model Selection
[AUTHORS]
Saptarshi Roy, Zehua Wang, Ambuj Tewari
[ABSTRACT]
We consider the problem of model selection in a high-dimensional sparse
linear regression model under privacy constraints. We propose a differentially
private best subset selection method with strong utility properties by adopting
the well-known exponential mechanism for selecting the best model. We propose
an efficient Metropolis-Hastings algorithm and establish that it enjoys
polynomial mixing time to its stationary distribution. Furthermore, we also
establish approximate differential privacy for the estimates of the mixed
Metropolis-Hastings chain. Finally, we perform some illustrative experiments
that show the strong utility of our algorithm.
[COMMENTS]
27 pages, 2 figures
[LINK]
http://arxiv.org/abs/2310.07852v3
[DATE]
2024-04-13 13:32:26+08:00
[CATEGORIES]
cs.LG
Enhancing path-integral approximation for non-linear diffusion with neural network
[AUTHORS]
Anna Knezevic
[ABSTRACT]
Enhancing the existing solution for pricing of fixed income instruments
within Black-Karasinski model structure, with neural network at various
parameterisation points to demonstrate that the method is able to achieve
superior outcomes for multiple calibrations across extended projection
horizons.
[LINK]
http://arxiv.org/abs/2404.08903v1
[DATE]
2024-04-13 13:15:46+08:00
[CATEGORIES]
cs.LG
Large Transformers are Better EEG Learners
[AUTHORS]
Bingxin Wang, Xiaowen Fu, Yuan Lan, Luchan Zhang, Wei Zheng, Yang Xiang
[ABSTRACT]
Pre-trained large transformer models have achieved remarkable performance in
the fields of natural language processing and computer vision. However, the
limited availability of public electroencephalogram (EEG) data presents a
unique challenge for extending the success of these models to EEG-based tasks.
To address this gap, we propose AdaCT, plug-and-play Adapters designed for
Converting Time series data into spatio-temporal 2D pseudo-images or text
forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel
time series data into spatio-temporal 2D pseudo-images for fine-tuning
pre-trained vision transformers, while AdaCT-T converts short single-channel
data into text for fine-tuning pre-trained language transformers. The proposed
approach allows for seamless integration of pre-trained vision models and
language models in time series decoding tasks, particularly in EEG data
analysis. Experimental results on diverse benchmark datasets, including
Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the
superiority of AdaCT over baseline methods. Overall, we provide a promising
transfer learning framework for leveraging the capabilities of pre-trained
vision and language models in EEG-based tasks, thereby advancing the field of
time series decoding and enhancing interpretability in EEG data analysis. Our
code will be available at https://github.com/wangbxj1234/AdaCE.
[LINK]
http://arxiv.org/abs/2308.11654v2
[DATE]
2024-04-13 13:11:03+08:00
[CATEGORIES]
cs.LG
Bullion: A Column Store for Machine Learning
[AUTHORS]
Gang Liao, Ye Liu, Jianjun Chen, Daniel J. Abadi
[ABSTRACT]
The past two decades have witnessed columnar storage revolutionizing data
warehousing and analytics. However, the rapid growth of machine learning poses
new challenges to this domain. This paper presents Bullion, a columnar storage
system tailored for machine learning workloads. Bullion addresses the
complexities of data compliance, optimizes the encoding of long sequence sparse
features, efficiently manages wide-table projections, and introduces feature
quantization in storage. By aligning with the evolving requirements of ML
applications, Bullion extends columnar storage to various scenarios, from
advertising and recommendation systems to the expanding realm of Generative AI.
Preliminary experimental results and theoretical analysis demonstrate
Bullion’s superior performance in handling the unique demands of machine
learning workloads compared to existing columnar storage solutions. Bullion
significantly reduces I/O costs for deletion compliance, achieves substantial
storage savings with its optimized encoding scheme for sparse features, and
drastically improves metadata parsing speed for wide-table projections. These
advancements position Bullion as a critical component in the future of machine
learning infrastructure, enabling organizations to efficiently manage and
process the massive volumes of data required for training and inference in
modern AI applications.
[LINK]
http://arxiv.org/abs/2404.08901v1
[DATE]
2024-04-13 13:01:54+08:00
[CATEGORIES]
cs.LG
Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading Indicators
[AUTHORS]
Lifan Zhao, Yanyan Shen
[COMMENTS]
Accepted to ICLR 2024. Code is at https://github.com/SJTU-Quant/LIFT
[LINK]
http://arxiv.org/abs/2401.17548v5
[DATE]
2024-04-13 12:26:56+08:00
[CATEGORIES]
cs.LG
Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming
[AUTHORS]
Yubo Zhuang, Xiaohui Chen, Yun Yang, Richard Y. Zhang
[ABSTRACT]
$K$-means clustering is a widely used machine learning method for identifying
patterns in large datasets. Recently, semidefinite programming (SDP)
relaxations have been proposed for solving the $K$-means optimization problem,
which enjoy strong statistical optimality guarantees. However, the prohibitive
cost of implementing an SDP solver renders these guarantees inaccessible to
practical datasets. In contrast, nonnegative matrix factorization (NMF) is a
simple clustering algorithm widely used by machine learning practitioners, but
it lacks a solid statistical underpinning and theoretical guarantees. In this
paper, we consider an NMF-like algorithm that solves a nonnegative low-rank
restriction of the SDP-relaxed $K$-means formulation using a nonconvex
Burer–Monteiro factorization approach. The resulting algorithm is as simple
and scalable as state-of-the-art NMF algorithms while also enjoying the same
strong statistical optimality guarantees as the SDP. In our experiments, we
observe that our algorithm achieves significantly smaller mis-clustering errors
compared to the existing state-of-the-art while maintaining scalability.
[COMMENTS]
Accepted to ICLR 2024
[LINK]
http://arxiv.org/abs/2305.18436v5
[DATE]
2024-04-13 12:05:41+08:00
[CATEGORIES]
cs.LG
HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores
[AUTHORS]
Yibo Zhong, Yao Zhou
[ABSTRACT]
Prior computer vision research extensively explores adapting pre-trained
vision transformers (ViT) to downstream tasks. However, the substantial number
of parameters requiring adaptation has led to a focus on Parameter Efficient
Transfer Learning (PETL) as an approach to efficiently adapt large pre-trained
models by training only a subset of parameters, achieving both parameter and
storage efficiency. Although the significantly reduced parameters have shown
promising performance under transfer learning scenarios, the structural
redundancy inherent in the model still leaves room for improvement, which
warrants further investigation. In this paper, we propose Head-level Efficient
Adaptation with Taylor-expansion importance score (HEAT): a simple method that
efficiently fine-tuning ViTs at head levels. In particular, the first-order
Taylor expansion is employed to calculate each head’s importance score, termed
Taylor-expansion Importance Score (TIS), indicating its contribution to
specific tasks. Additionally, three strategies for calculating TIS have been
employed to maximize the effectiveness of TIS. These strategies calculate TIS
from different perspectives, reflecting varying contributions of parameters.
Besides ViT, HEAT has also been applied to hierarchical transformers such as
Swin Transformer, demonstrating its versatility across different transformer
architectures. Through extensive experiments, HEAT has demonstrated superior
performance over state-of-the-art PETL methods on the VTAB-1K benchmark.
[LINK]
http://arxiv.org/abs/2404.08894v1
[DATE]
2024-04-13 12:01:35+08:00
[CATEGORIES]
cs.LG
ChangeAnywhere: Sample Generation for Remote Sensing Change Detection via Semantic Latent Diffusion Model
[AUTHORS]
Kai Tang, Jin Chen
[ABSTRACT]
Remote sensing change detection (CD) is a pivotal technique that pinpoints
changes on a global scale based on multi-temporal images. With the recent
expansion of deep learning, supervised deep learning-based CD models have shown
satisfactory performance. However, CD sample labeling is very time-consuming as
it is densely labeled and requires expert knowledge. To alleviate this problem,
we introduce ChangeAnywhere, a novel CD sample generation method using the
semantic latent diffusion model and single-temporal images. Specifically,
ChangeAnywhere leverages the relative ease of acquiring large single-temporal
semantic datasets to generate large-scale, diverse, and semantically annotated
bi-temporal CD datasets. ChangeAnywhere captures the two essentials of CD
samples, i.e., change implies semantically different, and non-change implies
reasonable change under the same semantic constraints. We generated
ChangeAnywhere-100K, the largest synthesis CD dataset with 100,000 pairs of CD
samples based on the proposed method. The ChangeAnywhere-100K significantly
improved both zero-shot and few-shot performance on two CD benchmark datasets
for various deep learning-based CD models, as demonstrated by transfer
experiments. This paper delineates the enormous potential of ChangeAnywhere for
CD sample generation and demonstrates the subsequent enhancement of model
performance. Therefore, ChangeAnywhere offers a potent tool for remote sensing
CD. All codes and pre-trained models will be available at
https://github.com/tangkai-RS/ChangeAnywhere.
[COMMENTS]
Concise manuscript version of ChangeAnywhere
[LINK]
http://arxiv.org/abs/2404.08892v1
[DATE]
2024-04-13 11:46:35+08:00
[CATEGORIES]
cs.LG
Systematic Assessment of Tabular Data Synthesis Algorithms
[AUTHORS]
Yuntao Du, Ninghui Li
[ABSTRACT]
Data synthesis has been advocated as an important approach for utilizing data
while protecting data privacy. A large number of tabular data synthesis
algorithms (which we call synthesizers) have been proposed. Some synthesizers
satisfy Differential Privacy, while others aim to provide privacy in a
heuristic fashion. A comprehensive understanding of the strengths and
weaknesses of these synthesizers remains elusive due to drawbacks in evaluation
metrics and missing head-to-head comparisons of newly developed synthesizers
that take advantage of diffusion models and large language models with
state-of-the-art marginal-based synthesizers.
In this paper, we present a systematic evaluation framework for assessing
tabular data synthesis algorithms. Specifically, we examine and critique
existing evaluation metrics, and introduce a set of new metrics in terms of
fidelity, privacy, and utility to address their limitations. Based on the
proposed metrics, we also devise a unified objective for tuning, which can
consistently improve the quality of synthetic data for all methods. We
conducted extensive evaluations of 8 different types of synthesizers on 12
real-world datasets and identified some interesting findings, which offer new
directions for privacy-preserving data synthesis.
[COMMENTS]
The code is available at: https://github.com/zealscott/SynMeter
[LINK]
http://arxiv.org/abs/2402.06806v2
[DATE]
2024-04-13 11:11:56+08:00
[CATEGORIES]
cs.LG
Learning Decentralized Linear Quadratic Regulator with $\sqrt{T}$ Regret
[AUTHORS]
Lintao Ye, Ming Chi, Ruiquan Liao, Vijay Gupta
[ABSTRACT]
We propose an online learning algorithm that adaptively designs a
decentralized linear quadratic regulator when the system model is unknown a
priori and new data samples from a single system trajectory become
progressively available. The algorithm uses a disturbance-feedback
representation of state-feedback controllers coupled with online convex
optimization with memory and delayed feedback. Under the assumption that the
system is stable or given a known stabilizing controller, we show that our
controller enjoys an expected regret that scales as $\sqrt{T}$ with the time
horizon $T$ for the case of partially nested information pattern. For more
general information patterns, the optimal controller is unknown even if the
system model is known. In this case, the regret of our controller is shown with
respect to a linear sub-optimal controller. We validate our theoretical
findings using numerical experiments.
[COMMENTS]
49 pages, 3 figures
[LINK]
http://arxiv.org/abs/2210.08886v3
[DATE]
2024-04-13 11:02:47+08:00
[CATEGORIES]
cs.LG
Generative AI Agent for Next-Generation MIMO Design: Fundamentals, Challenges, and Vision
[AUTHORS]
Zhe Wang, Jiayi Zhang, Hongyang Du, Ruichen Zhang, Dusit Niyato, Bo Ai, Khaled B. Letaief
[ABSTRACT]
Next-generation multiple input multiple output (MIMO) is expected to be
intelligent and scalable. In this paper, we study generative artificial
intelligence (AI) agent-enabled next-generation MIMO design. Firstly, we
provide an overview of the development, fundamentals, and challenges of the
next-generation MIMO. Then, we propose the concept of the generative AI agent,
which is capable of generating tailored and specialized contents with the aid
of large language model (LLM) and retrieval augmented generation (RAG). Next,
we comprehensively discuss the features and advantages of the generative AI
agent framework. More importantly, to tackle existing challenges of
next-generation MIMO, we discuss generative AI agent-enabled next-generation
MIMO design, from the perspective of performance analysis, signal processing,
and resource allocation. Furthermore, we present two compelling case studies
that demonstrate the effectiveness of leveraging the generative AI agent for
performance analysis in complex configuration scenarios. These examples
highlight how the integration of generative AI agents can significantly enhance
the analysis and design of next-generation MIMO systems. Finally, we discuss
important potential research future directions.
[COMMENTS]
9 pages, 3 figures, 2 tables
[LINK]
http://arxiv.org/abs/2404.08878v1
[DATE]
2024-04-13 10:39:36+08:00
[CATEGORIES]
cs.LG
Price-Discrimination Game for Distributed Resource Management in Federated Learning
[AUTHORS]
Han Zhang, Halvin Yang, Guopeng Zhang
[ABSTRACT]
In vanilla federated learning (FL) such as FedAvg, the parameter server (PS)
and multiple distributed clients can form a typical buyer’s market, where the
number of PS/buyers of FL services is far less than the number of
clients/sellers. In order to improve the performance of FL and reduce the cost
of motivating clients to participate in FL, this paper proposes to
differentiate the pricing for services provided by different clients rather
than simply providing the same service pricing for different clients. The price
is differentiated based on the performance improvements brought to FL and their
heterogeneity in computing and communication capabilities. To this end, a
price-discrimination game (PDG) is formulated to comprehensively address the
distributed resource management problems in FL, including multi-objective
trade-off, client selection, and incentive mechanism. As the PDG is a
mixed-integer nonlinear programming (MINLP) problem, a distributed
semi-heuristic algorithm with low computational complexity and low
communication overhead is designed to solve it. The simulation result verifies
the effectiveness of the proposed approach.
[LINK]
http://arxiv.org/abs/2308.13838v7
[DATE]
2024-04-13 09:41:23+08:00
[CATEGORIES]
cs.LG
An evaluation framework for synthetic data generation models
[AUTHORS]
Ioannis E. Livieris, Nikos Alimpertis, George Domalis, Dimitris Tsakalidis
[ABSTRACT]
Nowadays, the use of synthetic data has gained popularity as a cost-efficient
strategy for enhancing data augmentation for improving machine learning models
performance as well as addressing concerns related to sensitive data privacy.
Therefore, the necessity of ensuring quality of generated synthetic data, in
terms of accurate representation of real data, consists of primary importance.
In this work, we present a new framework for evaluating synthetic data
generation models’ ability for developing high-quality synthetic data. The
proposed approach is able to provide strong statistical and theoretical
information about the evaluation framework and the compared models’ ranking.
Two use case scenarios demonstrate the applicability of the proposed framework
for evaluating the ability of synthetic data generation models to generated
high quality data. The implementation code can be found in
https://github.com/novelcore/synthetic_data_evaluation_framework.
[COMMENTS]
This paper has been accepted for presentation at IFIP International
Conference on Artificial Intelligence Applications and Innovations
[LINK]
http://arxiv.org/abs/2404.08866v1
[DATE]
2024-04-13 09:16:45+08:00
[CATEGORIES]
cs.LG
Improving Technical “How-to” Query Accuracy with Automated Search Results Verification and Reranking
[AUTHORS]
Lei Ding, Jeshwanth Bheemanpally, Yi Zhang
[ABSTRACT]
Many people use search engines to find online guidance to solve computer or
mobile device problems. Users frequently encounter challenges in identifying
effective solutions from search results, often wasting time trying ineffective
solutions that seem relevant yet fail to solve the real problems. This paper
introduces a novel approach to improving the accuracy and relevance of online
technical support search results through automated search results verification
and reranking. Taking “How-to” queries specific to on-device execution as a
starting point, we first developed a solution that allows an AI agent to
interpret and execute step-by-step instructions in the search results in a
controlled Android environment. We further integrated the agent’s findings into
a reranking mechanism that orders search results based on the success
indicators of the tested solutions.
The paper details the architecture of our solution and a comprehensive
evaluation of the system through a series of tests across various application
domains. The results demonstrate a significant improvement in the quality and
reliability of the top-ranked results. Our findings suggest a paradigm shift in
how search engine ranking for online technical support help can be optimized,
offering a scalable and automated solution to the pervasive challenge of
finding effective and reliable online help.
[COMMENTS]
12 pages, 2 columns, 3 figures
[LINK]
http://arxiv.org/abs/2404.08860v1
[DATE]
2024-04-13 08:20:09+08:00
[CATEGORIES]
cs.LG
Forward Learning of Graph Neural Networks
[AUTHORS]
Namyong Park, Xing Wang, Antoine Simoulin, Shuai Yang, Grey Yang, Ryan Rossi, Puja Trivedi, Nesreen Ahmed
[ABSTRACT]
Graph neural networks (GNNs) have achieved remarkable success across a wide
range of applications, such as recommendation, drug discovery, and question
answering. Behind the success of GNNs lies the backpropagation (BP) algorithm,
which is the de facto standard for training deep neural networks (NNs).
However, despite its effectiveness, BP imposes several constraints, which are
not only biologically implausible, but also limit the scalability, parallelism,
and flexibility in learning NNs. Examples of such constraints include storage
of neural activities computed in the forward pass for use in the subsequent
backward pass, and the dependence of parameter updates on non-local signals. To
address these limitations, the forward-forward algorithm (FF) was recently
proposed as an alternative to BP in the image classification domain, which
trains NNs by performing two forward passes over positive and negative data.
Inspired by this advance, we propose ForwardGNN in this work, a new forward
learning procedure for GNNs, which avoids the constraints imposed by BP via an
effective layer-wise local forward training. ForwardGNN extends the original FF
to deal with graph data and GNNs, and makes it possible to operate without
generating negative inputs (hence no longer forward-forward). Further,
ForwardGNN enables each layer to learn from both the bottom-up and top-down
signals without relying on the backpropagation of errors. Extensive experiments
on real-world datasets show the effectiveness and generality of the proposed
forward graph learning framework. We release our code at
https://github.com/facebookresearch/forwardgnn.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2403.11004v2
[DATE]
2024-04-13 08:10:00+08:00
[CATEGORIES]
cs.LG
WROOM: An Autonomous Driving Approach for Off-Road Navigation
[AUTHORS]
Dvij Kalaria, Shreya Sharma, Sarthak Bhagat, Haoru Xue, John M. Dolan
[ABSTRACT]
Off-road navigation is a challenging problem both at the planning level to
get a smooth trajectory and at the control level to avoid flipping over,
hitting obstacles, or getting stuck at a rough patch. There have been several
recent works using classical approaches involving depth map prediction followed
by smooth trajectory planning and using a controller to track it. We design an
end-to-end reinforcement learning (RL) system for an autonomous vehicle in
off-road environments using a custom-designed simulator in the Unity game
engine. We warm-start the agent by imitating a rule-based controller and
utilize Proximal Policy Optimization (PPO) to improve the policy based on a
reward that incorporates Control Barrier Functions (CBF), facilitating the
agent’s ability to generalize effectively to real-world scenarios. The training
involves agents concurrently undergoing domain-randomized trials in various
environments. We also propose a novel simulation environment to replicate
off-road driving scenarios and deploy our proposed approach on a real buggy RC
car.
Videos and additional results: https://sites.google.com/view/wroom-utd/home
[LINK]
http://arxiv.org/abs/2404.08855v1
[DATE]
2024-04-13 07:55:59+08:00
[CATEGORIES]
cs.LG
Assessing Economic Viability: A Comparative Analysis of Total Cost of Ownership for Domain-Adapted Large Language Models versus State-of-the-art Counterparts in Chip Design Coding Assistance
[AUTHORS]
Amit Sharma, Teodor-Dumitru Ene, Kishor Kunal, Mingjie Liu, Zafar Hasan, Haoxing Ren
[ABSTRACT]
This paper presents a comparative analysis of total cost of ownership (TCO)
and performance between domain-adapted large language models (LLM) and
state-of-the-art (SoTA) LLMs , with a particular emphasis on tasks related to
coding assistance for chip design. We examine the TCO and performance metrics
of a domain-adaptive LLM, ChipNeMo, against two leading LLMs, Claude 3 Opus and
ChatGPT-4 Turbo, to assess their efficacy in chip design coding generation.
Through a detailed evaluation of the accuracy of the model, training
methodologies, and operational expenditures, this study aims to provide
stakeholders with critical information to select the most economically viable
and performance-efficient solutions for their specific needs. Our results
underscore the benefits of employing domain-adapted models, such as ChipNeMo,
that demonstrate improved performance at significantly reduced costs compared
to their general-purpose counterparts. In particular, we reveal the potential
of domain-adapted LLMs to decrease TCO by approximately 90%-95%, with the cost
advantages becoming increasingly evident as the deployment scale expands. With
expansion of deployment, the cost benefits of ChipNeMo become more pronounced,
making domain-adaptive LLMs an attractive option for organizations with
substantial coding needs supported by LLMs
[LINK]
http://arxiv.org/abs/2404.08850v1
[DATE]
2024-04-13 07:37:56+08:00
[CATEGORIES]
cs.LG
LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models
[AUTHORS]
Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu
[ABSTRACT]
Differential privacy (DP) is widely being employed in the industry as a
practical standard for privacy protection. While private training of computer
vision or natural language processing applications has been studied
extensively, the computational challenges of training of recommender systems
(RecSys) with DP have not been explored. In this work, we first present our
detailed characterization of private RecSys training using DP-SGD, root-causing
its several performance bottlenecks. Specifically, we identify DP-SGD’s noise
sampling and noisy gradient update stage to suffer from a severe compute and
memory bandwidth limitation, respectively, causing significant performance
overhead in training private RecSys. Based on these findings, we propose
LazyDP, an algorithm-software co-design that addresses the compute and memory
challenges of training RecSys with DP-SGD. Compared to a state-of-the-art
DP-SGD training system, we demonstrate that LazyDP provides an average 119x
training throughput improvement while also ensuring mathematically equivalent,
differentially private RecSys models to be trained.
[LINK]
http://arxiv.org/abs/2404.08847v1
[DATE]
2024-04-13 07:32:06+08:00
[CATEGORIES]
cs.LG
Multiply-Robust Causal Change Attribution
[AUTHORS]
Victor Quintas-Martinez, Mohammad Taha Bahadori, Eduardo Santiago, Jeff Mu, Dominik Janzing, David Heckerman
[ABSTRACT]
Comparing two samples of data, we observe a change in the distribution of an
outcome variable. In the presence of multiple explanatory variables, how much
of the change can be explained by each possible cause? We develop a new
estimation strategy that, given a causal model, combines regression and
re-weighting methods to quantify the contribution of each causal mechanism. Our
proposed methodology is multiply robust, meaning that it still recovers the
target parameter under partial misspecification. We prove that our estimator is
consistent and asymptotically normal. Moreover, it can be incorporated into
existing frameworks for causal attribution, such as Shapley values, which will
inherit the consistency and large-sample distribution properties. Our method
demonstrates excellent performance in Monte Carlo simulations, and we show its
usefulness in an empirical application.
[LINK]
http://arxiv.org/abs/2404.08839v1
[DATE]
2024-04-13 06:57:01+08:00
[CATEGORIES]
cs.LG
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers
[AUTHORS]
Awni Altabaa, Taylor Webb, Jonathan Cohen, John Lafferty
[ABSTRACT]
An extension of Transformers is proposed that enables explicit relational
reasoning through a novel module called the Abstractor. At the core of the
Abstractor is a variant of attention called relational cross-attention. The
approach is motivated by an architectural inductive bias for relational
learning that disentangles relational information from object-level features.
This enables explicit relational reasoning, supporting abstraction and
generalization from limited data. The Abstractor is first evaluated on simple
discriminative relational tasks and compared to existing relational
architectures. Next, the Abstractor is evaluated on purely relational
sequence-to-sequence tasks, where dramatic improvements are seen in sample
efficiency compared to standard Transformers. Finally, Abstractors are
evaluated on a collection of tasks based on mathematical problem solving, where
consistent improvements in performance and sample efficiency are observed.
[COMMENTS]
Published at ICLR 2024
[LINK]
http://arxiv.org/abs/2304.00195v4
[DATE]
2024-04-13 06:49:28+08:00
[CATEGORIES]
cs.LG
General surgery vision transformer: A video pre-trained foundation model for general surgery
[AUTHORS]
Samuel Schmidgall, Ji Woong Kim, Jeffrey Jopling, Axel Krieger
[ABSTRACT]
The absence of openly accessible data and specialized foundation models is a
major barrier for computational research in surgery. Toward this, (i) we
open-source the largest dataset of general surgery videos to-date, consisting
of 680 hours of surgical videos, including data from robotic and laparoscopic
techniques across 28 procedures; (ii) we propose a technique for video
pre-training a general surgery vision transformer (GSViT) on surgical videos
based on forward video prediction that can run in real-time for surgical
applications, toward which we open-source the code and weights of GSViT; (iii)
we also release code and weights for procedure-specific fine-tuned versions of
GSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the
Cholec80 phase annotation task, displaying improved performance over
state-of-the-art single frame predictors.
[LINK]
http://arxiv.org/abs/2403.05949v3
[DATE]
2024-04-13 06:30:54+08:00
[CATEGORIES]
cs.LG
Variance Reduction based Experience Replay for Policy Optimization
[AUTHORS]
Hua Zheng, Wei Xie, M. Ben Feng
[ABSTRACT]
For reinforcement learning on complex stochastic systems, it is desirable to
effectively leverage the information from historical samples collected in
previous iterations to accelerate policy optimization. Classical experience
replay, while effective, treats all observations uniformly, neglecting their
relative importance. To address this limitation, we introduce a novel Variance
Reduction Experience Replay (VRER) framework, enabling the selective reuse of
relevant samples to improve policy gradient estimation. VRER, as an adaptable
method that can seamlessly integrate with different policy optimization
algorithms, forms the foundation of our sample efficient off-policy learning
algorithm known as Policy Gradient with VRER (PG-VRER). Furthermore, the lack
of a rigorous understanding of the experience replay approach in the literature
motivates us to introduce a novel theoretical framework that accounts for
sample dependencies induced by Markovian noise and behavior policy
interdependencies. This framework is then employed to analyze the finite-time
convergence of the proposed PG-VRER algorithm, revealing a crucial
bias-variance trade-off in policy gradient estimation: the reuse of older
experience tends to introduce a larger bias while simultaneously reducing
gradient estimation variance. Extensive experiments have shown that VRER offers
a notable and consistent acceleration in learning optimal policies and enhances
the performance of state-of-the-art (SOTA) policy optimization approaches.
[COMMENTS]
54 pages; Previously this version appeared as arXiv:2208.12341 which
was submitted as a new work by accident
[LINK]
http://arxiv.org/abs/2110.08902v4
[DATE]
2024-04-13 06:13:14+08:00
[CATEGORIES]
cs.LG
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
[AUTHORS]
Kecen Li, Chen Gong, Zhixiang Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang
[ABSTRACT]
Differential Privacy (DP) image data synthesis, which leverages the DP
technique to generate synthetic data to replace the sensitive data, allowing
organizations to share and utilize synthetic images without privacy concerns.
Previous methods incorporate the advanced techniques of generative models and
pre-training on a public dataset to produce exceptional DP image data, but
suffer from problems of unstable training and massive computational resource
demands. This paper proposes a novel DP image synthesis method, termed
PRIVIMAGE, which meticulously selects pre-training data, promoting the
efficient creation of DP datasets with high fidelity and utility. PRIVIMAGE
first establishes a semantic query function using a public dataset. Then, this
function assists in querying the semantic distribution of the sensitive
dataset, facilitating the selection of data from the public dataset with
analogous semantics for pre-training. Finally, we pre-train an image generative
model using the selected data and then fine-tune this model on the sensitive
dataset using Differentially Private Stochastic Gradient Descent (DP-SGD).
PRIVIMAGE allows us to train a lightly parameterized generative model, reducing
the noise in the gradient during DP-SGD training and enhancing training
stability. Extensive experiments demonstrate that PRIVIMAGE uses only 1% of the
public dataset for pre-training and 7.6% of the parameters in the generative
model compared to the state-of-the-art method, whereas achieves superior
synthetic performance and conserves more computational resources. On average,
PRIVIMAGE achieves 30.1% lower FID and 12.6% higher Classification Accuracy
than the state-of-the-art method. The replication package and datasets can be
accessed online.
[COMMENTS]
Accepted at USENIX Security 2024. The first two authors contributed
equally
[LINK]
http://arxiv.org/abs/2311.12850v3
[DATE]
2024-04-13 06:08:40+08:00
[CATEGORIES]
cs.LG
Structured Model Pruning for Efficient Inference in Computational Pathology
[AUTHORS]
Mohammed Adnan, Qinle Ba, Nazim Shaikh, Shivam Kalra, Satarupa Mukherjee, Auranuch Lorsakul
[ABSTRACT]
Recent years have seen significant efforts to adopt Artificial Intelligence
(AI) in healthcare for various use cases, from computer-aided diagnosis to ICU
triage. However, the size of AI models has been rapidly growing due to scaling
laws and the success of foundational models, which poses an increasing
challenge to leverage advanced models in practical applications. It is thus
imperative to develop efficient models, especially for deploying AI solutions
under resource-constrains or with time sensitivity. One potential solution is
to perform model compression, a set of techniques that remove less important
model components or reduce parameter precision, to reduce model computation
demand. In this work, we demonstrate that model pruning, as a model compression
technique, can effectively reduce inference cost for computational and digital
pathology based analysis with a negligible loss of analysis performance. To
this end, we develop a methodology for pruning the widely used U-Net-style
architectures in biomedical imaging, with which we evaluate multiple pruning
heuristics on nuclei instance segmentation and classification, and empirically
demonstrate that pruning can compress models by at least 70% with a negligible
drop in performance.
[LINK]
http://arxiv.org/abs/2404.08831v1
[DATE]
2024-04-13 06:05:01+08:00
[CATEGORIES]
cs.LG
MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers
[AUTHORS]
Yatong Bai, Mo Zhou, Vishal M. Patel, Somayeh Sojoudi
[ABSTRACT]
Adversarial robustness often comes at the cost of degraded accuracy, impeding
the real-life application of robust classification models. Training-based
solutions for better trade-offs are limited by incompatibilities with
already-trained high-performance large models, necessitating the exploration of
training-free ensemble approaches. Observing that robust models are more
confident in correct predictions than in incorrect ones on clean and
adversarial data alike, we speculate amplifying this “benign confidence
property” can reconcile accuracy and robustness in an ensemble setting. To
achieve so, we propose “MixedNUTS”, a training-free method where the output
logits of a robust classifier and a standard non-robust classifier are
processed by nonlinear transformations with only three parameters, which are
optimized through an efficient algorithm. MixedNUTS then converts the
transformed logits into probabilities and mixes them as the overall output. On
CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom
strong adaptive attacks demonstrate MixedNUTS’s vastly improved accuracy and
near-SOTA robustness – it boosts CIFAR-100 clean accuracy by 7.86 points,
sacrificing merely 0.87 points in robust accuracy.
[LINK]
http://arxiv.org/abs/2402.02263v3
[DATE]
2024-04-13 06:03:06+08:00
[CATEGORIES]
cs.LG
Measuring the Predictability of Recommender Systems using Structural Complexity Metrics
[AUTHORS]
Alfonso Valderrama, Andrés Abeliuk
[ABSTRACT]
Recommender systems (RS) are central to the filtering and curation of online
content. These algorithms predict user ratings for unseen items based on past
preferences. Despite their importance, the innate predictability of RS has
received limited attention. This study introduces data-driven metrics to
measure the predictability of RS based on the structural complexity of the
user-item rating matrix. A low predictability score indicates complex and
unpredictable user-item interactions, while a high predictability score reveals
less complex patterns with predictive potential. We propose two strategies that
use singular value decomposition (SVD) and matrix factorization (MF) to measure
structural complexity. By perturbing the data and evaluating the prediction of
the perturbed version, we explore the structural consistency indicated by the
SVD singular vectors. The assumption is that a random perturbation of highly
structured data does not change its structure. Empirical results show a high
correlation between our metrics and the accuracy of the best-performing
prediction algorithms on real data sets.
[COMMENTS]
Accepted at WWW-24 Workshop: DCAI Data-centric Artificial
Intelligence
[LINK]
http://arxiv.org/abs/2404.08829v1
[DATE]
2024-04-13 06:00:27+08:00
[CATEGORIES]
cs.LG
Hindsight PRIORs for Reward Learning from Human Preferences
[AUTHORS]
Mudit Verma, Katherine Metcalf
[ABSTRACT]
Preference based Reinforcement Learning (PbRL) removes the need to hand
specify a reward function by learning a reward from preference feedback over
policy behaviors. Current approaches to PbRL do not address the credit
assignment problem inherent in determining which parts of a behavior most
contributed to a preference, which result in data intensive approaches and
subpar reward functions. We address such limitations by introducing a credit
assignment strategy (Hindsight PRIOR) that uses a world model to approximate
state importance within a trajectory and then guides rewards to be proportional
to state importance through an auxiliary predicted return redistribution
objective. Incorporating state importance into reward learning improves the
speed of policy learning, overall policy performance, and reward recovery on
both locomotion and manipulation tasks. For example, Hindsight PRIOR recovers
on average significantly (p<0.05) more reward on MetaWorld (20%) and DMC (15%).
The performance gains and our ablations demonstrate the benefits even a simple
credit assignment strategy can have on reward learning and that state
importance in forward dynamics prediction is a strong proxy for a state’s
contribution to a preference decision. Code repository can be found at
https://github.com/apple/ml-rlhf-hindsight-prior.
[COMMENTS]
International Conference on Learning Representations, 2024
[LINK]
http://arxiv.org/abs/2404.08828v1
[DATE]
2024-04-13 05:59:42+08:00
[CATEGORIES]
cs.LG
Negative Feedback Training: A Novel Concept to Improve Robustness of NVCIM DNN Accelerators
[AUTHORS]
Yifan Qin, Zheyu Yan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi
[ABSTRACT]
Compute-in-memory (CIM) accelerators built upon non-volatile memory (NVM)
devices excel in energy efficiency and latency when performing Deep Neural
Network (DNN) inference, thanks to their in-situ data processing capability.
However, the stochastic nature and intrinsic variations of NVM devices often
result in performance degradation in DNN inference. Introducing these non-ideal
device behaviors during DNN training enhances robustness, but drawbacks include
limited accuracy improvement, reduced prediction confidence, and convergence
issues. This arises from a mismatch between the deterministic training and
non-deterministic device variations, as such training, though considering
variations, relies solely on the model’s final output. In this work, we draw
inspiration from the control theory and propose a novel training concept:
Negative Feedback Training (NFT) leveraging the multi-scale noisy information
captured from network. We develop two specific NFT instances, Oriented
Variational Forward (OVF) and Intermediate Representation Snapshot (IRS).
Extensive experiments show that our methods outperform existing
state-of-the-art methods with up to a 46.71% improvement in inference accuracy
while reducing epistemic uncertainty, boosting output confidence, and improving
convergence probability. Their effectiveness highlights the generality and
practicality of our NFT concept in enhancing DNN robustness against device
variations.
[LINK]
http://arxiv.org/abs/2305.14561v4
[DATE]
2024-04-13 05:56:21+08:00
[CATEGORIES]
cs.LG
Adversarial Patterns: Building Robust Android Malware Classifiers
[AUTHORS]
Dipkamal Bhusal, Nidhi Rastogi
[ABSTRACT]
Machine learning models are increasingly being adopted across various fields,
such as medicine, business, autonomous vehicles, and cybersecurity, to analyze
vast amounts of data, detect patterns, and make predictions or recommendations.
In the field of cybersecurity, these models have made significant improvements
in malware detection. However, despite their ability to understand complex
patterns from unstructured data, these models are susceptible to adversarial
attacks that perform slight modifications in malware samples, leading to
misclassification from malignant to benign. Numerous defense approaches have
been proposed to either detect such adversarial attacks or improve model
robustness. These approaches have resulted in a multitude of attack and defense
techniques and the emergence of a field known as `adversarial machine
learning.’ In this survey paper, we provide a comprehensive review of
adversarial machine learning in the context of Android malware classifiers.
Android is the most widely used operating system globally and is an easy target
for malicious agents. The paper first presents an extensive background on
Android malware classifiers, followed by an examination of the latest
advancements in adversarial attacks and defenses. Finally, the paper provides
guidelines for designing robust malware classifiers and outlines research
directions for the future.
[COMMENTS]
survey
[LINK]
http://arxiv.org/abs/2203.02121v2
[DATE]
2024-04-13 05:41:08+08:00
[CATEGORIES]
cs.LG
Single-image driven 3d viewpoint training data augmentation for effective wine label recognition
[AUTHORS]
Yueh-Cheng Huang, Hsin-Yi Chen, Cheng-Jui Hung, Jen-Hui Chuang, Jenq-Neng Hwang
[ABSTRACT]
Confronting the critical challenge of insufficient training data in the field
of complex image recognition, this paper introduces a novel 3D viewpoint
augmentation technique specifically tailored for wine label recognition. This
method enhances deep learning model performance by generating visually
realistic training samples from a single real-world wine label image,
overcoming the challenges posed by the intricate combinations of text and
logos. Classical Generative Adversarial Network (GAN) methods fall short in
synthesizing such intricate content combination. Our proposed solution
leverages time-tested computer vision and image processing strategies to expand
our training dataset, thereby broadening the range of training samples for deep
learning applications. This innovative approach to data augmentation
circumvents the constraints of limited training resources. Using the augmented
training images through batch-all triplet metric learning on a Vision
Transformer (ViT) architecture, we can get the most discriminative embedding
features for every wine label, enabling us to perform one-shot recognition of
existing wine labels in the training classes or future newly collected wine
labels unavailable in the training. Experimental results show a significant
increase in recognition accuracy over conventional 2D data augmentation
techniques.
[LINK]
http://arxiv.org/abs/2404.08820v1
[DATE]
2024-04-13 05:30:09+08:00
[CATEGORIES]
cs.LG
Can Public Large Language Models Help Private Cross-device Federated Learning?
[AUTHORS]
Boxin Wang, Yibo Jacky Zhang, Yuan Cao, Bo Li, H. Brendan McMahan, Sewoong Oh, Zheng Xu, Manzil Zaheer
[ABSTRACT]
We study (differentially) private federated learning (FL) of language models.
The language models in cross-device FL are relatively small, which can be
trained with meaningful formal user-level differential privacy (DP) guarantees
when massive parallelism in training is enabled by the participation of a
moderate size of users. Recently, public data has been used to improve
privacy-utility trade-offs for both large and small language models. In this
work, we provide a systematic study of using large-scale public data and LLMs
to help differentially private training of on-device FL models, and further
improve the privacy-utility tradeoff by techniques of distillation. Moreover,
we propose a novel distribution matching algorithm with theoretical grounding
to sample public data close to private data distribution, which significantly
improves the sample efficiency of (pre-)training on public data. The proposed
method is efficient and effective for training private models by taking
advantage of public data, especially for customized on-device architectures
that do not have ready-to-use pre-trained models.
[COMMENTS]
Published at Findings of NAACL 2024
[LINK]
http://arxiv.org/abs/2305.12132v2
[DATE]
2024-04-13 05:01:12+08:00
[CATEGORIES]
cs.LG
Reducing the Barriers to Entry for Foundation Model Training
[AUTHORS]
Paolo Faraboschi, Ellis Giles, Justin Hotard, Konstanty Owczarek, Andrew Wheeler
[ABSTRACT]
The world has recently witnessed an unprecedented acceleration in demands for
Machine Learning and Artificial Intelligence applications. This spike in demand
has imposed tremendous strain on the underlying technology stack in supply
chain, GPU-accelerated hardware, software, datacenter power density, and energy
consumption. If left on the current technological trajectory, future demands
show insurmountable spending trends, further limiting market players, stifling
innovation, and widening the technology gap. To address these challenges, we
propose a fundamental change in the AI training infrastructure throughout the
technology ecosystem. The changes require advancements in supercomputing and
novel AI training approaches, from high-end software to low-level hardware,
microprocessor, and chip design, while advancing the energy efficiency required
by a sustainable infrastructure. This paper presents the analytical framework
that quantitatively highlights the challenges and points to the opportunities
to reduce the barriers to entry for training large language models.
[LINK]
http://arxiv.org/abs/2404.08811v1
[DATE]
2024-04-13 04:58:25+08:00
[CATEGORIES]
cs.LG
Leveraging viscous Hamilton-Jacobi PDEs for uncertainty quantification in scientific machine learning
[AUTHORS]
Zongren Zou, Tingwei Meng, Paula Chen, Jérôme Darbon, George Em Karniadakis
[ABSTRACT]
Uncertainty quantification (UQ) in scientific machine learning (SciML)
combines the powerful predictive power of SciML with methods for quantifying
the reliability of the learned models. However, two major challenges remain:
limited interpretability and expensive training procedures. We provide a new
interpretation for UQ problems by establishing a new theoretical connection
between some Bayesian inference problems arising in SciML and viscous
Hamilton-Jacobi partial differential equations (HJ PDEs). Namely, we show that
the posterior mean and covariance can be recovered from the spatial gradient
and Hessian of the solution to a viscous HJ PDE. As a first exploration of this
connection, we specialize to Bayesian inference problems with linear models,
Gaussian likelihoods, and Gaussian priors. In this case, the associated viscous
HJ PDEs can be solved using Riccati ODEs, and we develop a new Riccati-based
methodology that provides computational advantages when continuously updating
the model predictions. Specifically, our Riccati-based approach can efficiently
add or remove data points to the training set invariant to the order of the
data and continuously tune hyperparameters. Moreover, neither update requires
retraining on or access to previously incorporated data. We provide several
examples from SciML involving noisy data and \textit{epistemic uncertainty} to
illustrate the potential advantages of our approach. In particular, this
approach’s amenability to data streaming applications demonstrates its
potential for real-time inferences, which, in turn, allows for applications in
which the predicted uncertainty is used to dynamically alter the learning
process.
[LINK]
http://arxiv.org/abs/2404.08809v1
[DATE]
2024-04-13 04:54:01+08:00
[CATEGORIES]
cs.LG
SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
[AUTHORS]
Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang
[ABSTRACT]
Recently, event-based vision sensors have gained attention for autonomous
driving applications, as conventional RGB cameras face limitations in handling
challenging dynamic conditions. However, the availability of real-world and
synthetic event-based vision datasets remains limited. In response to this gap,
we present SEVD, a first-of-its-kind multi-view ego, and fixed perception
synthetic event-based dataset using multiple dynamic vision sensors within the
CARLA simulator. Data sequences are recorded across diverse lighting (noon,
nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy)
with domain shifts (discrete and continuous). SEVD spans urban, suburban,
rural, and highway scenes featuring various classes of objects (car, truck,
van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes
RGB imagery, depth maps, optical flow, semantic, and instance segmentation,
facilitating a comprehensive understanding of the scene. Furthermore, we
evaluate the dataset using state-of-the-art event-based (RED, RVT) and
frame-based (YOLOv8) methods for traffic participant detection tasks and
provide baseline benchmarks for assessment. Additionally, we conduct
experiments to assess the synthetic event-based dataset’s generalization
capabilities. The dataset is available at
https://eventbasedvision.github.io/SEVD
[LINK]
http://arxiv.org/abs/2404.10540v1
[DATE]
2024-04-13 04:40:12+08:00
[CATEGORIES]
cs.LG
Semantic Approach to Quantifying the Consistency of Diffusion Model Image Generation
[AUTHORS]
Brinnae Bent
[ABSTRACT]
In this study, we identify the need for an interpretable, quantitative score
of the repeatability, or consistency, of image generation in diffusion models.
We propose a semantic approach, using a pairwise mean CLIP (Contrastive
Language-Image Pretraining) score as our semantic consistency score. We applied
this metric to compare two state-of-the-art open-source image generation
diffusion models, Stable Diffusion XL and PixArt-{\alpha}, and we found
statistically significant differences between the semantic consistency scores
for the models. Agreement between the Semantic Consistency Score selected model
and aggregated human annotations was 94%. We also explored the consistency of
SDXL and a LoRA-fine-tuned version of SDXL and found that the fine-tuned model
had significantly higher semantic consistency in generated images. The Semantic
Consistency Score proposed here offers a measure of image generation alignment,
facilitating the evaluation of model architectures for specific tasks and
aiding in informed decision-making regarding model selection.
[COMMENTS]
Accepted to 2024 CVPR 3rd Explainable AI for Computer Vision (XAI4CV)
Workshop
[LINK]
http://arxiv.org/abs/2404.08799v1
[DATE]
2024-04-13 04:16:03+08:00
[CATEGORIES]
cs.LG
Diffusion-Based Joint Temperature and Precipitation Emulation of Earth System Models
[AUTHORS]
Katie Christensen, Lyric Otto, Seth Bassetti, Claudia Tebaldi, Brian Hutchinson
[ABSTRACT]
Earth system models (ESMs) are the principal tools used in climate science to
generate future climate projections under various atmospheric emissions
scenarios on a global or regional scale. Generative deep learning approaches
are suitable for emulating these tools due to their computational efficiency
and ability, once trained, to generate realizations in a fraction of the time
required by ESMs. We extend previous work that used a generative probabilistic
diffusion model to emulate ESMs by targeting the joint emulation of multiple
variables, temperature and precipitation, by a single diffusion model. Joint
generation of multiple variables is critical to generate realistic samples of
phenomena resulting from the interplay of multiple variables. The diffusion
model emulator takes in the monthly mean-maps of temperature and precipitation
and produces the daily values of each of these variables that exhibit
statistical properties similar to those generated by ESMs. Our results show the
outputs from our extended model closely resemble those from ESMs on various
climate metrics including dry spells and hot streaks, and that the joint
distribution of temperature and precipitation in our sample closely matches
those of ESMs.
[COMMENTS]
Presentation at Tackling Climate Change with Machine Learning, ICLR
2024
[LINK]
http://arxiv.org/abs/2404.08797v1
[DATE]
2024-04-13 04:13:19+08:00
[CATEGORIES]
cs.LG
Convergence of coordinate ascent variational inference for log-concave measures via optimal transport
[AUTHORS]
Manuel Arnese, Daniel Lacker
[ABSTRACT]
Mean field variational inference (VI) is the problem of finding the closest
product (factorized) measure, in the sense of relative entropy, to a given
high-dimensional probability measure $\rho$. The well known Coordinate Ascent
Variational Inference (CAVI) algorithm aims to approximate this product measure
by iteratively optimizing over one coordinate (factor) at a time, which can be
done explicitly. Despite its popularity, the convergence of CAVI remains poorly
understood. In this paper, we prove the convergence of CAVI for log-concave
densities $\rho$. If additionally $\log \rho$ has Lipschitz gradient, we find a
linear rate of convergence, and if also $\rho$ is strongly log-concave, we find
an exponential rate. Our analysis starts from the observation that mean field
VI, while notoriously non-convex in the usual sense, is in fact displacement
convex in the sense of optimal transport when $\rho$ is log-concave. This
allows us to adapt techniques from the optimization literature on coordinate
descent algorithms in Euclidean space.
[LINK]
http://arxiv.org/abs/2404.08792v1
[DATE]
2024-04-13 03:43:54+08:00
[CATEGORIES]
cs.LG
Handling Reward Misspecification in the Presence of Expectation Mismatch
[AUTHORS]
Sarath Sreedharan, Malek Mechergui
[ABSTRACT]
Detecting and handling misspecified objectives, such as reward functions, has
been widely recognized as one of the central challenges within the domain of
Artificial Intelligence (AI) safety research. However, even with the
recognition of the importance of this problem, we are unaware of any works that
attempt to provide a clear definition for what constitutes (a) misspecified
objectives and (b) successfully resolving such misspecifications. In this work,
we use the theory of mind, i.e., the human user’s beliefs about the AI agent,
as a basis to develop a formal explanatory framework called Expectation
Alignment (EAL) to understand the objective misspecification and its causes.
Our \EAL\ framework not only acts as an explanatory framework for existing
works but also provides us with concrete insights into the limitations of
existing methods to handle reward misspecification and novel solution
strategies. We use these insights to propose a new interactive algorithm that
uses the specified reward to infer potential user expectations about the system
behavior. We show how one can efficiently implement this algorithm by mapping
the inference problem into linear programs. We evaluate our method on a set of
standard Markov Decision Process (MDP) benchmarks.
[LINK]
http://arxiv.org/abs/2404.08791v1
[DATE]
2024-04-13 03:43:37+08:00
[CATEGORIES]
cs.LG
Differentiable and Stable Long-Range Tracking of Multiple Posterior Modes
[AUTHORS]
Ali Younis, Erik Sudderth
[ABSTRACT]
Particle filters flexibly represent multiple posterior modes
nonparametrically, via a collection of weighted samples, but have classically
been applied to tracking problems with known dynamics and observation
likelihoods. Such generative models may be inaccurate or unavailable for
high-dimensional observations like images. We instead leverage training data to
discriminatively learn particle-based representations of uncertainty in latent
object states, conditioned on arbitrary observations via deep neural network
encoders. While prior discriminative particle filters have used heuristic
relaxations of discrete particle resampling, or biased learning by truncating
gradients at resampling steps, we achieve unbiased and low-variance gradient
estimates by representing posteriors as continuous mixture densities. Our
theory and experiments expose dramatic failures of existing
reparameterization-based estimators for mixture gradients, an issue we address
via an importance-sampling gradient estimator. Unlike standard recurrent neural
networks, our mixture density particle filter represents multimodal uncertainty
in continuous latent states, improving accuracy and robustness. On a range of
challenging tracking and robot localization problems, our approach achieves
dramatic improvements in accuracy, while also showing much greater stability
across multiple training runs.
[COMMENTS]
Neurips 2023
[LINK]
http://arxiv.org/abs/2404.08789v1
[DATE]
2024-04-13 03:33:52+08:00
[CATEGORIES]
cs.LG
Detecting AI-Generated Images via CLIP
[AUTHORS]
A. G. Moskowitz, T. Gaona, J. Peterson
[ABSTRACT]
As AI-generated image (AIGI) methods become more powerful and accessible, it
has become a critical task to determine if an image is real or AI-generated.
Because AIGI lack the signatures of photographs and have their own unique
patterns, new models are needed to determine if an image is AI-generated. In
this paper, we investigate the ability of the Contrastive Language-Image
Pre-training (CLIP) architecture, pre-trained on massive internet-scale data
sets, to perform this differentiation. We fine-tune CLIP on real images and
AIGI from several generative models, enabling CLIP to determine if an image is
AI-generated and, if so, determine what generation method was used to create
it. We show that the fine-tuned CLIP architecture is able to differentiate AIGI
as well or better than models whose architecture is specifically designed to
detect AIGI. Our method will significantly increase access to AIGI-detecting
tools and reduce the negative effects of AIGI on society, as our CLIP
fine-tuning procedures require no architecture changes from publicly available
model repositories and consume significantly less GPU resources than other AIGI
detection models.
[COMMENTS]
submitted for publication in Machine Vision and Applications
[LINK]
http://arxiv.org/abs/2404.08788v1
[DATE]
2024-04-13 03:29:10+08:00
[CATEGORIES]
cs.LG
Stochastic Halpern iteration in normed spaces and applications to reinforcement learning
[AUTHORS]
Mario Bravo, Juan Pablo Contreras
[ABSTRACT]
We analyze the oracle complexity of the stochastic Halpern iteration with
variance reduction, where we aim to approximate fixed-points of nonexpansive
and contractive operators in a normed finite-dimensional space. We show that if
the underlying stochastic oracle is with uniformly bounded variance, our method
exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$,
improving recent rates established for the stochastic Krasnoselskii-Mann
iteration. Also, we establish a lower bound of $\Omega(\varepsilon^{-3})$,
which applies to a wide range of algorithms, including all averaged iterations
even with minibatching. Using a suitable modification of our approach, we
derive a $O(\varepsilon^{-2}(1-\gamma)^{-3})$ complexity bound in the case in
which the operator is a $\gamma$-contraction. As an application, we propose new
synchronous algorithms for average reward and discounted reward Markov decision
processes. In particular, for the average reward, our method improves on the
best-known sample complexity.
[COMMENTS]
Added references, typos corrected
[LINK]
http://arxiv.org/abs/2403.12338v2
[DATE]
2024-04-13 03:14:59+08:00
[CATEGORIES]
cs.LG
Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset
[AUTHORS]
Xiaomeng Zhu, Talha Bilal, Pär Mårtensson, Lars Hanson, Mårten Björkman, Atsuto Maki
[ABSTRACT]
This paper is about effectively utilizing synthetic data for training deep
neural networks for industrial parts classification, in particular, by taking
into account the domain gap against real-world images. To this end, we
introduce a synthetic dataset that may serve as a preliminary testbed for the
Sim-to-Real challenge; it contains 17 objects of six industrial use cases,
including isolated and assembled parts. A few subsets of objects exhibit large
similarities in shape and albedo for reflecting challenging cases of industrial
parts. All the sample images come with and without random backgrounds and
post-processing for evaluating the importance of domain randomization. We call
it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of
SIP-17 through benchmarking the performance of five state-of-the-art deep
network models, supervised and self-supervised, trained only on the synthetic
data while testing them on real data. By analyzing the results, we deduce some
insights on the feasibility and challenges of using synthetic data for
industrial parts classification and for further developing larger-scale
synthetic datasets. Our dataset and code are publicly available.
[COMMENTS]
Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW)
[LINK]
http://arxiv.org/abs/2404.08778v1
[DATE]
2024-04-13 03:04:59+08:00
[CATEGORIES]
cs.LG
Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support System
[AUTHORS]
Chollette Olisah, Lyndon Smith, Melvyn Smith, Lawrence Morolake, Osi Ojukwu
[ABSTRACT]
Crop yield prediction has been modeled on the assumption that there is no
interaction between weather and soil variables. However, this paper argues that
an interaction exists, and it can be finely modelled using the Kendall
Correlation coefficient. Given the nonlinearity of the interaction between
weather and soil variables, a deep neural network regressor (DNNR) is carefully
designed with consideration to the depth, number of neurons of the hidden
layers, and the hyperparameters with their optimizations. Additionally, a new
metric, the average of absolute root squared error (ARSE) is proposed to
combine the strengths of root mean square error (RMSE) and mean absolute error
(MAE). With the ARSE metric, the proposed DNNR(s), optimised random forest
regressor (RFR) and the extreme gradient boosting regressor (XGBR) achieved
impressively small yield errors, 0.0172 t/ha, and 0.0243 t/ha, 0.0001 t/ha, and
0.001 t/ha, respectively. However, the DNNR(s), with changes to the explanatory
variables to ensure generalizability to unforeseen data, DNNR(s) performed
best. Further analysis reveals that a strong interaction does exist between
weather and soil variables. Precisely, yield is observed to increase when
precipitation is reduced and silt increased, and vice-versa. However, the
degree of decrease or increase is not quantified in this paper. Contrary to
existing yield models targeted towards agricultural policies and global food
security, the goal of the proposed corn yield model is to empower the
smallholder farmer to farm smartly and intelligently, thus the prediction model
is integrated into a mobile application that includes education, and a
farmer-to-market access module.
[COMMENTS]
30 Pages, 11 Figures, 3 Tables
[LINK]
http://arxiv.org/abs/2401.03768v2
[DATE]
2024-04-13 02:49:46+08:00
[CATEGORIES]
cs.LG
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
[AUTHORS]
Junchi Wang, Lei Ke
[ABSTRACT]
Understanding human instructions to identify the target objects is vital for
perception systems. In recent years, the advancements of Large Language Models
(LLMs) have introduced new possibilities for image segmentation. In this work,
we delve into reasoning segmentation, a novel task that enables segmentation
system to reason and interpret implicit user intention via large language model
reasoning and then segment the corresponding target. Our work on reasoning
segmentation contributes on both the methodological design and dataset
labeling. For the model, we propose a new framework named LLM-Seg. LLM-Seg
effectively connects the current foundational Segmentation Anything Model and
the LLM by mask proposals selection. For the dataset, we propose an automatic
data generation pipeline and construct a new reasoning segmentation dataset
named LLM-Seg40K. Experiments demonstrate that our LLM-Seg exhibits competitive
performance compared with existing methods. Furthermore, our proposed pipeline
can efficiently produce high-quality reasoning segmentation datasets. The
LLM-Seg40K dataset, developed through this pipeline, serves as a new benchmark
for training and evaluating various reasoning segmentation approaches. Our
code, models and dataset are at https://github.com/wangjunchi/LLMSeg.
[COMMENTS]
Github: https://github.com/wangjunchi/LLMSeg
[LINK]
http://arxiv.org/abs/2404.08767v1
[DATE]
2024-04-13 02:45:51+08:00
[CATEGORIES]
cs.LG
`Eyes of a Hawk and Ears of a Fox’: Part Prototype Network for Generalized Zero-Shot Learning
[AUTHORS]
Joshua Feinglass, Jayaraman J. Thiagarajan, Rushil Anirudh, T. S. Jayram, Yezhou Yang
[ABSTRACT]
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon
base models which consider only a single class attribute vector representation
over the entire image. This is an oversimplification of the process of novel
category recognition, where different regions of the image may have properties
from different seen classes and thus have different predominant attributes.
With this in mind, we take a fundamentally different approach: a pre-trained
Vision-Language detector (VINVL) sensitive to attribute information is employed
to efficiently obtain region features. A learned function maps the region
features to region-specific attribute attention used to construct class part
prototypes. We conduct experiments on a popular GZSL benchmark consisting of
the CUB, SUN, and AWA2 datasets where our proposed Part Prototype Network (PPN)
achieves promising results when compared with other popular base models.
Corresponding ablation studies and analysis show that our approach is highly
practical and has a distinct advantage over global attribute attention when
localized proposals are available.
[COMMENTS]
Accepted to the CVPR 2024 LIMIT Workshop
[LINK]
http://arxiv.org/abs/2404.08761v1
[DATE]
2024-04-13 02:37:00+08:00
[CATEGORIES]
cs.LG
Generating Illustrated Instructions
[AUTHORS]
Sachit Menon, Ishan Misra, Rohit Girdhar
[ABSTRACT]
We introduce the new task of generating Illustrated Instructions, i.e.,
visual instructions customized to a user’s needs. We identify desiderata unique
to this task, and formalize it through a suite of automatic and human
evaluation metrics, designed to measure the validity, consistency, and efficacy
of the generations. We combine the power of large language models (LLMs)
together with strong text-to-image generation diffusion models to propose a
simple approach called StackedDiffusion, which generates such illustrated
instructions given text as input. The resulting model strongly outperforms
baseline approaches and state-of-the-art multimodal LLMs; and in 30% of cases,
users even prefer it to human-generated articles. Most notably, it enables
various new and exciting applications far beyond what static articles on the
web can provide, such as personalized instructions complete with intermediate
steps and pictures in response to a user’s individual situation.
[COMMENTS]
Accepted to CVPR 2024. Project website:
http://facebookresearch.github.io/IllustratedInstructions. Code reproduction:
https://github.com/sachit-menon/generating-illustrated-instructions-reproduction
[LINK]
http://arxiv.org/abs/2312.04552v2
[DATE]
2024-04-13 02:34:31+08:00
[CATEGORIES]
cs.LG
Training a Vision Language Model as Smartphone Assistant
[AUTHORS]
Nicolai Dorka, Janusz Marecki, Ammar Anwar
[COMMENTS]
ICLR 2024 workshop on Generative Models for Decision Making
[LINK]
http://arxiv.org/abs/2404.08755v1
[DATE]
2024-04-13 02:28:44+08:00
[CATEGORIES]
cs.LG
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
[AUTHORS]
Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis
[ABSTRACT]
Large Language Models (LLMs) trained on large volumes of data excel at
various natural language tasks, but they cannot handle tasks requiring
knowledge that has not been trained on previously. One solution is to use a
retriever that fetches relevant information to expand LLM’s knowledge scope.
However, existing textual-oriented retrieval-based LLMs are not ideal on
structured table data due to diversified data modalities and large table sizes.
In this work, we propose OpenTab, an open-domain table reasoning framework
powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant
tables and then generates SQL programs to parse the retrieved tables
efficiently. Utilizing the intermediate data derived from the SQL executions,
it conducts grounded inference to produce accurate response. Extensive
experimental evaluation shows that OpenTab significantly outperforms baselines
in both open- and closed-domain settings, achieving up to 21.5% higher
accuracy. We further run ablation studies to validate the efficacy of our
proposed designs of the system.
[COMMENTS]
Accepted by ICLR 2024
[LINK]
http://arxiv.org/abs/2402.14361v2
[DATE]
2024-04-13 02:27:34+08:00
[CATEGORIES]
cs.LG
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
[AUTHORS]
Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan
[ABSTRACT]
Reinforcement learning (RL) theory has largely focused on proving minimax
sample complexity bounds. These require strategic exploration algorithms that
use relatively limited function classes for representing the policy or value
function. Our goal is to explain why deep RL algorithms often perform well in
practice, despite using random exploration and much more expressive function
classes like neural networks. Our work arrives at an explanation by showing
that many stochastic MDPs can be solved by performing only a few steps of value
iteration on the random policy’s Q function and then acting greedily. When this
is true, we find that it is possible to separate the exploration and learning
components of RL, making it much easier to analyze. We introduce a new RL
algorithm, SQIRL, that iteratively learns a near-optimal policy by exploring
randomly to collect rollouts and then performing a limited number of steps of
fitted-Q iteration over those rollouts. Any regression algorithm that satisfies
basic in-distribution generalization properties can be used in SQIRL to
efficiently solve common MDPs. This can explain why deep RL works, since it is
empirically established that neural networks generalize well in-distribution.
Furthermore, SQIRL explains why random exploration works well in practice. We
leverage SQIRL to derive instance-dependent sample complexity bounds for RL
that are exponential only in an “effective horizon” of lookahead and on the
complexity of the class used for function approximation. Empirically, we also
find that SQIRL performance strongly correlates with PPO and DQN performance in
a variety of stochastic environments, supporting that our theoretical analysis
is predictive of practical performance. Our code and data are available at
https://github.com/cassidylaidlaw/effective-horizon.
[LINK]
http://arxiv.org/abs/2312.08369v2
[DATE]
2024-04-13 02:26:36+08:00
[CATEGORIES]
cs.LG
Computing distances and means on manifolds with a metric-constrained Eikonal approach
[AUTHORS]
Daniel Kelshaw, Luca Magri
[ABSTRACT]
Computing distances on Riemannian manifolds is a challenging problem with
numerous applications, from physics, through statistics, to machine learning.
In this paper, we introduce the metric-constrained Eikonal solver to obtain
continuous, differentiable representations of distance functions on manifolds.
The differentiable nature of these representations allows for the direct
computation of globally length-minimising paths on the manifold. We showcase
the use of metric-constrained Eikonal solvers for a range of manifolds and
demonstrate the applications. First, we demonstrate that metric-constrained
Eikonal solvers can be used to obtain the Fr'echet mean on a manifold,
employing the definition of a Gaussian mixture model, which has an analytical
solution to verify the numerical results. Second, we demonstrate how the
obtained distance function can be used to conduct unsupervised clustering on
the manifold – a task for which existing approaches are computationally
prohibitive. This work opens opportunities for distance computations on
manifolds.
[LINK]
http://arxiv.org/abs/2404.08754v1
[DATE]
2024-04-13 02:26:32+08:00
[CATEGORIES]
cs.LG
FastLogAD: Log Anomaly Detection with Mask-Guided Pseudo Anomaly Generation and Discrimination
[AUTHORS]
Yifei Lin, Hanqiu Deng, Xingyu Li
[ABSTRACT]
Nowadays large computers extensively output logs to record the runtime status
and it has become crucial to identify any suspicious or malicious activities
from the information provided by the realtime logs. Thus, fast log anomaly
detection is a necessary task to be implemented for automating the infeasible
manual detection. Most of the existing unsupervised methods are trained only on
normal log data, but they usually require either additional abnormal data for
hyperparameter selection or auxiliary datasets for discriminative model
optimization. In this paper, aiming for a highly effective discriminative model
that enables rapid anomaly detection,we propose FastLogAD, a
generator-discriminator framework trained to exhibit the capability of
generating pseudo-abnormal logs through the Mask-Guided Anomaly Generation
(MGAG) model and efficiently identifying the anomalous logs via the
Discriminative Abnormality Separation (DAS) model. Particularly,
pseudo-abnormal logs are generated by replacing randomly masked tokens in a
normal sequence with unlikely candidates. During the discriminative stage,
FastLogAD learns a distinct separation between normal and pseudoabnormal
samples based on their embedding norms, allowing the selection of a threshold
without exposure to any test data and achieving competitive performance.
Extensive experiments on several common benchmarks show that our proposed
FastLogAD outperforms existing anomaly detection approaches. Furthermore,
compared to previous methods, FastLogAD achieves at least x10 speed increase in
anomaly detection over prior work. Our implementation is available at
https://github.com/YifeiLin0226/FastLogAD.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2404.08750v1
[DATE]
2024-04-13 02:23:29+08:00
[CATEGORIES]
cs.LG
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
[AUTHORS]
Matteo Tucat, Anirbit Mukherjee
[ABSTRACT]
In this work, we instantiate a regularized form of the gradient clipping
algorithm and prove that it can converge to the global minima of deep neural
network loss functions provided that the net is of sufficient width. We present
empirical evidence that our theoretically founded regularized gradient clipping
algorithm is also competitive with the state-of-the-art deep-learning
heuristics. Hence the algorithm presented here constitutes a new approach to
rigorous deep learning.
The modification we do to standard gradient clipping is designed to leverage
the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was
recently proven to be true for various neural networks for any depth within a
neighborhood of the initialisation.
[COMMENTS]
16 pages, 4 figures
[LINK]
http://arxiv.org/abs/2404.08624v1
[DATE]
2024-04-13 01:37:42+08:00
[CATEGORIES]
cs.LG
A Dynamical Model of Neural Scaling Laws
[AUTHORS]
Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
[ABSTRACT]
On a variety of tasks, the performance of neural networks predictably
improves with training time, dataset size and model size across many orders of
magnitude. This phenomenon is known as a neural scaling law. Of fundamental
importance is the compute-optimal scaling law, which reports the performance as
a function of units of compute when choosing model sizes optimally. We analyze
a random feature model trained with gradient descent as a solvable model of
network training and generalization. This reproduces many observations about
neural scaling laws. First, our model makes a prediction about why the scaling
of performance with training time and with model size have different power law
exponents. Consequently, the theory predicts an asymmetric compute-optimal
scaling rule where the number of training steps are increased faster than model
parameters, consistent with recent empirical observations. Second, it has been
observed that early in training, networks converge to their infinite-width
dynamics at a rate $1/\textit{width}$ but at late time exhibit a rate
$\textit{width}^{-c}$, where $c$ depends on the structure of the architecture
and task. We show that our model exhibits this behavior. Lastly, our theory
shows how the gap between training and test loss can gradually build up over
time due to repeated reuse of data.
[COMMENTS]
Updated Appendix with new SGD section, more ensembling verification,
and connection to timescale/eigenvalue densities
[LINK]
http://arxiv.org/abs/2402.01092v2
[DATE]
2024-04-13 01:16:09+08:00
[CATEGORIES]
cs.LG
Hyperbolic Delaunay Geometric Alignment
[AUTHORS]
Aniss Aiman Medbouhi, Giovanni Luca Marchetti, Vladislav Polianskii, Alexander Kravberg, Petra Poklukar, Anastasia Varava, Danica Kragic
[ABSTRACT]
Hyperbolic machine learning is an emerging field aimed at representing data
with a hierarchical structure. However, there is a lack of tools for evaluation
and analysis of the resulting hyperbolic data representations. To this end, we
propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) – a similarity
score for comparing datasets in a hyperbolic space. The core idea is counting
the edges of the hyperbolic Delaunay graph connecting datapoints across the
given sets. We provide an empirical investigation on synthetic and real-life
biological data and demonstrate that HyperDGA outperforms the hyperbolic
version of classical distances between sets. Furthermore, we showcase the
potential of HyperDGA for evaluating latent representations inferred by a
Hyperbolic Variational Auto-Encoder.
[LINK]
http://arxiv.org/abs/2404.08608v1
[DATE]
2024-04-13 01:14:58+08:00
[CATEGORIES]
cs.LG
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
[AUTHORS]
Lorenzo Bardone, Sebastian Goldt
[ABSTRACT]
Neural networks extract features from data using stochastic gradient descent
(SGD). In particular, higher-order input cumulants (HOCs) are crucial for their
performance. However, extracting information from the $p$th cumulant of
$d$-dimensional inputs is computationally hard: the number of samples required
to recover a single direction from an order-$p$ tensor (tensor PCA) using
online SGD grows as $d^{p-1}$, which is prohibitive for high-dimensional
inputs. This result raises the question of how neural networks extract relevant
directions from the HOCs of their inputs efficiently. Here, we show that
correlations between latent variables along the directions encoded in different
input cumulants speed up learning from higher-order correlations. We show this
effect analytically by deriving nearly sharp thresholds for the number of
samples required by a single neuron to weakly-recover these directions using
online SGD from a random start in high dimensions. Our analytical results are
confirmed in simulations of two-layer neural networks and unveil a new
mechanism for hierarchical learning in neural networks.
[LINK]
http://arxiv.org/abs/2404.08602v1
[DATE]
2024-04-13 01:01:25+08:00
[CATEGORIES]
cs.LG
Generating Synthetic Time Series Data for Cyber-Physical Systems
[AUTHORS]
Alexander Sommers, Somayeh Bakhtiari Ramezani, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure
[ABSTRACT]
Data augmentation is an important facilitator of deep learning applications
in the time series domain. A gap is identified in the literature, demonstrating
sparse exploration of the transformer, the dominant sequence model, for data
augmentation in time series. A architecture hybridizing several successful
priors is put forth and tested using a powerful time domain similarity metric.
Results suggest the challenge of this domain, and several valuable directions
for future work.
[LINK]
http://arxiv.org/abs/2404.08601v1
[DATE]
2024-04-13 00:55:08+08:00
[CATEGORIES]
cs.LG
ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification
[AUTHORS]
Ahmad Sajedi, Samir Khaki, Yuri A. Lawryshyn, Konstantinos N. Plataniotis
[ABSTRACT]
Multi-label image classification presents a challenging task in many domains,
including computer vision and medical imaging. Recent advancements have
introduced graph-based and transformer-based methods to improve performance and
capture label dependencies. However, these methods often include complex
modules that entail heavy computation and lack interpretability. In this paper,
we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel
framework to address these challenges in multi-label image classification
tasks. Our simple yet effective approach employs supervised contrastive
learning, in which samples that share enough labels with an anchor image based
on a decision threshold are introduced as a positive set. This structure
captures label dependencies by pulling positive pair embeddings together and
pushing away negative samples that fall below the threshold. We enhance
representation learning by incorporating a mixture density network into
contrastive learning and generating Gaussian mixture distributions to explore
the epistemic uncertainty of the feature encoder. We validate the effectiveness
of our framework through experimentation with datasets from the computer vision
and medical imaging domains. Our method outperforms the existing
state-of-the-art methods while achieving a low computational footprint on both
datasets. Visualization analyses also demonstrate that ProbMCL-learned
classifiers maintain a meaningful semantic topology.
[COMMENTS]
This paper has been accepted for the ICASSP 2024 - 2024 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[LINK]
http://arxiv.org/abs/2401.01448v2
[DATE]
2024-04-13 00:37:46+08:00
[CATEGORIES]
cs.LG
Leap: molecular synthesisability scoring with intermediates
[AUTHORS]
Antonia Calvi, Théophile Gaudin, Dominik Miketa, Dominique Sydow, Liam Wilbraham
[ABSTRACT]
Assessing whether a molecule can be synthesised is a primary task in drug
discovery. It enables computational chemists to filter for viable compounds or
bias molecular generative models. The notion of synthesisability is dynamic as
it evolves depending on the availability of key compounds. A common approach in
drug discovery involves exploring the chemical space surrounding
synthetically-accessible intermediates. This strategy improves the
synthesisability of the derived molecules due to the availability of key
intermediates. Existing synthesisability scoring methods such as SAScore,
SCScore and RAScore, cannot condition on intermediates dynamically. Our
approach, Leap, is a GPT-2 model trained on the depth, or longest linear path,
of predicted synthesis routes that allows information on the availability of
key intermediates to be included at inference time. We show that Leap surpasses
all other scoring methods by at least 5% on AUC score when identifying
synthesisable molecules, and can successfully adapt predicted scores when
presented with a relevant intermediate compound.
[COMMENTS]
New Frontiers of AI for Drug Discovery and Development workshop paper
[LINK]
http://arxiv.org/abs/2403.13005v2
[DATE]
2024-04-13 00:26:04+08:00
[CATEGORIES]
cs.LG
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation
[AUTHORS]
Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, Panagiotis Angeloudis
[ABSTRACT]
This paper introduces CRITICAL, a novel closed-loop framework for autonomous
vehicle (AV) training and testing. CRITICAL stands out for its ability to
generate diverse scenarios, focusing on critical driving situations that target
specific learning and performance gaps identified in the Reinforcement Learning
(RL) agent. The framework achieves this by integrating real-world traffic
dynamics, driving behavior analysis, surrogate safety measures, and an optional
Large Language Model (LLM) component. It is proven that the establishment of a
closed feedback loop between the data generation pipeline and the training
process can enhance the learning rate during training, elevate overall system
performance, and augment safety resilience. Our evaluations, conducted using
the Proximal Policy Optimization (PPO) and the HighwayEnv simulation
environment, demonstrate noticeable performance improvements with the
integration of critical case generation and LLM analysis, indicating CRITICAL’s
potential to improve the robustness of AV systems and streamline the generation
of critical scenarios. This ultimately serves to hasten the development of AV
agents, expand the general scope of RL training, and ameliorate validation
efforts for AV safety.
[COMMENTS]
7 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.08570v1
[DATE]
2024-04-13 00:13:10+08:00
[CATEGORIES]
cs.LG
Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation
[AUTHORS]
Liu Yang, Qiang Li, Xiaoyang Ren, Yi Fang, Shafei Wang
[ABSTRACT]
Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal
hardware-induced unique distortion resident in the transmit signals to identify
an emitter, is emerging as a means to enhance the security of communication
systems. Recently, machine learning has achieved great success in developing
state-of-the-art RFFI models. However, few works consider cross-receiver RFFI
problems, where the RFFI model is trained and deployed on different receivers.
Due to altered receiver characteristics, direct deployment of RFFI model on a
new receiver leads to significant performance degradation. To address this
issue, we formulate the cross-receiver RFFI as a model adaptation problem,
which adapts the trained model to unlabeled signals from a new receiver. We
first develop a theoretical generalization error bound for the adaptation
model. Motivated by the bound, we propose a novel method to solve the
cross-receiver RFFI problem, which includes domain alignment and adaptive
pseudo-labeling. The former aims at finding a feature space where both domains
exhibit similar distributions, effectively reducing the domain discrepancy.
Meanwhile, the latter employs a dynamic pseudo-labeling scheme to implicitly
transfer the label information from the labeled receiver to the new receiver.
Experimental results indicate that the proposed method can effectively mitigate
the receiver impact and improve the cross-receiver RFFI performance.
[COMMENTS]
Accepted by IEEE Internet of Things Journal
[LINK]
http://arxiv.org/abs/2404.08566v1
[DATE]
2024-04-13 00:08:32+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Neville K Kitson, Anthony C Constantinou [ABSTRACT]
Causal Bayesian Networks provide an important tool for reasoning under
uncertainty with potential application to many complex causal systems.
Structure learning algorithms that can tell us something about the causal
structure of these systems are becoming increasingly important. In the
literature, the validity of these algorithms is often tested for sensitivity
over varying sample sizes, hyper-parameters, and occasionally objective
functions. In this paper, we show that the order in which the variables are
read from data can have much greater impact on the accuracy of the algorithm
than these factors. Because the variable ordering is arbitrary, any significant
effect it has on learnt graph accuracy is concerning, and this raises questions
about the validity of the results produced by algorithms that are sensitive to,
but have not been assessed against, different variable orderings. [LINK]
http://arxiv.org/abs/2206.08952v2 [DATE]
2024-04-13 00:05:03+08:00 [CATEGORIES]
cs.LG
MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking
[AUTHORS]
Tianwen Tang, Tong Zhu, Haodong Liu, Yin Bai, Jia Cheng, Wenliang Chen
[COMMENTS]
Accepted to LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.08559v1
[DATE]
2024-04-12 23:57:41+08:00
[CATEGORIES]
cs.CL
VertAttack: Taking advantage of Text Classifiers’ horizontal vision
[AUTHORS]
Jonathan Rusert
[ABSTRACT]
Text classification systems have continuously improved in performance over
the years. However, nearly all current SOTA classifiers have a similar
shortcoming, they process text in a horizontal manner. Vertically written words
will not be recognized by a classifier. In contrast, humans are easily able to
recognize and read words written both horizontally and vertically. Hence, a
human adversary could write problematic words vertically and the meaning would
still be preserved to other humans. We simulate such an attack, VertAttack.
VertAttack identifies which words a classifier is reliant on and then rewrites
those words vertically. We find that VertAttack is able to greatly drop the
accuracy of 4 different transformer models on 5 datasets. For example, on the
SST2 dataset, VertAttack is able to drop RoBERTa’s accuracy from 94 to 13%.
Furthermore, since VertAttack does not replace the word, meaning is easily
preserved. We verify this via a human study and find that crowdworkers are able
to correctly label 77% perturbed texts perturbed, compared to 81% of the
original texts. We believe VertAttack offers a look into how humans might
circumvent classifiers in the future and thus inspire a look into more robust
algorithms.
[COMMENTS]
14 pages, 4 figures, accepted to NAACL 2024
[LINK]
http://arxiv.org/abs/2404.08538v1
[DATE]
2024-04-12 23:32:17+08:00
[CATEGORIES]
cs.CL
Rethinking How to Evaluate Language Model Jailbreak
[AUTHORS]
Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik
[ABSTRACT]
Large language models (LLMs) have become increasingly integrated with various
applications. To ensure that LLMs do not generate unsafe responses, they are
aligned with safeguards that specify what content is restricted. However, such
alignment can be bypassed to produce prohibited content using a technique
commonly referred to as jailbreak. Different systems have been proposed to
perform the jailbreak automatically. These systems rely on evaluation methods
to determine whether a jailbreak attempt is successful. However, our analysis
reveals that current jailbreak evaluation methods have two limitations. (1)
Their objectives lack clarity and do not align with the goal of identifying
unsafe responses. (2) They oversimplify the jailbreak result as a binary
outcome, successful or not. In this paper, we propose three metrics, safeguard
violation, informativeness, and relative truthfulness, to evaluate language
model jailbreak. Additionally, we demonstrate how these metrics correlate with
the goal of different malicious actors. To compute these metrics, we introduce
a multifaceted approach that extends the natural language generation evaluation
method after preprocessing the response. We evaluate our metrics on a benchmark
dataset produced from three malicious intent datasets and three jailbreak
systems. The benchmark dataset is labeled by three annotators. We compare our
multifaceted approach with three existing jailbreak evaluation methods.
Experiments demonstrate that our multifaceted evaluation outperforms existing
methods, with F1 scores improving on average by 17% compared to existing
baselines. Our findings motivate the need to move away from the binary view of
the jailbreak problem and incorporate a more comprehensive evaluation to ensure
the safety of the language model.
[LINK]
http://arxiv.org/abs/2404.06407v2
[DATE]
2024-04-12 23:02:15+08:00
[CATEGORIES]
cs.CL
cs.LG
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
[AUTHORS]
Xuan Xie, Jiayang Song, Zhehua Zhou, Yuheng Huang, Da Song, Lei Ma
[ABSTRACT]
While Large Language Models (LLMs) have seen widespread applications across
numerous fields, their limited interpretability poses concerns regarding their
safe operations from multiple aspects, e.g., truthfulness, robustness, and
fairness. Recent research has started developing quality assurance methods for
LLMs, introducing techniques such as offline detector-based or uncertainty
estimation methods. However, these approaches predominantly concentrate on
post-generation analysis, leaving the online safety analysis for LLMs during
the generation phase an unexplored area. To bridge this gap, we conduct in this
work a comprehensive evaluation of the effectiveness of existing online safety
analysis methods on LLMs. We begin with a pilot study that validates the
feasibility of detecting unsafe outputs in the early generation process.
Following this, we establish the first publicly available benchmark of online
safety analysis for LLMs, including a broad spectrum of methods, models, tasks,
datasets, and evaluation metrics. Utilizing this benchmark, we extensively
analyze the performance of state-of-the-art online safety analysis methods on
both open-source and closed-source LLMs. This analysis reveals the strengths
and weaknesses of individual methods and offers valuable insights into
selecting the most appropriate method based on specific application scenarios
and task requirements. Furthermore, we also explore the potential of using
hybridization methods, i.e., combining multiple methods to derive a collective
safety conclusion, to enhance the efficacy of online safety analysis for LLMs.
Our findings indicate a promising direction for the development of innovative
and trustworthy quality assurance methodologies for LLMs, facilitating their
reliable deployments across diverse domains.
[LINK]
http://arxiv.org/abs/2404.08517v1
[DATE]
2024-04-12 22:55:16+08:00
[CATEGORIES]
cs.CL
cs.LG
Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
[AUTHORS]
Boyi Li, Rodolfo Corona, Karttikeya Mangalam, Catherine Chen, Daniel Flaherty, Serge Belongie, Kilian Q. Weinberger, Jitendra Malik, Trevor Darrell, Dan Klein
[ABSTRACT]
Are multimodal inputs necessary for grammar induction? Recent work has shown
that multimodal training inputs can improve grammar induction. However, these
improvements are based on comparisons to weak text-only baselines that were
trained on relatively little textual data. To determine whether multimodal
inputs are needed in regimes with large amounts of textual training data, we
design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is
a C-PFCG that incorporates em-beddings from text-only large language models
(LLMs). We use a fixed grammar family to directly compare LC-PCFG to various
multi-modal grammar induction methods. We compare performance on four benchmark
datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1
compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is
also more computationally efficient, providing an up to 85% reduction in
parameter count and 8.8x reduction in training time compared to multimodal
approaches. These results suggest that multimodal inputs may not be necessary
for grammar induction, and emphasize the importance of strong vision-free
baselines for evaluating the benefit of multimodal approaches.
[COMMENTS]
NAACL Findings 2024
[LINK]
http://arxiv.org/abs/2212.10564v3
[DATE]
2024-04-12 22:53:30+08:00
[CATEGORIES]
cs.CL
cs.LG
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
[AUTHORS]
Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. Iyer
[ABSTRACT]
Large language models (LLMs) have been driving a new wave of interactive AI
applications across numerous domains. However, efficiently serving LLM
inference requests is challenging due to their unpredictable execution times
originating from the autoregressive nature of generative models. Existing LLM
serving systems exploit first-come-first-serve (FCFS) scheduling, suffering
from head-of-line blocking issues. To address the non-deterministic nature of
LLMs and enable efficient interactive LLM serving, we present a speculative
shortest-job-first (SSJF) scheduler that uses a light proxy model to predict
LLM output sequence lengths. Our open-source SSJF implementation does not
require changes to memory management or batching strategies. Evaluations on
real-world datasets and production workload traces show that SSJF reduces
average job completion times by 30.5-39.6% and increases throughput by 2.2-3.6x
compared to FCFS schedulers, across no batching, dynamic batching, and
continuous batching settings.
[COMMENTS]
Accepted at AIOps’24
[LINK]
http://arxiv.org/abs/2404.08509v1
[DATE]
2024-04-12 22:46:15+08:00
[CATEGORIES]
cs.CL
cs.LG
Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing
[AUTHORS]
Zhenyu Qian, Yiming Qian, Yuting Song, Fei Gao, Hai Jin, Chen Yu, Xia Xie
[ABSTRACT]
Handling graph data is one of the most difficult tasks. Traditional
techniques, such as those based on geometry and matrix factorization, rely on
assumptions about the data relations that become inadequate when handling large
and complex graph data. On the other hand, deep learning approaches demonstrate
promising results in handling large graph data, but they often fall short of
providing interpretable explanations. To equip the graph processing with both
high accuracy and explainability, we introduce a novel approach that harnesses
the power of a large language model (LLM), enhanced by an uncertainty-aware
module to provide a confidence score on the generated answer. We experiment
with our approach on two graph processing tasks: few-shot knowledge graph
completion and graph classification. Our results demonstrate that through
parameter efficient fine-tuning, the LLM surpasses state-of-the-art algorithms
by a substantial margin across ten diverse benchmark datasets. Moreover, to
address the challenge of explainability, we propose an uncertainty estimation
based on perturbation, along with a calibration scheme to quantify the
confidence scores of the generated answers. Our confidence measure achieves an
AUC of 0.8 or higher on seven out of the ten datasets in predicting the
correctness of the answer generated by LLM.
[COMMENTS]
Because my organization does not allow members to privately upload
papers to arXiv, I am requesting a withdrawal of my submission
[LINK]
http://arxiv.org/abs/2404.00589v2
[DATE]
2024-04-12 22:30:10+08:00
[CATEGORIES]
cs.LG
cs.CL
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
[AUTHORS]
Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang
[COMMENTS]
NAACL 2024
[LINK]
http://arxiv.org/abs/2404.08491v1
[DATE]
2024-04-12 22:19:16+08:00
[CATEGORIES]
cs.CL
Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian
[AUTHORS]
Stefano De Paoli
[ABSTRACT]
This paper proposes a test to perform Thematic Analysis (TA) with Large
Language Model (LLM) on data which is in a different language than English.
While there has been initial promising work on using pre-trained LLMs for TA on
data in English, we lack any tests on whether these models can reasonably
perform the same analysis with good quality in other language. In this paper a
test will be proposed using an open access dataset of semi-structured
interviews in Italian. The test shows that a pre-trained model can perform such
a TA on the data, also using prompts in Italian. A comparative test shows the
model capacity to produce themes which have a good resemblance with those
produced independently by human researchers. The main implication of this study
is that pre-trained LLMs may thus be suitable to support analysis in
multilingual situations, so long as the language is supported by the model
used.
[LINK]
http://arxiv.org/abs/2404.08488v1
[DATE]
2024-04-12 22:10:09+08:00
[CATEGORIES]
cs.CL
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding
[AUTHORS]
Guangyu Yang, Jinghong Chen, Weizhe Lin, Bill Byrne
[ABSTRACT]
Minimum Bayes Risk (MBR) decoding can significantly improve translation
performance of Multilingual Large Language Models (MLLMs). However, MBR
decoding is computationally expensive. We show how the recently developed
Reinforcement Learning technique, Direct Preference Optimization (DPO), can
fine-tune MLLMs to get the gains of MBR without any additional computation in
inference. Our method uses only a small monolingual fine-tuning set and yields
significantly improved performance on multiple NMT test sets compared to MLLMs
without DPO.
[COMMENTS]
To appear at NAACL 2024
[LINK]
http://arxiv.org/abs/2311.08380v2
[DATE]
2024-04-12 22:07:38+08:00
[CATEGORIES]
cs.CL
Decoding AI: The inside story of data analysis in ChatGPT
[AUTHORS]
Ozan Evkaya, Miguel de Carvalho
[COMMENTS]
15 pages with figures and appendix
[LINK]
http://arxiv.org/abs/2404.08480v1
[DATE]
2024-04-12 21:57:30+08:00
[CATEGORIES]
cs.LG
cs.CL
QAQ: Quality Adaptive Quantization for LLM KV Cache
[AUTHORS]
Shichen Dong, Wen Cheng, Jiayu Qin, Wei Wang
[ABSTRACT]
The emergence of LLMs has ignited a fresh surge of breakthroughs in NLP
applications, particularly in domains such as question-answering systems and
text generation. As the need for longer context grows, a significant bottleneck
in model deployment emerges due to the linear expansion of the Key-Value (KV)
cache with the context length. Existing methods primarily rely on various
hypotheses, such as sorting the KV cache based on attention scores for
replacement or eviction, to compress the KV cache and improve model throughput.
However, heuristics used by these strategies may wrongly evict essential KV
cache, which can significantly degrade model performance. In this paper, we
propose QAQ, a Quality Adaptive Quantization scheme for the KV cache. We
theoretically demonstrate that key cache and value cache exhibit distinct
sensitivities to quantization, leading to the formulation of separate
quantization strategies for their non-uniform quantization. Through the
integration of dedicated outlier handling, as well as an improved
attention-aware approach, QAQ achieves up to 10x the compression ratio of the
KV cache size with a neglectable impact on model performance. QAQ significantly
reduces the practical hurdles of deploying LLMs, opening up new possibilities
for longer-context applications. The code is available at
github.com/ClubieDong/KVCacheQuantization.
[LINK]
http://arxiv.org/abs/2403.04643v2
[DATE]
2024-04-12 21:00:25+08:00
[CATEGORIES]
cs.CL
Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation
[AUTHORS]
Valentin Leonhard Buchner, Lele Cao, Jan-Christoph Kalo, Vilhelm von Ehrenheim
[COMMENTS]
Accepted by NAACL 2024 industry track (6 pages, 4 figures). Source
code to be found at https://github.com/EQTPartners/PTEC
[LINK]
http://arxiv.org/abs/2309.12075v3
[DATE]
2024-04-12 20:25:50+08:00
[CATEGORIES]
cs.CL
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
[AUTHORS]
William Fleshman, Aleem Khan, Marc Marone, Benjamin Van Durme
[ABSTRACT]
Large language models (LLMs) are increasingly capable of completing knowledge
intensive tasks by recalling information from a static pretraining corpus. Here
we are concerned with LLMs in the context of evolving data requirements. For
instance: batches of new data that are introduced periodically; subsets of data
with user-based access controls; or requirements on dynamic removal of
documents with guarantees that associated knowledge cannot be recalled. We wish
to satisfy these requirements while at the same time ensuring a model does not
forget old information when new data becomes available. To address these
issues, we introduce AdapterSwap, a training and inference scheme that
organizes knowledge from a data collection into a set of low-rank adapters,
which are dynamically composed during inference. Our experiments demonstrate
AdapterSwap’s ability to support efficient continual learning, while also
enabling organizations to have fine-grained control over data access and
deletion.
[LINK]
http://arxiv.org/abs/2404.08417v1
[DATE]
2024-04-12 20:06:02+08:00
[CATEGORIES]
cs.LG
cs.CL
Learning representations of learning representations
[AUTHORS]
Rita González-Márquez, Dmitry Kobak
[ABSTRACT]
The ICLR conference is unique among the top machine learning conferences in
that all submitted papers are openly available. Here we present the ICLR
dataset consisting of abstracts of all 24 thousand ICLR submissions from
2017-2024 with meta-data, decision scores, and custom keyword-based labels. We
find that on this dataset, bag-of-words representation outperforms most
dedicated sentence transformer models in terms of $k$NN classification
accuracy, and the top performing language models barely outperform TF-IDF. We
see this as a challenge for the NLP community. Furthermore, we use the ICLR
dataset to study how the field of machine learning has changed over the last
seven years, finding some improvement in gender balance. Using a 2D embedding
of the abstracts’ texts, we describe a shift in research topics from 2017 to
2024 and identify hedgehogs and foxes among the authors with the highest number
of ICLR submissions.
[LINK]
http://arxiv.org/abs/2404.08403v1
[DATE]
2024-04-12 19:30:16+08:00
[CATEGORIES]
cs.CL
cs.LG
Topic-Controllable Summarization: Topic-Aware Evaluation and Transformer Methods
[AUTHORS]
Tatiana Passali, Grigorios Tsoumakas
[ABSTRACT]
Topic-controllable summarization is an emerging research area with a wide
range of potential applications. However, existing approaches suffer from
significant limitations. For example, the majority of existing methods built
upon recurrent architectures, which can significantly limit their performance
compared to more recent Transformer-based architectures, while they also
require modifications to the model’s architecture for controlling the topic. At
the same time, there is currently no established evaluation metric designed
specifically for topic-controllable summarization. This work proposes a new
topic-oriented evaluation measure to automatically evaluate the generated
summaries based on the topic affinity between the generated summary and the
desired topic. The reliability of the proposed measure is demonstrated through
appropriately designed human evaluation. In addition, we adapt topic embeddings
to work with powerful Transformer architectures and propose a novel and
efficient approach for guiding the summary generation through control tokens.
Experimental results reveal that control tokens can achieve better performance
compared to more complicated embedding-based approaches while also being
significantly faster.
[COMMENTS]
11 pages, 1 figure, 6 tables
[LINK]
http://arxiv.org/abs/2206.04317v3
[DATE]
2024-04-12 18:33:56+08:00
[CATEGORIES]
cs.CL
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
[AUTHORS]
Junyu Lu, Dixiang Zhang, Songxin Zhang, Zejian Xie, Zhuoyang Song, Cong Lin, Jiaxing Zhang, Bingyi Jing, Pingjian Zhang
[ABSTRACT]
Large Vision Language Models (LVLMs) have demonstrated impressive zero-shot
capabilities in various vision-language dialogue scenarios. However, the
absence of fine-grained visual object detection hinders the model from
understanding the details of images, leading to irreparable visual
hallucinations and factual errors. In this paper, we propose Lyrics, a novel
multi-modal pre-training and instruction fine-tuning paradigm that bootstraps
vision-language alignment from fine-grained cross-modal collaboration. Building
on the foundation of BLIP-2, Lyrics infuses local visual features extracted
from a visual refiner that includes image tagging, object detection and
semantic segmentation modules into the Querying Transformer, while on the text
side, the language inputs equip the boundary boxes and tags derived from the
visual refiner. We further introduce a two-stage training scheme, in which the
pre-training stage bridges the modality gap through explicit and comprehensive
vision-language alignment targets. During the instruction fine-tuning stage, we
introduce semantic-aware visual feature extraction, a crucial method that
enables the model to extract informative features from concrete visual objects.
Our approach achieves robust performance on 13 datasets across various
vision-language tasks, and demonstrates promising multi-modal understanding,
perception and conversation capabilities in 11 scenario-based benchmark
toolkits.
[LINK]
http://arxiv.org/abs/2312.05278v2
[DATE]
2024-04-12 18:26:01+08:00
[CATEGORIES]
cs.CL
ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana
[AUTHORS]
Monica Romero, Sandra Gomez, Iván G. Torre
[ABSTRACT]
Indigenous languages are a fundamental legacy in the development of human
communication, embodying the unique identity and culture of local communities
of America. The Second AmericasNLP Competition Track 1 of NeurIPS 2022 proposed
developing automatic speech recognition (ASR) systems for five indigenous
languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana. In this paper, we
propose a reliable ASR model for each target language by crawling speech
corpora spanning diverse sources and applying data augmentation methods that
resulted in the winning approach in this competition. To achieve this, we
systematically investigated the impact of different hyperparameters by a
Bayesian search on the performance of the language models, specifically
focusing on the variants of the Wav2vec2.0 XLS-R model: 300M and 1B parameters.
Moreover, we performed a global sensitivity analysis to assess the contribution
of various hyperparametric configurations to the performances of our best
models. Importantly, our results show that freeze fine-tuning updates and
dropout rate are more vital parameters than the total number of epochs of lr.
Additionally, we liberate our best models – with no other ASR model reported
until now for two Wa’ikhana and Kotiria – and the many experiments performed
to pave the way to other researchers to continue improving ASR in minority
languages. This insight opens up interesting avenues for future work, allowing
for the advancement of ASR techniques in the preservation of minority
indigenous and acknowledging the complexities involved in this important
endeavour.
[LINK]
http://arxiv.org/abs/2404.08368v1
[DATE]
2024-04-12 18:12:38+08:00
[CATEGORIES]
cs.CL
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
[AUTHORS]
Juraj Vladika, Florian Matthes
[COMMENTS]
Accepted to NAACL 2024 (Findings)
[LINK]
http://arxiv.org/abs/2404.08359v1
[DATE]
2024-04-12 17:56:12+08:00
[CATEGORIES]
cs.CL
TextMachina: Seamless Generation of Machine-Generated Text Datasets
[AUTHORS]
Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador
[COMMENTS]
14 pages, 10 figures
[LINK]
http://arxiv.org/abs/2401.03946v2
[DATE]
2024-04-12 17:52:05+08:00
[CATEGORIES]
cs.CL
Gaining More Insight into Neural Semantic Parsing with Challenging Benchmarks
[AUTHORS]
Xiao Zhang, Chunliu Wang, Rik van Noord, Johan Bos
[ABSTRACT]
The Parallel Meaning Bank (PMB) serves as a corpus for semantic processing
with a focus on semantic parsing and text generation. Currently, we witness an
excellent performance of neural parsers and generators on the PMB. This might
suggest that such semantic processing tasks have by and large been solved. We
argue that this is not the case and that performance scores from the past on
the PMB are inflated by non-optimal data splits and test sets that are too
easy. In response, we introduce several changes. First, instead of the prior
random split, we propose a more systematic splitting approach to improve the
reliability of the standard test data. Second, except for the standard test
set, we also propose two challenge sets: one with longer texts including
discourse structure, and one that addresses compositional generalization. We
evaluate five neural models for semantic parsing and meaning-to-text
generation. Our results show that model performance declines (in some cases
dramatically) on the challenge sets, revealing the limitations of neural models
when confronting such challenges.
[LINK]
http://arxiv.org/abs/2404.08354v1
[DATE]
2024-04-12 17:48:58+08:00
[CATEGORIES]
cs.CL
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
[AUTHORS]
Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
[ABSTRACT]
Large-scale vision-and-language models, such as CLIP, are typically trained
on web-scale data, which can introduce inappropriate content and lead to the
development of unsafe and biased behavior. This, in turn, hampers their
applicability in sensitive and trustworthy contexts and could raise significant
concerns in their adoption. Our research introduces a novel approach to
enhancing the safety of vision-and-language models by diminishing their
sensitivity to NSFW (not safe for work) inputs. In particular, our methodology
seeks to sever “toxic” linguistic and visual concepts, unlearning the linkage
between unsafe linguistic or visual items and unsafe regions of the embedding
space. We show how this can be done by fine-tuning a CLIP model on synthetic
data obtained from a large language model trained to convert between safe and
unsafe sentences, and a text-to-image generator. We conduct extensive
experiments on the resulting embedding space for cross-modal retrieval,
text-to-image, and image-to-text generation, where we show that our model can
be remarkably employed with pre-trained generative models. Our source code and
trained models are available at: https://github.com/aimagelab/safe-clip.
[LINK]
http://arxiv.org/abs/2311.16254v2
[DATE]
2024-04-12 17:37:37+08:00
[CATEGORIES]
cs.CL
Using Large Language Models to Understand Telecom Standards
[AUTHORS]
Athanasios Karapantelakis, Mukesh Thakur, Alexandros Nikou, Farnaz Moradi, Christian Orlog, Fitsum Gaim, Henrik Holm, Doumitrou Daniil Nimara, Vincent Huang
[ABSTRACT]
The Third Generation Partnership Project (3GPP) has successfully introduced
standards for global mobility. However, the volume and complexity of these
standards has increased over time, thus complicating access to relevant
information for vendors and service providers. Use of Generative Artificial
Intelligence (AI) and in particular Large Language Models (LLMs), may provide
faster access to relevant information. In this paper, we evaluate the
capability of state-of-art LLMs to be used as Question Answering (QA)
assistants for 3GPP document reference. Our contribution is threefold. First,
we provide a benchmark and measuring methods for evaluating performance of
LLMs. Second, we do data preprocessing and fine-tuning for one of these LLMs
and provide guidelines to increase accuracy of the responses that apply to all
LLMs. Third, we provide a model of our own, TeleRoBERTa, that performs on-par
with foundation LLMs but with an order of magnitude less number of parameters.
Results show that LLMs can be used as a credible reference tool on telecom
technical documents, and thus have potential for a number of different
applications from troubleshooting and maintenance, to network operations and
software product development.
[COMMENTS]
Accepted to ICMLCN 2024, Stockholm, May 2024. Updating typo in
authors list
[LINK]
http://arxiv.org/abs/2404.02929v2
[DATE]
2024-04-12 17:08:30+08:00
[CATEGORIES]
cs.CL
Toward a Theory of Tokenization in LLMs
[AUTHORS]
Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran
[ABSTRACT]
While there has been a large body of research attempting to circumvent
tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the
current consensus is that it is a necessary initial step for designing
state-of-the-art performant language models. In this paper, we investigate
tokenization from a theoretical point of view by studying the behavior of
transformers on simple data generating processes. When trained on data drawn
from certain simple $k^{\text{th}}$-order Markov processes for $k > 1$,
transformers exhibit a surprising phenomenon - in the absence of tokenization,
they empirically fail to learn the right distribution and predict characters
according to a unigram model (Makkuva et al., 2024). With the addition of
tokenization, however, we empirically observe that transformers break through
this barrier and are able to model the probabilities of sequences drawn from
the source near-optimally, achieving small cross-entropy loss. With this
observation as starting point, we study the end-to-end cross-entropy loss
achieved by transformers with and without tokenization. With the appropriate
tokenization, we show that even the simplest unigram models (over tokens)
learnt by transformers are able to model the probability of sequences drawn
from $k^{\text{th}}$-order Markov sources near optimally. Our analysis provides
a justification for the use of tokenization in practice through studying the
behavior of transformers on Markovian data.
[COMMENTS]
58 pages, 10 figures
[LINK]
http://arxiv.org/abs/2404.08335v1
[DATE]
2024-04-12 17:01:14+08:00
[CATEGORIES]
cs.CL
cs.LG
The Integration of Semantic and Structural Knowledge in Knowledge Graph Entity Typing
[AUTHORS]
Muzhi Li, Minda Hu, Irwin King, Ho-fung Leung
[ABSTRACT]
The Knowledge Graph Entity Typing (KGET) task aims to predict missing type
annotations for entities in knowledge graphs. Recent works only utilize the
\textit{\textbf{structural knowledge}} in the local neighborhood of entities,
disregarding \textit{\textbf{semantic knowledge}} in the textual
representations of entities, relations, and types that are also crucial for
type inference. Additionally, we observe that the interaction between semantic
and structural knowledge can be utilized to address the false-negative problem.
In this paper, we propose a novel \textbf{\underline{S}}emantic and
\textbf{\underline{S}}tructure-aware KG \textbf{\underline{E}}ntity
\textbf{\underline{T}}yping~{(SSET)} framework, which is composed of three
modules. First, the \textit{Semantic Knowledge Encoding} module encodes factual
knowledge in the KG with a Masked Entity Typing task. Then, the
\textit{Structural Knowledge Aggregation} module aggregates knowledge from the
multi-hop neighborhood of entities to infer missing types. Finally, the
\textit{Unsupervised Type Re-ranking} module utilizes the inference results
from the two models above to generate type predictions that are robust to
false-negative samples. Extensive experiments show that SSET significantly
outperforms existing state-of-the-art methods.
[COMMENTS]
Accepted in NAACL2024 main
[LINK]
http://arxiv.org/abs/2404.08313v1
[DATE]
2024-04-12 16:17:44+08:00
[CATEGORIES]
cs.CL
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
[AUTHORS]
Xincan Feng, Akifumi Yoshimoto
[ABSTRACT]
Recent advancements in Natural Language Processing (NLP) have seen
Large-scale Language Models (LLMs) excel at producing high-quality text for
various purposes. Notably, in Text-To-Speech (TTS) systems, the integration of
BERT for semantic token generation has underscored the importance of semantic
content in producing coherent speech outputs. Despite this, the specific
utility of LLMs in enhancing TTS synthesis remains considerably limited. This
research introduces an innovative approach, Llama-VITS, which enhances TTS
synthesis by enriching the semantic content of text using LLM. Llama-VITS
integrates semantic embeddings from Llama2 with the VITS model, a leading
end-to-end TTS framework. By leveraging Llama2 for the primary speech synthesis
process, our experiments demonstrate that Llama-VITS matches the naturalness of
the original VITS (ORI-VITS) and those incorporate BERT (BERT-VITS), on the
LJSpeech dataset, a substantial collection of neutral, clear speech. Moreover,
our method significantly enhances emotive expressiveness on the EmoV_DB_bea_sem
dataset, a curated selection of emotionally consistent speech from the EmoV_DB
dataset, highlighting its potential to generate emotive speech.
[COMMENTS]
9 pages, 2 figures, 4 tables; accepted at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.06714v2
[DATE]
2024-04-12 14:42:12+08:00
[CATEGORIES]
cs.CL
Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study
[AUTHORS]
Wan-Hua Her, Udo Kruschwitz
[ABSTRACT]
Machine Translation has made impressive progress in recent years offering
close to human-level performance on many languages, but studies have primarily
focused on high-resource languages with broad online presence and resources.
With the help of growing Large Language Models, more and more low-resource
languages achieve better results through the presence of other languages.
However, studies have shown that not all low-resource languages can benefit
from multilingual systems, especially those with insufficient training and
evaluation data. In this paper, we revisit state-of-the-art Neural Machine
Translation techniques to develop automatic translation systems between German
and Bavarian. We investigate conditions of low-resource languages such as data
scarcity and parameter sensitivity and focus on refined solutions that combat
low-resource difficulties and creative solutions such as harnessing language
similarity. Our experiment entails applying Back-translation and Transfer
Learning to automatically generate more training data and achieve higher
translation performance. We demonstrate noisiness in the data and present our
approach to carry out text preprocessing extensively. Evaluation was conducted
using combined metrics: BLEU, chrF and TER. Statistical significance results
with Bonferroni correction show surprisingly high baseline systems, and that
Back-translation leads to significant improvement. Furthermore, we present a
qualitative analysis of translation errors and system limitations.
[COMMENTS]
Preprint accepted at the 3rd Annual Meeting of the Special Interest
Group on Under-resourced Languages (SIGUL 2024)
[LINK]
http://arxiv.org/abs/2404.08259v1
[DATE]
2024-04-12 14:16:26+08:00
[CATEGORIES]
cs.CL
Increasing Trust in Language Models through the Reuse of Verified Circuits
[AUTHORS]
Philip Quirke, Clement Neo, Fazl Barez
[ABSTRACT]
Language Models (LMs) are increasingly used for a wide range of prediction
tasks, but their training can often neglect rare edge cases, reducing their
reliability. Here, we define a stringent standard of trustworthiness whereby
the task algorithm and circuit implementation must be verified, accounting for
edge cases, with no known failure modes. We show that a transformer model can
be trained to meet this standard if built using mathematically and logically
specified frameworks. In this paper, we fully verify a model for n-digit
integer addition. To exhibit the reusability of verified modules, we insert the
trained integer addition model into an untrained model and train the combined
model to perform both addition and subtraction. We find extensive reuse of the
addition circuits for both tasks, easing verification of the more complex
subtractor model. We discuss how inserting verified task modules into LMs can
leverage model reuse to improve verifiability and trustworthiness of language
models built using them. The reuse of verified circuits reduces the effort to
verify more complex composite models which we believe to be a significant step
towards safety of language models.
[COMMENTS]
8 pages, 10 figures
[LINK]
http://arxiv.org/abs/2402.02619v3
[DATE]
2024-04-12 11:57:24+08:00
[CATEGORIES]
cs.LG
cs.CL
Eye-gaze Guided Multi-modal Alignment Framework for Radiology
[AUTHORS]
Chong Ma, Hanqi Jiang, Wenting Chen, Zihao Wu, Xiaowei Yu, Fang Zeng, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, Xiang Li
[ABSTRACT]
In multi-modal frameworks, the alignment of cross-modal features presents a
significant challenge. The predominant approach in multi-modal pre-training
emphasizes either global or local alignment between modalities, utilizing
extensive datasets. This bottom-up driven method often suffers from a lack of
interpretability, a critical concern in radiology. Previous studies have
integrated high-level labels in medical images or text, but these still rely on
manual annotation, a costly and labor-intensive process. Our work introduces a
novel approach by using eye-gaze data, collected synchronously by radiologists
during diagnostic evaluations. This data, indicating radiologists’ focus areas,
naturally links chest X-rays to diagnostic texts. We propose the Eye-gaze
Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for
better alignment of image and text features, aiming to reduce reliance on
manual annotations and thus cut training costs. Our model demonstrates robust
performance, outperforming other state-of-the-art methods in zero-shot
classification and retrieval tasks. The incorporation of easily-obtained
eye-gaze data during routine radiological diagnoses signifies a step towards
minimizing manual annotation dependency. Additionally, we explore the impact of
varying amounts of eye-gaze data on model performance, highlighting the
feasibility and utility of integrating this auxiliary data into multi-modal
pre-training.
[COMMENTS]
12 pages, 4 figures
[LINK]
http://arxiv.org/abs/2403.12416v2
[DATE]
2024-04-12 11:15:26+08:00
[CATEGORIES]
cs.CL
Measuring Cross-lingual Transfer in Bytes
[AUTHORS]
Leandro Rodrigues de Souza, Thales Sales Almeida, Roberto Lotufo, Rodrigo Nogueira
[ABSTRACT]
Multilingual pretraining has been a successful solution to the challenges
posed by the lack of resources for languages. These models can transfer
knowledge to target languages with minimal or no examples. Recent research
suggests that monolingual models also have a similar capability, but the
mechanisms behind this transfer remain unclear. Some studies have explored
factors like language contamination and syntactic similarity. An emerging line
of research suggests that the representations learned by language models
contain two components: a language-specific and a language-agnostic component.
The latter is responsible for transferring a more universal knowledge. However,
there is a lack of comprehensive exploration of these properties across diverse
target languages. To investigate this hypothesis, we conducted an experiment
inspired by the work on the Scaling Laws for Transfer. We measured the amount
of data transferred from a source language to a target language and found that
models initialized from diverse languages perform similarly to a target
language in a cross-lingual setting. This was surprising because the amount of
data transferred to 10 diverse target languages, such as Spanish, Korean, and
Finnish, was quite similar. We also found evidence that this transfer is not
related to language contamination or language proximity, which strengthens the
hypothesis that the model also relies on language-agnostic knowledge. Our
experiments have opened up new possibilities for measuring how much data
represents the language-agnostic representations learned during pretraining.
[COMMENTS]
NAACL 2024
[LINK]
http://arxiv.org/abs/2404.08191v1
[DATE]
2024-04-12 09:44:46+08:00
[CATEGORIES]
cs.CL
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
[AUTHORS]
Patrice Béchard, Orlando Marquez Ayala
[ABSTRACT]
A common and fundamental limitation of Generative AI (GenAI) is its
propensity to hallucinate. While large language models (LLM) have taken the
world by storm, without eliminating or at least reducing hallucinations,
real-world GenAI systems may face challenges in user adoption. In the process
of deploying an enterprise application that produces workflows based on natural
language requirements, we devised a system leveraging Retrieval Augmented
Generation (RAG) to greatly improve the quality of the structured output that
represents such workflows. Thanks to our implementation of RAG, our proposed
system significantly reduces hallucinations in the output and improves the
generalization of our LLM in out-of-domain settings. In addition, we show that
using a small, well-trained retriever encoder can reduce the size of the
accompanying LLM, thereby making deployments of LLM-based systems less
resource-intensive.
[COMMENTS]
To be presented at NAACL 2024. 11 pages and 4 figures
[LINK]
http://arxiv.org/abs/2404.08189v1
[DATE]
2024-04-12 09:42:09+08:00
[CATEGORIES]
cs.LG
cs.CL
Interpretation of Intracardiac Electrograms Through Textual Representations
[AUTHORS]
William Jongwon Han, Diana Gomez, Avi Alok, Chaojing Duan, Michael A. Rosenberg, Douglas Weber, Emerson Liu, Ding Zhao
[ABSTRACT]
Understanding the irregular electrical activity of atrial fibrillation (AFib)
has been a key challenge in electrocardiography. For serious cases of AFib,
catheter ablations are performed to collect intracardiac electrograms (EGMs).
EGMs offer intricately detailed and localized electrical activity of the heart
and are an ideal modality for interpretable cardiac studies. Recent
advancements in artificial intelligence (AI) has allowed some works to utilize
deep learning frameworks to interpret EGMs during AFib. Additionally, language
models (LMs) have shown exceptional performance in being able to generalize to
unseen domains, especially in healthcare. In this study, we are the first to
leverage pretrained LMs for finetuning of EGM interpolation and AFib
classification via masked language modeling. We formulate the EGM as a textual
sequence and present competitive performances on AFib classification compared
against other representations. Lastly, we provide a comprehensive
interpretability study to provide a multi-perspective intuition of the model’s
behavior, which could greatly benefit the clinical use.
[COMMENTS]
18 pages, 9 figures; Accepted to CHIL 2024
[LINK]
http://arxiv.org/abs/2402.01115v3
[DATE]
2024-04-12 09:32:32+08:00
[CATEGORIES]
cs.CL
Large Language Model for Causal Decision Making
[AUTHORS]
Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song
[ABSTRACT]
Large Language Models (LLMs) have shown their success in language
understanding and reasoning on general topics. However, their capability to
perform inference based on user-specified structured data and knowledge in
corpus-rare concepts, such as causal decision-making is still limited. In this
work, we explore the possibility of fine-tuning an open-sourced LLM into
LLM4Causal, which can identify the causal task, execute a corresponding
function, and interpret its numerical results based on users’ queries and the
provided dataset. Meanwhile, we propose a data generation process for more
controllable GPT prompting and present two instruction-tuning datasets: (1)
Causal-Retrieval-Bench for causal problem identification and input parameter
extraction for causal function calling and (2) Causal-Interpret-Bench for
in-context causal interpretation. By conducting end-to-end evaluations and two
ablation studies, we showed that LLM4Causal can deliver end-to-end solutions
for causal problems and provide easy-to-understand answers, which significantly
outperforms the baselines.
[LINK]
http://arxiv.org/abs/2312.17122v3
[DATE]
2024-04-12 09:30:55+08:00
[CATEGORIES]
cs.CL
Provably Robust DPO: Aligning Language Models with Noisy Feedback
[AUTHORS]
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
[ABSTRACT]
Learning from preference-based feedback has recently gained traction as a
promising approach to align language models with human interests. While these
aligned generative models have demonstrated impressive capabilities across
various tasks, their dependence on high-quality human preference data poses a
bottleneck in practical applications. Specifically, noisy (incorrect and
ambiguous) preference pairs in the dataset might restrict the language models
from capturing human intent accurately. While practitioners have recently
proposed heuristics to mitigate the effect of noisy preferences, a complete
theoretical understanding of their workings remain elusive.
In this work, we aim to bridge this gap by by introducing a general framework
for policy optimization in the presence of random preference flips. We focus on
the direct preference optimization (DPO) algorithm in particular since it
assumes that preferences adhere to the Bradley-Terry-Luce (BTL) model, raising
concerns about the impact of noisy data on the learned policy. We design a
novel loss function, which de-bias the effect of noise on average, making a
policy trained by minimizing that loss robust to the noise. Under log-linear
parameterization of the policy class and assuming good feature coverage of the
SFT policy, we prove that the sub-optimality gap of the proposed robust DPO
(rDPO) policy compared to the optimal policy is of the order
$O(\frac{1}{1-2\epsilon}\sqrt{\frac{d}{n}})$, where $\epsilon < 1/2$ is flip
rate of labels, $d$ is policy parameter dimension and $n$ is size of dataset.
Our experiments on IMDb sentiment generation and Anthropic’s helpful-harmless
dataset show that rDPO is robust to noise in preference labels compared to
vanilla DPO and other heuristics proposed by practitioners.
[LINK]
http://arxiv.org/abs/2403.00409v2
[DATE]
2024-04-12 09:09:37+08:00
[CATEGORIES]
cs.LG
cs.CL
UMBCLU at SemEval-2024 Task 1A and 1C: Semantic Textual Relatedness with and without machine translation
[AUTHORS]
Shubhashis Roy Dipta, Sai Vallurupalli
[ABSTRACT]
The aim of SemEval-2024 Task 1, “Semantic Textual Relatedness for African and
Asian Languages” is to develop models for identifying semantic textual
relatedness (STR) between two sentences using multiple languages (14 African
and Asian languages) and settings (supervised, unsupervised, and
cross-lingual). Large language models (LLMs) have shown impressive performance
on several natural language understanding tasks such as multilingual machine
translation (MMT), semantic similarity (STS), and encoding sentence embeddings.
Using a combination of LLMs that perform well on these tasks, we developed two
STR models, $\textit{TranSem}$ and $\textit{FineSem}$, for the supervised and
cross-lingual settings. We explore the effectiveness of several training
methods and the usefulness of machine translation. We find that direct
fine-tuning on the task is comparable to using sentence embeddings and
translating to English leads to better performance for some languages. In the
supervised setting, our model performance is better than the official baseline
for 3 languages with the remaining 4 performing on par. In the cross-lingual
setting, our model performance is better than the baseline for 3 languages
(leading to $1^{st}$ place for Africaans and $2^{nd}$ place for Indonesian), is
on par for 2 languages and performs poorly on the remaining 7 languages. Our
code is publicly available at https://github.com/dipta007/SemEval24-Task8.
[COMMENTS]
Accepted at SemEval 2024 (Colocated with NAACL 2024)
[LINK]
http://arxiv.org/abs/2402.12730v2
[DATE]
2024-04-12 08:53:29+08:00
[CATEGORIES]
cs.CL
cs.LG
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models
[AUTHORS]
Zhiyuan Peng, Xuyang Wu, Qifan Wang, Sravanthi Rajanala, Yi Fang
[ABSTRACT]
Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized
in Large Language Models (LLMs) to improve the down-streaming tasks without the
cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively
use PEFT for fine-tuning LLMs in ranking tasks with convincing performance;
there are some limitations, including the learned prompt being fixed for
different documents, overfitting to specific tasks, and low adaptation ability.
In this paper, we introduce a query-dependent parameter efficient fine-tuning
(Q-PEFT) approach for text reranking to leak the information of the true
queries to LLMs and then make the generation of true queries from input
documents much easier. Specifically, we utilize the query to extract the
top-$k$ tokens from concatenated documents, serving as contextual clues. We
further augment Q-PEFT by substituting the retrieval mechanism with a
multi-head attention layer to achieve end-to-end training and cover all the
tokens in the documents, guiding the LLMs to generate more document-specific
synthetic queries, thereby further improving the reranking performance.
Extensive experiments are conducted on four public datasets, demonstrating the
effectiveness of our proposed approach.
[LINK]
http://arxiv.org/abs/2404.04522v2
[DATE]
2024-04-12 08:18:06+08:00
[CATEGORIES]
cs.CL
cs.LG
Language Model Prompt Selection via Simulation Optimization
[AUTHORS]
Haoting Zhang, Jinghai He, Rhonda Righter, Zeyu Zheng
[ABSTRACT]
With the advancement in generative language models, the selection of prompts
has gained significant attention in recent years. A prompt is an instruction or
description provided by the user, serving as a guide for the generative
language model in content generation. Despite existing methods for prompt
selection that are based on human labor, we consider facilitating this
selection through simulation optimization, aiming to maximize a pre-defined
score for the selected prompt. Specifically, we propose a two-stage framework.
In the first stage, we determine a feasible set of prompts in sufficient
numbers, where each prompt is represented by a moderate-dimensional vector. In
the subsequent stage for evaluation and selection, we construct a surrogate
model of the score regarding the moderate-dimensional vectors that represent
the prompts. We propose sequentially selecting the prompt for evaluation based
on this constructed surrogate model. We prove the consistency of the sequential
evaluation procedure in our framework. We also conduct numerical experiments to
demonstrate the efficacy of our proposed framework, providing practical
instructions for implementation.
[LINK]
http://arxiv.org/abs/2404.08164v1
[DATE]
2024-04-12 08:03:56+08:00
[CATEGORIES]
cs.CL
cs.LG
RULER: What’s the Real Context Size of Your Long-Context Language Models?
[AUTHORS]
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg
[ABSTRACT]
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve
a piece of information (the “needle”) from long distractor texts (the
“haystack”), has been widely adopted to evaluate long-context language models
(LMs). However, this simple retrieval-based test is indicative of only a
superficial form of long-context understanding. To provide a more comprehensive
evaluation of long-context LMs, we create a new synthetic benchmark RULER with
flexible configurations for customized sequence length and task complexity.
RULER expands upon the vanilla NIAH test to encompass variations with diverse
types and quantities of needles. Moreover, RULER introduces new task categories
multi-hop tracing and aggregation to test behaviors beyond searching from
context. We evaluate ten long-context LMs with 13 representative tasks in
RULER. Despite achieving nearly perfect accuracy in the vanilla NIAH test, all
models exhibit large performance drops as the context length increases. While
these models all claim context sizes of 32K tokens or greater, only four models
(GPT-4, Command-R, Yi-34B, and Mixtral) can maintain satisfactory performance
at the length of 32K. Our analysis of Yi-34B, which supports context length of
200K, reveals large room for improvement as we increase input length and task
complexity. We open source RULER to spur comprehensive evaluation of
long-context LMs.
[LINK]
http://arxiv.org/abs/2404.06654v2
[DATE]
2024-04-12 07:53:59+08:00
[CATEGORIES]
cs.CL
Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models
[AUTHORS]
Md Messal Monem Miah, Ulie Schnaithmann, Arushi Raghuvanshi, Youngseo Son
[COMMENTS]
Published in NAACL 2024 Industry Track
[LINK]
http://arxiv.org/abs/2404.08156v1
[DATE]
2024-04-12 07:09:18+08:00
[CATEGORIES]
cs.CL
Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls
[AUTHORS]
Amin Hosseiny Marani, Ulie Schnaithmann, Youngseo Son, Akil Iyer, Manas Paldhe, Arushi Raghuvanshi
[ABSTRACT]
Current Conversational AI systems employ different machine learning
pipelines, as well as external knowledge sources and business logic to predict
the next action. Maintaining various components in dialogue managers’ pipeline
adds complexity in expansion and updates, increases processing time, and causes
additive noise through the pipeline that can lead to incorrect next action
prediction. This paper investigates graph integration into language
transformers to improve understanding the relationships between humans’
utterances, previous, and next actions without the dependency on external
sources or components. Experimental analyses on real calls indicate that the
proposed Graph Integrated Language Transformer models can achieve higher
performance compared to other production level conversational AI systems in
driving interactive calls with human users in real-world settings.
[COMMENTS]
Published in NAACL 2024 Industry Track
[LINK]
http://arxiv.org/abs/2404.08155v1
[DATE]
2024-04-12 06:47:50+08:00
[CATEGORIES]
cs.CL
Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs
[AUTHORS]
Jierui Li, Raymond Mooney
[ABSTRACT]
Distilling explicit chain-of-thought reasoning paths has emerged as an
effective method for improving the reasoning abilities of large language models
(LLMs) across various tasks. However, when tackling complex tasks that pose
significant challenges for state-of-the-art models, this technique often
struggles to produce effective chains of thought that lead to correct answers.
In this work, we propose a novel approach to distill reasoning abilities from
LLMs by leveraging their capacity to explain solutions. We apply our method to
solving competitive-level programming challenges. More specifically, we employ
an LLM to generate explanations for a set of <problem, solution-program> pairs,
then use <problem, explanation> pairs to fine-tune a smaller language model,
which we refer to as the Reasoner, to learn algorithmic reasoning that can
generate “how-to-solve” hints for unseen problems. Our experiments demonstrate
that learning from explanations enables the Reasoner to more effectively guide
program implementation by a Coder, resulting in higher solve rates than strong
chain-of-thought baselines on competitive-level programming problems. It also
outperforms models that learn directly from <problem, solution-program> pairs.
We curated an additional test set in the CodeContests format, which includes
246 more recent problems posted after the models’ knowledge cutoff.
[COMMENTS]
pre-print
[LINK]
http://arxiv.org/abs/2404.08148v1
[DATE]
2024-04-12 06:19:50+08:00
[CATEGORIES]
cs.CL
Extending Translate-Train for ColBERT-X to African Language CLIR
[AUTHORS]
Eugene Yang, Dawn J. Lawrie, Paul McNamee, James Mayfield
[ABSTRACT]
This paper describes the submission runs from the HLTCOE team at the CIRAL
CLIR tasks for African languages at FIRE 2023. Our submissions use machine
translation models to translate the documents and the training passages, and
ColBERT-X as the retrieval model. Additionally, we present a set of unofficial
runs that use an alternative training procedure with a similar training
setting.
[COMMENTS]
10 pages, 2 figures. System description paper for HLTCOE’s
participation in CIRAL@FIRE 2023
[LINK]
http://arxiv.org/abs/2404.08134v1
[DATE]
2024-04-12 05:31:02+08:00
[CATEGORIES]
cs.CL
HLTCOE at TREC 2023 NeuCLIR Track
[AUTHORS]
Eugene Yang, Dawn Lawrie, James Mayfield
[ABSTRACT]
The HLTCOE team applied PLAID, an mT5 reranker, and document translation to
the TREC 2023 NeuCLIR track. For PLAID we included a variety of models and
training techniques – the English model released with ColBERT v2,
translate-train~(TT), Translate Distill~(TD) and multilingual
translate-train~(MTT). TT trains a ColBERT model with English queries and
passages automatically translated into the document language from the MS-MARCO
v1 collection. This results in three cross-language models for the track, one
per language. MTT creates a single model for all three document languages by
combining the translations of MS-MARCO passages in all three languages into
mixed-language batches. Thus the model learns about matching queries to
passages simultaneously in all languages. Distillation uses scores from the mT5
model over non-English translated document pairs to learn how to score
query-document pairs. The team submitted runs to all NeuCLIR tasks: the CLIR
and MLIR news task as well as the technical documents task.
[COMMENTS]
6 pages. Part of TREC 2023 Proceedings
[LINK]
http://arxiv.org/abs/2404.08118v1
[DATE]
2024-04-12 04:46:18+08:00
[CATEGORIES]
cs.CL
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing
[AUTHORS]
Guangzhi Wang, Tianyi Chen, Kamran Ghasedi, HsiangTao Wu, Tianyu Ding, Chris Nuesmeyer, Ilya Zharkov, Mohan Kankanhalli, Luming Liang
[ABSTRACT]
Face attribute editing plays a pivotal role in various applications. However,
existing methods encounter challenges in achieving high-quality results while
preserving identity, editing faithfulness, and temporal consistency. These
challenges are rooted in issues related to the training pipeline, including
limited supervision, architecture design, and optimization strategy. In this
work, we introduce S3Editor, a Sparse Semantic-disentangled Self-training
framework for face video editing. S3Editor is a generic solution that
comprehensively addresses these challenges with three key contributions.
Firstly, S3Editor adopts a self-training paradigm to enhance the training
process through semi-supervision. Secondly, we propose a semantic disentangled
architecture with a dynamic routing mechanism that accommodates diverse editing
requirements. Thirdly, we present a structured sparse optimization schema that
identifies and deactivates malicious neurons to further disentangle impacts
from untarget attributes. S3Editor is model-agnostic and compatible with
various editing approaches. Our extensive qualitative and quantitative results
affirm that our approach significantly enhances identity preservation, editing
fidelity, as well as temporal consistency.
[LINK]
http://arxiv.org/abs/2404.08111v1
[DATE]
2024-04-12 04:25:26+08:00
[CATEGORIES]
cs.CL
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
[AUTHORS]
Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha
[ABSTRACT]
Fine-tuning language models (LMs) has demonstrated success in a wide array of
downstream tasks. However, as LMs are scaled up, the memory requirements for
backpropagation become prohibitively high. Zeroth-order (ZO) optimization
methods can leverage memory-efficient forward passes to estimate gradients.
More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently
outperform zero-shot and in-context learning when combined with suitable task
prompts. In this work, we couple ZO methods with variance reduction techniques
to enhance stability and convergence for inference-based LM fine-tuning. We
introduce Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient
(MeZO-SVRG) and demonstrate its efficacy across multiple LM fine-tuning tasks,
eliminating the reliance on task-specific prompts. Evaluated across a range of
both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG
outperforms MeZO with up to 20% increase in test accuracies in both full- and
partial-parameter fine-tuning settings. MeZO-SVRG benefits from reduced
computation time as it often surpasses MeZO’s peak test accuracy with a
$2\times$ reduction in GPU-hours. MeZO-SVRG significantly reduces the required
memory footprint compared to first-order SGD, i.e. by $2\times$ for
autoregressive models. Our experiments highlight that MeZO-SVRG’s memory
savings progressively improve compared to SGD with larger batch sizes.
[COMMENTS]
29 pages, 25 tables, 9 figures
[LINK]
http://arxiv.org/abs/2404.08080v1
[DATE]
2024-04-12 02:35:49+08:00
[CATEGORIES]
cs.LG
cs.CL
SQBC: Active Learning using LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
[AUTHORS]
Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele, Stefan Harmeling
[ABSTRACT]
Stance detection is an important task for many applications that analyse or
support online political discussions. Common approaches include fine-tuning
transformer based models. However, these models require a large amount of
labelled data, which might not be available. In this work, we present two
different ways to leverage LLM-generated synthetic data to train and improve
stance detection agents for online political discussions: first, we show that
augmenting a small fine-tuning dataset with synthetic data can improve the
performance of the stance detection model. Second, we propose a new active
learning method called SQBC based on the “Query-by-Comittee” approach. The key
idea is to use LLM-generated synthetic data as an oracle to identify the most
informative unlabelled samples, that are selected for manual labelling.
Comprehensive experiments show that both ideas can improve the stance detection
performance. Curiously, we observed that fine-tuning on actively selected
samples can exceed the performance of using the full dataset.
[LINK]
http://arxiv.org/abs/2404.08078v1
[DATE]
2024-04-12 02:34:11+08:00
[CATEGORIES]
cs.CL
cs.LG
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
[AUTHORS]
Mobashir Sadat, Cornelia Caragea
[COMMENTS]
Accepted to the NAACL 2024 Main Conference
[LINK]
http://arxiv.org/abs/2404.08066v1
[DATE]
2024-04-12 02:12:12+08:00
[CATEGORIES]
cs.CL
The Expressive Power of Transformers with Chain of Thought
[AUTHORS]
William Merrill, Ashish Sabharwal
[ABSTRACT]
Recent theoretical work has identified surprisingly simple reasoning
problems, such as checking if two nodes in a graph are connected or simulating
finite-state machines, that are provably unsolvable by standard transformers
that answer immediately after reading their input. However, in practice,
transformers’ reasoning can be improved by allowing them to use a “chain of
thought” or “scratchpad”, i.e., generate and condition on a sequence of
intermediate tokens before answering. Motivated by this, we ask: Does such
intermediate generation fundamentally extend the computational power of a
decoder-only transformer? We show that the answer is yes, but the amount of
increase depends crucially on the amount of intermediate generation. For
instance, we find that transformer decoders with a logarithmic number of
decoding steps (w.r.t. the input length) push the limits of standard
transformers only slightly, while a linear number of decoding steps, assuming
projected pre-norm (a slight generalization of standard pre-norm), adds a clear
new ability (under standard complexity conjectures): recognizing all regular
languages. Our results also imply that linear steps keep transformer decoders
within context-sensitive languages, and polynomial steps with generalized
pre-norm make them recognize exactly the class of polynomial-time solvable
problems – the first exact characterization of a type of transformers in terms
of standard complexity classes. Together, this provides a nuanced framework for
understanding how the length of a transformer’s chain of thought or scratchpad
impacts its reasoning power.
[COMMENTS]
9-page preprint. ICLR camera ready posted April 11
[LINK]
http://arxiv.org/abs/2310.07923v5
[DATE]
2024-04-12 02:03:53+08:00
[CATEGORIES]
cs.LG
cs.CL
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
[AUTHORS]
Yiwen Tang, Jiaming Liu, Dong Wang, Zhigang Wang, Shanghang Zhang, Bin Zhao, Xuelong Li
[ABSTRACT]
Large foundation models have recently emerged as a prominent focus of
interest, attaining superior performance in widespread scenarios. Due to the
scarcity of 3D data, many efforts have been made to adapt pre-trained
transformers from vision to 3D domains. However, such 2D-to-3D approaches are
still limited, due to the potential loss of spatial geometries and high
computation cost. More importantly, their frameworks are mainly designed for 2D
models, lacking a general any-to-3D paradigm. In this paper, we introduce
Any2Point, a parameter-efficient method to empower any-modality large models
(vision, language, audio) for 3D understanding. Given a frozen transformer from
any source modality, we propose a 3D-to-any (1D or 2D) virtual projection
strategy that correlates the input 3D points to the original 1D or 2D positions
within the source modality. This mechanism enables us to assign each 3D token
with a positional encoding paired with the pre-trained model, which avoids 3D
geometry loss caused by the true projection and better motivates the
transformer for 3D learning with 1D/2D positional priors. Then, within each
transformer block, we insert an any-to-3D guided adapter module for
parameter-efficient fine-tuning. The adapter incorporates prior spatial
knowledge from the source modality to guide the local feature aggregation of 3D
tokens, compelling the semantic adaption of any-modality transformers. We
conduct extensive experiments to showcase the effectiveness and efficiency of
our method. Code and models are released at
https://github.com/Ivan-Tang-3D/Any2Point.
[COMMENTS]
Code and models are released at
https://github.com/Ivan-Tang-3D/Any2Point
[LINK]
http://arxiv.org/abs/2404.07989v1
[DATE]
2024-04-12 01:59:45+08:00
[CATEGORIES]
cs.CL
cs.LG
Manipulating Large Language Models to Increase Product Visibility
[AUTHORS]
Aounon Kumar, Himabindu Lakkaraju
[ABSTRACT]
Large language models (LLMs) are increasingly being integrated into search
engines to provide natural language responses tailored to user queries.
Customers and end-users are also becoming more dependent on these models for
quick and easy purchase decisions. In this work, we investigate whether
recommendations from LLMs can be manipulated to enhance a product’s visibility.
We demonstrate that adding a strategic text sequence (STS) – a carefully
crafted message – to a product’s information page can significantly increase
its likelihood of being listed as the LLM’s top recommendation. To understand
the impact of STS, we use a catalog of fictitious coffee machines and analyze
its effect on two target products: one that seldom appears in the LLM’s
recommendations and another that usually ranks second. We observe that the
strategic text sequence significantly enhances the visibility of both products
by increasing their chances of appearing as the top recommendation. This
ability to manipulate LLM-generated search responses provides vendors with a
considerable competitive advantage and has the potential to disrupt fair market
competition. Just as search engine optimization (SEO) revolutionized how
webpages are customized to rank higher in search engine results, influencing
LLM recommendations could profoundly impact content optimization for AI-driven
search services. Code for our experiments is available at
https://github.com/aounon/llm-rank-optimizer.
[LINK]
http://arxiv.org/abs/2404.07981v1
[DATE]
2024-04-12 01:57:32+08:00
[CATEGORIES]
cs.CL
LLoCO: Learning Long Contexts Offline
[AUTHORS]
Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa
[ABSTRACT]
Processing long contexts remains a challenge for large language models (LLMs)
due to the quadratic computational and memory overhead of the self-attention
mechanism and the substantial KV cache sizes during generation. We propose a
novel approach to address this problem by learning contexts offline through
context compression and in-domain parameter-efficient finetuning. Our method
enables an LLM to create a concise representation of the original context and
efficiently retrieve relevant information to answer questions accurately. We
introduce LLoCO, a technique that combines context compression, retrieval, and
parameter-efficient finetuning using LoRA. Our approach extends the effective
context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We
evaluate our approach on several long-context question-answering datasets,
demonstrating that LLoCO significantly outperforms in-context learning while
using $30\times$ fewer tokens during inference. LLoCO achieves up to
$7.62\times$ speed-up and substantially reduces the cost of long document
question answering, making it a promising solution for efficient long context
processing. Our code is publicly available at
https://github.com/jeffreysijuntan/lloco.
[COMMENTS]
The first two authors contributed equally to this work
[LINK]
http://arxiv.org/abs/2404.07979v1
[DATE]
2024-04-12 01:57:22+08:00
[CATEGORIES]
cs.CL
cs.LG
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
[AUTHORS]
Jiayi Wang, David Ifeoluwa Adelani, Sweta Agrawal, Marek Masiak, Ricardo Rei, Eleftheria Briakou, Marine Carpuat, Xuanli He, Sofia Bourhim, Andiswa Bukula, Muhidin Mohamed, Temitayo Olatoye, Tosin Adewumi, Hamam Mokayede, Christine Mwase, Wangui Kimotho, Foutse Yuehgoh, Anuoluwapo Aremu, Jessica Ojo, Shamsuddeen Hassan Muhammad, Salomey Osei, Abdul-Hakeem Omotayo, Chiamaka Chukwuneke, Perez Ogayo, Oumaima Hourrane, Salma El Anigri, Lolwethu Ndolela, Thabiso Mangwana, Shafie Abdi Mohamed, Ayinde Hassan, Oluwabusayo Olufunke Awoyomi, Lama Alkhaled, Sana Al-Azzawi, Naome A. Etori, Millicent Ochieng, Clemencia Siro, Samuel Njoroge, Eric Muchiri, Wangari Kimotho, Lyse Naomi Wamba Momo, Daud Abolade, Simbiat Ajao, Iyanuoluwa Shode, Ricky Macharm, Ruqayya Nasir Iro, Saheed S. Abdullahi, Stephen E. Moore, Bernard Opoku, Zainab Akinjobi, Abeeb Afolabi, Nnaemeka Obiefuna, Onyekachi Raphael Ogbu, Sam Brian, Verrah Akinyi Otiende, Chinedu Emmanuel Mbonu, Sakayo Toadoum Sari, Yao Lu, Pontus Stenetorp
[ABSTRACT]
Despite the recent progress on scaling multilingual machine translation (MT)
to several under-resourced African languages, accurately measuring this
progress remains challenging, since evaluation is often performed on n-gram
matching metrics such as BLEU, which typically show a weaker correlation with
human judgments. Learned metrics such as COMET have higher correlation;
however, the lack of evaluation data with human ratings for under-resourced
languages, complexity of annotation guidelines like Multidimensional Quality
Metrics (MQM), and limited language coverage of multilingual encoders have
hampered their applicability to African languages. In this paper, we address
these challenges by creating high-quality human evaluation data with simplified
MQM guidelines for error detection and direct assessment (DA) scoring for 13
typologically diverse African languages. Furthermore, we develop AfriCOMET:
COMET evaluation metrics for African languages by leveraging DA data from
well-resourced languages and an African-centric multilingual encoder
(AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African
languages with respect to Spearman-rank correlation with human judgments
(0.441).
[COMMENTS]
Accepted by NAACL 2024
[LINK]
http://arxiv.org/abs/2311.09828v2
[DATE]
2024-04-12 01:38:09+08:00
[CATEGORIES]
cs.CL
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
[AUTHORS]
Zeyi Liao, Huan Sun
[ABSTRACT]
As large language models (LLMs) become increasingly prevalent and integrated
into autonomous systems, ensuring their safety is imperative. Despite
significant strides toward safety alignment, recent work
GCG~\citep{zou2023universal} proposes a discrete token optimization algorithm
and selects the single suffix with the lowest loss to successfully jailbreak
aligned LLMs. In this work, we first discuss the drawbacks of solely picking
the suffix with the lowest loss during GCG optimization for jailbreaking and
uncover the missed successful suffixes during the intermediate steps. Moreover,
we utilize those successful suffixes as training data to learn a generative
model, named AmpleGCG, which captures the distribution of adversarial suffixes
given a harmful query and enables the rapid generation of hundreds of suffixes
for any harmful queries in seconds. AmpleGCG achieves near 100\% attack success
rate (ASR) on two aligned LLMs (Llama-2-7B-chat and Vicuna-7B), surpassing two
strongest attack baselines. More interestingly, AmpleGCG also transfers
seamlessly to attack different models, including closed-source LLMs, achieving
a 99\% ASR on the latest GPT-3.5. To summarize, our work amplifies the impact
of GCG by training a generative model of adversarial suffixes that is universal
to any harmful queries and transferable from attacking open-source LLMs to
closed-source LLMs. In addition, it can generate 200 adversarial suffixes for
one harmful query in only 4 seconds, rendering it more challenging to defend.
[LINK]
http://arxiv.org/abs/2404.07921v1
[DATE]
2024-04-12 01:05:50+08:00
[CATEGORIES]
cs.CL
A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models
[AUTHORS]
Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen
[ABSTRACT]
A central component of rational behavior is logical inference: the process of
determining which conclusions follow from a set of premises. Psychologists have
documented several ways in which humans’ inferences deviate from the rules of
logic. Do language models, which are trained on text generated by humans,
replicate such human biases, or are they able to overcome them? Focusing on the
case of syllogisms – inferences from two simple premises – we show that,
within the PaLM2 family of transformer language models, larger models are more
logical than smaller ones, and also more logical than humans. At the same time,
even the largest models make systematic errors, some of which mirror human
reasoning biases: they show sensitivity to the (irrelevant) ordering of the
variables in the syllogism, and draw confident but incorrect inferences from
particular syllogisms (syllogistic fallacies). Overall, we find that language
models often mimic the human biases included in their training data, but are
able to overcome them in some cases.
[COMMENTS]
NAACL 2024
[LINK]
http://arxiv.org/abs/2311.00445v2
[DATE]
2024-04-12 00:49:57+08:00
[CATEGORIES]
cs.CL
cs.LG
HGRN2: Gated Linear RNNs with State Expansion
[AUTHORS]
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
[ABSTRACT]
Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated
competitive training speed and performance in language modeling, while offering
efficient inference. However, the recurrent state size of HGRN remains
relatively small, which limits its expressiveness.To address this issue,
inspired by linear attention, we introduce a simple outer-product-based state
expansion mechanism so that the recurrent state size can be significantly
enlarged without introducing any additional parameters. The linear attention
form also allows for hardware-efficient training.Our extensive experiments
verify the advantage of HGRN2 over HGRN1 in language modeling, image
classification, and Long Range Arena.Our largest 3B HGRN2 model slightly
outperforms Mamba and LLaMa Architecture Transformer for language modeling in a
controlled experiment setting; and performs competitively with many open-source
3B models in downstream evaluation while using much fewer total training
tokens.
[COMMENTS]
Techinical Report. Yiran Zhong is the corresponding author. The
source code is available at https://github.com/OpenNLPLab/HGRN2
[LINK]
http://arxiv.org/abs/2404.07904v1
[DATE]
2024-04-12 00:43:03+08:00
[CATEGORIES]
cs.CL
Analyzing Toxicity in Deep Conversations: A Reddit Case Study
[AUTHORS]
Vigneshwaran Shankaran, Rajesh Sharma
[ABSTRACT]
Online social media has become increasingly popular in recent years due to
its ease of access and ability to connect with others. One of social media’s
main draws is its anonymity, allowing users to share their thoughts and
opinions without fear of judgment or retribution. This anonymity has also made
social media prone to harmful content, which requires moderation to ensure
responsible and productive use. Several methods using artificial intelligence
have been employed to detect harmful content. However, conversation and
contextual analysis of hate speech are still understudied. Most promising works
only analyze a single text at a time rather than the conversation supporting
it. In this work, we employ a tree-based approach to understand how users
behave concerning toxicity in public conversation settings. To this end, we
collect both the posts and the comment sections of the top 100 posts from 8
Reddit communities that allow profanity, totaling over 1 million responses. We
find that toxic comments increase the likelihood of subsequent toxic comments
being produced in online conversations. Our analysis also shows that immediate
context plays a vital role in shaping a response rather than the original post.
We also study the effect of consensual profanity and observe overlapping
similarities with non-consensual profanity in terms of user behavior and
patterns.
[LINK]
http://arxiv.org/abs/2404.07879v1
[DATE]
2024-04-12 00:10:44+08:00
[CATEGORIES]
cs.CL
Scalability in Building Component Data Annotation: Enhancing Facade Material Classification with Synthetic Data
[AUTHORS]
Josie Harrison, Alexander Hollberg, Yinan Yu
[ABSTRACT]
Computer vision models trained on Google Street View images can create
material cadastres. However, current approaches need manually annotated
datasets that are difficult to obtain and often have class imbalance. To
address these challenges, this paper fine-tuned a Swin Transformer model on a
synthetic dataset generated with DALL-E and compared the performance to a
similar manually annotated dataset. Although manual annotation remains the gold
standard, the synthetic dataset performance demonstrates a reasonable
alternative. The findings will ease annotation needed to develop material
cadastres, offering architects insights into opportunities for material reuse,
thus contributing to the reduction of demolition waste.
[COMMENTS]
10 pages, 6 figures, submitted to 2024 European Conference of
Computing in Construction
[LINK]
http://arxiv.org/abs/2404.08557v1
[DATE]
2024-04-12 23:54:48+08:00
[CATEGORIES]
cs.LG
Rotation-equivariant Graph Neural Networks for Learning Glassy Liquids Representations
[AUTHORS]
Francesco Saverio Pezzicoli, Guillaume Charpiat, François P. Landes
[ABSTRACT]
The difficult problem of relating the static structure of glassy liquids and
their dynamics is a good target for Machine Learning, an approach which excels
at finding complex patterns hidden in data. Indeed, this approach is currently
a hot topic in the glassy liquids community, where the state of the art
consists in Graph Neural Networks (GNNs), which have great expressive power but
are heavy models and lack interpretability. Inspired by recent advances in the
field of Machine Learning group-equivariant representations, we build a GNN
that learns a robust representation of the glass’ static structure by
constraining it to preserve the roto-translation (SE(3)) equivariance. We show
that this constraint significantly improves the predictive power at comparable
or reduced number of parameters but most importantly, improves the ability to
generalize to unseen temperatures. While remaining a Deep network, our model
has improved interpretability compared to other GNNs, as the action of our
basic convolution layer relates directly to well-known rotation-invariant
expert features. Through transfer-learning experiments displaying unprecedented
performance, we demonstrate that our network learns a robust representation,
which allows us to push forward the idea of a learned structural order
parameter for glasses.
[COMMENTS]
Submitted to SciPost. 15 pages, 9 figures plus references and 4 pages
of appendix
[LINK]
http://arxiv.org/abs/2211.03226v3
[DATE]
2024-04-12 23:52:37+08:00
[CATEGORIES]
cs.LG
Generalization in diffusion models arises from geometry-adaptive harmonic representations
[AUTHORS]
Zahra Kadkhodaie, Florentin Guth, Eero P. Simoncelli, Stéphane Mallat
[ABSTRACT]
Deep neural networks (DNNs) trained for image denoising are able to generate
high-quality samples with score-based reverse diffusion algorithms. These
impressive capabilities seem to imply an escape from the curse of
dimensionality, but recent reports of memorization of the training set raise
the question of whether these networks are learning the “true” continuous
density of the data. Here, we show that two DNNs trained on non-overlapping
subsets of a dataset learn nearly the same score function, and thus the same
density, when the number of training images is large enough. In this regime of
strong generalization, diffusion-generated images are distinct from the
training set, and are of high visual quality, suggesting that the inductive
biases of the DNNs are well-aligned with the data density. We analyze the
learned denoising functions and show that the inductive biases give rise to a
shrinkage operation in a basis adapted to the underlying image. Examination of
these bases reveals oscillating harmonic structures along contours and in
homogeneous regions. We demonstrate that trained denoisers are inductively
biased towards these geometry-adaptive harmonic bases since they arise not only
when the network is trained on photographic images, but also when it is trained
on image classes supported on low-dimensional manifolds for which the harmonic
basis is suboptimal. Finally, we show that when trained on regular image
classes for which the optimal basis is known to be geometry-adaptive and
harmonic, the denoising performance of the networks is near-optimal.
[COMMENTS]
Accepted for oral presentation at ICLR, Vienna, May 2024
[LINK]
http://arxiv.org/abs/2310.02557v3
[DATE]
2024-04-12 23:48:47+08:00
[CATEGORIES]
cs.LG
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
[AUTHORS]
Tianyu Zhu, Myong Chol Jung, Jesse Clark
[ABSTRACT]
Contrastive learning has gained widespread adoption for retrieval tasks due
to its minimal requirement for manual annotations. However, popular contrastive
frameworks typically learn from binary relevance, making them ineffective at
incorporating direct fine-grained rankings. In this paper, we curate a
large-scale dataset featuring detailed relevance scores for each query-document
pair to facilitate future research and evaluation. Subsequently, we propose
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking (GCL),
which is designed to learn from fine-grained rankings beyond binary relevance
scores. Our results show that GCL achieves a 94.5% increase in NDCG@10 for
in-domain and 26.3 to 48.8% increases for cold-start evaluations, all relative
to the CLIP baseline and involving ground truth rankings.
[LINK]
http://arxiv.org/abs/2404.08535v1
[DATE]
2024-04-12 23:30:03+08:00
[CATEGORIES]
cs.LG
Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement
[AUTHORS]
Lucas Murray, Tatiana Castillo, Jaime Carrasco, Andrés Weintraub, Richard Weber, Isaac Martín de Diego, José Ramón González, Jordi García-Gonzalo
[ABSTRACT]
Over the past decades, the increase in both frequency and intensity of
large-scale wildfires due to climate change has emerged as a significant
natural threat. The pressing need to design resilient landscapes capable of
withstanding such disasters has become paramount, requiring the development of
advanced decision-support tools. Existing methodologies, including Mixed
Integer Programming, Stochastic Optimization, and Network Theory, have proven
effective but are hindered by computational demands, limiting their
applicability.
In response to this challenge, we propose using artificial intelligence
techniques, specifically Deep Reinforcement Learning, to address the complex
problem of firebreak placement in the landscape. We employ value-function based
approaches like Deep Q-Learning, Double Deep Q-Learning, and Dueling Double
Deep Q-Learning. Utilizing the Cell2Fire fire spread simulator combined with
Convolutional Neural Networks, we have successfully implemented a computational
agent capable of learning firebreak locations within a forest environment,
achieving good results.
Furthermore, we incorporate a pre-training loop, initially teaching our agent
to mimic a heuristic-based algorithm and observe that it consistently exceeds
the performance of these solutions. Our findings underscore the immense
potential of Deep Reinforcement Learning for operational research challenges,
especially in fire prevention. Our approach demonstrates convergence with
highly favorable results in problem instances as large as 40 x 40 cells,
marking a significant milestone in applying Reinforcement Learning to this
critical issue.
To the best of our knowledge, this study represents a pioneering effort in
using Reinforcement Learning to address the aforementioned problem, offering
promising perspectives in fire prevention and landscape management
[COMMENTS]
20 pages, 15 figures
[LINK]
http://arxiv.org/abs/2404.08523v1
[DATE]
2024-04-12 23:10:57+08:00
[CATEGORIES]
cs.LG
Adversarial Imitation Learning via Boosting
[AUTHORS]
Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun
[ABSTRACT]
Adversarial imitation learning (AIL) has stood out as a dominant framework
across various imitation learning (IL) applications, with Discriminator Actor
Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of
off-policy learning algorithms in improving sample efficiency and scalability
to higher-dimensional observations. Despite DAC’s empirical success, the
original AIL objective is on-policy and DAC’s ad-hoc application of off-policy
training does not guarantee successful imitation (Kostrikov et al., 2019;
2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this
issue by deriving a fully off-policy AIL objective. Instead in this work, we
develop a novel and principled AIL algorithm via the framework of boosting.
Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly
weighted weak learners (i.e., policies) and trains a discriminator that
witnesses the maximum discrepancy between the distributions of the ensemble and
the expert policy. We maintain a weighted replay buffer to represent the
state-action distribution induced by the ensemble, allowing us to train
discriminators using the entire data collected so far. In the weighted replay
buffer, the contribution of the data from older policies are properly
discounted with the weight computed based on the boosting framework.
Empirically, we evaluate our algorithm on both controller state-based and
pixel-based environments from the DeepMind Control Suite. AILBoost outperforms
DAC on both types of environments, demonstrating the benefit of properly
weighting replay buffer data for off-policy training. On state-based
environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021),
achieving competitive performance with as little as one expert trajectory.
[COMMENTS]
19 pages, 7 figures, 4 tables, 3 algorithms, ICLR 2024
[LINK]
http://arxiv.org/abs/2404.08513v1
[DATE]
2024-04-12 22:53:36+08:00
[CATEGORIES]
cs.LG
RFFNet: Large-Scale Interpretable Kernel Methods via Random Fourier Features
[AUTHORS]
Mateus P. Otto, Rafael Izbicki
[ABSTRACT]
Kernel methods provide a flexible and theoretically grounded approach to
nonlinear and nonparametric learning. While memory and run-time requirements
hinder their applicability to large datasets, many low-rank kernel
approximations, such as random Fourier features, were recently developed to
scale up such kernel methods. However, these scalable approaches are based on
approximations of isotropic kernels, which cannot remove the influence of
irrelevant features. In this work, we design random Fourier features for a
family of automatic relevance determination (ARD) kernels, and introduce
RFFNet, a new large-scale kernel method that learns the kernel relevances’ on
the fly via first-order stochastic optimization. We present an effective
initialization scheme for the method’s non-convex objective function, evaluate
if hard-thresholding RFFNet’s learned relevances yield a sensible rule for
variable selection, and perform an extensive ablation study of RFFNet’s
components. Numerical validation on simulated and real-world data shows that
our approach has a small memory footprint and run-time, achieves low prediction
error, and effectively identifies relevant features, thus leading to more
interpretable solutions. We supply users with an efficient, PyTorch-based
library, that adheres to the scikit-learn standard API and code for fully
reproducing our results.
[COMMENTS]
New datasets, ablation studies, and discussion of method’s
components. 45 pages, 11 figures
[LINK]
http://arxiv.org/abs/2211.06410v2
[DATE]
2024-04-12 22:51:32+08:00
[CATEGORIES]
cs.LG
Approximate Stein Classes for Truncated Density Estimation
[AUTHORS]
Daniel J. Williams, Song Liu
[ABSTRACT]
Estimating truncated density models is difficult, as these models have
intractable normalising constants and hard to satisfy boundary conditions.
Score matching can be adapted to solve the truncated density estimation
problem, but requires a continuous weighting function which takes zero at the
boundary and is positive elsewhere. Evaluation of such a weighting function
(and its gradient) often requires a closed-form expression of the truncation
boundary and finding a solution to a complicated optimisation problem. In this
paper, we propose approximate Stein classes, which in turn leads to a relaxed
Stein identity for truncated density estimation. We develop a novel discrepancy
measure, truncated kernelised Stein discrepancy (TKSD), which does not require
fixing a weighting function in advance, and can be evaluated using only samples
on the boundary. We estimate a truncated density model by minimising the
Lagrangian dual of TKSD. Finally, experiments show the accuracy of our method
to be an improvement over previous works even without the explicit functional
form of the boundary.
[COMMENTS]
Accepted to ICML 2023
[LINK]
http://arxiv.org/abs/2306.00602v2
[DATE]
2024-04-12 22:45:07+08:00
[CATEGORIES]
cs.LG
Identifying Important Group of Pixels using Interactions
[AUTHORS]
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
[ABSTRACT]
To better understand the behavior of image classifiers, it is useful to
visualize the contribution of individual pixels to the model prediction. In
this study, we propose a method, MoXI ($\textbf{Mo}$del e$\textbf{X}$planation
by $\textbf{I}$nteractions), that efficiently and accurately identifies a group
of pixels with high prediction confidence. The proposed method employs
game-theoretic concepts, Shapley values and interactions, taking into account
the effects of individual pixels and the cooperative influence of pixels on
model confidence. Theoretical analysis and experiments demonstrate that our
method better identifies the pixels that are highly contributing to the model
outputs than widely-used visualization by Grad-CAM, Attention rollout, and
Shapley value. While prior studies have suffered from the exponential
computational cost in the computation of Shapley value and interactions, we
show that this can be reduced to quadratic cost for our task. The code is
available at https://github.com/KosukeSumiyasu/MoXI.
[COMMENTS]
CVPR 2024
[LINK]
http://arxiv.org/abs/2401.03785v2
[DATE]
2024-04-12 22:44:04+08:00
[CATEGORIES]
cs.LG
Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support
[AUTHORS]
Tim Reichelt, Luke Ong, Tom Rainforth
[ABSTRACT]
The posterior in probabilistic programs with stochastic support decomposes as
a weighted sum of the local posterior distributions associated with each
possible program path. We show that making predictions with this full posterior
implicitly performs a Bayesian model averaging (BMA) over paths. This is
potentially problematic, as BMA weights can be unstable due to model
misspecification or inference approximations, leading to sub-optimal
predictions in turn. To remedy this issue, we propose alternative mechanisms
for path weighting: one based on stacking and one based on ideas from
PAC-Bayes. We show how both can be implemented as a cheap post-processing step
on top of existing inference engines. In our experiments, we find them to be
more robust and lead to better predictions compared to the default BMA weights.
[COMMENTS]
Accepted at the 27th International Conference on Artificial
Intelligence and Statistics (AISTATS) 2024
[LINK]
http://arxiv.org/abs/2310.14888v2
[DATE]
2024-04-12 22:36:18+08:00
[CATEGORIES]
cs.LG
Integrated Variational Fourier Features for Fast Spatial Modelling with Gaussian Processes
[AUTHORS]
Talay M Cheema, Carl Edward Rasmussen
[ABSTRACT]
Sparse variational approximations are popular methods for scaling up
inference and learning in Gaussian processes to larger datasets. For $N$
training points, exact inference has $O(N^3)$ cost; with $M \ll N$ features,
state of the art sparse variational methods have $O(NM^2)$ cost. Recently,
methods have been proposed using more sophisticated features; these promise
$O(M^3)$ cost, with good performance in low dimensional tasks such as spatial
modelling, but they only work with a very limited class of kernels, excluding
some of the most commonly used. In this work, we propose integrated Fourier
features, which extends these performance benefits to a very broad class of
stationary covariance functions. We motivate the method and choice of
parameters from a convergence analysis and empirical exploration, and show
practical speedup in synthetic and real world spatial regression tasks.
[LINK]
http://arxiv.org/abs/2308.14142v2
[DATE]
2024-04-12 22:31:51+08:00
[CATEGORIES]
cs.LG
On the Minimax Regret in Online Ranking with Top-k Feedback
[AUTHORS]
Mingyuan Zhang, Ambuj Tewari
[ABSTRACT]
In online ranking, a learning algorithm sequentially ranks a set of items and
receives feedback on its ranking in the form of relevance scores. Since
obtaining relevance scores typically involves human annotation, it is of great
interest to consider a partial feedback setting where feedback is restricted to
the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a
framework to analyze online ranking algorithms with top $k$ feedback. A key
element in their work was the use of techniques from partial monitoring. In
this paper, we further investigate online ranking with top $k$ feedback and
solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a
full characterization of minimax regret rates with the top $k$ feedback model
for all $k$ and for the following ranking performance measures: Pairwise Loss,
Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient
algorithm that achieves the minimax regret rate for Precision@n.
[LINK]
http://arxiv.org/abs/2309.02425v2
[DATE]
2024-04-12 22:28:39+08:00
[CATEGORIES]
cs.LG
Multimodal Learning for Materials
[AUTHORS]
Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić
[ABSTRACT]
Artificial intelligence is transforming computational materials science,
improving the prediction of material properties, and accelerating the discovery
of novel materials. Recently, publicly available material data repositories
have grown rapidly. This growth encompasses not only more materials, but also a
greater variety and quantity of their associated properties. Existing machine
learning efforts in materials science focus primarily on single-modality tasks,
i.e., relationships between materials and a single physical property, thus not
taking advantage of the rich and multimodal set of material properties. Here,
we introduce Multimodal Learning for Materials (MultiMat), which enables
self-supervised multi-modality training of foundation models for materials. We
demonstrate our framework’s potential using data from the Materials Project
database on multiple axes: (i) MultiMat achieves state-of-the-art performance
for challenging material property prediction tasks; (ii) MultiMat enables novel
and accurate material discovery via latent space similarity, enabling screening
for stable materials with desired properties; and (iii) MultiMat encodes
interpretable emergent features that may provide novel scientific insights.
[COMMENTS]
11 pages, 4 figures
[LINK]
http://arxiv.org/abs/2312.00111v3
[DATE]
2024-04-12 22:17:34+08:00
[CATEGORIES]
cs.LG
A Quadratic Synchronization Rule for Distributed Deep Learning
[AUTHORS]
Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang
[COMMENTS]
camera-ready version for ICLR‘24
[LINK]
http://arxiv.org/abs/2310.14423v2
[DATE]
2024-04-12 21:59:01+08:00
[CATEGORIES]
cs.LG
Solving Parametric PDEs with Radial Basis Functions and Deep Neural Networks
[AUTHORS]
Guanhang Lei, Zhen Lei, Lei Shi, Chenyu Zeng
[ABSTRACT]
We propose the POD-DNN, a novel algorithm leveraging deep neural networks
(DNNs) along with radial basis functions (RBFs) in the context of the proper
orthogonal decomposition (POD) reduced basis method (RBM), aimed at
approximating the parametric mapping of parametric partial differential
equations on irregular domains. The POD-DNN algorithm capitalizes on the
low-dimensional characteristics of the solution manifold for parametric
equations, alongside the inherent offline-online computational strategy of RBM
and DNNs. In numerical experiments, POD-DNN demonstrates significantly
accelerated computation speeds during the online phase. Compared to other
algorithms that utilize RBF without integrating DNNs, POD-DNN substantially
improves the computational speed in the online inference process. Furthermore,
under reasonable assumptions, we have rigorously derived upper bounds on the
complexity of approximating parametric mappings with POD-DNN, thereby providing
a theoretical analysis of the algorithm’s empirical performance.
[LINK]
http://arxiv.org/abs/2404.06834v2
[DATE]
2024-04-12 21:47:07+08:00
[CATEGORIES]
cs.LG
TSLANet: Rethinking Transformers for Time Series Representation Learning
[AUTHORS]
Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Xiaoli Li
[ABSTRACT]
Time series data, characterized by its intrinsic long and short-range
dependencies, poses a unique challenge across analytical applications. While
Transformer-based models excel at capturing long-range dependencies, they face
limitations in noise sensitivity, computational efficiency, and overfitting
with smaller datasets. In response, we introduce a novel Time Series
Lightweight Adaptive Network (TSLANet), as a universal convolutional model for
diverse time series tasks. Specifically, we propose an Adaptive Spectral Block,
harnessing Fourier analysis to enhance feature representation and to capture
both long-term and short-term interactions while mitigating noise via adaptive
thresholding. Additionally, we introduce an Interactive Convolution Block and
leverage self-supervised learning to refine the capacity of TSLANet for
decoding complex temporal patterns and improve its robustness on different
datasets. Our comprehensive experiments demonstrate that TSLANet outperforms
state-of-the-art models in various tasks spanning classification, forecasting,
and anomaly detection, showcasing its resilience and adaptability across a
spectrum of noise levels and data sizes. The code is available at
\url{https://github.com/emadeldeen24/TSLANet}
[LINK]
http://arxiv.org/abs/2404.08472v1
[DATE]
2024-04-12 21:41:29+08:00
[CATEGORIES]
cs.LG
VADA: a Data-Driven Simulator for Nanopore Sequencing
[AUTHORS]
Jonas Niederle, Simon Koop, Marc Pagès-Gallego, Vlado Menkovski
[ABSTRACT]
Nanopore sequencing offers the ability for real-time analysis of long DNA
sequences at a low cost, enabling new applications such as early detection of
cancer. Due to the complex nature of nanopore measurements and the high cost of
obtaining ground truth datasets, there is a need for nanopore simulators.
Existing simulators rely on handcrafted rules and parameters and do not learn
an internal representation that would allow for analysing underlying biological
factors of interest. Instead, we propose VADA, a purely data-driven method for
simulating nanopores based on an autoregressive latent variable model. We embed
subsequences of DNA and introduce a conditional prior to address the challenge
of a collapsing conditioning. We introduce an auxiliary regressor on the latent
variable to encourage our model to learn an informative latent representation.
We empirically demonstrate that our model achieves competitive simulation
performance on experimental nanopore data. Moreover, we show we have learned an
informative latent representation that is predictive of the DNA labels. We
hypothesize that other biological factors of interest, beyond the DNA labels,
can potentially be extracted from such a learned latent representation.
[LINK]
http://arxiv.org/abs/2404.08722v1
[DATE]
2024-04-12 21:24:28+08:00
[CATEGORIES]
cs.LG
OTTER: Improving Zero-Shot Classification via Optimal Transport
[AUTHORS]
Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala
[ABSTRACT]
Popular zero-shot models suffer due to artifacts inherited from pretraining.
A particularly detrimental artifact, caused by unbalanced web-scale pretraining
data, is mismatched label distribution. Existing approaches that seek to repair
the label distribution are not suitable in zero-shot settings, as they have
incompatible requirements such as access to labeled downstream task data or
knowledge of the true label balance in the pretraining distribution. We
sidestep these challenges and introduce a simple and lightweight approach to
adjust pretrained model predictions via optimal transport. Our technique
requires only an estimate of the label distribution of a downstream task.
Theoretically, we characterize the improvement produced by our procedure under
certain mild conditions and provide bounds on the error caused by
misspecification. Empirically, we validate our method in a wide array of
zero-shot image and text classification tasks, improving accuracy by 4.8% and
15.9% on average, and beating baselines like Prior Matching – often by
significant margins – in 17 out of 21 datasets.
[COMMENTS]
29 pages
[LINK]
http://arxiv.org/abs/2404.08461v1
[DATE]
2024-04-12 21:18:47+08:00
[CATEGORIES]
cs.LG
Unsupervised Learning of Group Invariant and Equivariant Representations
[AUTHORS]
Robin Winter, Marco Bertolini, Tuan Le, Frank Noé, Djork-Arné Clevert
[ABSTRACT]
Equivariant neural networks, whose hidden features transform according to
representations of a group G acting on the data, exhibit training efficiency
and an improved generalisation performance. In this work, we extend group
invariant and equivariant representation learning to the field of unsupervised
deep learning. We propose a general learning strategy based on an
encoder-decoder framework in which the latent representation is separated in an
invariant term and an equivariant group action component. The key idea is that
the network learns to encode and decode data to and from a group-invariant
representation by additionally learning to predict the appropriate group action
to align input and output pose to solve the reconstruction task. We derive the
necessary conditions on the equivariant encoder, and we present a construction
valid for any G, both discrete and continuous. We describe explicitly our
construction for rotations, translations and permutations. We test the validity
and the robustness of our approach in a variety of experiments with diverse
data types employing different network architectures.
[LINK]
http://arxiv.org/abs/2202.07559v3
[DATE]
2024-04-12 21:16:54+08:00
[CATEGORIES]
cs.LG
On the Independence Assumption in Neurosymbolic Learning
[AUTHORS]
Emile van Krieken, Pasquale Minervini, Edoardo M. Ponti, Antonio Vergari
[ABSTRACT]
State-of-the-art neurosymbolic learning systems use probabilistic reasoning
to guide neural networks towards predictions that conform to logical
constraints over symbols. Many such systems assume that the probabilities of
the considered symbols are conditionally independent given the input to
simplify learning and reasoning. We study and criticise this assumption,
highlighting how it can hinder optimisation and prevent uncertainty
quantification. We prove that loss functions bias conditionally independent
neural networks to become overconfident in their predictions. As a result, they
are unable to represent uncertainty over multiple valid options. Furthermore,
we prove that these loss functions are difficult to optimise: they are
non-convex, and their minima are usually highly disconnected. Our theoretical
analysis gives the foundation for replacing the conditional independence
assumption and designing more expressive neurosymbolic probabilistic models.
[COMMENTS]
11 pages, 8 appendix pages, 9 figures
[LINK]
http://arxiv.org/abs/2404.08458v1
[DATE]
2024-04-12 21:09:48+08:00
[CATEGORIES]
cs.LG
A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations
[AUTHORS]
Lorenc Kapllani, Long Teng
[ABSTRACT]
In this work, we propose a novel backward differential deep learning-based
algorithm for solving high-dimensional nonlinear backward stochastic
differential equations (BSDEs), where the deep neural network (DNN) models are
trained not only on the inputs and labels but also the differentials of the
corresponding labels. This is motivated by the fact that differential deep
learning can provide an efficient approximation of the labels and their
derivatives with respect to inputs. The BSDEs are reformulated as differential
deep learning problems by using Malliavin calculus. The Malliavin derivatives
of solution to a BSDE satisfy themselves another BSDE, resulting thus in a
system of BSDEs. Such formulation requires the estimation of the solution, its
gradient, and the Hessian matrix, represented by the triple of processes
$\left(Y, Z, \Gamma\right).$ All the integrals within this system are
discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed
to approximate the triple of these unknown processes. The DNN parameters are
backwardly optimized at each time step by minimizing a differential learning
type loss function, which is defined as a weighted sum of the dynamics of the
discretized BSDE system, with the first term providing the dynamics of the
process $Y$ and the other the process $Z$. An error analysis is carried out to
show the convergence of the proposed algorithm. Various numerical experiments
up to $50$ dimensions are provided to demonstrate the high efficiency. Both
theoretically and numerically, it is demonstrated that our proposed scheme is
more efficient compared to other contemporary deep learning-based
methodologies, especially in the computation of the process $\Gamma$.
[COMMENTS]
40 pages, 5 figures, 5 tables
[LINK]
http://arxiv.org/abs/2404.08456v1
[DATE]
2024-04-12 21:05:35+08:00
[CATEGORIES]
cs.LG
Federated Optimization with Doubly Regularized Drift Correction
[AUTHORS]
Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich
[ABSTRACT]
Federated learning is a distributed optimization paradigm that allows
training machine learning models across decentralized devices while keeping the
data localized. The standard method, FedAvg, suffers from client drift which
can hamper performance and increase communication costs over centralized
methods. Previous works proposed various strategies to mitigate drift, yet none
have shown uniformly improved communication-computation trade-offs over vanilla
gradient descent.
In this work, we revisit DANE, an established method in distributed
optimization. We show that (i) DANE can achieve the desired communication
reduction under Hessian similarity constraints. Furthermore, (ii) we present an
extension, DANE+, which supports arbitrary inexact local solvers and has more
freedom to choose how to aggregate the local updates. We propose (iii) a novel
method, FedRed, which has improved local computational complexity and retains
the same communication complexity compared to DANE/DANE+. This is achieved by
using doubly regularized drift correction.
[LINK]
http://arxiv.org/abs/2404.08447v1
[DATE]
2024-04-12 20:57:43+08:00
[CATEGORIES]
cs.LG
Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation
[AUTHORS]
Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
[ABSTRACT]
State-of-the-art methods for conditional average treatment effect (CATE)
estimation make widespread use of representation learning. Here, the idea is to
reduce the variance of the low-sample CATE estimation by a (potentially
constrained) low-dimensional representation. However, low-dimensional
representations can lose information about the observed confounders and thus
lead to bias, because of which the validity of representation learning for CATE
estimation is typically violated. In this paper, we propose a new,
representation-agnostic refutation framework for estimating bounds on the
representation-induced confounding bias that comes from dimensionality
reduction (or other constraints on the representations) in CATE estimation.
First, we establish theoretically under which conditions CATE is
non-identifiable given low-dimensional (constrained) representations. Second,
as our remedy, we propose a neural refutation framework which performs partial
identification of CATE or, equivalently, aims at estimating lower and upper
bounds of the representation-induced confounding bias. We demonstrate the
effectiveness of our bounds in a series of experiments. In sum, our refutation
framework is of direct relevance in practice where the validity of CATE
estimation is of importance.
[LINK]
http://arxiv.org/abs/2311.11321v3
[DATE]
2024-04-12 20:57:40+08:00
[CATEGORIES]
cs.LG
An improved tabular data generator with VAE-GMM integration
[AUTHORS]
Patricia A. Apellániz, Juan Parras, Santiago Zazo
[ABSTRACT]
The rising use of machine learning in various fields requires robust methods
to create synthetic tabular data. Data should preserve key characteristics
while addressing data scarcity challenges. Current approaches based on
Generative Adversarial Networks, such as the state-of-the-art CTGAN model,
struggle with the complex structures inherent in tabular data. These data often
contain both continuous and discrete features with non-Gaussian distributions.
Therefore, we propose a novel Variational Autoencoder (VAE)-based model that
addresses these limitations. Inspired by the TVAE model, our approach
incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE
architecture. This avoids the limitations imposed by assuming a strictly
Gaussian latent space, allowing for a more accurate representation of the
underlying data distribution during data generation. Furthermore, our model
offers enhanced flexibility by allowing the use of various differentiable
distributions for individual features, making it possible to handle both
continuous and discrete data types. We thoroughly validate our model on three
real-world datasets with mixed data types, including two medically relevant
ones, based on their resemblance and utility. This evaluation demonstrates
significant outperformance against CTGAN and TVAE, establishing its potential
as a valuable tool for generating synthetic tabular data in various domains,
particularly in healthcare.
[COMMENTS]
7 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.08434v1
[DATE]
2024-04-12 20:31:06+08:00
[CATEGORIES]
cs.LG
Adversarially Robust Spiking Neural Networks Through Conversion
[AUTHORS]
Ozan Özdenizci, Robert Legenstein
[ABSTRACT]
Spiking neural networks (SNNs) provide an energy-efficient alternative to a
variety of artificial neural network (ANN) based AI applications. As the
progress in neuromorphic computing with SNNs expands their use in applications,
the problem of adversarial robustness of SNNs becomes more pronounced. To the
contrary of the widely explored end-to-end adversarial training based
solutions, we address the limited progress in scalable robust SNN training
methods by proposing an adversarially robust ANN-to-SNN conversion algorithm.
Our method provides an efficient approach to embrace various computationally
demanding robust learning objectives that have been proposed for ANNs. During a
post-conversion robust finetuning phase, our method adversarially optimizes
both layer-wise firing thresholds and synaptic connectivity weights of the SNN
to maintain transferred robustness gains from the pre-trained ANN. We perform
experimental evaluations in a novel setting proposed to rigorously assess the
robustness of SNNs, where numerous adaptive adversarial attacks that account
for the spike-based operation dynamics are considered. Results show that our
approach yields a scalable state-of-the-art solution for adversarially robust
deep SNNs with low-latency.
[COMMENTS]
Transactions on Machine Learning Research (TMLR), 2024
[LINK]
http://arxiv.org/abs/2311.09266v2
[DATE]
2024-04-12 20:18:19+08:00
[CATEGORIES]
cs.LG
Contrastive Graph Pooling for Explainable Classification of Brain Networks
[AUTHORS]
Jiaxing Xu, Qingtian Bian, Xinhang Li, Aihu Zhang, Yiping Ke, Miao Qiao, Wei Zhang, Wei Khang Jeremy Sim, Balázs Gulyás
[ABSTRACT]
Functional magnetic resonance imaging (fMRI) is a commonly used technique to
measure neural activation. Its application has been particularly important in
identifying underlying neurodegenerative conditions such as Parkinson’s,
Alzheimer’s, and Autism. Recent analysis of fMRI data models the brain as a
graph and extracts features by graph neural networks (GNNs). However, the
unique characteristics of fMRI data require a special design of GNN. Tailoring
GNN to generate effective and domain-explainable features remains challenging.
In this paper, we propose a contrastive dual-attention block and a
differentiable graph pooling method called ContrastPool to better utilize GNN
for brain networks, meeting fMRI-specific requirements. We apply our method to
5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its
superiority over state-of-the-art baselines. Our case study confirms that the
patterns extracted by our method match the domain knowledge in neuroscience
literature, and disclose direct and interesting insights. Our contributions
underscore the potential of ContrastPool for advancing the understanding of
brain networks and neurodegenerative conditions. The source code is available
at https://github.com/AngusMonroe/ContrastPool.
[LINK]
http://arxiv.org/abs/2307.11133v2
[DATE]
2024-04-12 20:05:57+08:00
[CATEGORIES]
cs.LG
Enhancing MAP-Elites with Multiple Parallel Evolution Strategies
[AUTHORS]
Manon Flageat, Bryan Lim, Antoine Cully
[ABSTRACT]
With the development of fast and massively parallel evaluations in many
domains, Quality-Diversity (QD) algorithms, that already proved promising in a
large range of applications, have seen their potential multiplied. However, we
have yet to understand how to best use a large number of evaluations as using
them for random variations alone is not always effective. High-dimensional
search spaces are a typical situation where random variations struggle to
effectively search. Another situation is uncertain settings where solutions can
appear better than they truly are and naively evaluating more solutions might
mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a
novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast
parallel evaluations more effectively. MEMES maintains multiple (up to 100)
simultaneous ES processes, each with its own independent objective and reset
mechanism designed for QD optimisation, all on just a single GPU. We show that
MEMES outperforms both gradient-based and mutation-based QD algorithms on
black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its
benefit across domains. Additionally, our approach outperforms sampling-based
QD methods in uncertain domains when given the same evaluation budget. Overall,
MEMES generates reproducible solutions that are high-performing and diverse
through large-scale ES optimisation on easily accessible hardware.
[LINK]
http://arxiv.org/abs/2303.06137v2
[DATE]
2024-04-12 19:51:29+08:00
[CATEGORIES]
cs.LG
Kernel-Based Testing for Single-Cell Differential Analysis
[AUTHORS]
Anthony Ozier-Lafontaine, Camille Fourneaux, Ghislain Durif, Polina Arsenteva, Céline Vallot, Olivier Gandrillon, Sandrine Giraud, Bertrand Michel, Franck Picard
[LINK]
http://arxiv.org/abs/2307.08509v3
[DATE]
2024-04-12 19:48:03+08:00
[CATEGORIES]
cs.LG
Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification
[AUTHORS]
Kleanthis Malialis, Jin Li, Christos G. Panayiotou, Marios M. Polycarpou
[COMMENTS]
IEEE World Congress on Computational Intelligence (WCCI) 2024;
Keywords: graph streams, concept drift, incremental learning, graph
prototypes, nonstationary environments
[LINK]
http://arxiv.org/abs/2404.02572v2
[DATE]
2024-04-12 19:43:07+08:00
[CATEGORIES]
cs.LG
Box Facets and Cut Facets of Lifted Multicut Polytopes
[AUTHORS]
Lucas Fabian Naumann, Jannik Irmai, Shengxian Zhao, Bjoern Andres
[ABSTRACT]
The lifted multicut problem is a combinatorial optimization problem whose
feasible solutions relate one-to-one to the decompositions of a graph $G = (V,
E)$. Given an augmentation $\widehat{G} = (V, E \cup F)$ of $G$ and given costs
$c \in \mathbb{R}^{E \cup F}$, the objective is to minimize the sum of those
$c_{uw}$ with $uw \in E \cup F$ for which $u$ and $w$ are in distinct
components. For $F = \emptyset$, the problem specializes to the multicut
problem, and for $E = \tbinom{V}{2}$ to the clique partitioning problem. We
study a binary linear program formulation of the lifted multicut problem. More
specifically, we contribute to the analysis of the associated lifted multicut
polytopes: Firstly, we establish a necessary, sufficient and efficiently
decidable condition for a lower box inequality to define a facet. Secondly, we
show that deciding whether a cut inequality of the binary linear program
defines a facet is NP-hard.
[COMMENTS]
10 pages, 5 figures
[LINK]
http://arxiv.org/abs/2402.16814v3
[DATE]
2024-04-12 19:38:20+08:00
[CATEGORIES]
cs.LG
Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning
[AUTHORS]
Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo
[ABSTRACT]
Contemporary automatic first break (FB) picking methods typically analyze 1D
signals, 2D source gathers, or 3D source-receiver gathers. Utilizing
higher-dimensional data, such as 2D or 3D, incorporates global features,
improving the stability of local picking. Despite the benefits,
high-dimensional data requires structured input and increases computational
demands. Addressing this, we propose a novel approach using deep graph learning
called DGL-FB, constructing a large graph to efficiently extract information.
In this graph, each seismic trace is represented as a node, connected by edges
that reflect similarities. To manage the size of the graph, we develop a
subgraph sampling technique to streamline model training and inference. Our
proposed framework, DGL-FB, leverages deep graph learning for FB picking. It
encodes subgraphs into global features using a deep graph encoder.
Subsequently, the encoded global features are combined with local node signals
and fed into a ResUNet-based 1D segmentation network for FB detection. Field
survey evaluations of DGL-FB show superior accuracy and stability compared to a
2D U-Net-based benchmark method.
[LINK]
http://arxiv.org/abs/2404.08408v1
[DATE]
2024-04-12 19:36:24+08:00
[CATEGORIES]
cs.LG
Complexity of Probabilistic Reasoning for Neurosymbolic Classification Techniques
[AUTHORS]
Arthur Ledaguenel, Céline Hudelot, Mostepha Khouadjia
[ABSTRACT]
Neurosymbolic artificial intelligence is a growing field of research aiming
to combine neural network learning capabilities with the reasoning abilities of
symbolic systems. Informed multi-label classification is a sub-field of
neurosymbolic AI which studies how to leverage prior knowledge to improve
neural classification systems. A well known family of neurosymbolic techniques
for informed classification use probabilistic reasoning to integrate this
knowledge during learning, inference or both. Therefore, the asymptotic
complexity of probabilistic reasoning is of cardinal importance to assess the
scalability of such techniques. However, this topic is rarely tackled in the
neurosymbolic literature, which can lead to a poor understanding of the limits
of probabilistic neurosymbolic techniques. In this paper, we introduce a
formalism for informed supervised classification tasks and techniques. We then
build upon this formalism to define three abstract neurosymbolic techniques
based on probabilistic reasoning. Finally, we show computational complexity
results on several representation languages for prior knowledge commonly found
in the neurosymbolic literature.
[COMMENTS]
21 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.08404v1
[DATE]
2024-04-12 19:31:37+08:00
[CATEGORIES]
cs.LG
Calibration-Aware Bayesian Learning
[AUTHORS]
Jiayi Huang, Sangwoo Park, Osvaldo Simeone
[ABSTRACT]
Deep learning models, including modern systems like large language models,
are well known to offer unreliable estimates of the uncertainty of their
decisions. In order to improve the quality of the confidence levels, also known
as calibration, of a model, common approaches entail the addition of either
data-dependent or data-independent regularization terms to the training loss.
Data-dependent regularizers have been recently introduced in the context of
conventional frequentist learning to penalize deviations between confidence and
accuracy. In contrast, data-independent regularizers are at the core of
Bayesian learning, enforcing adherence of the variational distribution in the
model parameter space to a prior density. The former approach is unable to
quantify epistemic uncertainty, while the latter is severely affected by model
misspecification. In light of the limitations of both methods, this paper
proposes an integrated framework, referred to as calibration-aware Bayesian
neural networks (CA-BNNs), that applies both regularizers while optimizing over
a variational distribution as in Bayesian learning. Numerical results validate
the advantages of the proposed approach in terms of expected calibration error
(ECE) and reliability diagrams.
[COMMENTS]
submitted for conference publication
[LINK]
http://arxiv.org/abs/2305.07504v2
[DATE]
2024-04-12 19:30:04+08:00
[CATEGORIES]
cs.LG
Data-Driven Preference Sampling for Pareto Front Learning
[AUTHORS]
Rongguang Ye, Lei Chen, Weiduo Liao, Jinyuan Zhang, Hisao Ishibuchi
[ABSTRACT]
Pareto front learning is a technique that introduces preference vectors in a
neural network to approximate the Pareto front. Previous Pareto front learning
methods have demonstrated high performance in approximating simple Pareto
fronts. These methods often sample preference vectors from a fixed Dirichlet
distribution. However, no fixed sampling distribution can be adapted to diverse
Pareto fronts. Efficiently sampling preference vectors and accurately
estimating the Pareto front is a challenge. To address this challenge, we
propose a data-driven preference vector sampling framework for Pareto front
learning. We utilize the posterior information of the objective functions to
adjust the parameters of the sampling distribution flexibly. In this manner,
the proposed method can sample preference vectors from the location of the
Pareto front with a high probability. Moreover, we design the distribution of
the preference vector as a mixture of Dirichlet distributions to improve the
performance of the model in disconnected Pareto fronts. Extensive experiments
validate the superiority of the proposed method compared with state-of-the-art
algorithms.
[COMMENTS]
International Joint Conference on Neural Network (IJCNN’24)
[LINK]
http://arxiv.org/abs/2404.08397v1
[DATE]
2024-04-12 19:06:22+08:00
[CATEGORIES]
cs.LG
Graph data augmentation with Gromow-Wasserstein Barycenters
[AUTHORS]
Andrea Ponti
[ABSTRACT]
Graphs are ubiquitous in various fields, and deep learning methods have been
successful applied in graph classification tasks. However, building large and
diverse graph datasets for training can be expensive. While augmentation
techniques exist for structured data like images or numerical data, the
augmentation of graph data remains challenging. This is primarily due to the
complex and non-Euclidean nature of graph data. In this paper, it has been
proposed a novel augmentation strategy for graphs that operates in a
non-Euclidean space. This approach leverages graphon estimation, which models
the generative mechanism of networks sequences. Computational results
demonstrate the effectiveness of the proposed augmentation framework in
improving the performance of graph classification models. Additionally, using a
non-Euclidean distance, specifically the Gromow-Wasserstein distance, results
in better approximations of the graphon. This framework also provides a means
to validate different graphon estimation approaches, particularly in real-world
scenarios where the true graphon is unknown.
[COMMENTS]
6 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.08376v1
[DATE]
2024-04-12 18:22:55+08:00
[CATEGORIES]
cs.LG
Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning
[AUTHORS]
Martin Willbo, Aleksis Pirinen, John Martinsson, Edvin Listo Zec, Olof Mogren, Mikael Nilsson
[ABSTRACT]
Land cover classification and change detection are two important applications
of remote sensing and Earth observation (EO) that have benefited greatly from
the advances of deep learning. Convolutional and transformer-based U-net models
are the state-of-the-art architectures for these tasks, and their performances
have been boosted by an increased availability of large-scale annotated EO
datasets. However, the influence of different visual characteristics of the
input EO data on a model’s predictions is not well understood. In this work we
systematically examine model sensitivities with respect to several color- and
texture-based distortions on the input EO data during inference, given models
that have been trained without such distortions. We conduct experiments with
multiple state-of-the-art segmentation networks for land cover classification
and show that they are in general more sensitive to texture than to color
distortions. Beyond revealing intriguing characteristics of widely used land
cover classification models, our results can also be used to guide the
development of more robust models within the EO domain.
[LINK]
http://arxiv.org/abs/2403.04385v2
[DATE]
2024-04-12 18:15:45+08:00
[CATEGORIES]
cs.LG
Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systems
[AUTHORS]
Yuqing Wang, Mika V. Mäntylä, Serge Demeyer, Mutlu Beyazit, Joanna Kisaakye, Jesse Nyyssölä
[ABSTRACT]
Microservice-based systems (MSS) may experience failures in various fault
categories due to their complex and dynamic nature. To effectively handle
failures, AIOps tools utilize trace-based anomaly detection and root cause
analysis. In this paper, we propose a novel framework for few-shot abnormal
trace classification for MSS. Our framework comprises two main components: (1)
Multi-Head Attention Autoencoder for constructing system-specific trace
representations, which enables (2) Transformer Encoder-based Model-Agnostic
Meta-Learning to perform effective and efficient few-shot learning for abnormal
trace classification. The proposed framework is evaluated on two representative
MSS, Trainticket and OnlineBoutique, with open datasets. The results show that
our framework can adapt the learned knowledge to classify new, unseen abnormal
traces of novel fault categories both within the same system it was initially
trained on and even in the different MSS. Within the same MSS, our framework
achieves an average accuracy of 93.26\% and 85.2\% across 50 meta-testing tasks
for Trainticket and OnlineBoutique, respectively, when provided with 10
instances for each task. In a cross-system context, our framework gets an
average accuracy of 92.19\% and 84.77\% for the same meta-testing tasks of the
respective system, also with 10 instances provided for each task. Our work
demonstrates the applicability of achieving few-shot abnormal trace
classification for MSS and shows how it can enable cross-system adaptability.
This opens an avenue for building more generalized AIOps tools that require
less system-specific data labeling for anomaly detection and root cause
analysis.
[COMMENTS]
12 pages
[LINK]
http://arxiv.org/abs/2403.18998v3
[DATE]
2024-04-12 18:09:16+08:00
[CATEGORIES]
cs.LG
Differentiable All-pole Filters for Time-varying Audio Systems
[AUTHORS]
Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas
[ABSTRACT]
Infinite impulse response filters are an essential building block of many
time-varying audio systems, such as audio effects and synthesisers. However,
their recursive structure impedes end-to-end training of these systems using
automatic differentiation. Although non-recursive filter approximations like
frequency sampling and frame-based processing have been proposed and widely
used in previous works, they cannot accurately reflect the gradient of the
original system. We alleviate this difficulty by re-expressing a time-varying
all-pole filter to backpropagate the gradients through itself, so the filter
implementation is not bound to the technical limitations of automatic
differentiation frameworks. This implementation can be employed within any
audio system containing filters with poles for efficient gradient evaluation.
We demonstrate its training efficiency and expressive capabilities for
modelling real-world dynamic audio systems on a phaser, time-varying
subtractive synthesiser, and feed-forward compressor. We make our code
available and provide the trained audio effect and synth models in a VST plugin
at https://christhetree.github.io/all_pole_filters/.
[COMMENTS]
Submitted to DAFx 2024
[LINK]
http://arxiv.org/abs/2404.07970v2
[DATE]
2024-04-12 17:58:58+08:00
[CATEGORIES]
cs.LG
Self-Supervised k-Space Regularization for Motion-Resolved Abdominal MRI Using Neural Implicit k-Space Representation
[AUTHORS]
Veronika Spieker, Hannah Eichhorn, Jonathan K. Stelter, Wenqi Huang, Rickmer F. Braren, Daniel Rückert, Francisco Sahli Costabal, Kerstin Hammernik, Claudia Prieto, Dimitrios C. Karampinos, Julia A. Schnabel
[ABSTRACT]
Neural implicit k-space representations have shown promising results for
dynamic MRI at high temporal resolutions. Yet, their exclusive training in
k-space limits the application of common image regularization methods to
improve the final reconstruction. In this work, we introduce the concept of
parallel imaging-inspired self-consistency (PISCO), which we incorporate as
novel self-supervised k-space regularization enforcing a consistent
neighborhood relationship. At no additional data cost, the proposed
regularization significantly improves neural implicit k-space reconstructions
on simulated data. Abdominal in-vivo reconstructions using PISCO result in
enhanced spatio-temporal image quality compared to state-of-the-art methods.
Code is available at https://github.com/vjspi/PISCO-NIK.
[COMMENTS]
Under Review
[LINK]
http://arxiv.org/abs/2404.08350v1
[DATE]
2024-04-12 17:31:11+08:00
[CATEGORIES]
cs.LG
Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks
[AUTHORS]
Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang
[ABSTRACT]
Multi-modal learning aims to enhance performance by unifying models from
various modalities but often faces the “modality imbalance” problem in real
data, leading to a bias towards dominant modalities and neglecting others,
thereby limiting its overall effectiveness. To address this challenge, the core
idea is to balance the optimization of each modality to achieve a joint
optimum. Existing approaches often employ a modal-level control mechanism for
adjusting the update of each modal parameter. However, such a global-wise
updating mechanism ignores the different importance of each parameter. Inspired
by subnetwork optimization, we explore a uniform sampling-based optimization
strategy and find it more effective than global-wise updating. According to the
findings, we further propose a novel importance sampling-based, element-wise
joint optimization method, called Adaptively Mask Subnetworks Considering Modal
Significance(AMSS). Specifically, we incorporate mutual information rates to
determine the modal significance and employ non-uniform adaptive sampling to
select foreground subnetworks from each modality for parameter updates, thereby
rebalancing multi-modal learning. Additionally, we demonstrate the reliability
of the AMSS strategy through convergence analysis. Building upon theoretical
insights, we further enhance the multi-modal mask subnetwork strategy using
unbiased estimation, referred to as AMSS+. Extensive experiments reveal the
superiority of our approach over comparison methods.
[COMMENTS]
17 pages;6 figures
[LINK]
http://arxiv.org/abs/2404.08347v1
[DATE]
2024-04-12 17:22:24+08:00
[CATEGORIES]
cs.LG
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
[AUTHORS]
Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
[ABSTRACT]
Transformers come with a high computational cost, yet their effectiveness in
addressing problems in language and vision has sparked extensive research aimed
at enhancing their efficiency. However, diverse experimental conditions,
spanning multiple input domains, prevent a fair comparison based solely on
reported results, posing challenges for model selection. To address this gap in
comparability, we design a comprehensive benchmark of more than 30 models for
image classification, evaluating key efficiency aspects, including accuracy,
speed, and memory usage. This benchmark provides a standardized baseline across
the landscape of efficiency-oriented transformers and our framework of
analysis, based on Pareto optimality, reveals surprising insights. Despite
claims of other models being more efficient, ViT remains Pareto optimal across
multiple metrics. We observe that hybrid attention-CNN models exhibit
remarkable inference memory- and parameter-efficiency. Moreover, our benchmark
shows that using a larger model in general is more efficient than using higher
resolution images. Thanks to our holistic evaluation, we provide a centralized
resource for practitioners and researchers, facilitating informed decisions
when selecting transformers or measuring progress of the development of
efficient transformers.
[LINK]
http://arxiv.org/abs/2308.09372v2
[DATE]
2024-04-12 17:21:33+08:00
[CATEGORIES]
cs.LG
Properties of Discrete Sliced Wasserstein Losses
[AUTHORS]
Eloi Tanguy, Rémi Flamary, Julie Delon
[ABSTRACT]
The Sliced Wasserstein (SW) distance has become a popular alternative to the
Wasserstein distance for comparing probability measures. Widespread
applications include image processing, domain adaptation and generative
modelling, where it is common to optimise some parameters in order to minimise
SW, which serves as a loss function between discrete probability measures
(since measures admitting densities are numerically unattainable). All these
optimisation problems bear the same sub-problem, which is minimising the Sliced
Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y
\longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between
two uniform discrete measures with the same amount of points as a function of
the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We
investigate the regularity and optimisation properties of this energy, as well
as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in
SW using only $p$ samples) and show convergence results on the critical points
of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform
convergence and a uniform Central Limit result on the process
$\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic
Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge
towards (Clarke) critical points of these energies.
[LINK]
http://arxiv.org/abs/2307.10352v4
[DATE]
2024-04-12 16:51:55+08:00
[CATEGORIES]
cs.LG
Be Bayesian by Attachments to Catch More Uncertainty
[AUTHORS]
Shiyu Shen, Bin Pan, Tianyang Shi, Tao Li, Zhenwei Shi
[ABSTRACT]
Bayesian Neural Networks (BNNs) have become one of the promising approaches
for uncertainty estimation due to the solid theorical foundations. However, the
performance of BNNs is affected by the ability of catching uncertainty. Instead
of only seeking the distribution of neural network weights by in-distribution
(ID) data, in this paper, we propose a new Bayesian Neural Network with an
Attached structure (ABNN) to catch more uncertainty from out-of-distribution
(OOD) data. We first construct a mathematical description for the uncertainty
of OOD data according to the prior distribution, and then develop an attached
Bayesian structure to integrate the uncertainty of OOD data into the backbone
network. ABNN is composed of an expectation module and several distribution
modules. The expectation module is a backbone deep network which focuses on the
original task, and the distribution modules are mini Bayesian structures which
serve as attachments of the backbone. In particular, the distribution modules
aim at extracting the uncertainty from both ID and OOD data. We further provide
theoretical analysis for the convergence of ABNN, and experimentally validate
its superiority by comparing with some state-of-the-art uncertainty estimation
methods Code will be made available.
[LINK]
http://arxiv.org/abs/2310.13027v2
[DATE]
2024-04-12 16:37:18+08:00
[CATEGORIES]
cs.LG
Short vs. Long-term Coordination of Drones: When Distributed Optimization Meets Deep Reinforcement Learning
[AUTHORS]
Chuhao Qin, Evangelos Pournaras
[ABSTRACT]
Swarms of autonomous interactive drones, with the support of recharging
technology, can provide compelling sensing capabilities in Smart Cities, such
as traffic monitoring and disaster response. This paper aims to deliver a novel
coordination solution for the cost-effective navigation, sensing, and
recharging of drones. Existing approaches, such as deep reinforcement learning
(DRL), offer long-term adaptability, but lack energy efficiency, resilience,
and flexibility in dynamic environments. Therefore, this paper proposes a novel
approach where each drone independently determines its flying direction and
recharging place using DRL, while adapting navigation and sensing through
distributed optimization, which improves energy-efficiency during sensing
tasks. Furthermore, drones efficiently exchange information while retaining
decision-making autonomy via a structured tree communication model. Extensive
experimentation with datasets generated from realistic urban mobility
underscores an outstanding performance of the proposed solution compared to
state-of-the-art methods. Significant new insights show that long-term methods
optimize scarce drone resource for traffic management, while the integration of
short-term methods is crucial for advising on charging policies and maintaining
battery safety.
[COMMENTS]
This work has been submitted to the IEEE Transactions on Systems, Man
and Cybernetics: Systems for possible publication. Copyright may be
transferred without notice, after which this version may no longer be
accessible
[LINK]
http://arxiv.org/abs/2311.09852v5
[DATE]
2024-04-12 16:32:58+08:00
[CATEGORIES]
cs.LG
The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model
[AUTHORS]
Laixi Shi, Gen Li, Yuting Wei, Yuxin Chen, Matthieu Geist, Yuejie Chi
[ABSTRACT]
This paper investigates model robustness in reinforcement learning (RL) to
reduce the sim-to-real gap in practice. We adopt the framework of
distributionally robust Markov decision processes (RMDPs), aimed at learning a
policy that optimizes the worst-case performance when the deployed environment
falls within a prescribed uncertainty set around the nominal MDP. Despite
recent efforts, the sample complexity of RMDPs remained mostly unsettled
regardless of the uncertainty set in use. It was unclear if distributional
robustness bears any statistical consequences when benchmarked against standard
RL. Assuming access to a generative model that draws samples based on the
nominal MDP, we characterize the sample complexity of RMDPs when the
uncertainty set is specified via either the total variation (TV) distance or
$\chi^2$ divergence. The algorithm studied here is a model-based method called
{\em distributionally robust value iteration}, which is shown to be
near-optimal for the full range of uncertainty levels. Somewhat surprisingly,
our results uncover that RMDPs are not necessarily easier or harder to learn
than standard MDPs. The statistical consequence incurred by the robustness
requirement depends heavily on the size and shape of the uncertainty set: in
the case w.r.t.~the TV distance, the minimax sample complexity of RMDPs is
always smaller than that of standard MDPs; in the case w.r.t.~the $\chi^2$
divergence, the sample complexity of RMDPs can often far exceed the standard
MDP counterpart.
[COMMENTS]
Neural Information Processing Systems (2023)
[LINK]
http://arxiv.org/abs/2305.16589v2
[DATE]
2024-04-12 16:09:33+08:00
[CATEGORIES]
cs.LG
A Large Scale Survey of Motivation in Software Development and Analysis of its Validity
[AUTHORS]
Idan Amit, Dror G. Feitelson
[ABSTRACT]
Context: Motivation is known to improve performance. In software development
in particular, there has been considerable interest in the motivation of
contributors to open source. Objective: We identify 11 motivators from the
literature (enjoying programming, ownership of code, learning, self use, etc.),
and evaluate their relative effect on motivation. Since motivation is an
internal subjective feeling, we also analyze the validity of the answers.
Method: We conducted a survey with 66 questions on motivation which was
completed by 521 developers. Most of the questions used an 11 point scale. We
evaluated the validity of the answers validity by comparing related questions,
comparing to actual behavior on GitHub, and comparison with the same developer
in a follow up survey. Results: Validity problems include moderate correlations
between answers to related questions, as well as self promotion and mistakes in
the answers. Despite these problems, predictive analysis, investigating how
diverse motivators influence the probability of high motivation, provided
valuable insights. The correlations between the different motivators are low,
implying their independence. High values in all 11 motivators predict increased
probability of high motivation. In addition, improvement analysis shows that an
increase in most motivators predicts an increase in general motivation.
[LINK]
http://arxiv.org/abs/2404.08303v1
[DATE]
2024-04-12 15:51:21+08:00
[CATEGORIES]
cs.LG
Viewing the process of generating counterfactuals as a source of knowledge: a new approach for explaining classifiers
[AUTHORS]
Vincent Lemaire, Nathan Le Boudec, Victor Guyomard, Françoise Fessant
[ABSTRACT]
There are now many explainable AI methods for understanding the decisions of
a machine learning model. Among these are those based on counterfactual
reasoning, which involve simulating features changes and observing the impact
on the prediction. This article proposes to view this simulation process as a
source of creating a certain amount of knowledge that can be stored to be used,
later, in different ways. This process is illustrated in the additive model
and, more specifically, in the case of the naive Bayes classifier, whose
interesting properties for this purpose are shown.
[COMMENTS]
8 pages
[LINK]
http://arxiv.org/abs/2309.04284v4
[DATE]
2024-04-12 15:49:57+08:00
[CATEGORIES]
cs.LG
Neural Likelihood Approximation for Integer Valued Time Series Data
[AUTHORS]
Luke O’Loughlin, John Maclean, Andrew Black
[ABSTRACT]
Stochastic processes defined on integer valued state spaces are popular
within the physical and biological sciences. These models are necessary for
capturing the dynamics of small systems where the individual nature of the
populations cannot be ignored and stochastic effects are important. The
inference of the parameters of such models, from time series data, is
challenging due to intractability of the likelihood. To work at all, current
simulation based inference methods require the generation of realisations of
the model conditional on the data, which can be both tricky to implement and
computationally expensive. In this paper we instead construct a neural
likelihood approximation that can be trained using unconditional simulation of
the underlying model, which is much simpler. We demonstrate our method by
performing inference on a number of ecological and epidemiological models,
showing that we can accurately approximate the true posterior while achieving
significant computational speed ups compared to current best methods.
[LINK]
http://arxiv.org/abs/2310.12544v2
[DATE]
2024-04-12 15:45:49+08:00
[CATEGORIES]
cs.LG
Study of Emotion Concept Formation by Integrating Vision, Physiology, and Word Information using Multilayered Multimodal Latent Dirichlet Allocation
[AUTHORS]
Kazuki Tsurumaki, Chie Hieida, Kazuki Miyazawa
[ABSTRACT]
How are emotions formed? Through extensive debate and the promulgation of
diverse theories , the theory of constructed emotion has become prevalent in
recent research on emotions. According to this theory, an emotion concept
refers to a category formed by interoceptive and exteroceptive information
associated with a specific emotion. An emotion concept stores past experiences
as knowledge and can predict unobserved information from acquired information.
Therefore, in this study, we attempted to model the formation of emotion
concepts using a constructionist approach from the perspective of the
constructed emotion theory. Particularly, we constructed a model using
multilayered multimodal latent Dirichlet allocation , which is a probabilistic
generative model. We then trained the model for each subject using vision,
physiology, and word information obtained from multiple people who experienced
different visual emotion-evoking stimuli. To evaluate the model, we verified
whether the formed categories matched human subjectivity and determined whether
unobserved information could be predicted via categories. The verification
results exceeded chance level, suggesting that emotion concept formation can be
explained by the proposed model.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. We would like to thank Professor Takayuki Nagai for
useful discussions
[LINK]
http://arxiv.org/abs/2404.08295v1
[DATE]
2024-04-12 15:34:46+08:00
[CATEGORIES]
cs.LG
State-Space Systems as Dynamic Generative Models
[AUTHORS]
Juan-Pablo Ortega, Florian Rossmannek
[ABSTRACT]
A probabilistic framework to study the dependence structure induced by
deterministic discrete-time state-space systems between input and output
processes is introduced. General sufficient conditions are formulated under
which output processes exist and are unique once an input process has been
fixed, a property that in the deterministic state-space literature is known as
the echo state property. When those conditions are satisfied, the given
state-space system becomes a generative model for probabilistic dependences
between two sequence spaces. Moreover, those conditions guarantee that the
output depends continuously on the input when using the Wasserstein metric. The
output processes whose existence is proved are shown to be causal in a specific
sense and to generalize those studied in purely deterministic situations. The
results in this paper constitute a significant stochastic generalization of
sufficient conditions for the deterministic echo state property to hold, in the
sense that the stochastic echo state property can be satisfied under
contractivity conditions that are strictly weaker than those in deterministic
situations. This means that state-space systems can induce a purely
probabilistic dependence structure between input and output sequence spaces
even when there is no functional relation between those two spaces.
[LINK]
http://arxiv.org/abs/2404.08717v1
[DATE]
2024-04-12 15:32:57+08:00
[CATEGORIES]
cs.LG
Transfer Learning Study of Motion Transformer-based Trajectory Predictions
[AUTHORS]
Lars Ullrich, Alex McMaster, Knut Graichen
[ABSTRACT]
Trajectory planning in autonomous driving is highly dependent on predicting
the emergent behavior of other road users. Learning-based methods are currently
showing impressive results in simulation-based challenges, with
transformer-based architectures technologically leading the way. Ultimately,
however, predictions are needed in the real world. In addition to the shifts
from simulation to the real world, many vehicle- and country-specific shifts,
i.e. differences in sensor systems, fusion and perception algorithms as well as
traffic rules and laws, are on the agenda. Since models that can cover all
system setups and design domains at once are not yet foreseeable, model
adaptation plays a central role. Therefore, a simulation-based study on
transfer learning techniques is conducted on basis of a transformer-based
model. Furthermore, the study aims to provide insights into possible trade-offs
between computational time and performance to support effective transfers into
the real world.
[COMMENTS]
Accepted to be published as part of the 2024 IEEE Intelligent
Vehicles Symposium (IV), Jeju Shinhwa World, Jeju Island, Korea, June 2-5,
2024
[LINK]
http://arxiv.org/abs/2404.08271v1
[DATE]
2024-04-12 14:50:32+08:00
[CATEGORIES]
cs.LG
Efficient Graph Laplacian Estimation by Proximal Newton
[AUTHORS]
Yakov Medvedovsky, Eran Treister, Tirza Routtenberg
[ABSTRACT]
The Laplacian-constrained Gaussian Markov Random Field (LGMRF) is a common
multivariate statistical model for learning a weighted sparse dependency graph
from given data. This graph learning problem can be formulated as a maximum
likelihood estimation (MLE) of the precision matrix, subject to Laplacian
structural constraints, with a sparsity-inducing penalty term. This paper aims
to solve this learning problem accurately and efficiently. First, since the
commonly used $\ell_1$-norm penalty is inappropriate in this setting and may
lead to a complete graph, we employ the nonconvex minimax concave penalty
(MCP), which promotes sparse solutions with lower estimation bias. Second, as
opposed to existing first-order methods for this problem, we develop a
second-order proximal Newton approach to obtain an efficient solver, utilizing
several algorithmic features, such as using Conjugate Gradients,
preconditioning, and splitting to active/free sets. Numerical experiments
demonstrate the advantages of the proposed method in terms of both
computational complexity and graph learning accuracy compared to existing
methods.
[COMMENTS]
Proceedings of Artificial Intelligence and Statistics (AISTATS), 2024
[LINK]
http://arxiv.org/abs/2302.06434v3
[DATE]
2024-04-12 14:38:32+08:00
[CATEGORIES]
cs.LG
FedAgg: Adaptive Federated Learning with Aggregated Gradients
[AUTHORS]
Wenhao Yuan, Xuehe Wang
[ABSTRACT]
Federated Learning (FL) has emerged as a pivotal paradigm within distributed
model training, facilitating collaboration among multiple devices to refine a
shared model, harnessing their respective datasets as orchestrated by a central
server, while ensuring the localization of private data. Nonetheless, the
non-independent-and-identically-distributed (Non-IID) data generated on
heterogeneous clients and the incessant information exchange among participants
may markedly impede training efficacy and retard the convergence rate. In this
paper, we refine the conventional stochastic gradient descent (SGD) methodology
by introducing aggregated gradients at each local training epoch and propose an
adaptive learning rate iterative algorithm that concerns the divergence between
local and average parameters. To surmount the obstacle that acquiring other
clients’ local information, we introduce the mean-field approach by leveraging
two mean-field terms to approximately estimate the average local parameters and
gradients over time in a manner that precludes the need for local information
exchange among clients and design the decentralized adaptive learning rate for
each client. Through meticulous theoretical analysis, we provide a robust
convergence guarantee for our proposed algorithm and ensure its wide
applicability. Our numerical experiments substantiate the superiority of our
framework in comparison with existing state-of-the-art FL strategies for
enhancing model performance and accelerating convergence rate under IID and
Non-IID data distributions.
[LINK]
http://arxiv.org/abs/2303.15799v4
[DATE]
2024-04-12 14:26:04+08:00
[CATEGORIES]
cs.LG
Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models
[AUTHORS]
Zeyu Yang, Peikun Guo, Khadija Zanna, Akane Sano
[ABSTRACT]
Diffusion models have emerged as a robust framework for various generative
tasks, such as image and audio synthesis, and have also demonstrated a
remarkable ability to generate mixed-type tabular data comprising both
continuous and discrete variables. However, current approaches to training
diffusion models on mixed-type tabular data tend to inherit the imbalanced
distributions of features present in the training dataset, which can result in
biased sampling. In this research, we introduce a fair diffusion model designed
to generate balanced data on sensitive attributes. We present empirical
evidence demonstrating that our method effectively mitigates the class
imbalance in training data while maintaining the quality of the generated
samples. Furthermore, we provide evidence that our approach outperforms
existing methods for synthesizing tabular data in terms of performance and
fairness.
[LINK]
http://arxiv.org/abs/2404.08254v1
[DATE]
2024-04-12 14:08:43+08:00
[CATEGORIES]
cs.LG
Adaptive Federated Learning via New Entropy Approach
[AUTHORS]
Shensheng Zheng, Wenhao Yuan, Xuehe Wang, Lingjie Duan
[ABSTRACT]
Federated Learning (FL) has emerged as a prominent distributed machine
learning framework that enables geographically discrete clients to train a
global model collaboratively while preserving their privacy-sensitive data.
However, due to the non-independent-and-identically-distributed (Non-IID) data
generated by heterogeneous clients, the performances of the conventional
federated optimization schemes such as FedAvg and its variants deteriorate,
requiring the design to adaptively adjust specific model parameters to
alleviate the negative influence of heterogeneity. In this paper, by leveraging
entropy as a new metric for assessing the degree of system disorder, we propose
an adaptive FEDerated learning algorithm based on ENTropy theory (FedEnt) to
alleviate the parameter deviation among heterogeneous clients and achieve fast
convergence. Nevertheless, given the data disparity and parameter deviation of
heterogeneous clients, determining the optimal dynamic learning rate for each
client becomes a challenging task as there is no communication among
participating clients during the local training epochs. To enable a
decentralized learning rate for each participating client, we first introduce
the mean-field terms to estimate the components associated with other clients’
local parameters. Furthermore, we provide rigorous theoretical analysis on the
existence and determination of the mean-field estimators. Based on the
mean-field estimators, the closed-form adaptive learning rate for each client
is derived by constructing the Hamilton equation. Moreover, the convergence
rate of our proposed FedEnt is proved. The extensive experimental results on
the real-world datasets (i.e., MNIST, EMNIST-L, CIFAR10, and CIFAR100) show
that our FedEnt algorithm surpasses FedAvg and its variants (i.e., FedAdam,
FedProx, and FedDyn) under Non-IID settings and achieves a faster convergence
rate.
[COMMENTS]
16 pages, 13 figures
[LINK]
http://arxiv.org/abs/2303.14966v3
[DATE]
2024-04-12 14:04:55+08:00
[CATEGORIES]
cs.LG
Kinematics Modeling of Peroxy Free Radicals: A Deep Reinforcement Learning Approach
[AUTHORS]
Subhadarsi Nayak, Hrithwik Shalu, Joseph Stember
[ABSTRACT]
Tropospheric ozone, known as a concerning air pollutant, has been associated
with health issues including asthma, bronchitis, and impaired lung function.
The rates at which peroxy radicals react with NO play a critical role in the
overall formation and depletion of tropospheric ozone. However, obtaining
comprehensive kinetic data for these reactions remains challenging. Traditional
approaches to determine rate constants are costly and technically intricate.
Fortunately, the emergence of machine learning-based models offers a less
resource and time-intensive alternative for acquiring kinetics information. In
this study, we leveraged deep reinforcement learning to predict ranges of rate
constants (\textit{k}) with exceptional accuracy, achieving a testing set
accuracy of 100%. To analyze reactivity trends based on the molecular structure
of peroxy radicals, we employed 51 global descriptors as input parameters.
These descriptors were derived from optimized minimum energy geometries of
peroxy radicals using the quantum composite G3B3 method. Through the
application of Integrated Gradients (IGs), we gained valuable insights into the
significance of the various descriptors in relation to reaction rates. We
successfully validated and contextualized our findings by conducting
cross-comparisons with established trends in the existing literature. These
results establish a solid foundation for pioneering advancements in chemistry,
where computer analysis serves as an inspirational source driving innovation.
[LINK]
http://arxiv.org/abs/2404.10010v1
[DATE]
2024-04-12 13:51:28+08:00
[CATEGORIES]
cs.LG
Agile and versatile bipedal robot tracking control through reinforcement learning
[AUTHORS]
Jiayi Li, Linqi Ye, Yi Cheng, Houde Liu, Bin Liang
[ABSTRACT]
The remarkable athletic intelligence displayed by humans in complex dynamic
movements such as dancing and gymnastics suggests that the balance mechanism in
biological beings is decoupled from specific movement patterns. This decoupling
allows for the execution of both learned and unlearned movements under certain
constraints while maintaining balance through minor whole-body coordination. To
replicate this balance ability and body agility, this paper proposes a
versatile controller for bipedal robots. This controller achieves ankle and
body trajectory tracking across a wide range of gaits using a single
small-scale neural network, which is based on a model-based IK solver and
reinforcement learning. We consider a single step as the smallest control unit
and design a universally applicable control input form suitable for any
single-step variation. Highly flexible gait control can be achieved by
combining these minimal control units with high-level policy through our
extensible control interface. To enhance the trajectory-tracking capability of
our controller, we utilize a three-stage training curriculum. After training,
the robot can move freely between target footholds at varying distances and
heights. The robot can also maintain static balance without repeated stepping
to adjust posture. Finally, we evaluate the tracking accuracy of our controller
on various bipedal tasks, and the effectiveness of our control framework is
verified in the simulation environment.
[LINK]
http://arxiv.org/abs/2404.08246v1
[DATE]
2024-04-12 13:25:03+08:00
[CATEGORIES]
cs.LG
Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning
[AUTHORS]
Hui Bai, Ran Cheng
[ABSTRACT]
Hyperparameter optimization plays a key role in the machine learning domain.
Its significance is especially pronounced in reinforcement learning (RL), where
agents continuously interact with and adapt to their environments, requiring
dynamic adjustments in their learning trajectories. To cater to this
dynamicity, the Population-Based Training (PBT) was introduced, leveraging the
collective intelligence of a population of agents learning simultaneously.
However, PBT tends to favor high-performing agents, potentially neglecting the
explorative potential of agents on the brink of significant advancements. To
mitigate the limitations of PBT, we present the Generalized Population-Based
Training (GPBT), a refined framework designed for enhanced granularity and
flexibility in hyperparameter adaptation. Complementing GPBT, we further
introduce Pairwise Learning (PL). Instead of merely focusing on elite agents,
PL employs a comprehensive pairwise strategy to identify performance
differentials and provide holistic guidance to underperforming agents. By
integrating the capabilities of GPBT and PL, our approach significantly
improves upon traditional PBT in terms of adaptability and computational
efficiency. Rigorous empirical evaluations across a range of RL benchmarks
confirm that our approach consistently outperforms not only the conventional
PBT but also its Bayesian-optimized variant.
[COMMENTS]
IEEE TETCI
[LINK]
http://arxiv.org/abs/2404.08233v1
[DATE]
2024-04-12 12:23:20+08:00
[CATEGORIES]
cs.LG
Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality
[AUTHORS]
Khadija Zanna, Akane Sano
[ABSTRACT]
This paper considers the need for generalizable bias mitigation techniques in
machine learning due to the growing concerns of fairness and discrimination in
data-driven decision-making procedures across a range of industries. While many
existing methods for mitigating bias in machine learning have succeeded in
specific cases, they often lack generalizability and cannot be easily applied
to different data types or models. Additionally, the trade-off between accuracy
and fairness remains a fundamental tension in the field. To address these
issues, we propose a bias mitigation method based on multi-task learning,
utilizing the concept of Monte-Carlo dropout and Pareto optimality from
multi-objective optimization. This method optimizes accuracy and fairness while
improving the model’s explainability without using sensitive information. We
test this method on three datasets from different domains and show how it can
deliver the most desired trade-off between model fairness and performance. This
allows for tuning in specific domains where one metric may be more important
than another. With the framework we introduce in this paper, we aim to enhance
the fairness-performance trade-off and offer a solution to bias mitigation
methods’ generalizability issues in machine learning.
[COMMENTS]
Under review at Journal of Machine Learning Research
[LINK]
http://arxiv.org/abs/2404.08230v1
[DATE]
2024-04-12 12:17:50+08:00
[CATEGORIES]
cs.LG
Differentially Private Log-Location-Scale Regression Using Functional Mechanism
[AUTHORS]
Jiewen Sheng, Xiaolei Fang
[ABSTRACT]
This article introduces differentially private log-location-scale (DP-LLS)
regression models, which incorporate differential privacy into LLS regression
through the functional mechanism. The proposed models are established by
injecting noise into the log-likelihood function of LLS regression for
perturbed parameter estimation. We will derive the sensitivities utilized to
determine the magnitude of the injected noise and prove that the proposed
DP-LLS models satisfy $\epsilon$-differential privacy. In addition, we will
conduct simulations and case studies to evaluate the performance of the
proposed models. The findings suggest that predictor dimension, training sample
size, and privacy budget are three key factors impacting the performance of the
proposed DP-LLS regression models. Moreover, the results indicate that a
sufficiently large training dataset is needed to simultaneously ensure decent
performance of the proposed models and achieve a satisfactory level of privacy
protection.
[LINK]
http://arxiv.org/abs/2404.08715v1
[DATE]
2024-04-12 12:14:08+08:00
[CATEGORIES]
cs.LG
Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions
[AUTHORS]
Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, Piyush K. Sharma
[ABSTRACT]
Achieving distributed reinforcement learning (RL) for large-scale cooperative
multi-agent systems (MASs) is challenging because: (i) each agent has access to
only limited information; (ii) issues on convergence or computational
complexity emerge due to the curse of dimensionality. In this paper, we propose
a general computationally efficient distributed framework for cooperative
multi-agent reinforcement learning (MARL) by utilizing the structures of graphs
involved in this problem. We introduce three coupling graphs describing three
types of inter-agent couplings in MARL, namely, the state graph, the
observation graph and the reward graph. By further considering a communication
graph, we propose two distributed RL approaches based on local value-functions
derived from the coupling graphs. The first approach is able to reduce sample
complexity significantly under specific conditions on the aforementioned four
graphs. The second approach provides an approximate solution and can be
efficient even for problems with dense coupling graphs. Here there is a
trade-off between minimizing the approximation error and reducing the
computational complexity. Simulations show that our RL algorithms have a
significantly improved scalability to large-scale MASs compared with
centralized and consensus-based distributed RL algorithms.
[COMMENTS]
This paper has been accepted by IEEE Transactions on Automatic
Control as a full paper and published online. Different from the published
paper, the arxiv version contains more results. Moreover, we will
continuously update the arxiv version if we find any typos in the published
paper. So we suggest you to read this arxiv paper instead of the published
one. Thank you for your interest in our work
[LINK]
http://arxiv.org/abs/2202.13046v5
[DATE]
2024-04-12 11:41:09+08:00
[CATEGORIES]
cs.LG
HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies
[AUTHORS]
Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou
[ABSTRACT]
Multivariate Time Series (MTS) anomaly detection focuses on pinpointing
samples that diverge from standard operational patterns, which is crucial for
ensuring the safety and security of industrial applications. The primary
challenge in this domain is to develop representations capable of discerning
anomalies effectively. The prevalent methods for anomaly detection in the
literature are predominantly reconstruction-based and predictive in nature.
However, they typically concentrate on a single-dimensional instance level,
thereby not fully harnessing the complex associations inherent in industrial
MTS. To address this issue, we propose a novel self-supervised hierarchical
contrastive consistency learning method for detecting anomalies in MTS, named
HCL-MTSAD. It innovatively leverages data consistency at multiple levels
inherent in industrial MTS, systematically capturing consistent associations
across four latent levels-measurement, sample, channel, and process. By
developing a multi-layer contrastive loss, HCL-MTSAD can extensively mine data
consistency and spatio-temporal association, resulting in more informative
representations. Subsequently, an anomaly discrimination module, grounded in
self-supervised hierarchical contrastive learning, is designed to detect
timestamp-level anomalies by calculating multi-scale data consistency.
Extensive experiments conducted on six diverse MTS datasets retrieved from real
cyber-physical systems and server machines, in comparison with 20 baselines,
indicate that HCL-MTSAD’s anomaly detection capability outperforms the
state-of-the-art benchmark models by an average of 1.8\% in terms of F1 score.
[COMMENTS]
11 pages, 4 figures, under review by IEEE Internet of Things Journal
[LINK]
http://arxiv.org/abs/2404.08224v1
[DATE]
2024-04-12 11:39:33+08:00
[CATEGORIES]
cs.LG
Probabilistic Survival Analysis by Approximate Bayesian Inference of Neural Networks
[AUTHORS]
Christian Marius Lillelund, Martin Magris, Christian Fischer Pedersen
[ABSTRACT]
Predicting future events always comes with uncertainty, but traditional
non-probabilistic methods cannot distinguish certain from uncertain
predictions. In survival analysis, probabilistic methods applied to
state-of-the-art solutions in the healthcare and biomedical field are still
novel, and their implications have not been fully evaluated. In this paper, we
study the benefits of modeling uncertainty in deep neural networks for survival
analysis with a focus on prediction and calibration performance. For this, we
present a Bayesian deep learning framework that consists of three probabilistic
network architectures, which we train by optimizing the Cox partial likelihood
and combining input-dependent aleatoric uncertainty together with epistemic
uncertainty. This enables us to provide uncertainty estimates as credible
intervals when predicting the survival curve or as a probability density
function over the predicted median survival times. For our empirical analyses,
we evaluated our proposed method on four benchmark datasets and found that our
method demonstrates prediction performance comparable to the state-of-the-art
based on the concordance index and outperforms all other Cox-based approaches
in terms of the mean absolute error. Our work explicitly compares the extent to
which different Bayesian approximation techniques differ from each other and
improves the prediction over traditional non-probabilistic alternatives.
[LINK]
http://arxiv.org/abs/2404.06421v2
[DATE]
2024-04-12 11:27:02+08:00
[CATEGORIES]
cs.LG
Label-based Graph Augmentation with Metapath for Graph Anomaly Detection
[AUTHORS]
Hwan Kim, Junghoon Kim, Byung Suk Lee, Sungsu Lim
[ABSTRACT]
Graph anomaly detection has attracted considerable attention from various
domain ranging from network security to finance in recent years. Due to the
fact that labeling is very costly, existing methods are predominately developed
in an unsupervised manner. However, the detected anomalies may be found out
uninteresting instances due to the absence of prior knowledge regarding the
anomalies looking for. This issue may be solved by using few labeled anomalies
as prior knowledge. In real-world scenarios, we can easily obtain few labeled
anomalies. Efficiently leveraging labelled anomalies as prior knowledge is
crucial for graph anomaly detection; however, this process remains challenging
due to the inherently limited number of anomalies available. To address the
problem, we propose a novel approach that leverages metapath to embed actual
connectivity patterns between anomalous and normal nodes. To further
efficiently exploit context information from metapath-based anomaly subgraph,
we present a new framework, Metapath-based Graph Anomaly Detection (MGAD),
incorporating GCN layers in both the dual-encoders and decoders to efficiently
propagate context information between abnormal and normal nodes. Specifically,
MGAD employs GNN-based graph autoencoder as its backbone network. Moreover,
dual encoders capture the complex interactions and metapath-based context
information between labeled and unlabeled nodes both globally and locally.
Through a comprehensive set of experiments conducted on seven real-world
networks, this paper demonstrates the superiority of the MGAD method compared
to state-of-the-art techniques. The code is available at
https://github.com/missinghwan/MGAD.
[LINK]
http://arxiv.org/abs/2308.10918v2
[DATE]
2024-04-12 11:10:27+08:00
[CATEGORIES]
cs.LG
Self-Supervised Dataset Distillation for Transfer Learning
[AUTHORS]
Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
[ABSTRACT]
Dataset distillation methods have achieved remarkable success in distilling a
large dataset into a small set of representative samples. However, they are not
designed to produce a distilled dataset that can be effectively used for
facilitating self-supervised pre-training. To this end, we propose a novel
problem of distilling an unlabeled dataset into a set of small synthetic
samples for efficient self-supervised learning (SSL). We first prove that a
gradient of synthetic samples with respect to a SSL objective in naive bilevel
optimization is \textit{biased} due to the randomness originating from data
augmentations or masking. To address this issue, we propose to minimize the
mean squared error (MSE) between a model’s representations of the synthetic
examples and their corresponding learnable target feature representations for
the inner objective, which does not introduce any randomness. Our primary
motivation is that the model obtained by the proposed inner optimization can
mimic the \textit{self-supervised target model}. To achieve this, we also
introduce the MSE between representations of the inner model and the
self-supervised target model on the original full dataset for outer
optimization. Lastly, assuming that a feature extractor is fixed, we only
optimize a linear head on top of the feature extractor, which allows us to
reduce the computational cost and obtain a closed-form solution of the head
with kernel ridge regression. We empirically validate the effectiveness of our
method on various applications involving transfer learning.
[LINK]
http://arxiv.org/abs/2310.06511v3
[DATE]
2024-04-12 09:53:33+08:00
[CATEGORIES]
cs.LG
BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development
[AUTHORS]
Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang
[ABSTRACT]
Despite the widespread applications of machine learning force field (MLFF) on
solids and small molecules, there is a notable gap in applying MLFF to complex
liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular
Simulation Booster), a novel framework for molecular dynamics (MD) simulations,
with a demonstration of its capabilities in the context of liquid electrolytes
for lithium batteries. We design a physics-inspired graph equivariant
transformer architecture as the backbone of BAMBOO to learn from quantum
mechanical simulations. Additionally, we pioneer an ensemble knowledge
distillation approach and apply it on MLFFs to improve the stability of MD
simulations. Finally, we propose the density alignment algorithm to align
BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art
accuracy in predicting key electrolyte properties such as density, viscosity,
and ionic conductivity across various solvents and salt combinations. Our
current model, trained on more than 15 chemical species, achieves the average
density error of 0.01 g/cm$^3$ on various compositions compared with
experimental data. Moreover, our model demonstrates transferability to
molecules not included in the quantum mechanical dataset. We envision this work
as paving the way to a “universal MLFF” capable of simulating properties of
common organic liquids.
[LINK]
http://arxiv.org/abs/2404.07181v3
[DATE]
2024-04-12 09:08:34+08:00
[CATEGORIES]
cs.LG
Optimal Universal Quantum Encoding for Statistical Inference
[AUTHORS]
Farhad Farokhi
[ABSTRACT]
Optimal encoding of classical data for statistical inference using quantum
computing is investigated. A universal encoder is sought that is optimal for a
wide array of statistical inference tasks. Accuracy of any statistical
inference is shown to be upper bounded by a term that is proportional to
maximal quantum leakage from the classical data, i.e., the input to the
inference model, through its quantum encoding. This demonstrates that the
maximal quantum leakage is a universal measure of the quality of the encoding
strategy for statistical inference as it only depends on the quantum encoding
of the data and not the inference task itself. The optimal universal encoding
strategy, i.e., the encoding strategy that maximizes the maximal quantum
leakage, is proved to be attained by pure states. When there are enough qubits,
basis encoding is proved to be universally optimal. An iterative method for
numerically computing the optimal universal encoding strategy is presented.
[LINK]
http://arxiv.org/abs/2404.08172v1
[DATE]
2024-04-12 08:39:53+08:00
[CATEGORIES]
cs.LG
Systematically Assessing the Security Risks of AI/ML-enabled Connected Healthcare Systems
[AUTHORS]
Mohammed Elnawawy, Mohammadreza Hallajiyan, Gargi Mitra, Shahrear Iqbal, Karthik Pattabiraman
[ABSTRACT]
The adoption of machine-learning-enabled systems in the healthcare domain is
on the rise. While the use of ML in healthcare has several benefits, it also
expands the threat surface of medical systems. We show that the use of ML in
medical systems, particularly connected systems that involve interfacing the ML
engine with multiple peripheral devices, has security risks that might cause
life-threatening damage to a patient’s health in case of adversarial
interventions. These new risks arise due to security vulnerabilities in the
peripheral devices and communication channels. We present a case study where we
demonstrate an attack on an ML-enabled blood glucose monitoring system by
introducing adversarial data points during inference. We show that an adversary
can achieve this by exploiting a known vulnerability in the Bluetooth
communication channel connecting the glucose meter with the ML-enabled app. We
further show that state-of-the-art risk assessment techniques are not adequate
for identifying and assessing these new risks. Our study highlights the need
for novel risk analysis methods for analyzing the security of AI-enabled
connected health devices.
[COMMENTS]
13 pages, 5 figures, 3 tables
[LINK]
http://arxiv.org/abs/2401.17136v2
[DATE]
2024-04-12 08:33:58+08:00
[CATEGORIES]
cs.LG
Reinforcement Learning with Non-Cumulative Objective
[AUTHORS]
Wei Cui, Wei Yu
[ABSTRACT]
In reinforcement learning, the objective is almost always defined as a
\emph{cumulative} function over the rewards along the process. However, there
are many optimal control and reinforcement learning problems in various
application fields, especially in communications and networking, where the
objectives are not naturally expressed as summations of the rewards. In this
paper, we recognize the prevalence of non-cumulative objectives in various
problems, and propose a modification to existing algorithms for optimizing such
objectives. Specifically, we dive into the fundamental building block for many
optimal control and reinforcement learning algorithms: the Bellman optimality
equation. To optimize a non-cumulative objective, we replace the original
summation operation in the Bellman update rule with a generalized operation
corresponding to the objective. Furthermore, we provide sufficient conditions
on the form of the generalized operation as well as assumptions on the Markov
decision process under which the globally optimal convergence of the
generalized Bellman updates can be guaranteed. We demonstrate the idea
experimentally with the bottleneck objective, i.e., the objectives determined
by the minimum reward along the process, on classical optimal control and
reinforcement learning tasks, as well as on two network routing problems on
maximizing the flow rates.
[COMMENTS]
13 pages, 6 figures. Published in IEEE Transactions on Machine
Learning in Communications and Networking (TMLCN)
[LINK]
http://arxiv.org/abs/2307.04957v2
[DATE]
2024-04-12 08:32:08+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Tom Gur, Mohammad Mahdi Jahanara, Mohammad Mahdi Khodabandeh, Ninad Rajgopal, Bahar Salamatian, Igor Shinkar [ABSTRACT]
We continue the study of doubly-efficient proof systems for verifying
agnostic PAC learning, for which we obtain the following results.
[COMMENTS]
58 pages, To appear in STOC 2024 [LINK]
http://arxiv.org/abs/2404.08158v1 [DATE]
2024-04-12 07:16:21+08:00 [CATEGORIES]
cs.LG
Multi-blank Transducers for Speech Recognition
[AUTHORS]
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg
[ABSTRACT]
This paper proposes a modification to RNN-Transducer (RNN-T) models for
automatic speech recognition (ASR). In standard RNN-T, the emission of a blank
symbol consumes exactly one input frame; in our proposed method, we introduce
additional blank symbols, which consume two or more input frames when emitted.
We refer to the added symbols as big blanks, and the method multi-blank RNN-T.
For training multi-blank RNN-Ts, we propose a novel logit under-normalization
method in order to prioritize emissions of big blanks. With experiments on
multiple languages and datasets, we show that multi-blank RNN-T methods could
bring relative speedups of over +90%/+139% to model inference for English
Librispeech and German Multilingual Librispeech datasets, respectively. The
multi-blank RNN-T method also improves ASR accuracy consistently. We will
release our implementation of the method in the NeMo
(https://github.com/NVIDIA/NeMo) toolkit.
[LINK]
http://arxiv.org/abs/2211.03541v2
[DATE]
2024-04-12 06:58:21+08:00
[CATEGORIES]
cs.LG
Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
[AUTHORS]
Runqi Lin, Chaojian Yu, Tongliang Liu
[ABSTRACT]
Single-step adversarial training (SSAT) has demonstrated the potential to
achieve both efficiency and robustness. However, SSAT suffers from catastrophic
overfitting (CO), a phenomenon that leads to a severely distorted classifier,
making it vulnerable to multi-step adversarial attacks. In this work, we
observe that some adversarial examples generated on the SSAT-trained network
exhibit anomalous behaviour, that is, although these training samples are
generated by the inner maximization process, their associated loss decreases
instead, which we named abnormal adversarial examples (AAEs). Upon further
analysis, we discover a close relationship between AAEs and classifier
distortion, as both the number and outputs of AAEs undergo a significant
variation with the onset of CO. Given this observation, we re-examine the SSAT
process and uncover that before the occurrence of CO, the classifier already
displayed a slight distortion, indicated by the presence of few AAEs.
Furthermore, the classifier directly optimizing these AAEs will accelerate its
distortion, and correspondingly, the variation of AAEs will sharply increase as
a result. In such a vicious circle, the classifier rapidly becomes highly
distorted and manifests as CO within a few iterations. These observations
motivate us to eliminate CO by hindering the generation of AAEs. Specifically,
we design a novel method, termed Abnormal Adversarial Examples Regularization
(AAER), which explicitly regularizes the variation of AAEs to hinder the
classifier from becoming distorted. Extensive experiments demonstrate that our
method can effectively eliminate CO and further boost adversarial robustness
with negligible additional computational overhead.
[LINK]
http://arxiv.org/abs/2404.08154v1
[DATE]
2024-04-12 06:43:44+08:00
[CATEGORIES]
cs.LG
On the Over-Memorization During Natural, Robust and Catastrophic Overfitting
[AUTHORS]
Runqi Lin, Chaojian Yu, Bo Han, Tongliang Liu
[ABSTRACT]
Overfitting negatively impacts the generalization ability of deep neural
networks (DNNs) in both natural and adversarial training. Existing methods
struggle to consistently address different types of overfitting, typically
designing strategies that focus separately on either natural or adversarial
patterns. In this work, we adopt a unified perspective by solely focusing on
natural patterns to explore different types of overfitting. Specifically, we
examine the memorization effect in DNNs and reveal a shared behaviour termed
over-memorization, which impairs their generalization capacity. This behaviour
manifests as DNNs suddenly becoming high-confidence in predicting certain
training patterns and retaining a persistent memory for them. Furthermore, when
DNNs over-memorize an adversarial pattern, they tend to simultaneously exhibit
high-confidence prediction for the corresponding natural pattern. These
findings motivate us to holistically mitigate different types of overfitting by
hindering the DNNs from over-memorization training patterns. To this end, we
propose a general framework, Distraction Over-Memorization (DOM), which
explicitly prevents over-memorization by either removing or augmenting the
high-confidence natural patterns. Extensive experiments demonstrate the
effectiveness of our proposed method in mitigating overfitting across various
training paradigms.
[LINK]
http://arxiv.org/abs/2310.08847v2
[DATE]
2024-04-12 06:04:18+08:00
[CATEGORIES]
cs.LG
Self-Supervised Learning of Color Constancy
[AUTHORS]
Markus R. Ernst, Francisco M. López, Arthur Aubret, Roland W. Fleming, Jochen Triesch
[ABSTRACT]
Color constancy (CC) describes the ability of the visual system to perceive
an object as having a relatively constant color despite changes in lighting
conditions. While CC and its limitations have been carefully characterized in
humans, it is still unclear how the visual system acquires this ability during
development. Here, we present a first study showing that CC develops in a
neural network trained in a self-supervised manner through an invariance
learning objective. During learning, objects are presented under changing
illuminations, while the network aims to map subsequent views of the same
object onto close-by latent representations. This gives rise to representations
that are largely invariant to the illumination conditions, offering a plausible
example of how CC could emerge during human cognitive development via a form of
self-supervised learning.
[COMMENTS]
7 pages, 5 figures, submitted to the IEEE International Conference on
Development and Learning (ICDL 2024)
[LINK]
http://arxiv.org/abs/2404.08127v1
[DATE]
2024-04-12 05:07:38+08:00
[CATEGORIES]
cs.LG
Machine learning and economic forecasting: the role of international trade networks
[AUTHORS]
Thiago C. Silva, Paulo V. B. Wilhelm, Diego R. Amancio
[ABSTRACT]
This study examines the effects of de-globalization trends on international
trade networks and their role in improving forecasts for economic growth. Using
section-level trade data from nearly 200 countries from 2010 to 2022, we
identify significant shifts in the network topology driven by rising trade
policy uncertainty. Our analysis highlights key global players through
centrality rankings, with the United States, China, and Germany maintaining
consistent dominance. Using a horse race of supervised regressors, we find that
network topology descriptors evaluated from section-specific trade networks
substantially enhance the quality of a country’s GDP growth forecast. We also
find that non-linear models, such as Random Forest, XGBoost, and LightGBM,
outperform traditional linear models used in the economics literature. Using
SHAP values to interpret these non-linear model’s predictions, we find that
about half of most important features originate from the network descriptors,
underscoring their vital role in refining forecasts. Moreover, this study
emphasizes the significance of recent economic performance, population growth,
and the primary sector’s influence in shaping economic growth predictions,
offering novel insights into the intricacies of economic growth forecasting.
[LINK]
http://arxiv.org/abs/2404.08712v1
[DATE]
2024-04-12 05:04:56+08:00
[CATEGORIES]
cs.LG
A least-square method for non-asymptotic identification in linear switching control
[AUTHORS]
Haoyuan Sun, Ali Jadbabaie
[ABSTRACT]
The focus of this paper is on linear system identification in the setting
where it is known that the underlying partially-observed linear dynamical
system lies within a finite collection of known candidate models. We first
consider the problem of identification from a given trajectory, which in this
setting reduces to identifying the index of the true model with high
probability. We characterize the finite-time sample complexity of this problem
by leveraging recent advances in the non-asymptotic analysis of linear
least-square methods in the literature. In comparison to the earlier results
that assume no prior knowledge of the system, our approach takes advantage of
the smaller hypothesis class and leads to the design of a learner with a
dimension-free sample complexity bound. Next, we consider the switching control
of linear systems, where there is a candidate controller for each of the
candidate models and data is collected through interaction of the system with a
collection of potentially destabilizing controllers. We develop a
dimension-dependent criterion that can detect those destabilizing controllers
in finite time. By leveraging these results, we propose a data-driven switching
strategy that identifies the unknown parameters of the underlying system. We
then provide a non-asymptotic analysis of its performance and discuss its
implications on the classical method of estimator-based supervisory control.
[LINK]
http://arxiv.org/abs/2404.08120v1
[DATE]
2024-04-12 04:55:38+08:00
[CATEGORIES]
cs.LG
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards
[AUTHORS]
Hao Qin, Kwang-Sung Jun, Chicheng Zhang
[ABSTRACT]
We study $K$-armed bandit problems where the reward distributions of the arms
are all supported on the $[0,1]$ interval. It has been a challenge to design
regret-efficient randomized exploration algorithms in this setting. Maillard
sampling \cite{maillard13apprentissage}, an attractive alternative to Thompson
sampling, has recently been shown to achieve competitive regret guarantees in
the sub-Gaussian reward setting \cite{bian2022maillard} while maintaining
closed-form action probabilities, which is useful for offline policy
evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling
(KL-MS) algorithm, a natural extension of Maillard sampling for achieving
KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic
optimality when the rewards are Bernoulli and has a worst-case regret bound of
the form $O(\sqrt{\mu^(1-\mu^) K T \ln K} + K \ln T)$, where $\mu^*$ is the
expected reward of the optimal arm, and $T$ is the time horizon length.
[COMMENTS]
Accepted by NeurIPS 2023
[LINK]
http://arxiv.org/abs/2304.14989v4
[DATE]
2024-04-12 04:34:38+08:00
[CATEGORIES]
cs.LG
Protein intrinsic disorder prediction using Attention U-Net and ProtTrans protein language model
[AUTHORS]
Krzysztof Kotowski, Irena Roterman, Katarzyna Stapor
[ABSTRACT]
The prediction of intrinsic disorder regions has significant implications for
understanding protein function, structure, and dynamics. It can help to
discover novel functions or protein-protein interactions essential to designing
new drugs, therapies, or enzymes. Recently, a new generation of predictors
based on protein language models is emerging. These algorithms reach
state-of-the-art accuracy without calculating time-consuming multiple sequence
alignments (MSAs). The article pre-sents a new protein intrinsic disorder
predictor DisorderUnetLM based on the Attention U-Net convolutional neural
network using features from the protein language model ProtTrans.
DisorderUnetLM shows top results in the direct comparison with flDPnn and
IDP-CRF predictors using MSAs and with the SETH predictor using features from
the same ProtTrans model. Moreover, among 41 predictors from the latest
Critical Assessment of Protein Intrinsic Disorder Prediction (CAID-2)
benchmark, it ranks 9th for the Disorder-PDB subset (with ROC-AUC of 0.924) and
1st for the Disorder-NOX subset (with ROC-AUC of 0.844) which confirms its
potential to perform well in the upcoming CAID-3 challenge for which
Disor-derUnetLM was submitted.
[COMMENTS]
11 pages, 8 figures, 2 tables, submitted to Journal of Chemical
Information and Modeling
[LINK]
http://arxiv.org/abs/2404.08108v1
[DATE]
2024-04-12 04:14:14+08:00
[CATEGORIES]
cs.LG
Neural-Fly Enables Rapid Learning for Agile Flight in Strong Winds
[AUTHORS]
Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, Soon-Jo Chung
[ABSTRACT]
Executing safe and precise flight maneuvers in dynamic high-speed winds is
important for the ongoing commoditization of uninhabited aerial vehicles
(UAVs). However, because the relationship between various wind conditions and
its effect on aircraft maneuverability is not well understood, it is
challenging to design effective robot controllers using traditional control
design methods. We present Neural-Fly, a learning-based approach that allows
rapid online adaptation by incorporating pretrained representations through
deep learning. Neural-Fly builds on two key observations that aerodynamics in
different wind conditions share a common representation and that the
wind-specific part lies in a low-dimensional space. To that end, Neural-Fly
uses a proposed learning algorithm, domain adversarially invariant
meta-learning (DAIML), to learn the shared representation, only using 12
minutes of flight data. With the learned representation as a basis, Neural-Fly
then uses a composite adaptation law to update a set of linear coefficients for
mixing the basis elements. When evaluated under challenging wind conditions
generated with the Caltech Real Weather Wind Tunnel, with wind speeds up to
43.6 kilometers/hour (12.1 meters/second), Neural-Fly achieves precise flight
control with substantially smaller tracking error than state-of-the-art
nonlinear and adaptive controllers. In addition to strong empirical
performance, the exponential stability of Neural-Fly results in robustness
guarantees. Last, our control design extrapolates to unseen wind conditions, is
shown to be effective for outdoor flights with only onboard sensors, and can
transfer across drones with minimal performance degradation.
[COMMENTS]
This is the accepted version of Science Robotics Vol. 7, Issue 66,
eabm6597 (2022). Video: https://youtu.be/TuF9teCZX0U
[LINK]
http://arxiv.org/abs/2205.06908v2
[DATE]
2024-04-12 03:32:21+08:00
[CATEGORIES]
cs.LG
Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints
[AUTHORS]
Matthias Freiberger, Peter Kun, Christian Igel, Anders Sundnes Løvlie, Sebastian Risi
[ABSTRACT]
Models leveraging both visual and textual data such as Contrastive
Language-Image Pre-training (CLIP), are the backbone of many recent advances in
artificial intelligence. In this work, we show that despite their versatility,
such models are vulnerable to what we refer to as fooling master images.
Fooling master images are capable of maximizing the confidence score of a CLIP
model for a significant number of widely varying prompts, while being either
unrecognizable or unrelated to the attacked prompts for humans. The existence
of such images is problematic as it could be used by bad actors to maliciously
interfere with CLIP-trained image retrieval models in production with
comparably small effort as a single image can attack many different prompts. We
demonstrate how fooling master images for CLIP (CLIPMasterPrints) can be mined
using stochastic gradient descent, projected gradient descent, or blackbox
optimization. Contrary to many common adversarial attacks, the blackbox
optimization approach allows us to mine CLIPMasterPrints even when the weights
of the model are not accessible. We investigate the properties of the mined
images, and find that images trained on a small number of image captions
generalize to a much larger number of semantically related captions. We
evaluate possible mitigation strategies, where we increase the robustness of
the model and introduce an approach to automatically detect CLIPMasterPrints to
sanitize the input of vulnerable models. Finally, we find that vulnerability to
CLIPMasterPrints is related to a modality gap in contrastive pre-trained
multi-modal networks. Code available at
https://github.com/matfrei/CLIPMasterPrints.
[LINK]
http://arxiv.org/abs/2307.03798v2
[DATE]
2024-04-12 03:24:50+08:00
[CATEGORIES]
cs.LG
Efficient Representation of Natural Image Patches
[AUTHORS]
Cheng Guo
[ABSTRACT]
Utilizing an abstract information processing model based on minimal yet
realistic assumptions inspired by biological systems, we study how to achieve
the early visual system’s two ultimate objectives: efficient information
transmission and accurate sensor probability distribution modeling. We prove
that optimizing for information transmission does not guarantee optimal
probability distribution modeling in general. We illustrate, using a two-pixel
(2D) system and image patches, that an efficient representation can be realized
through a nonlinear population code driven by two types of biologically
plausible loss functions that depend solely on output. After unsupervised
learning, our abstract information processing model bears remarkable
resemblances to biological systems, despite not mimicking many features of real
neurons, such as spiking activity. A preliminary comparison with a contemporary
deep learning model suggests that our model offers a significant efficiency
advantage. Our model provides novel insights into the computational theory of
early visual systems as well as a potential new approach to enhance the
efficiency of deep learning models.
[LINK]
http://arxiv.org/abs/2210.13004v3
[DATE]
2024-04-12 03:22:41+08:00
[CATEGORIES]
cs.LG
Continual Learning of Range-Dependent Transmission Loss for Underwater Acoustic using Conditional Convolutional Neural Net
[AUTHORS]
Indu Kant Deo, Akash Venkateshwaran, Rajeev K. Jaiman
[ABSTRACT]
There is a significant need for precise and reliable forecasting of the
far-field noise emanating from shipping vessels. Conventional full-order models
based on the Navier-Stokes equations are unsuitable, and sophisticated model
reduction methods may be ineffective for accurately predicting far-field noise
in environments with seamounts and significant variations in bathymetry. Recent
advances in reduced-order models, particularly those based on convolutional and
recurrent neural networks, offer a faster and more accurate alternative. These
models use convolutional neural networks to reduce data dimensions effectively.
However, current deep-learning models face challenges in predicting wave
propagation over long periods and for remote locations, often relying on
auto-regressive prediction and lacking far-field bathymetry information. This
research aims to improve the accuracy of deep-learning models for predicting
underwater radiated noise in far-field scenarios. We propose a novel
range-conditional convolutional neural network that incorporates ocean
bathymetry data into the input. By integrating this architecture into a
continual learning framework, we aim to generalize the model for varying
bathymetry worldwide. To demonstrate the effectiveness of our approach, we
analyze our model on several test cases and a benchmark scenario involving
far-field prediction over Dickin’s seamount in the Northeast Pacific. Our
proposed architecture effectively captures transmission loss over a
range-dependent, varying bathymetry profile. This architecture can be
integrated into an adaptive management system for underwater radiated noise,
providing real-time end-to-end mapping between near-field ship noise sources
and received noise at the marine mammal’s location.
[COMMENTS]
14 pages, 18 figures
[LINK]
http://arxiv.org/abs/2404.08091v1
[DATE]
2024-04-12 03:13:38+08:00
[CATEGORIES]
cs.LG
Efficient Duple Perturbation Robustness in Low-rank MDPs
[AUTHORS]
Yang Hu, Haitong Ma, Bo Dai, Na Li
[ABSTRACT]
The pursuit of robustness has recently been a popular topic in reinforcement
learning (RL) research, yet the existing methods generally suffer from
efficiency issues that obstruct their real-world implementation. In this paper,
we introduce duple perturbation robustness, i.e. perturbation on both the
feature and factor vectors for low-rank Markov decision processes (MDPs), via a
novel characterization of $(\xi,\eta)$-ambiguity sets. The novel robust MDP
formulation is compatible with the function representation view, and therefore,
is naturally applicable to practical RL problems with large or even continuous
state-action spaces. Meanwhile, it also gives rise to a provably efficient and
practical algorithm with theoretical convergence rate guarantee. Examples are
designed to justify the new robustness concept, and algorithmic efficiency is
supported by both theoretical bounds and numerical simulations.
[COMMENTS]
25 pages, 8 figures, in submission to ICML‘24
[LINK]
http://arxiv.org/abs/2404.08089v1
[DATE]
2024-04-12 03:07:15+08:00
[CATEGORIES]
cs.LG
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
[AUTHORS]
Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar
[ABSTRACT]
Recent advances in decentralized deep learning algorithms have demonstrated
cutting-edge performance on various tasks with large pre-trained models.
However, a pivotal prerequisite for achieving this level of competitiveness is
the significant communication and computation overheads when updating these
models, which prohibits the applications of them to real-world scenarios. To
address this issue, drawing inspiration from advanced model merging techniques
without requiring additional training, we introduce the Decentralized Iterative
Merging-And-Training (DIMAT) paradigm–a novel decentralized deep learning
framework. Within DIMAT, each agent is trained on their local data and
periodically merged with their neighboring agents using advanced model merging
techniques like activation matching until convergence is achieved. DIMAT
provably converges with the best available rate for nonconvex functions with
various first-order methods, while yielding tighter error bounds compared to
the popular existing approaches. We conduct a comprehensive empirical analysis
to validate DIMAT’s superiority over baselines across diverse computer vision
tasks sourced from multiple datasets. Empirical results validate our
theoretical claims by showing that DIMAT attains faster and higher initial gain
in accuracy with independent and identically distributed (IID) and non-IID
data, incurring lower communication overhead. This DIMAT paradigm presents a
new opportunity for the future decentralized learning, enhancing its
adaptability to real-world with sparse and light-weight communication and
computation.
[COMMENTS]
CVPR 2024 accepted paper, 22 pages, 12 figures
[LINK]
http://arxiv.org/abs/2404.08079v1
[DATE]
2024-04-12 02:34:29+08:00
[CATEGORIES]
cs.LG
Spurious Stationarity and Hardness Results for Mirror Descent
[AUTHORS]
He Chen, Jiajin Li, Anthony Man-Cho So
[ABSTRACT]
Despite the considerable success of Bregman proximal-type algorithms, such as
mirror descent, in machine learning, a critical question remains: Can existing
stationarity measures, often based on Bregman divergence, reliably distinguish
between stationary and non-stationary points? In this paper, we present a
groundbreaking finding: All existing stationarity measures necessarily imply
the existence of spurious stationary points. We further establish an
algorithmic independent hardness result: Bregman proximal-type algorithms are
unable to escape from a spurious stationary point in finite steps when the
initial point is unfavorable, even for convex problems. Our hardness result
points out the inherent distinction between Euclidean and Bregman geometries,
and introduces both fundamental theoretical and numerical challenges to both
machine learning and optimization communities.
[LINK]
http://arxiv.org/abs/2404.08073v1
[DATE]
2024-04-12 02:28:01+08:00
[CATEGORIES]
cs.LG
Persistent Classification: A New Approach to Stability of Data and Adversarial Examples
[AUTHORS]
Brian Bell, Michael Geyer, David Glickenstein, Keaton Hamm, Carlos Scheidegger, Amanda Fernandez, Juston Moore
[ABSTRACT]
There are a number of hypotheses underlying the existence of adversarial
examples for classification problems. These include the high-dimensionality of
the data, high codimension in the ambient space of the data manifolds of
interest, and that the structure of machine learning models may encourage
classifiers to develop decision boundaries close to data points. This article
proposes a new framework for studying adversarial examples that does not depend
directly on the distance to the decision boundary. Similarly to the smoothed
classifier literature, we define a (natural or adversarial) data point to be
$(\gamma,\sigma)$-stable if the probability of the same classification is at
least $\gamma$ for points sampled in a Gaussian neighborhood of the point with
a given standard deviation $\sigma$. We focus on studying the differences
between persistence metrics along interpolants of natural and adversarial
points. We show that adversarial examples have significantly lower persistence
than natural examples for large neural networks in the context of the MNIST and
ImageNet datasets. We connect this lack of persistence with decision boundary
geometry by measuring angles of interpolants with respect to decision
boundaries. Finally, we connect this approach with robustness by developing a
manifold alignment gradient metric and demonstrating the increase in robustness
that can be achieved when training with the addition of this metric.
[LINK]
http://arxiv.org/abs/2404.08069v1
[DATE]
2024-04-12 02:13:42+08:00
[CATEGORIES]
cs.LG
WildGraph: Realistic Graph-based Trajectory Generation for Wildlife
[AUTHORS]
Ali Al-Lawati, Elsayed Eshra, Prasenjit Mitra
[ABSTRACT]
Trajectory generation is an important task in movement studies; it
circumvents the privacy, ethical, and technical challenges of collecting real
trajectories from the target population. In particular, real trajectories in
the wildlife domain are scarce as a result of ethical and environmental
constraints of the collection process. In this paper, we consider the problem
of generating long-horizon trajectories, akin to wildlife migration, based on a
small set of real samples. We propose a hierarchical approach to learn the
global movement characteristics of the real dataset and recursively refine
localized regions. Our solution, WildGraph, discretizes the geographic path
into a prototype network of H3 (https://www.uber.com/blog/h3/) regions and
leverages a recurrent variational auto-encoder to probabilistically generate
paths over the regions, based on occupancy. WildGraph successfully generates
realistic months-long trajectories using a sample size as small as 60.
Experiments performed on two wildlife migration datasets demonstrate that our
proposed method improves the generalization of the generated trajectories in
comparison to existing work while achieving superior or comparable performance
in several benchmark metrics. Our code is published on the following
repository: \url{https://github.com/aliwister/wildgraph}.
[LINK]
http://arxiv.org/abs/2404.08068v1
[DATE]
2024-04-12 02:13:21+08:00
[CATEGORIES]
cs.LG
Multi-scale Topology Optimization using Neural Networks
[AUTHORS]
Hongrui Chen, Xingchen Liu, Levent Burak Kara
[ABSTRACT]
A long-standing challenge is designing multi-scale structures with good
connectivity between cells while optimizing each cell to reach close to the
theoretical performance limit. We propose a new method for direct multi-scale
topology optimization using neural networks. Our approach focuses on inverse
homogenization that seamlessly maintains compatibility across neighboring
microstructure cells. Our approach consists of a topology neural network that
optimizes the microstructure shape and distribution across the design domain as
a continuous field. Each microstructure cell is optimized based on a specified
elasticity tensor that also accommodates in-plane rotations. The neural network
takes as input the local coordinates within a cell to represent the density
distribution within a cell, as well as the global coordinates of each cell to
design spatially varying microstructure cells. As such, our approach models an
n-dimensional multi-scale optimization problem as a 2n-dimensional inverse
homogenization problem using neural networks. During the inverse homogenization
of each unit cell, we extend the boundary of each cell by scaling the input
coordinates such that the boundaries of neighboring cells are combined. Inverse
homogenization on the combined cell improves connectivity. We demonstrate our
method through the design and optimization of graded multi-scale structures.
[LINK]
http://arxiv.org/abs/2404.08708v1
[DATE]
2024-04-12 02:00:22+08:00
[CATEGORIES]
cs.LG
Latent Guard: a Safety Framework for Text-to-image Generation
[AUTHORS]
Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati
[ABSTRACT]
With the ability to generate high-quality images, text-to-image (T2I) models
can be exploited for creating inappropriate content. To prevent misuse,
existing safety measures are either based on text blacklists, which can be
easily circumvented, or harmful content classification, requiring large
datasets for training and offering low flexibility. Hence, we propose Latent
Guard, a framework designed to improve safety measures in text-to-image
generation. Inspired by blacklist-based approaches, Latent Guard learns a
latent space on top of the T2I model’s text encoder, where it is possible to
check the presence of harmful concepts in the input text embeddings. Our
proposed framework is composed of a data generation pipeline specific to the
task using large language models, ad-hoc architectural components, and a
contrastive learning strategy to benefit from the generated data. The
effectiveness of our method is verified on three datasets and against four
baselines. Code and data will be shared at
https://github.com/rt219/LatentGuard.
[COMMENTS]
under review
[LINK]
http://arxiv.org/abs/2404.08031v1
[DATE]
2024-04-12 01:59:52+08:00
[CATEGORIES]
cs.LG
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
[AUTHORS]
Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen
[ABSTRACT]
To enhance the controllability of text-to-image diffusion models, existing
efforts like ControlNet incorporated image-based conditional controls. In this
paper, we reveal that existing methods still face significant challenges in
generating images that align with the image conditional controls. To this end,
we propose ControlNet++, a novel approach that improves controllable generation
by explicitly optimizing pixel-level cycle consistency between generated images
and conditional controls. Specifically, for an input conditional control, we
use a pre-trained discriminative reward model to extract the corresponding
condition of the generated images, and then optimize the consistency loss
between the input conditional control and extracted condition. A
straightforward implementation would be generating images from random noises
and then calculating the consistency loss, but such an approach requires
storing gradients for multiple sampling timesteps, leading to considerable time
and memory costs. To address this, we introduce an efficient reward strategy
that deliberately disturbs the input images by adding noise, and then uses the
single-step denoised images for reward fine-tuning. This avoids the extensive
costs associated with image sampling, allowing for more efficient reward
fine-tuning. Extensive experiments show that ControlNet++ significantly
improves controllability under various conditional controls. For example, it
achieves improvements over ControlNet by 7.9% mIoU, 13.4% SSIM, and 7.6% RMSE,
respectively, for segmentation mask, line-art edge, and depth conditions.
[COMMENTS]
Project Page: https://liming-ai.github.io/ControlNet_Plus_Plus
[LINK]
http://arxiv.org/abs/2404.07987v1
[DATE]
2024-04-12 01:59:09+08:00
[CATEGORIES]
cs.LG
Disguised Copyright Infringement of Latent Diffusion Models
[AUTHORS]
Yiwei Lu, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu
[ABSTRACT]
Copyright infringement may occur when a generative model produces samples
substantially similar to some copyrighted data that it had access to during the
training phase. The notion of access usually refers to including copyrighted
samples directly in the training dataset, which one may inspect to identify an
infringement. We argue that such visual auditing largely overlooks a concealed
copyright infringement, where one constructs a disguise that looks drastically
different from the copyrighted sample yet still induces the effect of training
Latent Diffusion Models on it. Such disguises only require indirect access to
the copyrighted material and cannot be visually distinguished, thus easily
circumventing the current auditing tools. In this paper, we provide a better
understanding of such disguised copyright infringement by uncovering the
disguises generation algorithm, the revelation of the disguises, and
importantly, how to detect them to augment the existing toolbox. Additionally,
we introduce a broader notion of acknowledgment for comprehending such indirect
access.
[LINK]
http://arxiv.org/abs/2404.06737v2
[DATE]
2024-04-12 01:54:13+08:00
[CATEGORIES]
cs.LG
Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation for Efficient Synthesis and Verification
[AUTHORS]
Lujie Yang, Hongkai Dai, Zhouxing Shi, Cho-Jui Hsieh, Russ Tedrake, Huan Zhang
[ABSTRACT]
Learning-based neural network (NN) control policies have shown impressive
empirical performance in a wide range of tasks in robotics and control.
However, formal (Lyapunov) stability guarantees over the region-of-attraction
(ROA) for NN controllers with nonlinear dynamical systems are challenging to
obtain, and most existing approaches rely on expensive solvers such as
sums-of-squares (SOS), mixed-integer programming (MIP), or satisfiability
modulo theories (SMT). In this paper, we demonstrate a new framework for
learning NN controllers together with Lyapunov certificates using fast
empirical falsification and strategic regularizations. We propose a novel
formulation that defines a larger verifiable region-of-attraction (ROA) than
shown in the literature, and refines the conventional restrictive constraints
on Lyapunov derivatives to focus only on certifiable ROAs. The Lyapunov
condition is rigorously verified post-hoc using branch-and-bound with scalable
linear bound propagation-based NN verification techniques. The approach is
efficient and flexible, and the full training and verification procedure is
accelerated on GPUs without relying on expensive solvers for SOS, MIP, nor SMT.
The flexibility and efficiency of our framework allow us to demonstrate
Lyapunov-stable output feedback control with synthesized NN-based controllers
and NN-based observers with formal stability guarantees, for the first time in
literature. Source code at
https://github.com/Verified-Intelligence/Lyapunov_Stable_NN_Controllers.
[LINK]
http://arxiv.org/abs/2404.07956v1
[DATE]
2024-04-12 01:49:15+08:00
[CATEGORIES]
cs.LG
Neural population geometry and optimal coding of tasks with shared latent structure
[AUTHORS]
Albert J. Wakhloo, Will Slatton, SueYeon Chung
[ABSTRACT]
Humans and animals can recognize latent structures in their environment and
apply this information to efficiently navigate the world. However, it remains
unclear what aspects of neural activity contribute to these computational
capabilities. Here, we develop an analytical theory linking the geometry of a
neural population’s activity to the generalization performance of a linear
readout on a set of tasks that depend on a common latent structure. We show
that four geometric measures of the activity determine performance across
tasks. Using this theory, we find that experimentally observed disentangled
representations naturally emerge as an optimal solution to the multi-task
learning problem. When data is scarce, these optimal neural codes compress less
informative latent variables, and when data is abundant, they expand these
variables in the state space. We validate our theory using macaque ventral
stream recordings. Our results therefore tie population geometry to multi-task
learning.
[COMMENTS]
26 Pages and 7 figures in main text. 20 Pages and 7 figures in
supplemental material
[LINK]
http://arxiv.org/abs/2402.16770v2
[DATE]
2024-04-12 01:40:57+08:00
[CATEGORIES]
cs.LG
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
[AUTHORS]
Zhengdao Chen
[ABSTRACT]
To characterize the function space explored by neural networks (NNs) is an
important aspect of learning theory. In this work, noticing that a multi-layer
NN generates implicitly a hierarchy of reproducing kernel Hilbert spaces
(RKHSs) - named a neural Hilbert ladder (NHL) - we define the function space as
an infinite union of RKHSs, which generalizes the existing Barron space theory
of two-layer NNs. We then establish several theoretical properties of the new
space. First, we prove a correspondence between functions expressed by L-layer
NNs and those belonging to L-level NHLs. Second, we prove generalization
guarantees for learning an NHL with a controlled complexity measure. Third, we
derive a non-Markovian dynamics of random fields that governs the evolution of
the NHL which is induced by the training of multi-layer NNs in an
infinite-width mean-field limit. Fourth, we show examples of depth separation
in NHLs under the ReLU activation function. Finally, we perform numerical
experiments to illustrate the feature learning aspect of NN training through
the lens of NHLs.
[COMMENTS]
65 pages, 3 figures. Published by the Journal of Machine Learning
Research and presented partially at the 40th International Conference on
Machine Learning (ICML 2023)
[LINK]
http://arxiv.org/abs/2307.01177v2
[DATE]
2024-04-12 01:23:42+08:00
[CATEGORIES]
cs.LG
Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
[AUTHORS]
Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji
[ABSTRACT]
Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for
multi-level communication networks. In H-SGD, before each global aggregation,
workers send their updated local models to local servers for aggregations.
Despite recent research efforts, the effect of local aggregation on global
convergence still lacks theoretical understanding. In this work, we first
introduce a new notion of “upward” and “downward” divergences. We then use it
to conduct a novel analysis to obtain a worst-case convergence upper bound for
two-level H-SGD with non-IID data, non-convex objective function, and
stochastic gradient. By extending this result to the case with random grouping,
we observe that this convergence upper bound of H-SGD is between the upper
bounds of two single-level local SGD settings, with the number of local
iterations equal to the local and global update periods in H-SGD, respectively.
We refer to this as the “sandwich behavior”. Furthermore, we extend our
analytical approach based on “upward” and “downward” divergences to study the
convergence for the general case of H-SGD with more than two levels, where the
“sandwich behavior” still holds. Our theoretical results provide key insights
of why local aggregation can be beneficial in improving the convergence of
H-SGD.
[COMMENTS]
36 pages, in AAAI 2022
[LINK]
http://arxiv.org/abs/2010.12998v4
[DATE]
2024-04-12 01:05:18+08:00
[CATEGORIES]
cs.LG
Low-rank Adaptation for Spatio-Temporal Forecasting
[AUTHORS]
Weilin Ruan, Wei Chen, Xilin Dang, Jianxiang Zhou, Weichuang Li, Xu Liu, Yuxuan Liang
[ABSTRACT]
Spatio-temporal forecasting is crucial in real-world dynamic systems,
predicting future changes using historical data from diverse locations.
Existing methods often prioritize the development of intricate neural networks
to capture the complex dependencies of the data, yet their accuracy fails to
show sustained improvement. Besides, these methods also overlook node
heterogeneity, hindering customized prediction modules from handling diverse
regional nodes effectively. In this paper, our goal is not to propose a new
model but to present a novel low-rank adaptation framework as an off-the-shelf
plugin for existing spatial-temporal prediction models, termed ST-LoRA, which
alleviates the aforementioned problems through node-level adjustments.
Specifically, we first tailor a node adaptive low-rank layer comprising
multiple trainable low-rank matrices. Additionally, we devise a multi-layer
residual fusion stacking module, injecting the low-rank adapters into predictor
modules of various models. Across six real-world traffic datasets and six
different types of spatio-temporal prediction models, our approach minimally
increases the parameters and training time of the original models by less than
4%, still achieving consistent and sustained performance enhancement.
[LINK]
http://arxiv.org/abs/2404.07919v1
[DATE]
2024-04-12 01:04:55+08:00
[CATEGORIES]
cs.LG
A Multi-Expert Large Language Model Architecture for Verilog Code Generation
[AUTHORS]
Bardia Nadimi, Hao Zheng
[ABSTRACT]
Recently, there has been a surging interest in using large language models
(LLMs) for Verilog code generation. However, the existing approaches are
limited in terms of the quality of the generated Verilog code. To address such
limitations, this paper introduces an innovative multi-expert LLM architecture
for Verilog code generation (MEV-LLM). Our architecture uniquely integrates
multiple LLMs, each specifically fine-tuned with a dataset that is categorized
with respect to a distinct level of design complexity. It allows more targeted
learning, directly addressing the nuances of generating Verilog code for each
category. Empirical evidence from experiments highlights notable improvements
in terms of the percentage of generated Verilog outputs that are syntactically
and functionally correct. These findings underscore the efficacy of our
approach, promising a forward leap in the field of automated hardware design
through machine learning.
[LINK]
http://arxiv.org/abs/2404.08029v1
[DATE]
2024-04-12 00:58:29+08:00
[CATEGORIES]
cs.LG
KTbench: A Novel Data Leakage-Free Framework for Knowledge Tracing
[AUTHORS]
Yahya Badran, Christine Preisach
[ABSTRACT]
Knowledge Tracing (KT) is concerned with predicting students’ future
performance on learning items in intelligent tutoring systems. Learning items
are tagged with skill labels called knowledge concepts (KCs). Many KT models
expand the sequence of item-student interactions into KC-student interactions
by replacing learning items with their constituting KCs. This often results in
a longer sequence length. This approach addresses the issue of sparse
item-student interactions and minimises model parameters. However, two problems
have been identified with such models.
The first problem is the model’s ability to learn correlations between KCs
belonging to the same item, which can result in the leakage of ground truth
labels and hinder performance. This problem can lead to a significant decrease
in performance on datasets with a higher number of KCs per item. The second
problem is that the available benchmark implementations ignore accounting for
changes in sequence length when expanding KCs, leading to different models
being tested with varying sequence lengths but still compared against the same
benchmark.
To address these problems, we introduce a general masking framework that
mitigates the first problem and enhances the performance of such KT models
while preserving the original model architecture without significant
alterations. Additionally, we introduce KTbench, an open-source benchmark
library designed to ensure the reproducibility of this work while mitigating
the second problem.
[COMMENTS]
preprint
[LINK]
http://arxiv.org/abs/2403.15304v2
[DATE]
2024-04-12 00:39:54+08:00
[CATEGORIES]
cs.LG
Anomaly Detection in Power Grids via Context-Agnostic Learning
[AUTHORS]
SangWoo Park, Amritanshu Pandey
[ABSTRACT]
An important tool grid operators use to safeguard against failures, whether
naturally occurring or malicious, involves detecting anomalies in the power
system SCADA data. In this paper, we aim to solve a real-time anomaly detection
problem. Given time-series measurement values coming from a fixed set of
sensors on the grid, can we identify anomalies in the network topology or
measurement data? Existing methods, primarily optimization-based, mostly use
only a single snapshot of the measurement values and do not scale well with the
network size. Recent data-driven ML techniques have shown promise by using a
combination of current and historical data for anomaly detection but generally
do not consider physical attributes like the impact of topology or
load/generation changes on sensor measurements and thus cannot accommodate
regular context-variability in the historical data. To address this gap, we
propose a novel context-aware anomaly detection algorithm, GridCAL, that
considers the effect of regular topology and load/generation changes. This
algorithm converts the real-time power flow measurements to context-agnostic
values, which allows us to analyze measurement coming from different grid
contexts in an aggregate fashion, enabling us to derive a unified statistical
model that becomes the basis of anomaly detection. Through numerical
simulations on networks up to 2383 nodes, we show that our approach is
accurate, outperforming state-of-the-art approaches, and is computationally
efficient.
[LINK]
http://arxiv.org/abs/2404.07898v1
[DATE]
2024-04-12 00:37:01+08:00
[CATEGORIES]
cs.LG
SE(3)-Stochastic Flow Matching for Protein Backbone Generation
[AUTHORS]
Avishek Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet, Kilian Fatras, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong
[ABSTRACT]
The computational design of novel protein structures has the potential to
impact numerous scientific disciplines greatly. Toward this goal, we introduce
FoldFlow, a series of novel generative models of increasing modeling power
based on the flow-matching paradigm over $3\mathrm{D}$ rigid motions – i.e.
the group $\text{SE}(3)$ – enabling accurate modeling of protein backbones. We
first introduce FoldFlow-Base, a simulation-free approach to learning
deterministic continuous-time dynamics and matching invariant target
distributions on $\text{SE}(3)$. We next accelerate training by incorporating
Riemannian optimal transport to create FoldFlow-OT, leading to the construction
of both more simple and stable flows. Finally, we design FoldFlow-SFM, coupling
both Riemannian OT and simulation-free training to learn stochastic
continuous-time dynamics over $\text{SE}(3)$. Our family of FoldFlow,
generative models offers several key advantages over previous approaches to the
generative modeling of proteins: they are more stable and faster to train than
diffusion-based approaches, and our models enjoy the ability to map any
invariant source distribution to any invariant target distribution over
$\text{SE}(3)$. Empirically, we validate FoldFlow, on protein backbone
generation of up to $300$ amino acids leading to high-quality designable,
diverse, and novel samples.
[COMMENTS]
ICLR 2024 Spotlight
[LINK]
http://arxiv.org/abs/2310.02391v4
[DATE]
2024-04-12 00:29:12+08:00
[CATEGORIES]
cs.LG
FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification
[AUTHORS]
Faisal Ahmed, Myungjin Lee, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin
[ABSTRACT]
Federated Learning (FL) has garnered significant interest recently due to its
potential as an effective solution for tackling many challenges in diverse
application scenarios, for example, data privacy in network edge traffic
classification. Despite its recognized advantages, FL encounters obstacles
linked to statistical data heterogeneity and labeled data scarcity during the
training of single-task models for machine learning-based traffic
classification, leading to hindered learning performance. In response to these
challenges, adopting a hard-parameter sharing multi-task learning model with
auxiliary tasks proves to be a suitable approach. Such a model has the
capability to reduce communication and computation costs, navigate statistical
complexities inherent in FL contexts, and overcome labeled data scarcity by
leveraging knowledge derived from interconnected auxiliary tasks. This paper
introduces a new framework for federated auxiliary hard-parameter sharing
multi-task learning, namely, FedAuxHMTL. The introduced framework incorporates
model parameter exchanges between edge server and base stations, enabling base
stations from distributed areas to participate in the FedAuxHMTL process and
enhance the learning performance of the main task-network edge traffic
classification. Empirical experiments are conducted to validate and demonstrate
the FedAuxHMTL’s effectiveness in terms of accuracy, total global loss,
communication costs, computing time, and energy consumption compared to its
counterparts.
[LINK]
http://arxiv.org/abs/2404.08028v1
[DATE]
2024-04-12 00:23:28+08:00
[CATEGORIES]
cs.LG
Automatic nonlinear MPC approximation with closed-loop guarantees
[AUTHORS]
Abdullah Tokmak, Christian Fiedler, Melanie N. Zeilinger, Sebastian Trimpe, Johannes Köhler
[ABSTRACT]
Safety guarantees are vital in many control applications, such as robotics.
Model predictive control (MPC) provides a constructive framework for
controlling safety-critical systems, but is limited by its computational
complexity. We address this problem by presenting a novel algorithm that
automatically computes an explicit approximation to nonlinear MPC schemes while
retaining closed-loop guarantees. Specifically, the problem can be reduced to a
function approximation problem, which we then tackle by proposing ALKIA-X, the
Adaptive and Localized Kernel Interpolation Algorithm with eXtrapolated
reproducing kernel Hilbert space norm. ALKIA-X is a non-iterative algorithm
that ensures numerically well-conditioned computations, a fast-to-evaluate
approximating function, and the guaranteed satisfaction of any desired bound on
the approximation error. Hence, ALKIA-X automatically computes an explicit
function that approximates the MPC, yielding a controller suitable for
safety-critical systems and high sampling rates. We apply ALKIA-X to
approximate two nonlinear MPC schemes, demonstrating reduced computational
demand and applicability to realistic problems.
[COMMENTS]
Submitted to IEEE Transactions on Automatic Control. Compared to the
previously uploaded version, this version contains an additional numerical
example
[LINK]
http://arxiv.org/abs/2312.10199v2
[DATE]
2024-04-12 00:22:54+08:00
[CATEGORIES]
cs.LG
Grokking as the Transition from Lazy to Rich Training Dynamics
[AUTHORS]
Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan
[ABSTRACT]
We propose that the grokking phenomenon, where the train loss of a neural
network decreases much earlier than its test loss, can arise due to a neural
network transitioning from lazy training dynamics to a rich, feature learning
regime. To illustrate this mechanism, we study the simple setting of vanilla
gradient descent on a polynomial regression problem with a two layer neural
network which exhibits grokking without regularization in a way that cannot be
explained by existing theories. We identify sufficient statistics for the test
loss of such a network, and tracking these over training reveals that grokking
arises in this setting when the network first attempts to fit a kernel
regression solution with its initial features, followed by late-time feature
learning where a generalizing solution is identified after train loss is
already low. We find that the key determinants of grokking are the rate of
feature learning – which can be controlled precisely by parameters that scale
the network output – and the alignment of the initial features with the target
function $y(x)$. We argue this delayed generalization arises when (1) the top
eigenvectors of the initial neural tangent kernel and the task labels $y(x)$
are misaligned, but (2) the dataset size is large enough so that it is possible
for the network to generalize eventually, but not so large that train loss
perfectly tracks test loss at all epochs, and (3) the network begins training
in the lazy regime so does not learn features immediately. We conclude with
evidence that this transition from lazy (linear model) to rich training
(feature learning) can control grokking in more general settings, like on
MNIST, one-layer Transformers, and student-teacher networks.
[COMMENTS]
Adding new experiments on higher degree Hermite polynomials,
multi-index targets, removed DMFT analysis from this version
[LINK]
http://arxiv.org/abs/2310.06110v3
[DATE]
2024-04-12 00:15:34+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Anirban Mukherjee, Hannah Hanwen Chang [ABSTRACT]
Social science research often hinges on the relationship between categorical
variables and outcomes. We introduce CAVIAR, a novel method for embedding
categorical variables that assume values in a high-dimensional ambient space
but are sampled from an underlying manifold. Our theoretical and numerical
analyses outline challenges posed by such categorical variables in causal
inference. Specifically, dynamically varying and sparse levels can lead to
violations of the Donsker conditions and a failure of the estimation
functionals to converge to a tight Gaussian process. Traditional approaches,
including the exclusion of rare categorical levels and principled variable
selection models like LASSO, fall short. CAVIAR embeds the data into a
lower-dimensional global coordinate system. The mapping can be derived from
both structured and unstructured data, and ensures stable and robust estimates
through dimensionality reduction. In a dataset of direct-to-consumer apparel
sales, we illustrate how high-dimensional categorical variables, such as zip
codes, can be succinctly represented, facilitating inference and analysis. [LINK]
http://arxiv.org/abs/2404.04979v2 [DATE]
2024-04-12 00:11:33+08:00 [CATEGORIES]
cs.LG
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
[AUTHORS]
Dayeon Ki, Marine Carpuat
[ABSTRACT]
Machine Translation (MT) remains one of the last NLP tasks where large
language models (LLMs) have not yet replaced dedicated supervised systems. This
work exploits the complementary strengths of LLMs and supervised MT by guiding
LLMs to automatically post-edit MT with external feedback on its quality,
derived from Multidimensional Quality Metric (MQM) annotations. Working with
LLaMA-2 models, we consider prompting strategies varying the nature of feedback
provided and then fine-tune the LLM to improve its ability to exploit the
provided guidance. Through experiments on Chinese-English, English-German, and
English-Russian MQM data, we demonstrate that prompting LLMs to post-edit MT
improves TER, BLEU and COMET scores, although the benefits of fine-grained
feedback are not clear. Fine-tuning helps integrate fine-grained feedback more
effectively and further improves translation quality based on both automatic
and human evaluation.
[COMMENTS]
21 pages, 8 figures
[LINK]
http://arxiv.org/abs/2404.07851v1
[DATE]
2024-04-11 23:47:10+08:00
[CATEGORIES]
cs.CL
MetaCheckGPT – A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models
[AUTHORS]
Rahul Mehta, Andrew Hoblitzell, Jack O’Keefe, Hyeju Jang, Vasudeva Varma
[ABSTRACT]
Hallucinations in large language models (LLMs) have recently become a
significant problem. A recent effort in this direction is a shared task at
Semeval 2024 Task 6, SHROOM, a Shared-task on Hallucinations and Related
Observable Overgeneration Mistakes. This paper describes our winning solution
ranked 1st and 2nd in the 2 sub-tasks of model agnostic and model aware tracks
respectively. We propose a meta-regressor framework of LLMs for model
evaluation and integration that achieves the highest scores on the leaderboard.
We also experiment with various transformer-based models and black box methods
like ChatGPT, Vectara, and others. In addition, we perform an error analysis
comparing GPT4 against our best model which shows the limitations of the
former.
[COMMENTS]
Entry for SemEval-2024 Shared Task 6: SHROOM, a Shared-task on
Hallucinations and Related Observable Overgeneration Mistakes
[LINK]
http://arxiv.org/abs/2404.06948v2
[DATE]
2024-04-11 23:39:44+08:00
[CATEGORIES]
cs.CL
Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation
[AUTHORS]
Juliette Faille, Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire Gardent
[ABSTRACT]
We explore question generation in the context of knowledge-grounded dialogs
focusing on explainability and evaluation. Inspired by previous work on
planning-based summarisation, we present a model which instead of directly
generating a question, sequentially predicts first a fact then a question. We
evaluate our approach on 37k test dialogs adapted from the KGConv dataset and
we show that, although more demanding in terms of inference, our approach
performs on par with a standard model which solely generates a question while
allowing for a detailed referenceless evaluation of the model behaviour in
terms of relevance, factuality and pronominalisation.
[LINK]
http://arxiv.org/abs/2404.07836v1
[DATE]
2024-04-11 23:24:50+08:00
[CATEGORIES]
cs.CL
MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish
[AUTHORS]
Stefan Bott, Horacio Saggion, Nelson Peréz Rojas, Martin Solis Salazar, Saul Calderon Ramirez
[ABSTRACT]
Automatic lexical simplification is a task to substitute lexical items that
may be unfamiliar and difficult to understand with easier and more common
words. This paper presents MultiLS-SP/CA, a novel dataset for lexical
simplification in Spanish and Catalan. This dataset represents the first of its
kind in Catalan and a substantial addition to the sparse data on automatic
lexical simplification which is available for Spanish. Specifically, MultiLS-SP
is the first dataset for Spanish which includes scalar ratings of the
understanding difficulty of lexical items. In addition, we describe experiments
with this dataset, which can serve as a baseline for future work on the same
data.
[COMMENTS]
Submitted to the 40th edition of the SEPLN Conference. Under Revision
[LINK]
http://arxiv.org/abs/2404.07814v1
[DATE]
2024-04-11 22:57:19+08:00
[CATEGORIES]
cs.CL
Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation
[AUTHORS]
Stephen Bothwell, Abigail Swenor, David Chiang
[COMMENTS]
Proceedings of the Third Workshop on Language Technologies for
Historical and Ancient Languages
[LINK]
http://arxiv.org/abs/2404.07792v1
[DATE]
2024-04-11 22:35:23+08:00
[CATEGORIES]
cs.CL
cs.LG
Discourse-Aware In-Context Learning for Temporal Expression Normalization
[AUTHORS]
Akash Kumar Gautam, Lukas Lange, Jannik Strötgen
[ABSTRACT]
Temporal expression (TE) normalization is a well-studied problem. However,
the predominately used rule-based systems are highly restricted to specific
settings, and upcoming machine learning approaches suffer from a lack of
labeled data. In this work, we explore the feasibility of proprietary and
open-source large language models (LLMs) for TE normalization using in-context
learning to inject task, document, and example information into the model. We
explore various sample selection strategies to retrieve the most relevant set
of examples. By using a window-based prompt design approach, we can perform TE
normalization across sentences, while leveraging the LLM knowledge without
training the model. Our experiments show competitive results to models designed
for this task. In particular, our method achieves large performance
improvements for non-standard settings by dynamically including relevant
examples during inference.
[COMMENTS]
Accepted at NAACL 2024
[LINK]
http://arxiv.org/abs/2404.07775v1
[DATE]
2024-04-11 22:13:44+08:00
[CATEGORIES]
cs.CL
cs.LG
AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports
[AUTHORS]
Lukas Lange, Marc Müller, Ghazaleh Haratinezhad Torbati, Dragan Milchevski, Patrick Grau, Subhash Pujari, Annemarie Friedrich
[COMMENTS]
Accepted at LREC-COLING 2024. Corpus available at
https://github.com/boschresearch/anno-ctr-lrec-coling-2024
[LINK]
http://arxiv.org/abs/2404.07765v1
[DATE]
2024-04-11 22:04:36+08:00
[CATEGORIES]
cs.CL
cs.LG
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
[AUTHORS]
Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang
[LINK]
http://arxiv.org/abs/2404.07738v1
[DATE]
2024-04-11 21:36:29+08:00
[CATEGORIES]
cs.CL
cs.LG
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models
[AUTHORS]
Andreas Säuberli, Simon Clematide
[ABSTRACT]
Reading comprehension tests are used in a variety of applications, reaching
from education to assessing the comprehensibility of simplified texts. However,
creating such tests manually and ensuring their quality is difficult and
time-consuming. In this paper, we explore how large language models (LLMs) can
be used to generate and evaluate multiple-choice reading comprehension items.
To this end, we compiled a dataset of German reading comprehension items and
developed a new protocol for human and automatic evaluation, including a metric
we call text informativity, which is based on guessability and answerability.
We then used this protocol and the dataset to evaluate the quality of items
generated by Llama 2 and GPT-4. Our results suggest that both models are
capable of generating items of acceptable quality in a zero-shot setting, but
GPT-4 clearly outperforms Llama 2. We also show that LLMs can be used for
automatic evaluation by eliciting item reponses from them. In this scenario,
evaluation results with GPT-4 were the most similar to human annotators.
Overall, zero-shot generation with LLMs is a promising approach for generating
and evaluating reading comprehension test items, in particular for languages
without large amounts of available data.
[COMMENTS]
Accepted for publication at the 3rd Workshop on Tools and Resources
for People with REAding DIfficulties (READI) at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.07720v1
[DATE]
2024-04-11 21:11:21+08:00
[CATEGORIES]
cs.CL
CLUE: A Clinical Language Understanding Evaluation for LLMs
[AUTHORS]
Amin Dada, Marie Bauer, Amanda Butler Contreras, Osman Alperen Koraş, Constantin Marc Seibold, Kaleb E Smith, Jens Kleesiek
[ABSTRACT]
Large Language Models (LLMs) have shown the potential to significantly
contribute to patient care, diagnostics, and administrative processes. Emerging
biomedical LLMs address healthcare-specific challenges, including privacy
demands and computational constraints. However, evaluation of these models has
primarily been limited to non-clinical tasks, which do not reflect the
complexity of practical clinical applications. Additionally, there has been no
thorough comparison between biomedical and general-domain LLMs for clinical
tasks. To fill this gap, we present the Clinical Language Understanding
Evaluation (CLUE), a benchmark tailored to evaluate LLMs on real-world clinical
tasks. CLUE includes two novel datasets derived from MIMIC IV discharge letters
and four existing tasks designed to test the practical applicability of LLMs in
healthcare settings. Our evaluation covers several biomedical and general
domain LLMs, providing insights into their clinical performance and
applicability. CLUE represents a step towards a standardized approach to
evaluating and developing LLMs in healthcare to align future model development
with the real-world needs of clinical application. We publish our evaluation
and data generation scripts: https://github.com/TIO-IKIM/CLUE.
[LINK]
http://arxiv.org/abs/2404.04067v2
[DATE]
2024-04-11 21:10:30+08:00
[CATEGORIES]
cs.CL
cs.LG
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs
[AUTHORS]
Lei Sun, Zhengwei Tao, Youdi Li, Hiroshi Arakawa
[ABSTRACT]
The integration of Large Language Models (LLMs) and knowledge graphs (KGs)
has achieved remarkable success in various natural language processing tasks.
However, existing methodologies that integrate LLMs and KGs often navigate the
task-solving process solely based on the LLM’s analysis of the question,
overlooking the rich cognitive potential inherent in the vast knowledge
encapsulated in KGs. To address this, we introduce Observation-Driven Agent
(ODA), a novel AI agent framework tailored for tasks involving KGs. ODA
incorporates KG reasoning abilities via global observation that enhances
reasoning capabilities through a cyclical paradigm of observation, action, and
reflection. Confronting the exponential explosion of knowledge during
observation, we innovatively design a recursive observation mechanism.
Subsequently, we integrate the observed knowledge into the action and
reflection modules. Through extensive experiments, ODA demonstrates
state-of-the-art performance on several datasets, notably achieving accuracy
improvements of 12.87% and 8.9%.
[COMMENTS]
LLM+KG
[LINK]
http://arxiv.org/abs/2404.07677v1
[DATE]
2024-04-11 20:16:16+08:00
[CATEGORIES]
cs.CL
Curated Datasets and Neural Models for Machine Translation of Informal Registers between Mayan and Spanish Vernaculars
[AUTHORS]
Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena
[ABSTRACT]
The Mayan languages comprise a language family with an ancient history,
millions of speakers, and immense cultural value, that, nevertheless, remains
severely underrepresented in terms of resources and global exposure. In this
paper we develop, curate, and publicly release a set of corpora in several
Mayan languages spoken in Guatemala and Southern Mexico, which we call MayanV.
The datasets are parallel with Spanish, the dominant language of the region,
and are taken from official native sources focused on representing informal,
day-to-day, and non-domain-specific language. As such, and according to our
dialectometric analysis, they differ in register from most other available
resources. Additionally, we present neural machine translation models, trained
on as many resources and Mayan languages as possible, and evaluated exclusively
on our datasets. We observe lexical divergences between the dialects of Spanish
in our resources and the more widespread written standard of Spanish, and that
resources other than the ones we present do not seem to improve translation
performance, indicating that many such resources may not accurately capture
common, real-life language usage. The MayanV dataset is available at
https://github.com/transducens/mayanv.
[COMMENTS]
13 pages, 3 figures, 8 tables, Submitted to NAACL 2024
[LINK]
http://arxiv.org/abs/2404.07673v1
[DATE]
2024-04-11 20:09:47+08:00
[CATEGORIES]
cs.CL
cs.LG
DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition
[AUTHORS]
Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan, Chi-Han Lin, Berlin Chen
[COMMENTS]
Accepted by LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2403.17645v3
[DATE]
2024-04-11 20:07:33+08:00
[CATEGORIES]
cs.CL
Unlocking the Potential of ChatGPT: A Comprehensive Exploration of its Applications, Advantages, Limitations, and Future Directions in Natural Language Processing
[AUTHORS]
Walid Hariri
[ABSTRACT]
Large language models have revolutionized the field of artificial
intelligence and have been used in various applications. Among these models,
ChatGPT (Chat Generative Pre-trained Transformer) has been developed by OpenAI,
it stands out as a powerful tool that has been widely adopted. ChatGPT has been
successfully applied in numerous areas, including chatbots, content generation,
language translation, personalized recommendations, and even medical diagnosis
and treatment. Its success in these applications can be attributed to its
ability to generate human-like responses, understand natural language, and
adapt to different contexts. Its versatility and accuracy make it a powerful
tool for natural language processing (NLP). However, there are also limitations
to ChatGPT, such as its tendency to produce biased responses and its potential
to perpetuate harmful language patterns. This article provides a comprehensive
overview of ChatGPT, its applications, advantages, and limitations.
Additionally, the paper emphasizes the importance of ethical considerations
when using this robust tool in real-world scenarios. Finally, This paper
contributes to ongoing discussions surrounding artificial intelligence and its
impact on vision and NLP domains by providing insights into prompt engineering
techniques.
[LINK]
http://arxiv.org/abs/2304.02017v9
[DATE]
2024-04-11 19:44:38+08:00
[CATEGORIES]
cs.CL
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
[AUTHORS]
Nathan Godey, Éric de la Clergerie, Benoît Sagot
[ABSTRACT]
Recent advances in language modeling consist in pretraining highly
parameterized neural networks on extremely large web-mined text corpora.
Training and inference with such models can be costly in practice, which
incentivizes the use of smaller counterparts. However, it has been observed
that smaller models can suffer from saturation, characterized as a drop in
performance at some advanced point in training followed by a plateau. In this
paper, we find that such saturation can be explained by a mismatch between the
hidden dimension of smaller models and the high rank of the target contextual
probability distribution. This mismatch affects the performance of the linear
prediction head used in such models through the well-known softmax bottleneck
phenomenon. We measure the effect of the softmax bottleneck in various settings
and find that models based on less than 1000 hidden dimensions tend to adopt
degenerate latent representations in late pretraining, which leads to reduced
evaluation performance.
[LINK]
http://arxiv.org/abs/2404.07647v1
[DATE]
2024-04-11 19:10:36+08:00
[CATEGORIES]
cs.CL
Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective
[AUTHORS]
Victor Gallego
[ABSTRACT]
This paper proposes an interpretation of RLAIF as Bayesian inference by
introducing distilled Self-Critique (dSC), which refines the outputs of a LLM
through a Gibbs sampler that is later distilled into a fine-tuned model. Only
requiring synthetic data, dSC is exercised in experiments regarding safety,
sentiment, and privacy control, showing it can be a viable and cheap
alternative to align LLMs. Code released at
\url{https://github.com/vicgalle/distilled-self-critique}.
[COMMENTS]
Accepted to ICLR 2024 (TinyPapers track)
[LINK]
http://arxiv.org/abs/2312.01957v3
[DATE]
2024-04-11 18:54:19+08:00
[CATEGORIES]
cs.CL
cs.LG
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain
[AUTHORS]
Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, Andrea Zaninello
[ABSTRACT]
Research on language technology for the development of medical applications
is currently a hot topic in Natural Language Understanding and Generation.
Thus, a number of large language models (LLMs) have recently been adapted to
the medical domain, so that they can be used as a tool for mediating in
human-AI interaction. While these LLMs display competitive performance on
automated medical texts benchmarks, they have been pre-trained and evaluated
with a focus on a single language (English mostly). This is particularly true
of text-to-text models, which typically require large amounts of
domain-specific pre-training data, often not easily accessible for many
languages. In this paper, we address these shortcomings by compiling, to the
best of our knowledge, the largest multilingual corpus for the medical domain
in four languages, namely English, French, Italian and Spanish. This new corpus
has been used to train Medical mT5, the first open-source text-to-text
multilingual model for the medical domain. Additionally, we present two new
evaluation benchmarks for all four languages with the aim of facilitating
multilingual research in this domain. A comprehensive evaluation shows that
Medical mT5 outperforms both encoders and similarly sized text-to-text models
for the Spanish, French, and Italian benchmarks, while being competitive with
current state-of-the-art LLMs in English.
[COMMENTS]
LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.07613v1
[DATE]
2024-04-11 18:01:32+08:00
[CATEGORIES]
cs.CL
cs.LG
A Multi-Label Dataset of French Fake News: Human and Machine Insights
[AUTHORS]
Benjamin Icard, François Maine, Morgane Casanova, Géraud Faye, Julien Chanson, Guillaume Gadek, Ghislain Atemezing, François Bancilhon, Paul Égré
[COMMENTS]
Paper to appear in the Proceedings of the 2024 Joint International
Conference on Computational Linguistics, Language Resources and Evaluation
(LREC-COLING 2024)
[LINK]
http://arxiv.org/abs/2403.16099v2
[DATE]
2024-04-11 17:58:17+08:00
[CATEGORIES]
cs.CL
cs.LG
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective
[AUTHORS]
Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou
[ABSTRACT]
Code generation aims to understand the problem description and generate
corresponding code snippets, where existing works generally decompose such
complex tasks into intermediate steps by prompting strategies, such as
Chain-of-Thought and its variants. While these studies have achieved some
success, their effectiveness is highly dependent on the capabilities of
advanced Large Language Models (LLMs) such as GPT-4, particularly in terms of
API calls, which significantly limits their practical applicability.
Consequently, how to enhance the code generation capabilities of small and
medium-scale code LLMs without significantly increasing training costs is an
appealing challenge. In this paper, we suggest that code comments are the
natural logic pivot between natural language and code language and propose
using comments to boost the code generation ability of code LLMs. Concretely,
we propose MANGO (comMents As Natural loGic pivOts), including a comment
contrastive training strategy and a corresponding logical comment decoding
strategy. Experiments are performed on HumanEval and MBPP, utilizing StarCoder
and WizardCoder as backbone models, and encompassing model parameter sizes
between 3B and 7B. The results indicate that MANGO significantly improves the
code pass rate based on the strong baselines. Meanwhile, the robustness of the
logical comment decoding strategy is notably higher than the Chain-of-thoughts
prompting. The code is publicly available at
\url{https://github.com/pppa2019/Mango}.
[COMMENTS]
The code is publicly available at https://github.com/pppa2019/Mango
[LINK]
http://arxiv.org/abs/2404.07549v1
[DATE]
2024-04-11 16:30:46+08:00
[CATEGORIES]
cs.CL
Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning
[AUTHORS]
Quanyu Long, Yin Wu, Wenya Wang, Sinno Jialin Pan
[ABSTRACT]
In-context Learning (ICL) has emerged as a powerful capability alongside the
development of scaled-up large language models (LLMs). By instructing LLMs
using few-shot demonstrative examples, ICL enables them to perform a wide range
of tasks without updating millions of parameters. However, the precise
contributions of demonstrations towards improving end-task performance have not
been thoroughly investigated in recent analytical studies. In this paper, we
empirically decompose the overall performance of ICL into three dimensions,
label space, format, and discrimination, and we evaluate four general-purpose
LLMs across a diverse range of tasks. Counter-intuitively, we find that the
demonstrations have a marginal impact on provoking discriminative knowledge of
language models. However, ICL exhibits significant efficacy in regulating the
label space and format which helps LLMs to respond in desired label words. We
then demonstrate this ability functions similar to detailed instructions for
LLMs to follow. We additionally provide an in-depth analysis of the mechanism
of retrieval helping with ICL and find that retrieving the most semantically
similar examples notably boosts model’s discriminative capability.
[COMMENTS]
36 pages, 8 figures
[LINK]
http://arxiv.org/abs/2404.07546v1
[DATE]
2024-04-11 16:20:10+08:00
[CATEGORIES]
cs.CL
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
[AUTHORS]
Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu
[ABSTRACT]
We analyze how well pre-trained large language models (e.g., Llama2, GPT-4,
Claude 3, etc) can do linear and non-linear regression when given in-context
examples, without any additional training or gradient updates. Our findings
reveal that several large language models (e.g., GPT-4, Claude 3) are able to
perform regression tasks with a performance rivaling (or even outperforming)
that of traditional supervised methods such as Random Forest, Bagging, or
Gradient Boosting. For example, on the challenging Friedman #2 regression
dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM,
Random Forest, KNN, or Gradient Boosting. We then investigate how well the
performance of large language models scales with the number of in-context
exemplars. We borrow from the notion of regret from online learning and
empirically show that LLMs are capable of obtaining a sub-linear regret.
[COMMENTS]
50 pages, 48 figures, preprint
[LINK]
http://arxiv.org/abs/2404.07544v1
[DATE]
2024-04-11 16:12:43+08:00
[CATEGORIES]
cs.CL
Technical Report: Impact of Position Bias on Language Models in Token Classification
[AUTHORS]
Mehdi Ben Amor, Michael Granitzer, Jelena Mitrović
[ABSTRACT]
Language Models (LMs) have shown state-of-the-art performance in Natural
Language Processing (NLP) tasks. Downstream tasks such as Named Entity
Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data
imbalance issues, particularly regarding the ratio of positive to negative
examples and class disparities. This paper investigates an often-overlooked
issue of encoder models, specifically the position bias of positive examples in
token classification tasks. For completeness, we also include decoders in the
evaluation. We evaluate the impact of position bias using different position
embedding techniques, focusing on BERT with Absolute Position Embedding (APE),
Relative Position Embedding (RPE), and Rotary Position Embedding (RoPE).
Therefore, we conduct an in-depth evaluation of the impact of position bias on
the performance of LMs when fine-tuned on token classification benchmarks. Our
study includes CoNLL03 and OntoNote5.0 for NER, English Tree Bank UD_en, and
TweeBank for POS tagging. We propose an evaluation approach to investigate
position bias in transformer models. We show that LMs can suffer from this bias
with an average drop ranging from 3\% to 9\% in their performance. To mitigate
this effect, we propose two methods: Random Position Shifting and Context
Perturbation, that we apply on batches during the training process. The results
show an improvement of $\approx$ 2\% in the performance of the model on
CoNLL03, UD_en, and TweeBank.
[COMMENTS]
Updated content of the preprint
[LINK]
http://arxiv.org/abs/2304.13567v4
[DATE]
2024-04-11 16:10:11+08:00
[CATEGORIES]
cs.CL
Introducing L2M3, A Multilingual Medical Large Language Model to Advance Health Equity in Low-Resource Regions
[AUTHORS]
Agasthya Gangavarapu
[ABSTRACT]
Addressing the imminent shortfall of 10 million health workers by 2030,
predominantly in Low- and Middle-Income Countries (LMICs), this paper
introduces an innovative approach that harnesses the power of Large Language
Models (LLMs) integrated with machine translation models. This solution is
engineered to meet the unique needs of Community Health Workers (CHWs),
overcoming language barriers, cultural sensitivities, and the limited
availability of medical dialog datasets. I have crafted a model that not only
boasts superior translation capabilities but also undergoes rigorous
fine-tuning on open-source datasets to ensure medical accuracy and is equipped
with comprehensive safety features to counteract the risks of misinformation.
Featuring a modular design, this approach is specifically structured for
swift adaptation across various linguistic and cultural contexts, utilizing
open-source components to significantly reduce healthcare operational costs.
This strategic innovation markedly improves the accessibility and quality of
healthcare services by providing CHWs with contextually appropriate medical
knowledge and diagnostic tools. This paper highlights the transformative impact
of this context-aware LLM, underscoring its crucial role in addressing the
global healthcare workforce deficit and propelling forward healthcare outcomes
in LMICs.
[LINK]
http://arxiv.org/abs/2404.08705v1
[DATE]
2024-04-11 15:39:22+08:00
[CATEGORIES]
cs.CL
cs.LG
Supervised Knowledge Makes Large Language Models Better In-context Learners
[AUTHORS]
Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang
[COMMENTS]
Accepted to ICLR 2024
[LINK]
http://arxiv.org/abs/2312.15918v2
[DATE]
2024-04-11 14:41:15+08:00
[CATEGORIES]
cs.CL
Leveraging Data Augmentation for Process Information Extraction
[AUTHORS]
Julian Neuberger, Leonie Doll, Benedict Engelmann, Lars Ackermann, Stefan Jablonski
[ABSTRACT]
Business Process Modeling projects often require formal process models as a
central component. High costs associated with the creation of such formal
process models motivated many different fields of research aimed at automated
generation of process models from readily available data. These include process
mining on event logs, and generating business process models from natural
language texts. Research in the latter field is regularly faced with the
problem of limited data availability, hindering both evaluation and development
of new techniques, especially learning-based ones.
To overcome this data scarcity issue, in this paper we investigate the
application of data augmentation for natural language text data. Data
augmentation methods are well established in machine learning for creating new,
synthetic data without human assistance. We find that many of these methods are
applicable to the task of business process information extraction, improving
the accuracy of extraction. Our study shows, that data augmentation is an
important component in enabling machine learning methods for the task of
business process model generation from natural language text, where currently
mostly rule-based systems are still state of the art. Simple data augmentation
techniques improved the $F_1$ score of mention extraction by 2.9 percentage
points, and the $F_1$ of relation extraction by $4.5$. To better understand how
data augmentation alters human annotated texts, we analyze the resulting text,
visualizing and discussing the properties of augmented textual data.
We make all code and experiments results publicly available.
[COMMENTS]
Accepted at BPMDS 2024 (https://sites.google.com/view/bpmds/), to be
printed
[LINK]
http://arxiv.org/abs/2404.07501v1
[DATE]
2024-04-11 14:32:03+08:00
[CATEGORIES]
cs.CL
Interactive Prompt Debugging with Sequence Salience
[AUTHORS]
Ian Tenney, Ryan Mullins, Bin Du, Shree Pandya, Minsuk Kahng, Lucas Dixon
[ABSTRACT]
We present Sequence Salience, a visual tool for interactive prompt debugging
with input salience methods. Sequence Salience builds on widely used salience
methods for text classification and single-token prediction, and extends this
to a system tailored for debugging complex LLM prompts. Our system is
well-suited for long texts, and expands on previous work by 1) providing
controllable aggregation of token-level salience to the word, sentence, or
paragraph level, making salience over long inputs tractable; and 2) supporting
rapid iteration where practitioners can act on salience results, refine
prompts, and run salience on the new output. We include case studies showing
how Sequence Salience can help practitioners work with several complex
prompting strategies, including few-shot, chain-of-thought, and constitutional
principles. Sequence Salience is built on the Learning Interpretability Tool,
an open-source platform for ML model visualizations, and code, notebooks, and
tutorials are available at http://goo.gle/sequence-salience.
[LINK]
http://arxiv.org/abs/2404.07498v1
[DATE]
2024-04-11 14:22:56+08:00
[CATEGORIES]
cs.CL
cs.LG
Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability
[AUTHORS]
Jinwei Lu, Yuanfeng Song, Haodi Zhang, Chen Zhang, Raymond Chi-Wing Wong
[ABSTRACT]
Text-to-Vis is an emerging task in the natural language processing (NLP) area
that aims to automatically generate data visualizations from natural language
questions (NLQs). Despite their progress, existing text-to-vis models often
heavily rely on lexical matching between words in the questions and tokens in
data schemas. This overreliance on lexical matching may lead to a diminished
level of model robustness against input variations. In this study, we
thoroughly examine the robustness of current text-to-vis models, an area that
has not previously been explored. In particular, we construct the first
robustness dataset nvBench-Rob, which contains diverse lexical and phrasal
variations based on the original text-to-vis benchmark nvBench. Then, we found
that the performance of existing text-to-vis models on this new dataset
dramatically drops, implying that these methods exhibit inadequate robustness
overall. Finally, we propose a novel framework based on Retrieval-Augmented
Generation (RAG) technique, named GRED, specifically designed to address input
perturbations in these two variants. The framework consists of three parts:
NLQ-Retrieval Generator, Visualization Query-Retrieval Retuner and
Annotation-based Debugger, which are used to tackle the challenges posed by
natural language variants, programming style differences and data schema
variants, respectively. Extensive experimental evaluations show that, compared
to the state-of-the-art model RGVisNet in the Text-to-Vis field, GRED performs
better in terms of model robustness, with a 32% increase in accuracy on the
proposed nvBench-Rob dataset.
[LINK]
http://arxiv.org/abs/2404.07135v2
[DATE]
2024-04-11 13:56:39+08:00
[CATEGORIES]
cs.CL
Augmenting Knowledge Graph Hierarchies Using Neural Transformers
[AUTHORS]
Sanat Sharma, Mayank Poddar, Jayant Kumar, Kosta Blank, Tracy King
[ABSTRACT]
Knowledge graphs are useful tools to organize, recommend and sort data.
Hierarchies in knowledge graphs provide significant benefit in improving
understanding and compartmentalization of the data within a knowledge graph.
This work leverages large language models to generate and augment hierarchies
in an existing knowledge graph. For small (<100,000 node) domain-specific KGs,
we find that a combination of few-shot prompting with one-shot generation works
well, while larger KG may require cyclical generation. We present techniques
for augmenting hierarchies, which led to coverage increase by 98% for intents
and 99% for colors in our knowledge graph.
[COMMENTS]
European Conference on Information Retrieval 2024
[LINK]
http://arxiv.org/abs/2404.08020v1
[DATE]
2024-04-11 13:53:38+08:00
[CATEGORIES]
cs.CL
cs.LG
CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts
[AUTHORS]
Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera
[ABSTRACT]
Social media platforms play an essential role in crisis communication, but
analyzing crisis-related social media texts is challenging due to their
informal nature. Transformer-based pre-trained models like BERT and RoBERTa
have shown success in various NLP tasks, but they are not tailored for
crisis-related texts. Furthermore, general-purpose sentence encoders are used
to generate sentence embeddings, regardless of the textual complexities in
crisis-related texts. Advances in applications like text classification,
semantic search, and clustering contribute to the effective processing of
crisis-related texts, which is essential for emergency responders to gain a
comprehensive view of a crisis event, whether historical or real-time. To
address these gaps in crisis informatics literature, this study introduces
CrisisTransformers, an ensemble of pre-trained language models and sentence
encoders trained on an extensive corpus of over 15 billion word tokens from
tweets associated with more than 30 crisis events, including disease outbreaks,
natural disasters, conflicts, and other critical incidents. We evaluate
existing models and CrisisTransformers on 18 crisis-specific public datasets.
Our pre-trained models outperform strong baselines across all datasets in
classification tasks, and our best-performing sentence encoder improves the
state-of-the-art by 17.43% in sentence encoding tasks. Additionally, we
investigate the impact of model initialization on convergence and evaluate the
significance of domain-specific models in generating semantically meaningful
sentence embeddings. The models are publicly available at:
https://huggingface.co/crisistransformers
[LINK]
http://arxiv.org/abs/2309.05494v3
[DATE]
2024-04-11 13:25:17+08:00
[CATEGORIES]
cs.CL
RoT: Enhancing Large Language Models with Reflection on Search Trees
[AUTHORS]
Wenyang Hui, Chengyue Jiang, Yan Wang, Kewei Tu
[ABSTRACT]
Large language models (LLMs) have demonstrated impressive capability in
reasoning and planning when integrated with tree-search-based prompting
methods. However, since these methods ignore the previous search experiences,
they often make the same mistakes in the search process. To address this issue,
we introduce Reflection on search Trees (RoT), an LLM reflection framework
designed to improve the performance of tree-search-based prompting methods. It
uses a strong LLM to summarize guidelines from previous tree search experiences
to enhance the ability of a weak LLM. The guidelines are instructions about
solving this task through tree search which can prevent the weak LLMs from
making similar mistakes in the past search process. In addition, we proposed a
novel state selection method, which identifies the critical information from
historical search processes to help RoT generate more specific and meaningful
guidelines. In our extensive experiments, we find that RoT significantly
improves the performance of LLMs in reasoning or planning tasks with various
tree-search-based prompting methods (e.g., BFS and MCTS). Non-tree-search-based
prompting methods such as Chain-of-Thought (CoT) can also benefit from RoT
guidelines since RoT can provide task-specific knowledge collected from the
search experience.
[COMMENTS]
9 pages main
[LINK]
http://arxiv.org/abs/2404.05449v2
[DATE]
2024-04-11 13:21:00+08:00
[CATEGORIES]
cs.CL
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models
[AUTHORS]
Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, Bowen Zhang, Xiaojiang Peng
[ABSTRACT]
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task
3 on multimodal emotion cause analysis in conversations. We propose a novel
Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction
(MER-MCE) framework that integrates text, audio, and visual modalities using
specialized emotion encoders. Our approach sets itself apart from
top-performing teams by leveraging modality-specific features for enhanced
emotion understanding and causality inference. Experimental evaluation
demonstrates the advantages of our multimodal approach, with our submission
achieving a competitive weighted F1 score of 0.3435, ranking third with a
margin of only 0.0339 behind the 1st team and 0.0025 behind the 2nd team.
Project: https://github.com/MIPS-COLT/MER-MCE.git
[COMMENTS]
Ranked 3rd in SemEval ‘24 Task 3 with F1 of 0.3435, close to 1st &
2nd by 0.0339 & 0.0025
[LINK]
http://arxiv.org/abs/2404.00511v3
[DATE]
2024-04-11 13:14:35+08:00
[CATEGORIES]
cs.CL
Structure-aware Fine-tuning for Code Pre-trained Models
[AUTHORS]
Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao
[ABSTRACT]
Over the past few years, we have witnessed remarkable advancements in Code
Pre-trained Models (CodePTMs). These models achieved excellent representation
capabilities by designing structure-based pre-training tasks for code. However,
how to enhance the absorption of structural knowledge when fine-tuning CodePTMs
still remains a significant challenge. To fill this gap, in this paper, we
present Structure-aware Fine-tuning (SAT), a novel structure-enhanced and
plug-and-play fine-tuning method for CodePTMs. We first propose a structure
loss to quantify the difference between the information learned by CodePTMs and
the knowledge extracted from code structure. Specifically, we use the attention
scores extracted from Transformer layer as the learned structural information,
and the shortest path length between leaves in abstract syntax trees as the
structural knowledge. Subsequently, multi-task learning is introduced to
improve the performance of fine-tuning. Experiments conducted on four
pre-trained models and two generation tasks demonstrate the effectiveness of
our proposed method as a plug-and-play solution. Furthermore, we observed that
SAT can benefit CodePTMs more with limited training data.
[COMMENTS]
Accepted by COLING 2024
[LINK]
http://arxiv.org/abs/2404.07471v1
[DATE]
2024-04-11 12:24:48+08:00
[CATEGORIES]
cs.CL
Scalable Language Model with Generalized Continual Learning
[AUTHORS]
Bohao Peng, Zhuotao Tian, Shu Liu, Mingchang Yang, Jiaya Jia
[ABSTRACT]
Continual learning has gained increasing importance as it facilitates the
acquisition and refinement of scalable knowledge and skills in language models.
However, existing methods typically encounter strict limitations and challenges
in real-world scenarios, such as reliance on experience replay, optimization
constraints, and inference task-ID. In this study, we introduce the Scalable
Language Model (SLM) to overcome these limitations within a more challenging
and generalized setting, representing a significant advancement toward
practical applications for continual learning. Specifically, we propose the
Joint Adaptive Re-Parameterization (JARe), integrated with Dynamic Task-related
Knowledge Retrieval (DTKR), to enable adaptive adjustment of language models
based on specific downstream tasks. This approach leverages the task
distribution within the vector space, aiming to achieve a smooth and effortless
continual learning process. Our method demonstrates state-of-the-art
performance on diverse backbones and benchmarks, achieving effective continual
learning in both full-set and few-shot scenarios with minimal forgetting.
Moreover, while prior research primarily focused on a single task type such as
classification, our study goes beyond, with the large language model, i.e.,
LLaMA-2, to explore the effects across diverse domains and task types, such
that a single language model can be decently scaled to broader applications.
[COMMENTS]
The Twelfth International Conference on Learning Representations
[LINK]
http://arxiv.org/abs/2404.07470v1
[DATE]
2024-04-11 12:22:15+08:00
[CATEGORIES]
cs.CL
Behavior Trees Enable Structured Programming of Language Model Agents
[AUTHORS]
Richard Kelley
[ABSTRACT]
Language models trained on internet-scale data sets have shown an impressive
ability to solve problems in Natural Language Processing and Computer Vision.
However, experience is showing that these models are frequently brittle in
unexpected ways, and require significant scaffolding to ensure that they
operate correctly in the larger systems that comprise “language-model agents.”
In this paper, we argue that behavior trees provide a unifying framework for
combining language models with classical AI and traditional programming. We
introduce Dendron, a Python library for programming language model agents using
behavior trees. We demonstrate the approach embodied by Dendron in three case
studies: building a chat agent, a camera-based infrastructure inspection agent
for use on a mobile robot or vehicle, and an agent that has been built to
satisfy safety constraints that it did not receive through instruction tuning
or RLHF.
[LINK]
http://arxiv.org/abs/2404.07439v1
[DATE]
2024-04-11 10:44:13+08:00
[CATEGORIES]
cs.CL
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
[AUTHORS]
Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen
[ABSTRACT]
The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged
moderate-sized large language models (LLMs) highlights the potential of
building smaller yet powerful LLMs. Regardless, the cost of training such
models from scratch on trillions of tokens remains high. In this work, we study
structured pruning as an effective means to develop smaller LLMs from
pre-trained, larger models. Our approach employs two key techniques: (1)
targeted structured pruning, which prunes a larger model to a specified target
shape by removing layers, heads, and intermediate and hidden dimensions in an
end-to-end manner, and (2) dynamic batch loading, which dynamically updates the
composition of sampled data in each training batch based on varying losses
across different domains. We demonstrate the efficacy of our approach by
presenting the Sheared-LLaMA series, pruning the LLaMA2-7B model down to 1.3B
and 2.7B parameters. Sheared-LLaMA models outperform state-of-the-art
open-source models of equivalent sizes, such as Pythia, INCITE, OpenLLaMA and
the concurrent TinyLlama models, on a wide range of downstream and instruction
tuning evaluations, while requiring only 3% of compute compared to training
such models from scratch. This work provides compelling evidence that
leveraging existing LLMs with structured pruning is a far more cost-effective
approach for building competitive small-scale LLMs
[COMMENTS]
The code and models are available at
https://github.com/princeton-nlp/LLM-Shearing
[LINK]
http://arxiv.org/abs/2310.06694v2
[DATE]
2024-04-11 09:18:06+08:00
[CATEGORIES]
cs.CL
cs.LG
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
[AUTHORS]
Yiran Wu, Tianwei Yue, Shaokun Zhang, Chi Wang, Qingyun Wu
[ABSTRACT]
It is a notable trend to use Large Language Models (LLMs) to tackle complex
tasks, e.g., tasks that require a sequence of actions and dynamic interaction
with tools and external environments. In this paper, we propose StateFlow, a
novel LLM-based task-solving paradigm that conceptualizes complex task-solving
processes as state machines. In StateFlow, we distinguish between “process
grounding” (via state and state transitions) and “sub-task solving” (through
actions within a state), enhancing control and interpretability of the
task-solving procedure. A state represents the status of a running process. The
transitions between states are controlled by heuristic rules or decisions made
by the LLM, allowing for a dynamic and adaptive progression. Upon entering a
state, a series of actions is executed, involving not only calling LLMs guided
by different prompts, but also the utilization of external tools as needed. Our
results show that StateFlow significantly enhances LLMs’ efficiency. For
instance, StateFlow achieves 13% and 28% higher success rates compared to ReAct
in InterCode SQL and ALFWorld benchmark, with 5x and 3x less cost respectively.
We also show that StateFlow can be combined with iterative refining methods
like Reflexion to further improve performance.
[LINK]
http://arxiv.org/abs/2403.11322v3
[DATE]
2024-04-11 07:04:48+08:00
[CATEGORIES]
cs.CL
Analyzing the Performance of Large Language Models on Code Summarization
[AUTHORS]
Rajarshi Haldar, Julia Hockenmaier
[ABSTRACT]
Large language models (LLMs) such as Llama 2 perform very well on tasks that
involve both natural language and source code, particularly code summarization
and code generation. We show that for the task of code summarization, the
performance of these models on individual examples often depends on the amount
of (subword) token overlap between the code and the corresponding reference
natural language descriptions in the dataset. This token overlap arises because
the reference descriptions in standard datasets (corresponding to docstrings in
large code bases) are often highly similar to the names of the functions they
describe. We also show that this token overlap occurs largely in the function
names of the code and compare the relative performance of these models after
removing function names versus removing code structure. We also show that using
multiple evaluation metrics like BLEU and BERTScore gives us very little
additional insight since these metrics are highly correlated with each other.
[LINK]
http://arxiv.org/abs/2404.08018v1
[DATE]
2024-04-11 06:42:18+08:00
[CATEGORIES]
cs.CL
Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI
[AUTHORS]
Sahil Garg, Anderson Schneider, Anant Raj, Kashif Rasul, Yuriy Nevmyvaka, Sneihil Gopal, Amit Dhurandhar, Guillermo Cecchi, Irina Rish
[ABSTRACT]
Building on the remarkable achievements in generative sampling of natural
images, we propose an innovative challenge, potentially overly ambitious, which
involves generating samples of entire multivariate time series that resemble
images. However, the statistical challenge lies in the small sample size,
sometimes consisting of a few hundred subjects. This issue is especially
problematic for deep generative models that follow the conventional approach of
generating samples from a canonical distribution and then decoding or denoising
them to match the true data distribution. In contrast, our method is grounded
in information theory and aims to implicitly characterize the distribution of
images, particularly the (global and local) dependency structure between
pixels. We achieve this by empirically estimating its KL-divergence in the dual
form with respect to the respective marginal distribution. This enables us to
perform generative sampling directly in the optimized 1-D dual divergence
space. Specifically, in the dual space, training samples representing the data
distribution are embedded in the form of various clusters between two end
points. In theory, any sample embedded between those two end points is
in-distribution w.r.t. the data distribution. Our key idea for generating novel
samples of images is to interpolate between the clusters via a walk as per
gradients of the dual function w.r.t. the data dimensions. In addition to the
data efficiency gained from direct sampling, we propose an algorithm that
offers a significant reduction in sample complexity for estimating the
divergence of the data distribution with respect to the marginal distribution.
We provide strong theoretical guarantees along with an extensive empirical
evaluation using many real-world datasets from diverse domains, establishing
the superiority of our approach w.r.t. state-of-the-art deep learning methods.
[LINK]
http://arxiv.org/abs/2404.07377v1
[DATE]
2024-04-11 06:35:06+08:00
[CATEGORIES]
cs.LG
cs.CL
LLMs in Biomedicine: A study on clinical Named Entity Recognition
[AUTHORS]
Masoud Monajatipoor, Jiaxin Yang, Joel Stremmel, Melika Emami, Fazlolah Mohaghegh, Mozhdeh Rouhsedaghat, Kai-Wei Chang
[ABSTRACT]
Large Language Models (LLMs) demonstrate remarkable versatility in various
NLP tasks but encounter distinct challenges in biomedicine due to medical
language complexities and data scarcity. This paper investigates the
application of LLMs in the medical domain by exploring strategies to enhance
their performance for the Named-Entity Recognition (NER) task. Specifically,
our study reveals the importance of meticulously designed prompts in
biomedicine. Strategic selection of in-context examples yields a notable
improvement, showcasing ~15-20\% increase in F1 score across all benchmark
datasets for few-shot clinical NER. Additionally, our findings suggest that
integrating external resources through prompting strategies can bridge the gap
between general-purpose LLM proficiency and the specialized demands of medical
NER. Leveraging a medical knowledge base, our proposed method inspired by
Retrieval-Augmented Generation (RAG) can boost the F1 score of LLMs for
zero-shot clinical NER. We will release the code upon publication.
[LINK]
http://arxiv.org/abs/2404.07376v1
[DATE]
2024-04-11 06:26:26+08:00
[CATEGORIES]
cs.CL
Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring
[AUTHORS]
Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, Theresa Nguyen, Tim Althoff
[ABSTRACT]
Self-guided mental health interventions, such as “do-it-yourself” tools to
learn and practice coping strategies, show great promise to improve access to
mental health care. However, these interventions are often cognitively
demanding and emotionally triggering, creating accessibility barriers that
limit their wide-scale implementation and adoption. In this paper, we study how
human-language model interaction can support self-guided mental health
interventions. We take cognitive restructuring, an evidence-based therapeutic
technique to overcome negative thinking, as a case study. In an IRB-approved
randomized field study on a large mental health website with 15,531
participants, we design and evaluate a system that uses language models to
support people through various steps of cognitive restructuring. Our findings
reveal that our system positively impacts emotional intensity for 67% of
participants and helps 65% overcome negative thoughts. Although adolescents
report relatively worse outcomes, we find that tailored interventions that
simplify language model generations improve overall effectiveness and equity.
[COMMENTS]
CHI 2024 Camera Ready
[LINK]
http://arxiv.org/abs/2310.15461v2
[DATE]
2024-04-11 05:59:58+08:00
[CATEGORIES]
cs.CL
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
[AUTHORS]
Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung
[ABSTRACT]
We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for
evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM)
task. This benchmark focuses on syntax-aware completions of program structures
such as code blocks and conditional expressions, and includes 17,720 examples
from multiple programming languages, sourced from recent code submissions after
April 2022 to minimize data contamination. SAFIM provides a robust framework
with various prompt designs and novel syntax-aware post-processing techniques,
facilitating accurate and fair comparisons across LLMs. Our comprehensive
evaluation of 15 LLMs shows that FIM pretraining not only enhances FIM
proficiency but also improves Left-to-Right (L2R) inference using LLMs. Our
findings challenge conventional beliefs and suggest that pretraining methods
and data quality have more impact than model size. SAFIM thus serves as a
foundational platform for future research in effective pretraining strategies
for code LLMs. The evaluation toolkit and dataset are available at
https://github.com/gonglinyuan/safim, and the leaderboard is available at
https://safimbenchmark.com.
[LINK]
http://arxiv.org/abs/2403.04814v2
[DATE]
2024-04-11 04:26:31+08:00
[CATEGORIES]
cs.CL
cs.LG
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
[AUTHORS]
Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu
[ABSTRACT]
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon
the RWKV (RWKV-4) architecture. Our architectural design advancements include
multi-headed matrix-valued states and a dynamic recurrence mechanism that
improve expressivity while maintaining the inference efficiency characteristics
of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a
fast tokenizer based on greedy matching for enhanced multilinguality. We
trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two
Finch models with 1.6 and 3.1 billion parameters and find that they achieve
competitive performance across a wide variety of benchmarks. We release all our
models on HuggingFace under the Apache 2.0 license. Models at:
https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM
Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code
at: https://github.com/RWKV/RWKV-infctx-trainer
[LINK]
http://arxiv.org/abs/2404.05892v2
[DATE]
2024-04-11 03:34:38+08:00
[CATEGORIES]
cs.CL
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models
[AUTHORS]
Emmy Liu, Graham Neubig, Jacob Andreas
[ABSTRACT]
Modern language models (LMs) can learn to perform new tasks in different
ways: in instruction following, the target task is described explicitly in
natural language; in few-shot prompting, the task is specified implicitly with
a small number of examples; in instruction inference, LMs are presented with
in-context examples and are then prompted to generate a natural language task
description before making predictions. Each of these procedures may be thought
of as invoking a different form of reasoning: instruction following involves
deductive reasoning, few-shot prompting involves inductive reasoning, and
instruction inference involves abductive reasoning. How do these different
capabilities relate? Across four LMs (from the gpt and llama families) and two
learning problems (involving arithmetic functions and machine translation) we
find a strong dissociation between the different types of reasoning: LMs can
sometimes learn effectively from few-shot prompts even when they are unable to
explain their own prediction rules; conversely, they sometimes infer useful
task descriptions while completely failing to learn from human-generated
descriptions of the same task. Our results highlight the non-systematic nature
of reasoning even in some of today’s largest LMs, and underscore the fact that
very different learning mechanisms may be invoked by seemingly similar
prompting procedures.
[LINK]
http://arxiv.org/abs/2404.03028v2
[DATE]
2024-04-11 03:03:00+08:00
[CATEGORIES]
cs.CL
We’re Calling an Intervention: Taking a Closer Look at Language Model Adaptation to Different Types of Linguistic Variation
[AUTHORS]
Aarohi Srivastava, David Chiang
[COMMENTS]
Preprint. Under review
[LINK]
http://arxiv.org/abs/2404.07304v1
[DATE]
2024-04-11 02:56:53+08:00
[CATEGORIES]
cs.CL
Is Your LLM Outdated? Benchmarking LLMs & Alignment Algorithms for Time-Sensitive Knowledge
[AUTHORS]
Seyed Mahed Mousavi, Simone Alghisi, Giuseppe Riccardi
[ABSTRACT]
We study the appropriateness of Large Language Models (LLMs) as knowledge
repositories. We focus on the challenge of maintaining LLMs’ factual knowledge
up-to-date over time. Motivated by the lack of studies on identifying outdated
knowledge within LLMs, we design and develop a dynamic benchmark with
up-to-date ground truth answers for each target factual question. We evaluate
eighteen open-source and closed-source state-of-the-art LLMs on time-sensitive
knowledge retrieved in real-time from Wikidata. We select time-sensitive domain
facts in politics, sports, and organizations, and estimate the recency of the
information learned by the model during pre-training\fine-tuning. In the second
contribution, we evaluate the effectiveness of knowledge editing methods for
aligning LLMs with up-to-date factual knowledge and compare their performance
with Retrieval Augmented Generation. The dynamic benchmark is designed to be
used as-is to assess LLMs’s up-to-dateness, as well as to be extended to other
domains by sharing the code, the dataset, as well as evaluation and
visualization scripts.
[LINK]
http://arxiv.org/abs/2404.08700v1
[DATE]
2024-04-11 02:08:59+08:00
[CATEGORIES]
cs.CL
The Impact of Depth on Compositional Generalization in Transformer Language Models
[AUTHORS]
Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen
[ABSTRACT]
To process novel sentences, language models (LMs) must generalize
compositionally – combine familiar elements in new ways. What aspects of a
model’s structure promote compositional generalization? Focusing on
transformers, we test the hypothesis, motivated by theoretical and empirical
work, that deeper transformers generalize more compositionally. Simply adding
layers increases the total number of parameters; to address this confound
between depth and size, we construct three classes of models which trade off
depth for width such that the total number of parameters is kept constant (41M,
134M and 374M parameters). We pretrain all models as LMs and fine-tune them on
tasks that test for compositional generalization. We report three main
conclusions: (1) after fine-tuning, deeper models generalize more
compositionally than shallower models do, but the benefit of additional layers
diminishes rapidly; (2) within each family, deeper models show better language
modeling performance, but returns are similarly diminishing; (3) the benefits
of depth for compositional generalization cannot be attributed solely to better
performance on language modeling. Because model latency is approximately linear
in the number of layers, these results lead us to the recommendation that, with
a given total parameter budget, transformers can be made shallower than is
typical without sacrificing performance.
[COMMENTS]
Accepted to NAACL 2024
[LINK]
http://arxiv.org/abs/2310.19956v2
[DATE]
2024-04-11 02:06:14+08:00
[CATEGORIES]
cs.CL
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
[AUTHORS]
Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal
[ABSTRACT]
This work introduces an efficient method to scale Transformer-based Large
Language Models (LLMs) to infinitely long inputs with bounded memory and
computation. A key component in our proposed approach is a new attention
technique dubbed Infini-attention. The Infini-attention incorporates a
compressive memory into the vanilla attention mechanism and builds in both
masked local attention and long-term linear attention mechanisms in a single
Transformer block. We demonstrate the effectiveness of our approach on
long-context language modeling benchmarks, 1M sequence length passkey context
block retrieval and 500K length book summarization tasks with 1B and 8B LLMs.
Our approach introduces minimal bounded memory parameters and enables fast
streaming inference for LLMs.
[COMMENTS]
9 pages, 4 figures, 4 tables
[LINK]
http://arxiv.org/abs/2404.07143v1
[DATE]
2024-04-11 00:18:42+08:00
[CATEGORIES]
cs.CL
cs.LG
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
[AUTHORS]
Jie Ou, Yueming Chen, Wenhong Tian
[ABSTRACT]
While Large Language Models (LLMs) have shown remarkable abilities, they are
hindered by significant resource consumption and considerable latency due to
autoregressive processing. In this study, we introduce Adaptive N-gram Parallel
Decoding (ANPD), an innovative and lossless approach that accelerates inference
by allowing the simultaneous generation of multiple tokens. ANPD incorporates a
two-stage approach: it begins with a rapid drafting phase that employs an
N-gram module, which adapts based on the current interactive context, followed
by a verification phase, during which the original LLM assesses and confirms
the proposed tokens. Consequently, ANPD preserves the integrity of the LLM’s
original output while enhancing processing speed. We further leverage a
multi-level architecture for the N-gram module to enhance the precision of the
initial draft, consequently reducing inference latency. ANPD eliminates the
need for retraining or extra GPU memory, making it an efficient and
plug-and-play enhancement. In our experiments, models such as LLaMA and its
fine-tuned variants have shown speed improvements up to 3.67x, validating the
effectiveness of our proposed ANPD.
[LINK]
http://arxiv.org/abs/2404.08698v1
[DATE]
2024-04-11 00:11:09+08:00
[CATEGORIES]
cs.CL
cs.LG
SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction
[AUTHORS]
Ying Chen, Jiajing Xie, Yuxiang Lin, Yuhang Song, Wenxian Yang, Rongshan Yu
[ABSTRACT]
Multi-modal learning that combines pathological images with genomic data has
significantly enhanced the accuracy of survival prediction. Nevertheless,
existing methods have not fully utilized the inherent hierarchical structure
within both whole slide images (WSIs) and transcriptomic data, from which
better intra-modal representations and inter-modal integration could be
derived. Moreover, many existing studies attempt to improve multi-modal
representations through attention mechanisms, which inevitably lead to high
complexity when processing high-dimensional WSIs and transcriptomic data.
Recently, a structured state space model named Mamba emerged as a promising
approach for its superior performance in modeling long sequences with low
complexity. In this study, we propose Mamba with multi-grained multi-modal
interaction (SurvMamba) for survival prediction. SurvMamba is implemented with
a Hierarchical Interaction Mamba (HIM) module that facilitates efficient
intra-modal interactions at different granularities, thereby capturing more
detailed local features as well as rich global representations. In addition, an
Interaction Fusion Mamba (IFM) module is used for cascaded inter-modal
interactive fusion, yielding more comprehensive features for survival
prediction. Comprehensive evaluations on five TCGA datasets demonstrate that
SurvMamba outperforms other existing methods in terms of performance and
computational cost.
[LINK]
http://arxiv.org/abs/2404.08027v1
[DATE]
2024-04-11 23:58:12+08:00
[CATEGORIES]
cs.LG
Inferring Change Points in High-Dimensional Linear Regression via Approximate Message Passing
[AUTHORS]
Gabriel Arpino, Xiaoqi Liu, Ramji Venkataramanan
[ABSTRACT]
We consider the problem of localizing change points in high-dimensional
linear regression. We propose an Approximate Message Passing (AMP) algorithm
for estimating both the signals and the change point locations. Assuming
Gaussian covariates, we give an exact asymptotic characterization of its
estimation performance in the limit where the number of samples grows
proportionally to the signal dimension. Our algorithm can be tailored to
exploit any prior information on the signal, noise, and change points. It also
enables uncertainty quantification in the form of an efficiently computable
approximate posterior distribution, whose asymptotic form we characterize
exactly. We validate our theory via numerical experiments, and demonstrate the
favorable performance of our estimators on both synthetic data and images.
[COMMENTS]
24 pages, 8 figures
[LINK]
http://arxiv.org/abs/2404.07864v1
[DATE]
2024-04-11 23:57:12+08:00
[CATEGORIES]
cs.LG
Streaming detection of significant delay changes in public transport systems
[AUTHORS]
Przemysław Wrona, Maciej Grzenda, Marcin Luckner
[ABSTRACT]
Public transport systems are expected to reduce pollution and contribute to
sustainable development. However, disruptions in public transport such as
delays may negatively affect mobility choices. To quantify delays, aggregated
data from vehicle locations systems are frequently used. However, delays
observed at individual stops are caused inter alia by fluctuations in running
times and propagation of delays occurring in other locations. Hence, in this
work, we propose both the method detecting significant delays and reference
architecture, relying on stream processing engines, in which the method is
implemented. The method can complement the calculation of delays defined as
deviation from schedules. This provides both online rather than batch
identification of significant and repetitive delays, and resilience to the
limited quality of location data. The method we propose can be used with
different change detectors, such as ADWIN, applied to location data stream
shuffled to individual edges of a transport graph. It can detect in an online
manner at which edges statistically significant delays are observed and at
which edges delays arise and are reduced. Detections can be used to model
mobility choices and quantify the impact of repetitive rather than random
disruptions on feasible trips with multimodal trip modelling engines. The
evaluation performed with the public transport data of over 2000 vehicles
confirms the merits of the method and reveals that a limited-size subgraph of a
transport system graph causes statistically significant delays
[COMMENTS]
This preprint has not undergone peer review or any post-submission
improvements or corrections. The Version of Record of this contribution is
published in Computational Science - ICCS 2022. Lecture Notes in Computer
Science, vol 13353. Springer, Cham, and is available online at
https://doi.org/10.1007/978-3-031-08760-8_41
[LINK]
http://arxiv.org/abs/2404.07860v1
[DATE]
2024-04-11 23:54:20+08:00
[CATEGORIES]
cs.LG
Overparameterized Multiple Linear Regression as Hyper-Curve Fitting
[AUTHORS]
E. Atza, N. Budko
[ABSTRACT]
The paper shows that the application of the fixed-effect multiple linear
regression model to an overparameterized dataset is equivalent to fitting the
data with a hyper-curve parameterized by a single scalar parameter. This
equivalence allows for a predictor-focused approach, where each predictor is
described by a function of the chosen parameter. It is proven that a linear
model will produce exact predictions even in the presence of nonlinear
dependencies that violate the model assumptions. Parameterization in terms of
the dependent variable and the monomial basis in the predictor function space
are applied here to both synthetic and experimental data. The hyper-curve
approach is especially suited for the regularization of problems with noise in
predictor variables and can be used to remove noisy and “improper” predictors
from the model.
[LINK]
http://arxiv.org/abs/2404.07849v1
[DATE]
2024-04-11 23:43:11+08:00
[CATEGORIES]
cs.LG
Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution
[AUTHORS]
Handi Deng, Yucheng Zhou, Jiaxuan Xiang, Liujie Gu, Yan Luo, Hai Feng, Mingyuan Liu, Cheng Ma
[ABSTRACT]
Foundation models have rapidly evolved and have achieved significant
accomplishments in computer vision tasks. Specifically, the prompt mechanism
conveniently allows users to integrate image prior information into the model,
making it possible to apply models without any training. Therefore, we propose
a method based on foundation models and zero training to solve the tasks of
photoacoustic (PA) image segmentation. We employed the segment anything model
(SAM) by setting simple prompts and integrating the model’s outputs with prior
knowledge of the imaged objects to accomplish various tasks, including: (1)
removing the skin signal in three-dimensional PA image rendering; (2) dual
speed-of-sound reconstruction, and (3) segmentation of finger blood vessels.
Through these demonstrations, we have concluded that deep learning can be
directly applied in PA imaging without the requirement for network design and
training. This potentially allows for a hands-on, convenient approach to
achieving efficient and accurate segmentation of PA images. This letter serves
as a comprehensive tutorial, facilitating the mastery of the technique through
the provision of code and sample datasets.
[LINK]
http://arxiv.org/abs/2404.07833v1
[DATE]
2024-04-11 23:18:34+08:00
[CATEGORIES]
cs.LG
On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning
[AUTHORS]
Giuseppe Canonaco, Leo Ardon, Alberto Pozanco, Daniel Borrajo
[ABSTRACT]
The use of Potential Based Reward Shaping (PBRS) has shown great promise in
the ongoing research effort to tackle sample inefficiency in Reinforcement
Learning (RL). However, the choice of the potential function is critical for
this technique to be effective. Additionally, RL techniques are usually
constrained to use a finite horizon for computational limitations. This
introduces a bias when using PBRS, thus adding an additional layer of
complexity. In this paper, we leverage abstractions to automatically produce a
“good” potential function. We analyse the bias induced by finite horizons in
the context of PBRS producing novel insights. Finally, to asses sample
efficiency and performance impact, we evaluate our approach on four
environments including a goal-oriented navigation task and three Arcade
Learning Environments (ALE) games demonstrating that we can reach the same
level of performance as CNN-based solutions with a simple fully-connected
network.
[LINK]
http://arxiv.org/abs/2404.07826v1
[DATE]
2024-04-11 23:09:49+08:00
[CATEGORIES]
cs.LG
Group Decision-Making among Privacy-Aware Agents
[AUTHORS]
Marios Papachristou, M. Amin Rahimian
[ABSTRACT]
How can individuals exchange information to learn from each other despite
their privacy needs and security concerns? For example, consider individuals
deliberating a contentious topic and being concerned about divulging their
private experiences. Preserving individual privacy and enabling efficient
social learning are both important desiderata but seem fundamentally at odds
with each other and very hard to reconcile. We do so by controlling information
leakage using rigorous statistical guarantees that are based on differential
privacy (DP). Our agents use log-linear rules to update their beliefs after
communicating with their neighbors. Adding DP randomization noise to beliefs
provides communicating agents with plausible deniability with regard to their
private information and their network neighborhoods. We consider two learning
environments one for distributed maximum-likelihood estimation given a finite
number of private signals and another for online learning from an infinite,
intermittent signal stream. Noisy information aggregation in the finite case
leads to interesting tradeoffs between rejecting low-quality states and making
sure all high-quality states are accepted in the algorithm output. Our results
flesh out the nature of the trade-offs in both cases between the quality of the
group decision outcomes, learning accuracy, communication cost, and the level
of privacy protections that the agents are afforded.
[LINK]
http://arxiv.org/abs/2402.08156v4
[DATE]
2024-04-11 22:59:39+08:00
[CATEGORIES]
cs.LG
Post-Hoc Reversal: Are We Selecting Models Prematurely?
[AUTHORS]
Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Chase Lipton
[ABSTRACT]
Trained models are often composed with post-hoc transforms such as
temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to
improve performance, robustness, uncertainty estimation, etc. However, such
transforms are typically applied only after the base models have already been
finalized by standard means. In this paper, we challenge this practice with an
extensive empirical study. In particular, we demonstrate a phenomenon that we
call post-hoc reversal, where performance trends are reversed after applying
these post-hoc transforms. This phenomenon is especially prominent in
high-noise settings. For example, while base models overfit badly early in
training, both conventional ensembling and SWA favor base models trained for
more epochs. Post-hoc reversal can also suppress the appearance of double
descent and mitigate mismatches between test loss and test error seen in base
models. Based on our findings, we propose post-hoc selection, a simple
technique whereby post-hoc metrics inform model development decisions such as
early stopping, checkpointing, and broader hyperparameter choices. Our
experimental analyses span real-world vision, language, tabular and graph
datasets from domains like satellite imaging, language modeling, census
prediction and social network analysis. On an LLM instruction tuning dataset,
post-hoc selection results in > 1.5x MMLU improvement compared to naive
selection. Code is available at
https://github.com/rishabh-ranjan/post-hoc-reversal.
[COMMENTS]
9 pages + references + appendix, 7 figures
[LINK]
http://arxiv.org/abs/2404.07815v1
[DATE]
2024-04-11 22:58:19+08:00
[CATEGORIES]
cs.LG
Enhancing Data Efficiency and Feature Identification for Lithium-Ion Battery Lifespan Prediction by Deciphering Interpretation of Temporal Patterns and Cyclic Variability Using Attention-Based Models
[AUTHORS]
Jaewook Lee, Seongmin Heo, Jay H. Lee
[ABSTRACT]
Accurately predicting the lifespan of lithium-ion batteries is crucial for
optimizing operational strategies and mitigating risks. While numerous studies
have aimed at predicting battery lifespan, few have examined the
interpretability of their models or how such insights could improve
predictions. Addressing this gap, we introduce three innovative models that
integrate shallow attention layers into a foundational model from our previous
work, which combined elements of recurrent and convolutional neural networks.
Utilizing a well-known public dataset, we showcase our methodology’s
effectiveness. Temporal attention is applied to identify critical timesteps and
highlight differences among test cell batches, particularly underscoring the
significance of the “rest” phase. Furthermore, by applying cyclic attention via
self-attention to context vectors, our approach effectively identifies key
cycles, enabling us to strategically decrease the input size for quicker
predictions. Employing both single- and multi-head attention mechanisms, we
have systematically minimized the required input from 100 to 50 and then to 30
cycles, refining this process based on cyclic attention scores. Our refined
model exhibits strong regression capabilities, accurately forecasting the
initiation of rapid capacity fade with an average deviation of only 58 cycles
by analyzing just the initial 30 cycles of easily accessible input data.
[LINK]
http://arxiv.org/abs/2311.10792v3
[DATE]
2024-04-11 22:24:24+08:00
[CATEGORIES]
cs.LG
Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting
[AUTHORS]
Nathaniel Dean, Dilip Sarkar
[ABSTRACT]
Overparameterized deep neural networks (DNNs), if not sufficiently
regularized, are susceptible to overfitting their training examples and not
generalizing well to test data. To discourage overfitting, researchers have
developed multicomponent loss functions that reduce intra-class feature
correlation and maximize inter-class feature distance in one or more layers of
the network. By analyzing the penultimate feature layer activations output by a
DNN’s feature extraction section prior to the linear classifier, we find that
modified forms of the intra-class feature covariance and inter-class prototype
separation are key components of a fundamental Chebyshev upper bound on the
probability of misclassification, which we designate the Chebyshev Prototype
Risk (CPR). While previous approaches’ covariance loss terms scale
quadratically with the number of network features, our CPR bound indicates that
an approximate covariance loss in log-linear time is sufficient to reduce the
bound and is scalable to large architectures. We implement the terms of the CPR
bound into our Explicit CPR (exCPR) loss function and observe from empirical
results on multiple datasets and network architectures that our training
algorithm reduces overfitting and improves upon previous approaches in many
settings. Our code is available at
https://github.com/Deano1718/Regularization_exCPR .
[COMMENTS]
17 pages, 2 figures
[LINK]
http://arxiv.org/abs/2404.07083v2
[DATE]
2024-04-11 22:21:32+08:00
[CATEGORIES]
cs.LG
Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts for Language-Guided Robot Manipulation
[AUTHORS]
Namasivayam Kalithasan, Sachit Sachdeva, Himanshu Gaurav Singh, Divyanshu Aggarwal, Gurarmaan Singh Panjeta, Vishal Bindal, Arnav Tuli, Rohan Paul, Parag Singla
[ABSTRACT]
Our goal is to build embodied agents that can learn inductively generalizable
spatial concepts in a continual manner, e.g, constructing a tower of a given
height. Existing work suffers from certain limitations (a) (Liang et al., 2023)
and their multi-modal extensions, rely heavily on prior knowledge and are not
grounded in the demonstrations (b) (Liu et al., 2023) lack the ability to
generalize due to their purely neural approach. A key challenge is to achieve a
fine balance between symbolic representations which have the capability to
generalize, and neural representations that are physically grounded. In
response, we propose a neuro-symbolic approach by expressing inductive concepts
as symbolic compositions over grounded neural concepts. Our key insight is to
decompose the concept learning problem into the following steps 1) Sketch:
Getting a programmatic representation for the given instruction 2) Plan:
Perform Model-Based RL over the sequence of grounded neural action concepts to
learn a grounded plan 3) Generalize: Abstract out a generic (lifted) Python
program to facilitate generalizability. Continual learning is achieved by
interspersing learning of grounded neural concepts with higher level symbolic
constructs. Our experiments demonstrate that our approach significantly
outperforms existing baselines in terms of its ability to learn novel concepts
and generalize inductively.
[LINK]
http://arxiv.org/abs/2404.07774v1
[DATE]
2024-04-11 22:09:41+08:00
[CATEGORIES]
cs.LG
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
[AUTHORS]
Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang
[ABSTRACT]
Diffusion models, a powerful and universal generative AI technology, have
achieved tremendous success in computer vision, audio, reinforcement learning,
and computational biology. In these applications, diffusion models provide
flexible high-dimensional data modeling, and act as a sampler for generating
new samples under active guidance towards task-desired properties. Despite the
significant empirical success, theory of diffusion models is very limited,
potentially slowing down principled methodological innovations for further
harnessing and improving diffusion models. In this paper, we review emerging
applications of diffusion models, understanding their sample generation under
various controls. Next, we overview the existing theories of diffusion models,
covering their statistical properties and sampling capabilities. We adopt a
progressive routine, beginning with unconditional diffusion models and
connecting to conditional counterparts. Further, we review a new avenue in
high-dimensional structured optimization through conditional diffusion models,
where searching for solutions is reformulated as a conditional sampling problem
and solved by diffusion models. Lastly, we discuss future directions about
diffusion models. The purpose of this paper is to provide a well-rounded
theoretical exposure for stimulating forward-looking theories and methods of
diffusion models.
[LINK]
http://arxiv.org/abs/2404.07771v1
[DATE]
2024-04-11 22:07:25+08:00
[CATEGORIES]
cs.LG
Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models – Technical Challenges and Implications for Monitoring and Verification
[AUTHORS]
Tuong Vy Nguyen, Alexander Glaser, Felix Biessmann
[ABSTRACT]
Novel deep-learning (DL) architectures have reached a level where they can
generate digital media, including photorealistic images, that are difficult to
distinguish from real data. These technologies have already been used to
generate training data for Machine Learning (ML) models, and large
text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving
remarkable results in realistic high-resolution image generation. Given these
developments, issues of data authentication in monitoring and verification
deserve a careful and systematic analysis: How realistic are synthetic images?
How easily can they be generated? How useful are they for ML researchers, and
what is their potential for Open Science? In this work, we use novel DL models
to explore how synthetic satellite images can be created using conditioning
mechanisms. We investigate the challenges of synthetic satellite image
generation and evaluate the results based on authenticity and state-of-the-art
metrics. Furthermore, we investigate how synthetic data can alleviate the lack
of data in the context of ML methods for remote-sensing. Finally we discuss
implications of synthetic satellite imagery in the context of monitoring and
verification.
[COMMENTS]
https://resources.inmm.org/annual-meeting-proceedings/generating-synthetic-satellite-imagery-deep-learning-text-image-models
[LINK]
http://arxiv.org/abs/2404.07754v1
[DATE]
2024-04-11 22:00:20+08:00
[CATEGORIES]
cs.LG
3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces
[AUTHORS]
Xuanming Cao, Chengyu Tao, Juan Du
[ABSTRACT]
The surface quality inspection of manufacturing parts based on 3D point cloud
data has attracted increasing attention in recent years. The reason is that the
3D point cloud can capture the entire surface of manufacturing parts, unlike
the previous practices that focus on some key product characteristics. However,
achieving accurate 3D anomaly detection is challenging, due to the complex
surfaces of manufacturing parts and the difficulty of collecting sufficient
anomaly samples. To address these challenges, we propose a novel untrained
anomaly detection method based on 3D point cloud data for complex manufacturing
parts, which can achieve accurate anomaly detection in a single sample without
training data. In the proposed framework, we transform an input sample into two
sets of profiles along different directions. Based on one set of the profiles,
a novel segmentation module is devised to segment the complex surface into
multiple basic and simple components. In each component, another set of
profiles, which have the nature of similar shapes, can be modeled as a low-rank
matrix. Thus, accurate 3D anomaly detection can be achieved by using Robust
Principal Component Analysis (RPCA) on these low-rank matrices. Extensive
numerical experiments on different types of parts show that our method achieves
promising results compared with the benchmark methods.
[LINK]
http://arxiv.org/abs/2404.07748v1
[DATE]
2024-04-11 21:46:05+08:00
[CATEGORIES]
cs.LG
A Deep Learning Method for Simultaneous Denoising and Missing Wedge Reconstruction in Cryogenic Electron Tomography
[AUTHORS]
Simon Wiedemann, Reinhard Heckel
[ABSTRACT]
Cryogenic electron tomography is a technique for imaging biological samples
in 3D. A microscope collects a series of 2D projections of the sample, and the
goal is to reconstruct the 3D density of the sample called the tomogram.
Reconstruction is difficult as the 2D projections are noisy and can not be
recorded from all directions, resulting in a missing wedge of information.
Tomograms conventionally reconstructed with filtered back-projection suffer
from noise and strong artifacts due to the missing wedge. Here, we propose a
deep-learning approach for simultaneous denoising and missing wedge
reconstruction called DeepDeWedge. The algorithm requires no ground truth data
and is based on fitting a neural network to the 2D projections using a
self-supervised loss. DeepDeWedge performs better than CryoCARE and IsoNet,
which are state-of-the-art methods for denoising and missing wedge
reconstruction, and similarly and, in some cases, better than the combination
of the two methods. At the same time, DeepDeWedge is simpler than this two-step
approach, as it does denoising and missing wedge reconstruction simultaneously
rather than sequentially.
[LINK]
http://arxiv.org/abs/2311.05539v2
[DATE]
2024-04-11 21:39:18+08:00
[CATEGORIES]
cs.LG
Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits
[AUTHORS]
Ayush Sawarni, Nirjhar Das, Siddharth Barman, Gaurav Sinha
[ABSTRACT]
We study the generalized linear contextual bandit problem within the
requirements of limited adaptivity. In this paper, we present two algorithms,
B-GLinCB and RS-GLinCB, that address, respectively, two prevalent limited
adaptivity models: batch learning with stochastic contexts and rare policy
switches with adversarial contexts. For both these models, we establish
essentially tight regret bounds. Notably, in the obtained bounds, we manage to
eliminate a dependence on a key parameter $\kappa$, which captures the
non-linearity of the underlying reward model. For our batch learning algorithm
B-GLinCB, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales
as $\tilde{O}(\sqrt{T})$. Further, we establish that our rarely switching
algorithm RS-GLinCB updates its policy at most $\tilde{O}(\log^2 T)$ times and
achieves a regret of $\tilde{O}(\sqrt{T})$. Our approach for removing the
dependence on $\kappa$ for generalized linear contextual bandits might be of
independent interest.
[COMMENTS]
31 pages
[LINK]
http://arxiv.org/abs/2404.06831v2
[DATE]
2024-04-11 21:38:13+08:00
[CATEGORIES]
cs.LG
Monte Carlo Tree Search with Boltzmann Exploration
[AUTHORS]
Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda
[ABSTRACT]
Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound
applied to Trees (UCT), are instrumental to automated planning techniques.
However, UCT can be slow to explore an optimal action when it initially appears
inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the
maximum entropy principle into an MCTS approach, utilising Boltzmann policies
to sample actions, naturally encouraging more exploration. In this paper, we
highlight a major limitation of MENTS: optimal actions for the maximum entropy
objective do not necessarily correspond to optimal actions for the original
objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and
Decaying ENtropy Tree-Search (DENTS), that address these limitations and
preserve the benefits of Boltzmann policies, such as allowing actions to be
sampled faster by using the Alias method. Our empirical analysis shows that our
algorithms show consistent high performance across several benchmark domains,
including the game of Go.
[COMMENTS]
Camera ready version of NeurIPS2023 paper
[LINK]
http://arxiv.org/abs/2404.07732v1
[DATE]
2024-04-11 21:25:35+08:00
[CATEGORIES]
cs.LG
Realistic Continual Learning Approach using Pre-trained Models
[AUTHORS]
Nadia Nasri, Carlos Gutiérrez-Álvarez, Sergio Lafuente-Arroyo, Saturnino Maldonado-Bascón, Roberto J. López-Sastre
[ABSTRACT]
Continual learning (CL) is crucial for evaluating adaptability in learning
solutions to retain knowledge. Our research addresses the challenge of
catastrophic forgetting, where models lose proficiency in previously learned
tasks as they acquire new ones. While numerous solutions have been proposed,
existing experimental setups often rely on idealized class-incremental learning
scenarios. We introduce Realistic Continual Learning (RealCL), a novel CL
paradigm where class distributions across tasks are random, departing from
structured setups.
We also present CLARE (Continual Learning Approach with pRE-trained models
for RealCL scenarios), a pre-trained model-based solution designed to integrate
new knowledge while preserving past learning. Our contributions include
pioneering RealCL as a generalization of traditional CL setups, proposing CLARE
as an adaptable approach for RealCL tasks, and conducting extensive experiments
demonstrating its effectiveness across various RealCL scenarios. Notably, CLARE
outperforms existing models on RealCL benchmarks, highlighting its versatility
and robustness in unpredictable learning environments.
[LINK]
http://arxiv.org/abs/2404.07729v1
[DATE]
2024-04-11 21:19:46+08:00
[CATEGORIES]
cs.LG
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
[AUTHORS]
Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, Jaakko Lehtinen
[ABSTRACT]
Guidance is a crucial technique for extracting the best performance out of
image-generating diffusion models. Traditionally, a constant guidance weight
has been applied throughout the sampling chain of an image. We show that
guidance is clearly harmful toward the beginning of the chain (high noise
levels), largely unnecessary toward the end (low noise levels), and only
beneficial in the middle. We thus restrict it to a specific range of noise
levels, improving both the inference speed and result quality. This limited
guidance interval improves the record FID in ImageNet-512 significantly, from
1.81 to 1.40. We show that it is quantitatively and qualitatively beneficial
across different sampler parameters, network architectures, and datasets,
including the large-scale setting of Stable Diffusion XL. We thus suggest
exposing the guidance interval as a hyperparameter in all diffusion models that
use guidance.
[LINK]
http://arxiv.org/abs/2404.07724v1
[DATE]
2024-04-11 21:16:47+08:00
[CATEGORIES]
cs.LG
AdvNF: Reducing Mode Collapse in Conditional Normalising Flows using Adversarial Learning
[AUTHORS]
Vikas Kanaujia, Mathias S. Scheurer, Vipul Arora
[ABSTRACT]
Deep generative models complement Markov-chain-Monte-Carlo methods for
efficiently sampling from high-dimensional distributions. Among these methods,
explicit generators, such as Normalising Flows (NFs), in combination with the
Metropolis Hastings algorithm have been extensively applied to get unbiased
samples from target distributions. We systematically study central problems in
conditional NFs, such as high variance, mode collapse and data efficiency. We
propose adversarial training for NFs to ameliorate these problems. Experiments
are conducted with low-dimensional synthetic datasets and XY spin models in two
spatial dimensions.
[COMMENTS]
29 pages, submitted to Scipost Physics
[LINK]
http://arxiv.org/abs/2401.15948v2
[DATE]
2024-04-11 21:07:04+08:00
[CATEGORIES]
cs.LG
Deep Learning for Satellite Image Time Series Analysis: A Review
[AUTHORS]
Lynn Miller, Charlotte Pelletier, Geoffrey I. Webb
[ABSTRACT]
Earth observation (EO) satellite missions have been providing detailed images
about the state of the Earth and its land cover for over 50 years. Long term
missions, such as NASA’s Landsat, Terra, and Aqua satellites, and more
recently, the ESA’s Sentinel missions, record images of the entire world every
few days. Although single images provide point-in-time data, repeated images of
the same area, or satellite image time series (SITS) provide information about
the changing state of vegetation and land use. These SITS are useful for
modeling dynamic processes and seasonal changes such as plant phenology. They
have potential benefits for many aspects of land and natural resource
management, including applications in agricultural, forest, water, and disaster
management, urban planning, and mining. However, the resulting satellite image
time series (SITS) are complex, incorporating information from the temporal,
spatial, and spectral dimensions. Therefore, deep learning methods are often
deployed as they can analyze these complex relationships. This review presents
a summary of the state-of-the-art methods of modelling environmental,
agricultural, and other Earth observation variables from SITS data using deep
learning methods. We aim to provide a resource for remote sensing experts
interested in using deep learning techniques to enhance Earth observation
models with temporal information.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2404.03936v2
[DATE]
2024-04-11 21:02:58+08:00
[CATEGORIES]
cs.LG
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
[AUTHORS]
Shiming Chen, Wenjin Hou, Salman Khan, Fahad Shahbaz Khan
[ABSTRACT]
Zero-shot learning (ZSL) recognizes the unseen classes by conducting
visual-semantic interactions to transfer semantic knowledge from seen classes
to unseen ones, supported by semantic information (e.g., attributes). However,
existing ZSL methods simply extract visual features using a pre-trained network
backbone (i.e., CNN or ViT), which fail to learn matched visual-semantic
correspondences for representing semantic-related visual features as lacking of
the guidance of semantic information, resulting in undesirable visual-semantic
interactions. To tackle this issue, we propose a progressive semantic-guided
vision transformer for zero-shot learning (dubbed ZSLViT). ZSLViT mainly
considers two properties in the whole network: i) discover the semantic-related
visual representations explicitly, and ii) discard the semantic-unrelated
visual information. Specifically, we first introduce semantic-embedded token
learning to improve the visual-semantic correspondences via semantic
enhancement and discover the semantic-related visual tokens explicitly with
semantic-guided token attention. Then, we fuse low semantic-visual
correspondence visual tokens to discard the semantic-unrelated visual
information for visual enhancement. These two operations are integrated into
various encoders to progressively learn semantic-related visual representations
for accurate visual-semantic interactions in ZSL. The extensive experiments
show that our ZSLViT achieves significant performance gains on three popular
benchmark datasets, i.e., CUB, SUN, and AWA2.
[COMMENTS]
Accepted to CVPR’24
[LINK]
http://arxiv.org/abs/2404.07713v1
[DATE]
2024-04-11 20:59:38+08:00
[CATEGORIES]
cs.LG
Interactive Ontology Matching with Cost-Efficient Learning
[AUTHORS]
Bin Cheng, Jonathan Fürst, Tobias Jacobs, Celia Garrido-Hidalgo
[ABSTRACT]
The creation of high-quality ontologies is crucial for data integration and
knowledge-based reasoning, specifically in the context of the rising data
economy. However, automatic ontology matchers are often bound to the heuristics
they are based on, leaving many matches unidentified. Interactive ontology
matching systems involving human experts have been introduced, but they do not
solve the fundamental issue of flexibly finding additional matches outside the
scope of the implemented heuristics, even though this is highly demanded in
industrial settings. Active machine learning methods appear to be a promising
path towards a flexible interactive ontology matcher. However, off-the-shelf
active learning mechanisms suffer from low query efficiency due to extreme
class imbalance, resulting in a last-mile problem where high human effort is
required to identify the remaining matches.
To address the last-mile problem, this work introduces DualLoop, an active
learning method tailored to ontology matching. DualLoop offers three main
contributions: (1) an ensemble of tunable heuristic matchers, (2) a short-term
learner with a novel query strategy adapted to highly imbalanced data, and (3)
long-term learners to explore potential matches by creating and tuning new
heuristics. We evaluated DualLoop on three datasets of varying sizes and
domains. Compared to existing active learning methods, we consistently achieved
better F1 scores and recall, reducing the expected query cost spent on finding
90% of all matches by over 50%. Compared to traditional interactive ontology
matchers, we are able to find additional, last-mile matches. Finally, we detail
the successful deployment of our approach within an actual product and report
its operational performance results within the Architecture, Engineering, and
Construction (AEC) industry sector, showcasing its practical value and
efficiency.
[LINK]
http://arxiv.org/abs/2404.07663v1
[DATE]
2024-04-11 19:53:14+08:00
[CATEGORIES]
cs.LG
PINNACLE: PINN Adaptive ColLocation and Experimental points selection
[AUTHORS]
Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng, Bryan Kian Hsiang Low
[ABSTRACT]
Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft
constraints, train with a composite loss function that contains multiple
training point types: different types of collocation points chosen during
training to enforce each PDE and initial/boundary conditions, and experimental
points which are usually costly to obtain via experiments or simulations.
Training PINNs using this loss function is challenging as it typically requires
selecting large numbers of points of different types, each with different
training dynamics. Unlike past works that focused on the selection of either
collocation or experimental points, this work introduces PINN Adaptive
ColLocation and Experimental points selection (PINNACLE), the first algorithm
that jointly optimizes the selection of all training point types, while
automatically adjusting the proportion of collocation point types as training
progresses. PINNACLE uses information on the interaction among training point
types, which had not been considered before, based on an analysis of PINN
training dynamics via the Neural Tangent Kernel (NTK). We theoretically show
that the criterion used by PINNACLE is related to the PINN generalization
error, and empirically demonstrate that PINNACLE is able to outperform existing
point selection methods for forward, inverse, and transfer learning problems.
[COMMENTS]
Accepted to 12th International Conference on Learning Representations
(ICLR 2024), 36 pages
[LINK]
http://arxiv.org/abs/2404.07662v1
[DATE]
2024-04-11 19:51:46+08:00
[CATEGORIES]
cs.LG
Robust performance metrics for imbalanced classification problems
[AUTHORS]
Hajo Holzmann, Bernhard Klar
[ABSTRACT]
We show that established performance metrics in binary classification, such
as the F-score, the Jaccard similarity coefficient or Matthews’ correlation
coefficient (MCC), are not robust to class imbalance in the sense that if the
proportion of the minority class tends to $0$, the true positive rate (TPR) of
the Bayes classifier under these metrics tends to $0$ as well. Thus, in
imbalanced classification problems, these metrics favour classifiers which
ignore the minority class. To alleviate this issue we introduce robust
modifications of the F-score and the MCC for which, even in strongly imbalanced
settings, the TPR is bounded away from $0$. We numerically illustrate the
behaviour of the various performance metrics in simulations as well as on a
credit default data set. We also discuss connections to the ROC and
precision-recall curves and give recommendations on how to combine their usage
with performance metrics.
[LINK]
http://arxiv.org/abs/2404.07661v1
[DATE]
2024-04-11 19:50:05+08:00
[CATEGORIES]
cs.LG
Efficient Online Unlearning via Hessian-Free Recollection of Individual Data Statistics
[AUTHORS]
Xinbao Qiao, Meng Zhang, Ming Tang, Ermin Wei
[ABSTRACT]
Machine unlearning strives to uphold the data owners’ right to be forgotten
by enabling models to selectively forget specific data. Recent methods suggest
that one approach of data forgetting is by precomputing and storing statistics
carrying second-order information to improve computational and memory
efficiency. However, they rely on restrictive assumptions and the
computation/storage suffer from the curse of model parameter dimensionality,
making it challenging to apply to most deep neural networks. In this work, we
propose a Hessian-free online unlearning method. We propose to maintain a
statistical vector for each data point, computed through affine stochastic
recursion approximation of the difference between retrained and learned models.
Our proposed algorithm achieves near-instantaneous online unlearning as it only
requires a vector addition operation. Based on the strategy that recollecting
statistics for forgetting data, the proposed method significantly reduces the
unlearning runtime. Experimental studies demonstrate that the proposed scheme
surpasses existing results by orders of magnitude in terms of time and memory
costs, while also enhancing accuracy.
[COMMENTS]
25 pages, 8 figures
[LINK]
http://arxiv.org/abs/2404.01712v2
[DATE]
2024-04-11 19:42:34+08:00
[CATEGORIES]
cs.LG
Risk Estimation in a Markov Cost Process: Lower and Upper Bounds
[AUTHORS]
Gugan Thoppe, L. A. Prashanth, Sanjay Bhat
[ABSTRACT]
We tackle the problem of estimating risk measures of the infinite-horizon
discounted cost within a Markov cost process. The risk measures we study
include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR).
First, we show that estimating any of these risk measures with
$\epsilon$-accuracy, either in expected or high-probability sense, requires at
least $\Omega(1/\epsilon^2)$ samples. Then, using a truncation scheme, we
derive an upper bound for the CVaR and variance estimation. This bound matches
our lower bound up to logarithmic factors. Finally, we discuss an extension of
our estimation scheme that covers more general risk measures satisfying a
certain continuity criterion, e.g., spectral risk measures, utility-based
shortfall risk. To the best of our knowledge, our work is the first to provide
lower and upper bounds for estimating any risk measure beyond the mean within a
Markovian setting. Our lower bounds also extend to the infinite-horizon
discounted costs’ mean. Even in that case, our lower bound of
$\Omega(1/\epsilon^2) $ improves upon the existing $\Omega(1/\epsilon)$ bound
[13].
[LINK]
http://arxiv.org/abs/2310.11389v2
[DATE]
2024-04-11 18:18:34+08:00
[CATEGORIES]
cs.LG
The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health
[AUTHORS]
M. Jaleed Khan, Ioana Duta, Beth Albert, William Cooke, Manu Vatish, Gabriel Davis Jones
[ABSTRACT]
The rapid advancement of Artificial Intelligence (AI) in healthcare presents
a unique opportunity for advancements in obstetric care, particularly through
the analysis of cardiotocography (CTG) for fetal monitoring. However, the
effectiveness of such technologies depends upon the availability of large,
high-quality datasets that are suitable for machine learning. This paper
introduces the Oxford Maternity (OxMat) dataset, the world’s largest curated
dataset of CTGs, featuring raw time series CTG data and extensive clinical data
for both mothers and babies, which is ideally placed for machine learning. The
OxMat dataset addresses the critical gap in women’s health data by providing
over 177,211 unique CTG recordings from 51,036 pregnancies, carefully curated
and reviewed since 1991. The dataset also comprises over 200 antepartum,
intrapartum and postpartum clinical variables, ensuring near-complete data for
crucial outcomes such as stillbirth and acidaemia. While this dataset also
covers the intrapartum stage, around 94% of the constituent CTGS are
antepartum. This allows for a unique focus on the underserved antepartum
period, in which early detection of at-risk fetuses can significantly improve
health outcomes. Our comprehensive review of existing datasets reveals the
limitations of current datasets: primarily, their lack of sufficient volume,
detailed clinical data and antepartum data. The OxMat dataset lays a foundation
for future AI-driven prenatal care, offering a robust resource for developing
and testing algorithms aimed at improving maternal and fetal health outcomes.
[LINK]
http://arxiv.org/abs/2404.08024v1
[DATE]
2024-04-11 17:52:39+08:00
[CATEGORIES]
cs.LG
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
[AUTHORS]
Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess
[ABSTRACT]
We investigate whether Deep Reinforcement Learning (Deep RL) is able to
synthesize sophisticated and safe movement skills for a low-cost, miniature
humanoid robot that can be composed into complex behavioral strategies in
dynamic environments. We used Deep RL to train a humanoid robot with 20
actuated joints to play a simplified one-versus-one (1v1) soccer game. The
resulting agent exhibits robust and dynamic movement skills such as rapid fall
recovery, walking, turning, kicking and more; and it transitions between them
in a smooth, stable, and efficient manner. The agent’s locomotion and tactical
behavior adapts to specific game contexts in a way that would be impractical to
manually design. The agent also developed a basic strategic understanding of
the game, and learned, for instance, to anticipate ball movements and to block
opponent shots. Our agent was trained in simulation and transferred to real
robots zero-shot. We found that a combination of sufficiently high-frequency
control, targeted dynamics randomization, and perturbations during training in
simulation enabled good-quality transfer. Although the robots are inherently
fragile, basic regularization of the behavior during training led the robots to
learn safe and effective movements while still performing in a dynamic and
agile way – well beyond what is intuitively expected from the robot. Indeed,
in experiments, they walked 181% faster, turned 302% faster, took 63% less time
to get up, and kicked a ball 34% faster than a scripted baseline, while
efficiently combining the skills to achieve the longer term objectives.
[COMMENTS]
Project website: https://sites.google.com/view/op3-soccer
[LINK]
http://arxiv.org/abs/2304.13653v2
[DATE]
2024-04-11 17:50:07+08:00
[CATEGORIES]
cs.LG
Cell-Free Multi-User MIMO Equalization via In-Context Learning
[AUTHORS]
Matteo Zecchin, Kai Yu, Osvaldo Simeone
[ABSTRACT]
Large pre-trained sequence models, such as transformers, excel as few-shot
learners capable of in-context learning (ICL). In ICL, a model is trained to
adapt its operation to a new task based on limited contextual information,
typically in the form of a few training examples for the given task. Previous
work has explored the use of ICL for channel equalization in single-user
multi-input and multiple-output (MIMO) systems. In this work, we demonstrate
that ICL can be also used to tackle the problem of multi-user equalization in
cell-free MIMO systems with limited fronthaul capacity. In this scenario, a
task is defined by channel statistics, signal-to-noise ratio, and modulation
schemes. The context encompasses the users’ pilot sequences, the corresponding
quantized received signals, and the current received data signal. Different
prompt design strategies are proposed and evaluated that encompass also
large-scale fading and modulation information. Experiments demonstrate that
ICL-based equalization provides estimates with lower mean squared error as
compared to the linear minimum mean squared error equalizer, especially in the
presence of limited fronthaul capacity and pilot contamination.
[LINK]
http://arxiv.org/abs/2404.05538v2
[DATE]
2024-04-11 17:45:13+08:00
[CATEGORIES]
cs.LG
On adversarial training and the 1 Nearest Neighbor classifier
[AUTHORS]
Amir Hagai, Yair Weiss
[ABSTRACT]
The ability to fool deep learning classifiers with tiny perturbations of the
input has lead to the development of adversarial training in which the loss
with respect to adversarial examples is minimized in addition to the training
examples. While adversarial training improves the robustness of the learned
classifiers, the procedure is computationally expensive, sensitive to
hyperparameters and may still leave the classifier vulnerable to other types of
small perturbations. In this paper we analyze the adversarial robustness of the
1 Nearest Neighbor (1NN) classifier and compare its performance to adversarial
training. We prove that under reasonable assumptions, the 1 NN classifier will
be robust to {\em any} small image perturbation of the training images and will
give high adversarial accuracy on test images as the number of training
examples goes to infinity. In experiments with 45 different binary image
classification problems taken from CIFAR10, we find that 1NN outperform TRADES
(a powerful adversarial training algorithm) in terms of average adversarial
accuracy. In additional experiments with 69 pretrained robust models for
CIFAR10, we find that 1NN outperforms almost all of them in terms of robustness
to perturbations that are only slightly different from those seen during
training. Taken together, our results suggest that modern adversarial training
methods still fall short of the robustness of the simple 1NN classifier. our
code can be found at
https://github.com/amirhagai/On-Adversarial-Training-And-The-1-Nearest-Neighbor-Classifier
[LINK]
http://arxiv.org/abs/2404.06313v2
[DATE]
2024-04-11 17:27:12+08:00
[CATEGORIES]
cs.LG
Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization
[AUTHORS]
Olatunji Mumini Omisore, Toluwanimi Akinyemi, Anh Nguyen, Lei Wang
[ABSTRACT]
Although robot-assisted cardiovascular catheterization is commonly performed
for intervention of cardiovascular diseases, more studies are needed to support
the procedure with automated tool segmentation. This can aid surgeons on tool
tracking and visualization during intervention. Learning-based segmentation has
recently offered state-of-the-art segmentation performances however, generating
ground-truth signals for fully-supervised methods is labor-intensive and time
consuming for the interventionists. In this study, a weakly-supervised learning
method with multi-lateral pseudo labeling is proposed for tool segmentation in
cardiac angiograms. The method includes a modified U-Net model with one encoder
and multiple lateral-branched decoders that produce pseudo labels as
supervision signals under different perturbation. The pseudo labels are
self-generated through a mixed loss function and shared consistency in the
decoders. We trained the model end-to-end with weakly-annotated data obtained
during robotic cardiac catheterization. Experiments with the proposed model
shows weakly annotated data has closer performance to when fully annotated data
is used. Compared to three existing weakly-supervised methods, our approach
yielded higher segmentation performance across three different cardiac
angiogram data. With ablation study, we showed consistent performance under
different parameters. Thus, we offer a less expensive method for real-time tool
segmentation and tracking during robot-assisted cardiac catheterization.
[LINK]
http://arxiv.org/abs/2404.07594v1
[DATE]
2024-04-11 17:23:44+08:00
[CATEGORIES]
cs.LG
Diffusion posterior sampling for simulation-based inference in tall data settings
[AUTHORS]
Julia Linhart, Gabriel Victorino Cardoso, Alexandre Gramfort, Sylvain Le Corff, Pedro L. C. Rodrigues
[ABSTRACT]
Determining which parameters of a non-linear model could best describe a set
of experimental data is a fundamental problem in science and it has gained much
traction lately with the rise of complex large-scale simulators (a.k.a.
black-box simulators). The likelihood of such models is typically intractable,
which is why classical MCMC methods can not be used. Simulation-based inference
(SBI) stands out in this context by only requiring a dataset of simulations to
train deep generative models capable of approximating the posterior
distribution that relates input parameters to a given observation. In this
work, we consider a tall data extension in which multiple observations are
available and one wishes to leverage their shared information to better infer
the parameters of the model. The method we propose is built upon recent
developments from the flourishing score-based diffusion literature and allows
us to estimate the tall data posterior distribution simply using information
from the score network trained on individual observations. We compare our
method to recently proposed competing approaches on various numerical
experiments and demonstrate its superiority in terms of numerical stability and
computational cost.
[COMMENTS]
38 pages, 20 figures, 3 tables, 11 appendices
[LINK]
http://arxiv.org/abs/2404.07593v1
[DATE]
2024-04-11 17:23:36+08:00
[CATEGORIES]
cs.LG
Generating Comprehensive Lithium Battery Charging Data with Generative AI
[AUTHORS]
Lidang Jiang, Changyan Hu, Sibei Ji, Hang Zhao, Junxiong Chen, Ge He
[ABSTRACT]
In optimizing performance and extending the lifespan of lithium batteries,
accurate state prediction is pivotal. Traditional regression and classification
methods have achieved some success in battery state prediction. However, the
efficacy of these data-driven approaches heavily relies on the availability and
quality of public datasets. Additionally, generating electrochemical data
predominantly through battery experiments is a lengthy and costly process,
making it challenging to acquire high-quality electrochemical data. This
difficulty, coupled with data incompleteness, significantly impacts prediction
accuracy. Addressing these challenges, this study introduces the End of Life
(EOL) and Equivalent Cycle Life (ECL) as conditions for generative AI models.
By integrating an embedding layer into the CVAE model, we developed the Refined
Conditional Variational Autoencoder (RCVAE). Through preprocessing data into a
quasi-video format, our study achieves an integrated synthesis of
electrochemical data, including voltage, current, temperature, and charging
capacity, which is then processed by the RCVAE model. Coupled with customized
training and inference algorithms, this model can generate specific
electrochemical data for EOL and ECL under supervised conditions. This method
provides users with a comprehensive electrochemical dataset, pioneering a new
research domain for the artificial synthesis of lithium battery data.
Furthermore, based on the detailed synthetic data, various battery state
indicators can be calculated, offering new perspectives and possibilities for
lithium battery performance prediction.
[LINK]
http://arxiv.org/abs/2404.07577v1
[DATE]
2024-04-11 17:08:45+08:00
[CATEGORIES]
cs.LG
Pathology-genomic fusion via biologically informed cross-modality graph learning for survival analysis
[AUTHORS]
Zeyu Zhang, Yuanshen Zhao, Jingxian Duan, Yaou Liu, Hairong Zheng, Dong Liang, Zhenyu Zhang, Zhi-Cheng Li
[ABSTRACT]
The diagnosis and prognosis of cancer are typically based on multi-modal
clinical data, including histology images and genomic data, due to the complex
pathogenesis and high heterogeneity. Despite the advancements in digital
pathology and high-throughput genome sequencing, establishing effective
multi-modal fusion models for survival prediction and revealing the potential
association between histopathology and transcriptomics remains challenging. In
this paper, we propose Pathology-Genome Heterogeneous Graph (PGHG) that
integrates whole slide images (WSI) and bulk RNA-Seq expression data with
heterogeneous graph neural network for cancer survival analysis. The PGHG
consists of biological knowledge-guided representation learning network and
pathology-genome heterogeneous graph. The representation learning network
utilizes the biological prior knowledge of intra-modal and inter-modal data
associations to guide the feature extraction. The node features of each
modality are updated through attention-based graph learning strategy. Unimodal
features and bi-modal fused features are extracted via attention pooling module
and then used for survival prediction. We evaluate the model on low-grade
gliomas, glioblastoma, and kidney renal papillary cell carcinoma datasets from
the Cancer Genome Atlas (TCGA) and the First Affiliated Hospital of Zhengzhou
University (FAHZU). Extensive experimental results demonstrate that the
proposed method outperforms both unimodal and other multi-modal fusion models.
For demonstrating the model interpretability, we also visualize the attention
heatmap of pathological images and utilize integrated gradient algorithm to
identify important tissue structure, biological pathways and key genes.
[LINK]
http://arxiv.org/abs/2404.08023v1
[DATE]
2024-04-11 17:07:40+08:00
[CATEGORIES]
cs.LG
Differentially Private Reinforcement Learning with Self-Play
[AUTHORS]
Dan Qiao, Yu-Xiang Wang
[ABSTRACT]
We study the problem of multi-agent reinforcement learning (multi-agent RL)
with differential privacy (DP) constraints. This is well-motivated by various
real-world applications involving sensitive data, where it is critical to
protect users’ private information. We first extend the definitions of Joint DP
(JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games, where
both definitions ensure trajectory-wise privacy protection. Then we design a
provably efficient algorithm based on optimistic Nash value iteration and
privatization of Bernstein-type bonuses. The algorithm is able to satisfy JDP
and LDP requirements when instantiated with appropriate privacy mechanisms.
Furthermore, for both notions of DP, our regret bound generalizes the best
known result under the single-agent RL case, while our regret could also reduce
to the best known result for multi-agent RL without privacy constraints. To the
best of our knowledge, these are the first line of results towards
understanding trajectory-wise privacy protection in multi-agent RL.
[COMMENTS]
32 pages
[LINK]
http://arxiv.org/abs/2404.07559v1
[DATE]
2024-04-11 16:42:51+08:00
[CATEGORIES]
cs.LG
Boosting Digital Safeguards: Blending Cryptography and Steganography
[AUTHORS]
Anamitra Maiti, Subham Laha, Rishav Upadhaya, Soumyajit Biswas, Vikas Chaudhary, Biplab Kar, Nikhil Kumar, Jaydip Sen
[ABSTRACT]
In today’s digital age, the internet is essential for communication and the
sharing of information, creating a critical need for sophisticated data
security measures to prevent unauthorized access and exploitation. Cryptography
encrypts messages into a cipher text that is incomprehensible to unauthorized
readers, thus safeguarding data during its transmission. Steganography, on the
other hand, originates from the Greek term for “covered writing” and involves
the art of hiding data within another medium, thereby facilitating covert
communication by making the message invisible. This proposed approach takes
advantage of the latest advancements in Artificial Intelligence (AI) and Deep
Learning (DL), especially through the application of Generative Adversarial
Networks (GANs), to improve upon traditional steganographic methods. By
embedding encrypted data within another medium, our method ensures that the
communication remains hidden from prying eyes. The application of GANs enables
a smart, secure system that utilizes the inherent sensitivity of neural
networks to slight alterations in data, enhancing the protection against
detection. By merging the encryption techniques of cryptography with the hiding
capabilities of steganography, and augmenting these with the strengths of AI,
we introduce a comprehensive security system designed to maintain both the
privacy and integrity of information. This system is crafted not just to
prevent unauthorized access or modification of data, but also to keep the
existence of the data hidden. This fusion of technologies tackles the core
challenges of data security in the current era of open digital communication,
presenting an advanced solution with the potential to transform the landscape
of information security.
[COMMENTS]
This report pertains to the Capstone Project done by Group 3 of the
Fall batch of 2023 students at Praxis Tech School, Kolkata, India. The
reports consists of 36 pages and it includes 11 figures and 5 tables
[LINK]
http://arxiv.org/abs/2404.05985v2
[DATE]
2024-04-11 16:21:27+08:00
[CATEGORIES]
cs.LG
Random Forests for time-fixed and time-dependent predictors: The DynForest R package
[AUTHORS]
Anthony Devaux, Cécile Proust-Lima, Robin Genuer
[ABSTRACT]
The R package DynForest implements random forests for predicting a
continuous, a categorical or a (multiple causes) time-to-event outcome based on
time-fixed and time-dependent predictors. The main originality of DynForest is
that it handles time-dependent predictors that can be endogeneous (i.e.,
impacted by the outcome process), measured with error and measured at
subject-specific times. At each recursive step of the tree building process,
the time-dependent predictors are internally summarized into individual
features on which the split can be done. This is achieved using flexible linear
mixed models (thanks to the R package lcmm) which specification is
pre-specified by the user. DynForest returns the mean for continuous outcome,
the category with a majority vote for categorical outcome or the cumulative
incidence function over time for survival outcome. DynForest also computes
variable importance and minimal depth to inform on the most predictive
variables or groups of variables. This paper aims to guide the user with
step-by-step examples for fitting random forests using DynForest.
[LINK]
http://arxiv.org/abs/2302.02670v2
[DATE]
2024-04-11 16:14:05+08:00
[CATEGORIES]
cs.LG
Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities
[AUTHORS]
Rocco Caprio, Juan Kuntz, Samuel Power, Adam M. Johansen
[ABSTRACT]
We prove non-asymptotic error bounds for particle gradient descent
(PGD)~(Kuntz et al., 2023), a recently introduced algorithm for maximum
likelihood estimation of large latent variable models obtained by discretizing
a gradient flow of the free energy. We begin by showing that, for models
satisfying a condition generalizing both the log-Sobolev and the
Polyak–{\L}ojasiewicz inequalities (LSI and P{\L}I, respectively), the flow
converges exponentially fast to the set of minimizers of the free energy. We
achieve this by extending a result well-known in the optimal transport
literature (that the LSI implies the Talagrand inequality) and its counterpart
in the optimization literature (that the P{\L}I implies the so-called quadratic
growth condition), and applying it to our new setting. We also generalize the
Bakry–'Emery Theorem and show that the LSI/P{\L}I generalization holds for
models with strongly concave log-likelihoods. For such models, we further
control PGD’s discretization error, obtaining non-asymptotic error bounds.
While we are motivated by the study of PGD, we believe that the inequalities
and results we extend may be of independent interest.
[LINK]
http://arxiv.org/abs/2403.02004v2
[DATE]
2024-04-11 15:54:55+08:00
[CATEGORIES]
cs.LG
Bayesian Federated Model Compression for Communication and Computation Efficiency
[AUTHORS]
Chengyu Xia, Danny H. K. Tsang, Vincent K. N. Lau
[ABSTRACT]
In this paper, we investigate Bayesian model compression in federated
learning (FL) to construct sparse models that can achieve both communication
and computation efficiencies. We propose a decentralized Turbo variational
Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a
hierarchical sparse prior to promote a clustered sparse structure in the weight
matrix. Then, by carefully integrating message passing and VBI with a
decentralized turbo framework, we propose the D-Turbo-VBI algorithm which can
(i) reduce both upstream and downstream communication overhead during federated
training, and (ii) reduce the computational complexity during local inference.
Additionally, we establish the convergence property for thr proposed
D-Turbo-VBI algorithm. Simulation results show the significant gain of our
proposed algorithm over the baselines in reducing communication overhead during
federated training and computational complexity of final model.
[LINK]
http://arxiv.org/abs/2404.07532v1
[DATE]
2024-04-11 15:51:30+08:00
[CATEGORIES]
cs.LG
S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
[AUTHORS]
Zhen Long, Qiyuan Wang, Yazhou Ren, Yipeng Liu, Ce Zhu
[ABSTRACT]
Anchor-based large-scale multi-view clustering has attracted considerable
attention for its effectiveness in handling massive datasets. However, current
methods mainly seek the consensus embedding feature for clustering by exploring
global correlations between anchor graphs or projection matrices.In this paper,
we propose a simple yet efficient scalable multi-view tensor clustering
(S^2MVTC) approach, where our focus is on learning correlations of embedding
features within and across views. Specifically, we first construct the
embedding feature tensor by stacking the embedding features of different views
into a tensor and rotating it. Additionally, we build a novel tensor
low-frequency approximation (TLFA) operator, which incorporates graph
similarity into embedding feature learning, efficiently achieving smooth
representation of embedding features within different views. Furthermore,
consensus constraints are applied to embedding features to ensure inter-view
semantic consistency. Experimental results on six large-scale multi-view
datasets demonstrate that S^2MVTC significantly outperforms state-of-the-art
algorithms in terms of clustering performance and CPU execution time,
especially when handling massive data. The code of S^2MVTC is publicly
available at https://github.com/longzhen520/S2MVTC.
[COMMENTS]
Accepted by CVPR2024
[LINK]
http://arxiv.org/abs/2403.09107v2
[DATE]
2024-04-11 15:42:43+08:00
[CATEGORIES]
cs.LG
GNN-based Probabilistic Supply and Inventory Predictions in Supply Chain Networks
[AUTHORS]
Hyung-il Ahn, Young Chol Song, Santiago Olivar, Hershel Mehta, Naveen Tewari
[ABSTRACT]
Successful supply chain optimization must mitigate imbalances between supply
and demand over time. While accurate demand prediction is essential for supply
planning, it alone does not suffice. The key to successful supply planning for
optimal and viable execution lies in maximizing predictability for both demand
and supply throughout an execution horizon. Therefore, enhancing the accuracy
of supply predictions is imperative to create an attainable supply plan that
matches demand without overstocking or understocking. However, in complex
supply chain networks with numerous nodes and edges, accurate supply
predictions are challenging due to dynamic node interactions, cascading supply
delays, resource availability, production and logistic capabilities.
Consequently, supply executions often deviate from their initial plans. To
address this, we present the Graph-based Supply Prediction (GSP) probabilistic
model. Our attention-based graph neural network (GNN) model predicts supplies,
inventory, and imbalances using graph-structured historical data, demand
forecasting, and original supply plan inputs. The experiments, conducted using
historical data from a global consumer goods company’s large-scale supply
chain, demonstrate that GSP significantly improves supply and inventory
prediction accuracy, potentially offering supply plan corrections to optimize
executions.
[LINK]
http://arxiv.org/abs/2404.07523v1
[DATE]
2024-04-11 15:36:00+08:00
[CATEGORIES]
cs.LG
Remembering Transformer for Continual Learning
[AUTHORS]
Yuwei Sun, Jun Sakuma, Ryota Kanai
[ABSTRACT]
Neural networks encounter the challenge of Catastrophic Forgetting (CF) in
continual learning, where new task knowledge interferes with previously learned
knowledge. We propose Remembering Transformer, inspired by the brain’s
Complementary Learning Systems (CLS), to tackle this issue. Remembering
Transformer employs a mixture-of-adapters and a generative model-based routing
mechanism to alleviate CF by dynamically routing task data to relevant
adapters. Our approach demonstrated a new SOTA performance in various vision
continual learning tasks and great parameter efficiency.
[LINK]
http://arxiv.org/abs/2404.07518v1
[DATE]
2024-04-11 15:22:14+08:00
[CATEGORIES]
cs.LG
Generative Probabilistic Planning for Optimizing Supply Chain Networks
[AUTHORS]
Hyung-il Ahn, Santiago Olivar, Hershel Mehta, Young Chol Song
[ABSTRACT]
Supply chain networks in enterprises are typically composed of complex
topological graphs involving various types of nodes and edges, accommodating
numerous products with considerable demand and supply variability. However, as
supply chain networks expand in size and complexity, traditional supply chain
planning methods (e.g., those found in heuristic rule-based and operations
research-based systems) tend to become locally optimal or lack computational
scalability, resulting in substantial imbalances between supply and demand
across nodes in the network. This paper introduces a novel Generative AI
technique, which we call Generative Probabilistic Planning (GPP). GPP generates
dynamic supply action plans that are globally optimized across all network
nodes over the time horizon for changing objectives like maximizing profits or
service levels, factoring in time-varying probabilistic demand, lead time, and
production conditions. GPP leverages attention-based graph neural networks
(GNN), offline deep reinforcement learning (Offline RL), and policy simulations
to train generative policy models and create optimal plans through
probabilistic simulations, effectively accounting for various uncertainties.
Our experiments using historical data from a global consumer goods company with
complex supply chain networks demonstrate that GPP accomplishes
objective-adaptable, probabilistically resilient, and dynamic planning for
supply chain networks, leading to significant improvements in performance and
profitability for enterprises. Our work plays a pivotal role in shaping the
trajectory of AI adoption within the supply chain domain.
[LINK]
http://arxiv.org/abs/2404.07511v1
[DATE]
2024-04-11 15:06:58+08:00
[CATEGORIES]
cs.LG
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
[AUTHORS]
Shuren Qi, Yushu Zhang, Chao Wang, Zhihua Xia, Xiaochun Cao, Jian Weng
[ABSTRACT]
Developing robust and interpretable vision systems is a crucial step towards
trustworthy artificial intelligence. In this regard, a promising paradigm
considers embedding task-required invariant structures, e.g., geometric
invariance, in the fundamental image representation. However, such invariant
representations typically exhibit limited discriminability, limiting their
applications in larger-scale trustworthy vision tasks. For this open problem,
we conduct a systematic investigation of hierarchical invariance, exploring
this topic from theoretical, practical, and application perspectives. At the
theoretical level, we show how to construct over-complete invariants with a
Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a
fully interpretable manner. The general blueprint, specific definitions,
invariant properties, and numerical implementations are provided. At the
practical level, we discuss how to customize this theoretical framework into a
given task. With the over-completeness, discriminative features w.r.t. the task
can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We
demonstrate the above arguments with accuracy, invariance, and efficiency
results on texture, digit, and parasite classification experiments.
Furthermore, at the application level, our representations are explored in
real-world forensics tasks on adversarial perturbations and Artificial
Intelligence Generated Content (AIGC). Such applications reveal that the
proposed strategy not only realizes the theoretically promised invariance, but
also exhibits competitive discriminability even in the era of deep learning.
For robust and interpretable vision tasks at larger scales, hierarchical
invariant representation can be considered as an effective alternative to
traditional CNN and invariants.
[LINK]
http://arxiv.org/abs/2402.15430v2
[DATE]
2024-04-11 14:40:12+08:00
[CATEGORIES]
cs.LG
Generating Counterfactual Explanations Using Cardinality Constraints
[AUTHORS]
Rubén Ruiz-Torrubiano
[ABSTRACT]
Providing explanations about how machine learning algorithms work and/or make
particular predictions is one of the main tools that can be used to improve
their trusworthiness, fairness and robustness. Among the most intuitive type of
explanations are counterfactuals, which are examples that differ from a given
point only in the prediction target and some set of features, presenting which
features need to be changed in the original example to flip the prediction for
that example. However, such counterfactuals can have many different features
than the original example, making their interpretation difficult. In this
paper, we propose to explicitly add a cardinality constraint to counterfactual
generation limiting how many features can be different from the original
example, thus providing more interpretable and easily understantable
counterfactuals.
[LINK]
http://arxiv.org/abs/2404.07502v1
[DATE]
2024-04-11 14:33:19+08:00
[CATEGORIES]
cs.LG
VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning
[AUTHORS]
Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao
[ABSTRACT]
Trajectory similarity search plays an essential role in autonomous driving,
as it enables vehicles to analyze the information and characteristics of
different trajectories to make informed decisions and navigate safely in
dynamic environments. Existing work on the trajectory similarity search task
primarily utilizes sequence-processing algorithms or Recurrent Neural Networks
(RNNs), which suffer from the inevitable issues of complicated architecture and
heavy training costs. Considering the intricate connections between
trajectories, using Graph Neural Networks (GNNs) for data modeling is feasible.
However, most methods directly use existing mathematical graph structures as
the input instead of constructing specific graphs from certain vehicle
trajectory data. This ignores such data’s unique and dynamic characteristics.
To bridge such a research gap, we propose VeTraSS – an end-to-end pipeline for
Vehicle Trajectory Similarity Search. Specifically, VeTraSS models the original
trajectory data into multi-scale graphs, and generates comprehensive embeddings
through a novel multi-layer attention-based GNN. The learned embeddings can be
used for searching similar vehicle trajectories. Extensive experiments on the
Porto and Geolife datasets demonstrate the effectiveness of VeTraSS, where our
model outperforms existing work and reaches the state-of-the-art. This
demonstrates the potential of VeTraSS for trajectory analysis and safe
navigation in self-driving vehicles in the real world.
[LINK]
http://arxiv.org/abs/2404.08021v1
[DATE]
2024-04-11 14:19:55+08:00
[CATEGORIES]
cs.LG
Model predictive control-based value estimation for efficient reinforcement learning
[AUTHORS]
Qizhen Wu, Kexin Liu, Lei Chen
[ABSTRACT]
Reinforcement learning suffers from limitations in real practices primarily
due to the number of required interactions with virtual environments. It
results in a challenging problem because we are implausible to obtain a local
optimal strategy with only a few attempts for many learning methods. Hereby, we
design an improved reinforcement learning method based on model predictive
control that models the environment through a data-driven approach. Based on
the learned environment model, it performs multi-step prediction to estimate
the value function and optimize the policy. The method demonstrates higher
learning efficiency, faster convergent speed of strategies tending to the local
optimal value, and less sample capacity space required by experience replay
buffers. Experimental results, both in classic databases and in a dynamic
obstacle avoidance scenario for an unmanned aerial vehicle, validate the
proposed approaches.
[LINK]
http://arxiv.org/abs/2310.16646v2
[DATE]
2024-04-11 14:08:45+08:00
[CATEGORIES]
cs.LG
Characterizing the Influence of Topology on Graph Learning Tasks
[AUTHORS]
Kailong Wu, Yule Xie, Jiaxin Ding, Yuxiang Ren, Luoyi Fu, Xinbing Wang, Chenghu Zhou
[ABSTRACT]
Graph neural networks (GNN) have achieved remarkable success in a wide range
of tasks by encoding features combined with topology to create effective
representations. However, the fundamental problem of understanding and
analyzing how graph topology influences the performance of learning models on
downstream tasks has not yet been well understood. In this paper, we propose a
metric, TopoInf, which characterizes the influence of graph topology by
measuring the level of compatibility between the topological information of
graph data and downstream task objectives. We provide analysis based on the
decoupled GNNs on the contextual stochastic block model to demonstrate the
effectiveness of the metric. Through extensive experiments, we demonstrate that
TopoInf is an effective metric for measuring topological influence on
corresponding tasks and can be further leveraged to enhance graph learning.
[LINK]
http://arxiv.org/abs/2404.07493v1
[DATE]
2024-04-11 14:04:06+08:00
[CATEGORIES]
cs.LG
Deep Reinforcement Learning for Traveling Purchaser Problems
[AUTHORS]
Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Yuli Zhang
[ABSTRACT]
The traveling purchaser problem (TPP) is an important combinatorial
optimization problem with broad applications. Due to the coupling between
routing and purchasing, existing works on TPPs commonly address route
construction and purchase planning simultaneously, which, however, leads to
exact methods with high computational cost and heuristics with sophisticated
design but limited performance. In sharp contrast, we propose a novel approach
based on deep reinforcement learning (DRL), which addresses route construction
and purchase planning separately, while evaluating and optimizing the solution
from a global perspective. The key components of our approach include a
bipartite graph representation for TPPs to capture the market-product
relations, and a policy network that extracts information from the bipartite
graph and uses it to sequentially construct the route. One significant benefit
of our framework is that we can efficiently construct the route using the
policy network, and once the route is determined, the associated purchasing
plan can be easily derived through linear programming, while, leveraging DRL,
we can train the policy network to optimize the global solution objective.
Furthermore, by introducing a meta-learning strategy, the policy network can be
trained stably on large-sized TPP instances, and generalize well across
instances of varying sizes and distributions, even to much larger instances
that are never seen during training. Experiments on various synthetic TPP
instances and the TPPLIB benchmark demonstrate that our DRL-based approach can
significantly outperform well-established TPP heuristics, reducing the
optimality gap by 40%-90%, and also showing an advantage in runtime, especially
on large-sized instances.
[LINK]
http://arxiv.org/abs/2404.02476v2
[DATE]
2024-04-11 14:00:27+08:00
[CATEGORIES]
cs.LG
Robust Knowledge Adaptation for Dynamic Graph Neural Networks
[AUTHORS]
Hanjie Li, Changsheng Li, Kaituo Feng, Ye Yuan, Guoren Wang, Hongyuan Zha
[ABSTRACT]
Graph structured data often possess dynamic characters in nature. Recent
years have witnessed the increasing attentions paid to dynamic graph neural
networks for modelling graph data. However, almost all existing approaches
operate under the assumption that, upon the establishment of a new link, the
embeddings of the neighboring nodes should undergo updates to learn temporal
dynamics. Nevertheless, these approaches face the following limitation: If the
node introduced by a new connection contains noisy information, propagating its
knowledge to other nodes becomes unreliable and may even lead to the collapse
of the model. In this paper, we propose Ada-DyGNN: a robust knowledge
Adaptation framework via reinforcement learning for Dynamic Graph Neural
Networks. In contrast to previous approaches, which update the embeddings of
the neighbor nodes immediately after adding a new link, Ada-DyGNN adaptively
determines which nodes should be updated. Considering that the decision to
update the embedding of one neighbor node can significantly impact other
neighbor nodes, we conceptualize the node update selection as a sequence
decision problem and employ reinforcement learning to address it effectively.
By this means, we can adaptively propagate knowledge to other nodes for
learning robust node embedding representations. To the best of our knowledge,
our approach constitutes the first attempt to explore robust knowledge
adaptation via reinforcement learning specifically tailored for dynamic graph
neural networks. Extensive experiments on three benchmark datasets demonstrate
that Ada-DyGNN achieves the state-of-the-art performance. In addition, we
conduct experiments by introducing different degrees of noise into the dataset,
quantitatively and qualitatively illustrating the robustness of Ada-DyGNN.
[COMMENTS]
14 pages, 6 figures
[LINK]
http://arxiv.org/abs/2207.10839v2
[DATE]
2024-04-11 13:46:09+08:00
[CATEGORIES]
cs.LG
Predictive Modelling of Air Quality Index (AQI) Across Diverse Cities and States of India using Machine Learning: Investigating the Influence of Punjab’s Stubble Burning on AQI Variability
[AUTHORS]
Kamaljeet Kaur Sidhu, Habeeb Balogun, Kazeem Oluwakemi Oseni
[ABSTRACT]
Air pollution is a common and serious problem nowadays and it cannot be
ignored as it has harmful impacts on human health. To address this issue
proactively, people should be aware of their surroundings, which means the
environment where they survive. With this motive, this research has predicted
the AQI based on different air pollutant concentrations in the atmosphere. The
dataset used for this research has been taken from the official website of
CPCB. The dataset has the air pollutant concentration from 22 different
monitoring stations in different cities of Delhi, Haryana, and Punjab. This
data is checked for null values and outliers. But, the most important thing to
note is the correct understanding and imputation of such values rather than
ignoring or doing wrong imputation. The time series data has been used in this
research which is tested for stationarity using The Dickey-Fuller test. Further
different ML models like CatBoost, XGBoost, Random Forest, SVM regressor, time
series model SARIMAX, and deep learning model LSTM have been used to predict
AQI. For the performance evaluation of different models, I used MSE, RMSE, MAE,
and R2. It is observed that Random Forest performed better as compared to other
models.
[LINK]
http://arxiv.org/abs/2404.08702v1
[DATE]
2024-04-11 13:03:40+08:00
[CATEGORIES]
cs.LG
The Optimal Choice of Hypothesis Is the Weakest, Not the Shortest
[AUTHORS]
Michael Timothy Bennett
[COMMENTS]
Published at the 16th Conference on Artificial General Intelligence,
Stockholm, 2023
[LINK]
http://arxiv.org/abs/2301.12987v4
[DATE]
2024-04-11 13:02:10+08:00
[CATEGORIES]
cs.LG
LLaGA: Large Language and Graph Assistant
[AUTHORS]
Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang
[ABSTRACT]
Graph Neural Networks (GNNs) have empowered the advance in graph-structured
data analysis. Recently, the rise of Large Language Models (LLMs) like GPT-4
has heralded a new era in deep learning. However, their application to graph
data poses distinct challenges due to the inherent difficulty of translating
graph structures to language. To this end, we introduce the Large Language and
Graph Assistant (LLaGA), an innovative model that effectively integrates LLM
capabilities to handle the complexities of graph-structured data. LLaGA retains
the general-purpose nature of LLMs while adapting graph data into a format
compatible with LLM input. LLaGA achieves this by reorganizing graph nodes to
structure-aware sequences and then mapping these into the token embedding space
through a versatile projector. LLaGA excels in versatility, generalizability
and interpretability, allowing it to perform consistently well across different
datasets and tasks, extend its ability to unseen datasets or tasks, and provide
explanations for graphs. Our extensive experiments across popular graph
benchmarks show that LLaGA delivers outstanding performance across four
datasets and three tasks using one single model, surpassing state-of-the-art
graph models in both supervised and zero-shot scenarios. Our code is available
at \url{https://github.com/VITA-Group/LLaGA}.
[LINK]
http://arxiv.org/abs/2402.08170v3
[DATE]
2024-04-11 13:01:12+08:00
[CATEGORIES]
cs.LG
LUCF-Net: Lightweight U-shaped Cascade Fusion Network for Medical Image Segmentation
[AUTHORS]
Songkai Sun, Qingshan She, Yuliang Ma, Rihui Li, Yingchun Zhang
[ABSTRACT]
In this study, the performance of existing U-shaped neural network
architectures was enhanced for medical image segmentation by adding
Transformer. Although Transformer architectures are powerful at extracting
global information, its ability to capture local information is limited due to
its high complexity. To address this challenge, we proposed a new lightweight
U-shaped cascade fusion network (LUCF-Net) for medical image segmentation. It
utilized an asymmetrical structural design and incorporated both local and
global modules to enhance its capacity for local and global modeling.
Additionally, a multi-layer cascade fusion decoding network was designed to
further bolster the network’s information fusion capabilities. Validation
results achieved on multi-organ datasets in CT format, cardiac segmentation
datasets in MRI format, and dermatology datasets in image format demonstrated
that the proposed model outperformed other state-of-the-art methods in handling
local-global information, achieving an improvement of 1.54% in Dice coefficient
and 2.6 mm in Hausdorff distance on multi-organ segmentation. Furthermore, as a
network that combines Convolutional Neural Network and Transformer
architectures, it achieves competitive segmentation performance with only 6.93
million parameters and 6.6 gigabytes of floating point operations, without the
need of pre-training. In summary, the proposed method demonstrated enhanced
performance while retaining a simpler model design compared to other
Transformer-based segmentation networks.
[LINK]
http://arxiv.org/abs/2404.07473v1
[DATE]
2024-04-11 12:54:42+08:00
[CATEGORIES]
cs.LG
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
[AUTHORS]
Ainesh Bakshi, Vincent Cohen-Addad, Samuel B. Hopkins, Rajesh Jayaram, Silvio Lattanzi
[ABSTRACT]
Multi-dimensional Scaling (MDS) is a family of methods for embedding an
$n$-point metric into low-dimensional Euclidean space. We study the
Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities
$\{d_{i,j}\}{i , j \in [n]}$ over $n$ points, the goal is to find an embedding
$\{x_1,\dots,x_n\} \in \mathbb{R}^k$ that minimizes [\text{OPT} = \min{x}
\mathbb{E}{i,j \in [n]} \left[ \left(1-\frac{|x_i - x_j|}{d{i,j}}\right)^2
\right] ]
Kamada-Kawai provides a more relaxed measure of the quality of a
low-dimensional metric embedding than the traditional bi-Lipschitz-ness measure
studied in theoretical computer science; this is advantageous because strong
hardness-of-approximation results are known for the latter, Kamada-Kawai admits
nontrivial approximation algorithms. Despite its popularity, our theoretical
understanding of MDS is limited. Recently, Demaine, Hesterberg, Koehler, Lynch,
and Urschel (arXiv:2109.11505) gave the first approximation algorithm with
provable guarantees for Kamada-Kawai in the constant-$k$ regime, with cost
$\text{OPT} +\epsilon$ in $n^2 2^{\text{poly}(\Delta/\epsilon)}$ time, where
$\Delta$ is the aspect ratio of the input. In this work, we give the first
approximation algorithm for MDS with quasi-polynomial dependency on $\Delta$:
we achieve a solution with cost $\tilde{O}(\log
\Delta)\text{OPT}^{\Omega(1)}+\epsilon$ in time
$n^{O(1)}2^{\text{poly}(\log(\Delta)/\epsilon)}$.
Our approach is based on a novel analysis of a conditioning-based rounding
scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the
geometry of low-dimensional Euclidean space, allowing us to avoid an
exponential dependence on the aspect ratio. We believe our geometry-aware
treatment of the Sherali-Adams Hierarchy is an important step towards
developing general-purpose techniques for efficient metric optimization
algorithms.
[COMMENTS]
Extended exposition
[LINK]
http://arxiv.org/abs/2311.17840v2
[DATE]
2024-04-11 12:23:42+08:00
[CATEGORIES]
cs.LG
GEM3D: GEnerative Medial Abstractions for 3D Shape Synthesis
[AUTHORS]
Dmitry Petrov, Pradyumn Goyal, Vikas Thamizharasan, Vladimir G. Kim, Matheus Gadelha, Melinos Averkiou, Siddhartha Chaudhuri, Evangelos Kalogerakis
[ABSTRACT]
We introduce GEM3D – a new deep, topology-aware generative model of 3D
shapes. The key ingredient of our method is a neural skeleton-based
representation encoding information on both shape topology and geometry.
Through a denoising diffusion probabilistic model, our method first generates
skeleton-based representations following the Medial Axis Transform (MAT), then
generates surfaces through a skeleton-driven neural implicit formulation. The
neural implicit takes into account the topological and geometric information
stored in the generated skeleton representations to yield surfaces that are
more topologically and geometrically accurate compared to previous neural field
formulations. We discuss applications of our method in shape synthesis and
point cloud reconstruction tasks, and evaluate our method both qualitatively
and quantitatively. We demonstrate significantly more faithful surface
reconstruction and diverse shape generation results compared to the
state-of-the-art, also involving challenging scenarios of reconstructing and
synthesizing structurally complex, high-genus shape surfaces from Thingi10K and
ShapeNet.
[COMMENTS]
Webpage: https://lodurality.github.io/GEM3D/ – Cond. accept. to
SIGGRAPH 2024 (conf. track) – Changes (based on reviews): changed style to
sigconf; rearranged figures for readability; added missing citations; fixed
misaligned centers in Fig. 3; added failure cases (Fig. 10); rewrote
discussion; added categories averages to Tab. 8; added Tab. 10 with model
capacities
[LINK]
http://arxiv.org/abs/2402.16994v2
[DATE]
2024-04-11 11:44:49+08:00
[CATEGORIES]
cs.LG
Representation Learning of Tangled Key-Value Sequence Data for Early Classification
[AUTHORS]
Tao Duan, Junzhou Zhao, Shuo Zhang, Jing Tao, Pinghui Wang
[ABSTRACT]
Key-value sequence data has become ubiquitous and naturally appears in a
variety of real-world applications, ranging from the user-product purchasing
sequences in e-commerce, to network packet sequences forwarded by routers in
networking. Classifying these key-value sequences is important in many
scenarios such as user profiling and malicious applications identification. In
many time-sensitive scenarios, besides the requirement of classifying a
key-value sequence accurately, it is also desired to classify a key-value
sequence early, in order to respond fast. However, these two goals are
conflicting in nature, and it is challenging to achieve them simultaneously. In
this work, we formulate a novel tangled key-value sequence early classification
problem, where a tangled key-value sequence is a mixture of several concurrent
key-value sequences with different keys. The goal is to classify each
individual key-value sequence sharing a same key both accurately and early. To
address this problem, we propose a novel method, i.e., Key-Value sequence Early
Co-classification (KVEC), which leverages both inner- and inter-correlations of
items in a tangled key-value sequence through key correlation and value
correlation to learn a better sequence representation. Meanwhile, a time-aware
halting policy decides when to stop the ongoing key-value sequence and classify
it based on current sequence representation. Experiments on both real-world and
synthetic datasets demonstrate that our method outperforms the state-of-the-art
baselines significantly. KVEC improves the prediction accuracy by up to $4.7 -
17.5\%$ under the same prediction earliness condition, and improves the
harmonic mean of accuracy and earliness by up to $3.7 - 14.0\%$.
[COMMENTS]
12 pages, 31 figures, Accepted by ICDE2024
[LINK]
http://arxiv.org/abs/2404.07454v1
[DATE]
2024-04-11 11:23:15+08:00
[CATEGORIES]
cs.LG
Graph Attention Network for Lane-Wise and Topology-Invariant Intersection Traffic Simulation
[AUTHORS]
Nooshin Yousefzadeh, Rahul Sengupta, Yashaswi Karnati, Anand Rangarajan, Sanjay Ranka
[ABSTRACT]
Traffic congestion has significant economic, environmental, and social
ramifications. Intersection traffic flow dynamics are influenced by numerous
factors. While microscopic traffic simulators are valuable tools, they are
computationally intensive and challenging to calibrate. Moreover, existing
machine-learning approaches struggle to provide lane-specific waveforms or
adapt to intersection topology and traffic patterns. In this study, we propose
two efficient and accurate “Digital Twin” models for intersections, leveraging
Graph Attention Neural Networks (GAT). These attentional graph auto-encoder
digital twins capture temporal, spatial, and contextual aspects of traffic
within intersections, incorporating various influential factors such as
high-resolution loop detector waveforms, signal state records, driving
behaviors, and turning-movement counts. Trained on diverse counterfactual
scenarios across multiple intersections, our models generalize well, enabling
the estimation of detailed traffic waveforms for any intersection approach and
exit lanes. Multi-scale error metrics demonstrate that our models perform
comparably to microsimulations. The primary application of our study lies in
traffic signal optimization, a pivotal area in transportation systems research.
These lightweight digital twins can seamlessly integrate into corridor and
network signal timing optimization frameworks. Furthermore, our study’s
applications extend to lane reconfiguration, driving behavior analysis, and
facilitating informed decisions regarding intersection safety and efficiency
enhancements. A promising avenue for future research involves extending this
approach to urban freeway corridors and integrating it with measures of
effectiveness metrics.
[COMMENTS]
T-TIS Journal, 12 pages, 8 figures, 4 tables
[LINK]
http://arxiv.org/abs/2404.07446v1
[DATE]
2024-04-11 11:02:06+08:00
[CATEGORIES]
cs.LG
Elementary Analysis of Policy Gradient Methods
[AUTHORS]
Jiacai Liu, Wenye Li, Ke Wei
[ABSTRACT]
Projected policy gradient under the simplex parameterization, policy gradient
and natural policy gradient under the softmax parameterization, are fundamental
algorithms in reinforcement learning. There have been a flurry of recent
activities in studying these algorithms from the theoretical aspect. Despite
this, their convergence behavior is still not fully understood, even given the
access to exact policy evaluations. In this paper, we focus on the discounted
MDP setting and conduct a systematic study of the aforementioned policy
optimization methods. Several novel results are presented, including 1) global
linear convergence of projected policy gradient for any constant step size, 2)
sublinear convergence of softmax policy gradient for any constant step size, 3)
global linear convergence of softmax natural policy gradient for any constant
step size, 4) global linear convergence of entropy regularized softmax policy
gradient for a wider range of constant step sizes than existing result, 5)
tight local linear convergence rate of entropy regularized natural policy
gradient, and 6) a new and concise local quadratic convergence rate of soft
policy iteration without the assumption on the stationary distribution under
the optimal policy. New and elementary analysis techniques have been developed
to establish these results.
[LINK]
http://arxiv.org/abs/2404.03372v2
[DATE]
2024-04-11 10:59:07+08:00
[CATEGORIES]
cs.LG
Tensor Decomposition Based Attention Module for Spiking Neural Networks
[AUTHORS]
Haoyu Deng, Ruijie Zhu, Xuerui Qiu, Yule Duan, Malu Zhang, Liangjian Deng
[ABSTRACT]
The attention mechanism has been proven to be an effective way to improve
spiking neural network (SNN). However, based on the fact that the current SNN
input data flow is split into tensors to process on GPUs, none of the previous
works consider the properties of tensors to implement an attention module. This
inspires us to rethink current SNN from the perspective of tensor-relevant
theories. Using tensor decomposition, we design the \textit{projected full
attention} (PFA) module, which demonstrates excellent results with linearly
growing parameters. Specifically, PFA is composed by the \textit{linear
projection of spike tensor} (LPST) module and \textit{attention map composing}
(AMC) module. In LPST, we start by compressing the original spike tensor into
three projected tensors using a single property-preserving strategy with
learnable parameters for each dimension. Then, in AMC, we exploit the inverse
procedure of the tensor decomposition process to combine the three tensors into
the attention map using a so-called connecting factor. To validate the
effectiveness of the proposed PFA module, we integrate it into the widely used
VGG and ResNet architectures for classification tasks. Our method achieves
state-of-the-art performance on both static and dynamic benchmark datasets,
surpassing the existing SNN models with Transformer-based and CNN-based
backbones.
[COMMENTS]
Accepted by Knowledge-Based Systems
[LINK]
http://arxiv.org/abs/2310.14576v2
[DATE]
2024-04-11 10:57:21+08:00
[CATEGORIES]
cs.LG
1-bit Quantized On-chip Hybrid Diffraction Neural Network Enabled by Authentic All-optical Fully-connected Architecture
[AUTHORS]
Yu Shao, Haiqi Gao, Yipeng Chen, Yujie liu, Junren Wen, Haidong He, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang
[ABSTRACT]
Optical Diffraction Neural Networks (DNNs), a subset of Optical Neural
Networks (ONNs), show promise in mirroring the prowess of electronic networks.
This study introduces the Hybrid Diffraction Neural Network (HDNN), a novel
architecture that incorporates matrix multiplication into DNNs, synergizing the
benefits of conventional ONNs with those of DNNs to surmount the modulation
limitations inherent in optical diffraction neural networks. Utilizing a
singular phase modulation layer and an amplitude modulation layer, the trained
neural network demonstrated remarkable accuracies of 96.39% and 89% in digit
recognition tasks in simulation and experiment, respectively. Additionally, we
develop the Binning Design (BD) method, which effectively mitigates the
constraints imposed by sampling intervals on diffraction units, substantially
streamlining experimental procedures. Furthermore, we propose an on-chip HDNN
that not only employs a beam-splitting phase modulation layer for enhanced
integration level but also significantly relaxes device fabrication
requirements, replacing metasurfaces with relief surfaces designed by 1-bit
quantization. Besides, we conceptualized an all-optical HDNN-assisted lesion
detection network, achieving detection outcomes that were 100% aligned with
simulation predictions. This work not only advances the performance of DNNs but
also streamlines the path towards industrial optical neural network production.
[LINK]
http://arxiv.org/abs/2404.07443v1
[DATE]
2024-04-11 10:54:17+08:00
[CATEGORIES]
cs.LG
The Sample Complexity of Gradient Descent in Stochastic Convex Optimization
[AUTHORS]
Roi Livni
[ABSTRACT]
We analyze the sample complexity of full-batch Gradient Descent (GD) in the
setup of non-smooth Stochastic Convex Optimization. We show that the
generalization error of GD, with common choice of hyper-parameters, can be
$\tilde \Theta(d/m + 1/\sqrt{m})$, where $d$ is the dimension and $m$ is the
sample size. This matches the sample complexity of \emph{worst-case} empirical
risk minimizers. That means that, in contrast with other algorithms, GD has no
advantage over naive ERMs. Our bound follows from a new generalization bound
that depends on both the dimension as well as the learning rate and number of
iterations. Our bound also shows that, for general hyper-parameters, when the
dimension is strictly larger than number of samples, $T=\Omega(1/\epsilon^4)$
iterations are necessary to avoid overfitting. This resolves an open problem by
Schlisserman et al.23 and Amir er Al.21, and improves over previous lower
bounds that demonstrated that the sample size must be at least square root of
the dimension.
[LINK]
http://arxiv.org/abs/2404.04931v2
[DATE]
2024-04-11 10:32:43+08:00
[CATEGORIES]
cs.LG
Data-Driven Portfolio Management for Motion Pictures Industry: A New Data-Driven Optimization Methodology Using a Large Language Model as the Expert
[AUTHORS]
Mohammad Alipour-Vaezi, Kwok-Leung Tsui
[ABSTRACT]
Portfolio management is one of the unresponded problems of the Motion
Pictures Industry (MPI). To design an optimal portfolio for an MPI distributor,
it is essential to predict the box office of each project. Moreover, for an
accurate box office prediction, it is critical to consider the effect of the
celebrities involved in each MPI project, which was impossible with any
precedent expert-based method. Additionally, the asymmetric characteristic of
MPI data decreases the performance of any predictive algorithm. In this paper,
firstly, the fame score of the celebrities is determined using a large language
model. Then, to tackle the asymmetric character of MPI’s data, projects are
classified. Furthermore, the box office prediction takes place for each class
of projects. Finally, using a hybrid multi-attribute decision-making technique,
the preferability of each project for the distributor is calculated, and
benefiting from a bi-objective optimization model, the optimal portfolio is
designed.
[LINK]
http://arxiv.org/abs/2404.07434v1
[DATE]
2024-04-11 10:23:30+08:00
[CATEGORIES]
cs.LG
Deep Temporal Graph Clustering
[AUTHORS]
Meng Liu, Yue Liu, Ke Liang, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu
[ABSTRACT]
Deep graph clustering has recently received significant attention due to its
ability to enhance the representation learning capabilities of models in
unsupervised scenarios. Nevertheless, deep clustering for temporal graphs,
which could capture crucial dynamic interaction information, has not been fully
explored. It means that in many clustering-oriented real-world scenarios,
temporal graphs can only be processed as static graphs. This not only causes
the loss of dynamic information but also triggers huge computational
consumption. To solve the problem, we propose a general framework for deep
Temporal Graph Clustering called TGC, which introduces deep clustering
techniques to suit the interaction sequence-based batch-processing pattern of
temporal graphs. In addition, we discuss differences between temporal graph
clustering and static graph clustering from several levels. To verify the
superiority of the proposed framework TGC, we conduct extensive experiments.
The experimental results show that temporal graph clustering enables more
flexibility in finding a balance between time and space requirements, and our
framework can effectively improve the performance of existing temporal graph
learning methods. The code is released:
https://github.com/MGitHubL/Deep-Temporal-Graph-Clustering.
[LINK]
http://arxiv.org/abs/2305.10738v3
[DATE]
2024-04-11 10:21:26+08:00
[CATEGORIES]
cs.LG
From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution
[AUTHORS]
Bernard J. Koch, David Peterson
[ABSTRACT]
Over the past decade, AI research has focused heavily on building ever-larger
deep learning models. This approach has simultaneously unlocked incredible
achievements in science and technology, and hindered AI from overcoming
long-standing limitations with respect to explainability, ethical harms, and
environmental efficiency. Drawing on qualitative interviews and computational
analyses, our three-part history of AI research traces the creation of this
“epistemic monoculture” back to a radical reconceptualization of scientific
progress that began in the late 1980s. In the first era of AI research
(1950s-late 1980s), researchers and patrons approached AI as a “basic” science
that would advance through autonomous exploration and organic assessments of
progress (e.g., peer-review, theoretical consensus). The failure of this
approach led to a retrenchment of funding in the 1980s. Amid this “AI Winter,”
an intervention by the U.S. government reoriented the field towards measurable
progress on tasks of military and commercial interest. A new evaluation system
called “benchmarking” provided an objective way to quantify progress on tasks
by focusing exclusively on increasing predictive accuracy on example datasets.
Distilling science down to verifiable metrics clarified the roles of
scientists, allowed the field to rapidly integrate talent, and provided clear
signals of significance and progress. But history has also revealed a tradeoff
to this streamlined approach to science: the consolidation around external
interests and inherent conservatism of benchmarking has disincentivized
exploration beyond scaling monoculture. In the discussion, we explain how AI’s
monoculture offers a compelling challenge to the belief that basic,
exploration-driven research is needed for scientific progress. Implications for
the spread of AI monoculture to other sciences in the era of generative AI are
also discussed.
[LINK]
http://arxiv.org/abs/2404.06647v2
[DATE]
2024-04-11 10:09:23+08:00
[CATEGORIES]
cs.LG
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models
[AUTHORS]
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
[ABSTRACT]
Adversarial ranking attacks have gained increasing attention due to their
success in probing vulnerabilities, and, hence, enhancing the robustness, of
neural ranking models. Conventional attack methods employ perturbations at a
single granularity, e.g., word or sentence level, to target documents. However,
limiting perturbations to a single level of granularity may reduce the
flexibility of adversarial examples, thereby diminishing the potential threat
of the attack. Therefore, we focus on generating high-quality adversarial
examples by incorporating multi-granular perturbations. Achieving this
objective involves tackling a combinatorial explosion problem, which requires
identifying an optimal combination of perturbations across all possible levels
of granularity, positions, and textual pieces. To address this challenge, we
transform the multi-granular adversarial attack into a sequential
decision-making process, where perturbations in the next attack step build on
the perturbed document in the current attack step. Since the attack process can
only access the final state without direct intermediate signals, we use
reinforcement learning to perform multi-granular attacks. During the
reinforcement learning process, two agents work cooperatively to identify
multi-granular vulnerabilities as attack targets and organize perturbation
candidates into a final perturbation sequence. Experimental results show that
our attack method surpasses prevailing baselines in both attack effectiveness
and imperceptibility.
[COMMENTS]
Accepted by SIGIR2024
[LINK]
http://arxiv.org/abs/2404.01574v2
[DATE]
2024-04-11 10:00:12+08:00
[CATEGORIES]
cs.LG
AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent
[AUTHORS]
Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg
[ABSTRACT]
Encouraged by the remarkable achievements of language and vision foundation
models, developing generalist robotic agents through imitation learning, using
large demonstration datasets, has become a prominent area of interest in robot
learning. The efficacy of imitation learning is heavily reliant on the quantity
and quality of the demonstration datasets. In this study, we aim to scale up
demonstrations in a data-efficient way to facilitate the learning of generalist
robotic agents. We introduce AdaDemo (Adaptive Online Demonstration Expansion),
a general framework designed to improve multi-task policy learning by actively
and continually expanding the demonstration dataset. AdaDemo strategically
collects new demonstrations to address the identified weakness in the existing
policy, ensuring data efficiency is maximized. Through a comprehensive
evaluation on a total of 22 tasks across two robotic manipulation benchmarks
(RLBench and Adroit), we demonstrate AdaDemo’s capability to progressively
improve policy performance by guiding the generation of high-quality
demonstration datasets in a data-efficient manner.
[LINK]
http://arxiv.org/abs/2404.07428v1
[DATE]
2024-04-11 09:59:29+08:00
[CATEGORIES]
cs.LG
Learning Chemotherapy Drug Action via Universal Physics-Informed Neural Networks
[AUTHORS]
Lena Podina, Ali Ghodsi, Mohammad Kohandel
[ABSTRACT]
Quantitative systems pharmacology (QSP) is widely used to assess drug effects
and toxicity before the drug goes to clinical trial. However, significant
manual distillation of the literature is needed in order to construct a QSP
model. Parameters may need to be fit, and simplifying assumptions of the model
need to be made. In this work, we apply Universal Physics-Informed Neural
Networks (UPINNs) to learn unknown components of various differential equations
that model chemotherapy pharmacodynamics. We learn three commonly employed
chemotherapeutic drug actions (log-kill, Norton-Simon, and E_max) from
synthetic data. Then, we use the UPINN method to fit the parameters for several
synthetic datasets simultaneously. Finally, we learn the net proliferation rate
in a model of doxorubicin (a chemotherapeutic) pharmacodynamics. As these are
only toy examples, we highlight the usefulness of UPINNs in learning unknown
terms in pharmacodynamic and pharmacokinetic models.
[LINK]
http://arxiv.org/abs/2404.08019v1
[DATE]
2024-04-11 09:30:05+08:00
[CATEGORIES]
cs.LG
Minusformer: Improving Time Series Forecasting by Progressively Learning Residuals
[AUTHORS]
Daojun Liang, Haixia Zhang, Dongfeng Yuan, Bingzheng Zhang, Minggao Zhang
[ABSTRACT]
In this paper, we find that ubiquitous time series (TS) forecasting models
are prone to severe overfitting. To cope with this problem, we embrace a
de-redundancy approach to progressively reinstate the intrinsic values of TS
for future intervals. Specifically, we renovate the vanilla Transformer by
reorienting the information aggregation mechanism from addition to subtraction.
Then, we incorporate an auxiliary output branch into each block of the original
model to construct a highway leading to the ultimate prediction. The output of
subsequent modules in this branch will subtract the previously learned results,
enabling the model to learn the residuals of the supervision signal, layer by
layer. This designing facilitates the learning-driven implicit progressive
decomposition of the input and output streams, empowering the model with
heightened versatility, interpretability, and resilience against overfitting.
Since all aggregations in the model are minus signs, which is called
Minusformer. Extensive experiments demonstrate the proposed method outperform
existing state-of-the-art methods, yielding an average performance improvement
of 11.9% across various datasets.
[LINK]
http://arxiv.org/abs/2402.02332v2
[DATE]
2024-04-11 09:21:03+08:00
[CATEGORIES]
cs.LG
Semantically-correlated memories in a dense associative model
[AUTHORS]
Thomas F Burns
[ABSTRACT]
I introduce a novel associative memory model named Correlated Dense
Associative Memory (CDAM), which integrates both auto- and hetero-association
in a unified framework for continuous-valued memory patterns. Employing an
arbitrary graph structure to semantically link memory patterns, CDAM is
theoretically and numerically analysed, revealing four distinct dynamical
modes: auto-association, narrow hetero-association, wide hetero-association,
and neutral quiescence. Drawing inspiration from inhibitory modulation studies,
I employ anti-Hebbian learning rules to control the range of
hetero-association, extract multi-scale representations of community structures
in graphs, and stabilise the recall of temporal sequences. Experimental
demonstrations showcase CDAM’s efficacy in handling real-world data,
replicating a classical neuroscience experiment, performing image retrieval,
and simulating arbitrary finite automata.
[COMMENTS]
35 pages, 32 figures
[LINK]
http://arxiv.org/abs/2404.07123v2
[DATE]
2024-04-11 09:09:08+08:00
[CATEGORIES]
cs.LG
Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling
[AUTHORS]
Sourajit Saha, Tejas Gokhale
[ABSTRACT]
Downsampling operators break the shift invariance of convolutional neural
networks (CNNs) and this affects the robustness of features learned by CNNs
when dealing with even small pixel-level shift. Through a large-scale
correlation analysis framework, we study shift invariance of CNNs by inspecting
existing downsampling operators in terms of their maximum-sampling bias (MSB),
and find that MSB is negatively correlated with shift invariance. Based on this
crucial insight, we propose a learnable pooling operator called Translation
Invariant Polyphase Sampling (TIPS) and two regularizations on the intermediate
feature maps of TIPS to reduce MSB and learn translation-invariant
representations. TIPS can be integrated into any CNN and can be trained
end-to-end with marginal computational overhead. Our experiments demonstrate
that TIPS results in consistent performance gains in terms of accuracy, shift
consistency, and shift fidelity on multiple benchmarks for image classification
and semantic segmentation compared to previous methods and also leads to
improvements in adversarial and distributional robustness. TIPS results in the
lowest MSB compared to all previous methods, thus explaining our strong
empirical results.
[LINK]
http://arxiv.org/abs/2404.07410v1
[DATE]
2024-04-11 08:49:38+08:00
[CATEGORIES]
cs.LG
Incremental Randomized Smoothing Certification
[AUTHORS]
Shubham Ugare, Tarun Suresh, Debangshu Banerjee, Gagandeep Singh, Sasa Misailovic
[ABSTRACT]
Randomized smoothing-based certification is an effective approach for
obtaining robustness certificates of deep neural networks (DNNs) against
adversarial attacks. This method constructs a smoothed DNN model and certifies
its robustness through statistical sampling, but it is computationally
expensive, especially when certifying with a large number of samples.
Furthermore, when the smoothed model is modified (e.g., quantized or pruned),
certification guarantees may not hold for the modified DNN, and recertifying
from scratch can be prohibitively expensive.
We present the first approach for incremental robustness certification for
randomized smoothing, IRS. We show how to reuse the certification guarantees
for the original smoothed model to certify an approximated model with very few
samples. IRS significantly reduces the computational cost of certifying
modified DNNs while maintaining strong robustness guarantees. We experimentally
demonstrate the effectiveness of our approach, showing up to 3x certification
speedup over the certification that applies randomized smoothing of the
approximate model from scratch.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2305.19521v2
[DATE]
2024-04-11 08:38:29+08:00
[CATEGORIES]
cs.LG
Learning the Positions in CountSketch
[AUTHORS]
Yi Li, Honghao Lin, Simin Liu, Ali Vakilian, David P. Woodruff
[ABSTRACT]
We consider sketching algorithms which first compress data by multiplication
with a random sketch matrix, and then apply the sketch to quickly solve an
optimization problem, e.g., low-rank approximation and regression. In the
learning-based sketching paradigm proposed by~\cite{indyk2019learning}, the
sketch matrix is found by choosing a random sparse matrix, e.g., CountSketch,
and then the values of its non-zero entries are updated by running gradient
descent on a training data set. Despite the growing body of work on this
paradigm, a noticeable omission is that the locations of the non-zero entries
of previous algorithms were fixed, and only their values were learned. In this
work, we propose the first learning-based algorithms that also optimize the
locations of the non-zero entries. Our first proposed algorithm is based on a
greedy algorithm. However, one drawback of the greedy algorithm is its slower
training time. We fix this issue and propose approaches for learning a
sketching matrix for both low-rank approximation and Hessian approximation for
second order optimization. The latter is helpful for a range of constrained
optimization problems, such as LASSO and matrix estimation with a nuclear norm
constraint. Both approaches achieve good accuracy with a fast running time.
Moreover, our experiments suggest that our algorithm can still reduce the error
significantly even if we only have a very limited number of training matrices.
[COMMENTS]
Corrected the proof of Theorem 5.1. arXiv admin note: text overlap
with arXiv:2007.09890
[LINK]
http://arxiv.org/abs/2306.06611v2
[DATE]
2024-04-11 08:31:28+08:00
[CATEGORIES]
cs.LG
Learning to Predict 3D Rotational Dynamics from Images of a Rigid Body with Unknown Mass Distribution
[AUTHORS]
Justice Mason, Christine Allen-Blanchette, Nicholas Zolman, Elizabeth Davison, Naomi Ehrich Leonard
[ABSTRACT]
In many real-world settings, image observations of freely rotating 3D rigid
bodies may be available when low-dimensional measurements are not. However, the
high-dimensionality of image data precludes the use of classical estimation
techniques to learn the dynamics. The usefulness of standard deep learning
methods is also limited, because an image of a rigid body reveals nothing about
the distribution of mass inside the body, which, together with initial angular
velocity, is what determines how the body will rotate. We present a
physics-based neural network model to estimate and predict 3D rotational
dynamics from image sequences. We achieve this using a multi-stage prediction
pipeline that maps individual images to a latent representation homeomorphic to
$\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts
future latent states using the Hamiltonian equations of motion. We demonstrate
the efficacy of our approach on new rotating rigid-body datasets of sequences
of synthetic images of rotating objects, including cubes, prisms and
satellites, with unknown uniform and non-uniform mass distributions. Our model
outperforms competing baselines on our datasets, producing better qualitative
predictions and reducing the error observed for the state-of-the-art
Hamiltonian Generative Network by a factor of 2.
[COMMENTS]
Previously appeared as arXiv:2209.11355v2, which was submitted as a
replacement by accident. arXiv admin note: text overlap with arXiv:2209.11355
[LINK]
http://arxiv.org/abs/2308.14666v2
[DATE]
2024-04-11 07:39:38+08:00
[CATEGORIES]
cs.LG
Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits
[AUTHORS]
Chenhui Deng, Zichao Yue, Cunxi Yu, Gokce Sarar, Ryan Carey, Rajeev Jain, Zhiru Zhang
[ABSTRACT]
While graph neural networks (GNNs) have gained popularity for learning
circuit representations in various electronic design automation (EDA) tasks,
they face challenges in scalability when applied to large graphs and exhibit
limited generalizability to new designs. These limitations make them less
practical for addressing large-scale, complex circuit problems. In this work we
propose HOGA, a novel attention-based model for learning circuit
representations in a scalable and generalizable manner. HOGA first computes
hop-wise features per node prior to model training. Subsequently, the hop-wise
features are solely used to produce node representations through a gated
self-attention module, which adaptively learns important features among
different hops without involving the graph topology. As a result, HOGA is
adaptive to various structures across different circuits and can be efficiently
trained in a distributed manner. To demonstrate the efficacy of HOGA, we
consider two representative EDA tasks: quality of results (QoR) prediction and
functional reasoning. Our experimental results indicate that (1) HOGA reduces
estimation error over conventional GNNs by 46.76% for predicting QoR after
logic synthesis; (2) HOGA improves 10.0% reasoning accuracy over GNNs for
identifying functional blocks on unseen gate-level netlists after complex
technology mapping; (3) The training time for HOGA almost linearly decreases
with an increase in computing resources.
[COMMENTS]
Published as a conference paper at Design Automation Conference (DAC)
2024
[LINK]
http://arxiv.org/abs/2403.01317v4
[DATE]
2024-04-11 07:31:08+08:00
[CATEGORIES]
cs.LG
Improving Multi-Center Generalizability of GAN-Based Fat Suppression using Federated Learning
[AUTHORS]
Pranav Kulkarni, Adway Kanhere, Harshita Kukreja, Vivian Zhang, Paul H. Yi, Vishwa S. Parekh
[ABSTRACT]
Generative Adversarial Network (GAN)-based synthesis of fat suppressed (FS)
MRIs from non-FS proton density sequences has the potential to accelerate
acquisition of knee MRIs. However, GANs trained on single-site data have poor
generalizability to external data. We show that federated learning can improve
multi-center generalizability of GANs for synthesizing FS MRIs, while
facilitating privacy-preserving multi-institutional collaborations.
[COMMENTS]
5 pages, 2 figures
[LINK]
http://arxiv.org/abs/2404.07374v1
[DATE]
2024-04-11 06:16:20+08:00
[CATEGORIES]
cs.LG
Synthesizing Neural Network Controllers with Closed-Loop Dissipativity Guarantees
[AUTHORS]
Neelay Junnarkar, Murat Arcak, Peter Seiler
[ABSTRACT]
In this paper, a method is presented to synthesize neural network controllers
such that the feedback system of plant and controller is dissipative,
certifying performance requirements such as L2 gain bounds. The class of plants
considered is that of linear time-invariant (LTI) systems interconnected with
an uncertainty, including nonlinearities treated as an uncertainty for
convenience of analysis. The uncertainty of the plant and the nonlinearities of
the neural network are both described using integral quadratic constraints
(IQCs). First, a dissipativity condition is derived for uncertain LTI systems.
Second, this condition is used to construct a linear matrix inequality (LMI)
which can be used to synthesize neural network controllers. Finally, this
convex condition is used in a projection-based training method to synthesize
neural network controllers with dissipativity guarantees. Numerical examples on
an inverted pendulum and a flexible rod on a cart are provided to demonstrate
the effectiveness of this approach.
[COMMENTS]
Submitted to the journal Automatica, 14 pages, 7 figures
[LINK]
http://arxiv.org/abs/2404.07373v1
[DATE]
2024-04-11 06:15:28+08:00
[CATEGORIES]
cs.LG
Gradient Networks
[AUTHORS]
Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura
[ABSTRACT]
Directly parameterizing and learning gradients of functions has widespread
significance, with specific applications in optimization, generative modeling,
and optimal transport. This paper introduces gradient networks (GradNets):
novel neural network architectures that parameterize gradients of various
function classes. GradNets exhibit specialized architectural constraints that
ensure correspondence to gradient functions. We provide a comprehensive GradNet
design framework that includes methods for transforming GradNets into monotone
gradient networks (mGradNets), which are guaranteed to represent gradients of
convex functions. We establish the approximation capabilities of the proposed
GradNet and mGradNet. Our results demonstrate that these networks universally
approximate the gradients of (convex) functions. Furthermore, these networks
can be customized to correspond to specific spaces of (monotone) gradient
functions, including gradients of transformed sums of (convex) ridge functions.
Our analysis leads to two distinct GradNet architectures, GradNet-C and
GradNet-M, and we describe the corresponding monotone versions, mGradNet-C and
mGradNet-M. Our empirical results show that these architectures offer efficient
parameterizations and outperform popular methods in gradient field learning
tasks.
[LINK]
http://arxiv.org/abs/2404.07361v1
[DATE]
2024-04-11 05:36:59+08:00
[CATEGORIES]
cs.LG
GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data
[AUTHORS]
Daniel Platnick, Sourena Khanzadeh, Alireza Sadeghian, Richard Anthony Valenzano
[ABSTRACT]
Microplastic particle ingestion or inhalation by humans is a problem of
growing concern. Unfortunately, current research methods that use machine
learning to understand their potential harms are obstructed by a lack of
available data. Deep learning techniques in particular are challenged by such
domains where only small or imbalanced data sets are available. Overcoming this
challenge often involves oversampling underrepresented classes or augmenting
the existing data to improve model performance. This paper proposes GANsemble:
a two-module framework connecting data augmentation with conditional generative
adversarial networks (cGANs) to generate class-conditioned synthetic data.
First, the data chooser module automates augmentation strategy selection by
searching for the best data augmentation strategy. Next, the cGAN module uses
this strategy to train a cGAN for generating enhanced synthetic data. We
experiment with the GANsemble framework on a small and imbalanced microplastics
data set. A Microplastic-cGAN (MPcGAN) algorithm is introduced, and baselines
for synthetic microplastics (SYMP) data are established in terms of Frechet
Inception Distance (FID) and Inception Scores (IS). We also provide a synthetic
microplastics filter (SYMP-Filter) algorithm to increase the quality of
generated SYMP. Additionally, we show the best amount of oversampling with
augmentation to fix class imbalance in small microplastics data sets. To our
knowledge, this study is the first application of generative AI to
synthetically create microplastics data.
[COMMENTS]
Accepted to the 37th Canadian Artificial Intelligence Conference
(2024), 12 pages, 4 figures
[LINK]
http://arxiv.org/abs/2404.07356v1
[DATE]
2024-04-11 05:23:13+08:00
[CATEGORIES]
cs.LG
Addressing the Abstraction and Reasoning Corpus via Procedural Example Generation
[AUTHORS]
Michael Hodel
[ABSTRACT]
This work presents code to procedurally generate examples for the ARC
training tasks. For each of the 400 tasks, an example generator following the
transformation logic of the original examples was created. In effect, the
assumed underlying distribution of examples for any given task was reverse
engineered by implementing a means to sample from it. An attempt was made to
cover an as large as reasonable space of possible examples for each task. That
is, whenever the original examples of a given task may be limited in their
diversity e.g. by having the dimensions of the grids, the set of symbols or
number of objects constant or within tight bounds, even though the
transformation does not require it, such constraints were lifted. Having access
to not just a few examples per task, as the case for ARC, but instead very
many, should enable a wide range of experiments that may be important stepping
stones towards making leaps on the benchmark.
[LINK]
http://arxiv.org/abs/2404.07353v1
[DATE]
2024-04-11 05:16:59+08:00
[CATEGORIES]
cs.LG
A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos
[AUTHORS]
Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, Enkelejda Kasneci
[ABSTRACT]
Eye-tracking applications that utilize the human gaze in video understanding
tasks have become increasingly important. To effectively automate the process
of video analysis based on eye-tracking data, it is important to accurately
replicate human gaze behavior. However, this task presents significant
challenges due to the inherent complexity and ambiguity of human gaze patterns.
In this work, we introduce a novel method for simulating human gaze behavior.
Our approach uses a transformer-based reinforcement learning algorithm to train
an agent that acts as a human observer, with the primary role of watching
videos and simulating human gaze behavior. We employed an eye-tracking dataset
gathered from videos generated by the VirtualHome simulator, with a primary
focus on activity recognition. Our experimental results demonstrate the
effectiveness of our gaze prediction method by highlighting its capability to
replicate human gaze behavior and its applicability for downstream tasks where
real human-gaze is used as input.
[COMMENTS]
2024 Symposium on Eye Tracking Research and Applications (ETRA24),
Glasgow, United Kingdom
[LINK]
http://arxiv.org/abs/2404.07351v1
[DATE]
2024-04-11 05:14:33+08:00
[CATEGORIES]
cs.LG
Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention
[AUTHORS]
Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, Enkelejda Kasneci
[ABSTRACT]
Humans utilize their gaze to concentrate on essential information while
perceiving and interpreting intentions in videos. Incorporating human gaze into
computational algorithms can significantly enhance model performance in video
understanding tasks. In this work, we address a challenging and innovative task
in video understanding: predicting the actions of an agent in a video based on
a partial video. We introduce the Gaze-guided Action Anticipation algorithm,
which establishes a visual-semantic graph from the video input. Our method
utilizes a Graph Neural Network to recognize the agent’s intention and predict
the action sequence to fulfill this intention. To assess the efficiency of our
approach, we collect a dataset containing household activities generated in the
VirtualHome environment, accompanied by human gaze data of viewing videos. Our
method outperforms state-of-the-art techniques, achieving a 7\% improvement in
accuracy for 18-class intention recognition. This highlights the efficiency of
our method in learning important features from human gaze data.
[COMMENTS]
2024 Symposium on Eye Tracking Research and Applications (ETRA24),
Glasgow, United Kingdom
[LINK]
http://arxiv.org/abs/2404.07347v1
[DATE]
2024-04-11 05:03:23+08:00
[CATEGORIES]
cs.LG
A Modified Depolarization Approach for Efficient Quantum Machine Learning
[AUTHORS]
Bikram Khanal, Pablo Rivas
[ABSTRACT]
Quantum Computing in the Noisy Intermediate-Scale Quantum (NISQ) era has
shown promising applications in machine learning, optimization, and
cryptography. Despite the progress, challenges persist due to system noise,
errors, and decoherence that complicate the simulation of quantum systems. The
depolarization channel is a standard tool for simulating a quantum system’s
noise. However, modeling such noise for practical applications is
computationally expensive when we have limited hardware resources, as is the
case in the NISQ era. We propose a modified representation for a single-qubit
depolarization channel with two Kraus operators based only on X and Z Pauli
matrices. Our approach reduces the computational complexity from six to four
matrix multiplications per execution of a channel. Experiments on a Quantum
Machine Learning (QML) model on the Iris dataset across various circuit depths
and depolarization rates validate that our approach maintains the model’s
accuracy while improving efficiency. This simplified noise model enables more
scalable simulations of quantum circuits under depolarization, advancing
capabilities in the NISQ era.
[LINK]
http://arxiv.org/abs/2404.07330v1
[DATE]
2024-04-11 04:17:40+08:00
[CATEGORIES]
cs.LG
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
[AUTHORS]
Nate Rahn, Pierluca D’Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
[ABSTRACT]
Deep reinforcement learning agents for continuous control are known to
exhibit significant instability in their performance over time. In this work,
we provide a fresh perspective on these behaviors by studying the return
landscape: the mapping between a policy and a return. We find that popular
algorithms traverse noisy neighborhoods of this landscape, in which a single
update to the policy parameters leads to a wide range of returns. By taking a
distributional view of these returns, we map the landscape, characterizing
failure-prone regions of policy space and revealing a hidden dimension of
policy quality. We show that the landscape exhibits surprising structure by
finding simple paths in parameter space which improve the stability of a
policy. To conclude, we develop a distribution-aware procedure which finds such
paths, navigating away from noisy neighborhoods in order to improve the
robustness of a policy. Taken together, our results provide new insight into
the optimization, evaluation, and design of agents.
[COMMENTS]
NeurIPS 2023 Accepted Paper. The first two authors contributed
equally
[LINK]
http://arxiv.org/abs/2309.14597v3
[DATE]
2024-04-11 03:54:28+08:00
[CATEGORIES]
cs.LG
Rethinking Perceptual Metrics for Medical Image Translation
[AUTHORS]
Nicholas Konz, Yuwen Chen, Hanxue Gu, Haoyu Dong, Maciej A. Mazurowski
[ABSTRACT]
Modern medical image translation methods use generative models for tasks such
as the conversion of CT images to MRI. Evaluating these methods typically
relies on some chosen downstream task in the target domain, such as
segmentation. On the other hand, task-agnostic metrics are attractive, such as
the network feature-based perceptual metrics (e.g., FID) that are common to
image translation in general computer vision. In this paper, we investigate
evaluation metrics for medical image translation on two medical image
translation tasks (GE breast MRI to Siemens breast MRI and lumbar spine MRI to
CT), tested on various state-of-the-art translation methods. We show that
perceptual metrics do not generally correlate with segmentation metrics due to
them extending poorly to the anatomical constraints of this sub-field, with FID
being especially inconsistent. However, we find that the lesser-used
pixel-level SWD metric may be useful for subtle intra-modality translation. Our
results demonstrate the need for further research into helpful metrics for
medical image translation.
[LINK]
http://arxiv.org/abs/2404.07318v1
[DATE]
2024-04-11 03:39:43+08:00
[CATEGORIES]
cs.LG
Structured Reinforcement Learning for Media Streaming at the Wireless Edge
[AUTHORS]
Archana Bura, Sarat Chandra Bobbili, Shreyas Rameshkumar, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai
[ABSTRACT]
Media streaming is the dominant application over wireless edge (access)
networks. The increasing softwarization of such networks has led to efforts at
intelligent control, wherein application-specific actions may be dynamically
taken to enhance the user experience. The goal of this work is to develop and
demonstrate learning-based policies for optimal decision making to determine
which clients to dynamically prioritize in a video streaming setting. We
formulate the policy design question as a constrained Markov decision problem
(CMDP), and observe that by using a Lagrangian relaxation we can decompose it
into single-client problems. Further, the optimal policy takes a threshold form
in the video buffer length, which enables us to design an efficient constrained
reinforcement learning (CRL) algorithm to learn it. Specifically, we show that
a natural policy gradient (NPG) based algorithm that is derived using the
structure of our problem converges to the globally optimal policy. We then
develop a simulation environment for training, and a real-world intelligent
controller attached to a WiFi access point for evaluation. We empirically show
that the structured learning approach enables fast learning. Furthermore, such
a structured policy can be easily deployed due to low computational complexity,
leading to policy execution taking only about 15$\mu$s. Using YouTube streaming
experiments in a resource constrained scenario, we demonstrate that the CRL
approach can increase QoE by over 30%.
[COMMENTS]
15 pages, 14 figures
[LINK]
http://arxiv.org/abs/2404.07315v1
[DATE]
2024-04-11 03:25:51+08:00
[CATEGORIES]
cs.LG
Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length
[AUTHORS]
Katerina Hlavackova-Schindler, Anna Melnykova, Irene Tubikanec
[ABSTRACT]
Multivariate Hawkes processes (MHPs) are versatile probabilistic tools used
to model various real-life phenomena: earthquakes, operations on stock markets,
neuronal activity, virus propagation and many others. In this paper, we focus
on MHPs with exponential decay kernels and estimate connectivity graphs, which
represent the Granger causal relations between their components. We approach
this inference problem by proposing an optimization criterion and model
selection algorithm based on the minimum message length (MML) principle. MML
compares Granger causal models using the Occam’s razor principle in the
following way: even when models have a comparable goodness-of-fit to the
observed data, the one generating the most concise explanation of the data is
preferred. While most of the state-of-art methods using lasso-type penalization
tend to overfitting in scenarios with short time horizons, the proposed
MML-based method achieves high F1 scores in these settings. We conduct a
numerical study comparing the proposed algorithm to other related classical and
state-of-art methods, where we achieve the highest F1 scores in specific sparse
graph settings. We illustrate the proposed method also on G7 sovereign bond
data and obtain causal connections, which are in agreement with the expert
knowledge available in the literature.
[COMMENTS]
26 pages, 5 figures
[LINK]
http://arxiv.org/abs/2309.02027v2
[DATE]
2024-04-11 03:03:58+08:00
[CATEGORIES]
cs.LG
Transfer Learning via Latent Dependency Factor for Estimating PM 2.5
[AUTHORS]
Shrey Gupta, Yongbee Park, Jianzhao Bi, Suyash Gupta, Andreas Züfle, Avani Wildani, Yang Liu
[ABSTRACT]
Air pollution, especially particulate matter 2.5 (PM 2.5), is a pressing
concern for public health and is difficult to estimate in developing countries
(data-poor regions) due to a lack of ground sensors. Transfer learning models
can be leveraged to solve this problem, as they use alternate data sources to
gain knowledge (i.e., data from data-rich regions). However, current transfer
learning methodologies do not account for dependencies between the source and
the target domains. We recognize this transfer problem as spatial transfer
learning and propose a new feature named Latent Dependency Factor (LDF) that
captures spatial and semantic dependencies of both domains and is subsequently
added to the datasets. We generate LDF using a novel two-stage autoencoder
model that learns from clusters of similar source and target domain data. Our
experiments show that transfer models using LDF have a $19.34\%$ improvement
over the best-performing baselines. We additionally support our experiments
with qualitative results.
[LINK]
http://arxiv.org/abs/2404.07308v1
[DATE]
2024-04-11 03:01:44+08:00
[CATEGORIES]
cs.LG
An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning
[AUTHORS]
Mohammad Sadegh Salehi, Subhadip Mukherjee, Lindon Roberts, Matthias J. Ehrhardt
[ABSTRACT]
Various tasks in data science are modeled utilizing the variational
regularization approach, where manually selecting regularization parameters
presents a challenge. The difficulty gets exacerbated when employing
regularizers involving a large number of hyperparameters. To overcome this
challenge, bilevel learning can be employed to learn such parameters from data.
However, neither exact function values nor exact gradients with respect to the
hyperparameters are attainable, necessitating methods that only rely on inexact
evaluation of such quantities. State-of-the-art inexact gradient-based methods
a priori select a sequence of the required accuracies and cannot identify an
appropriate step size since the Lipschitz constant of the hypergradient is
unknown. In this work, we propose an algorithm with backtracking line search
that only relies on inexact function evaluations and hypergradients and show
convergence to a stationary point. Furthermore, the proposed algorithm
determines the required accuracy dynamically rather than manually selected
before running it. Our numerical experiments demonstrate the efficiency and
feasibility of our approach for hyperparameter estimation on a range of
relevant problems in imaging and data science such as total variation and field
of experts denoising and multinomial logistic regression. Particularly, the
results show that the algorithm is robust to its own hyperparameters such as
the initial accuracies and step size.
[LINK]
http://arxiv.org/abs/2308.10098v2
[DATE]
2024-04-11 02:49:08+08:00
[CATEGORIES]
cs.LG
ONNXPruner: ONNX-Based General Model Pruning Adapter
[AUTHORS]
Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao
[ABSTRACT]
Recent advancements in model pruning have focused on developing new
algorithms and improving upon benchmarks. However, the practical application of
these algorithms across various models and platforms remains a significant
challenge. To address this challenge, we propose ONNXPruner, a versatile
pruning adapter designed for the ONNX format models. ONNXPruner streamlines the
adaptation process across diverse deep learning frameworks and hardware
platforms. A novel aspect of ONNXPruner is its use of node association trees,
which automatically adapt to various model architectures. These trees clarify
the structural relationships between nodes, guiding the pruning process,
particularly highlighting the impact on interconnected nodes. Furthermore, we
introduce a tree-level evaluation method. By leveraging node association trees,
this method allows for a comprehensive analysis beyond traditional single-node
evaluations, enhancing pruning performance without the need for extra
operations. Experiments across multiple models and datasets confirm
ONNXPruner’s strong adaptability and increased efficacy. Our work aims to
advance the practical application of model pruning.
[LINK]
http://arxiv.org/abs/2404.08016v1
[DATE]
2024-04-11 02:36:25+08:00
[CATEGORIES]
cs.LG
Certifying almost all quantum states with few single-qubit measurements
[AUTHORS]
Hsin-Yuan Huang, John Preskill, Mehdi Soleimanifar
[ABSTRACT]
Certifying that an n-qubit state synthesized in the lab is close to the
target state is a fundamental task in quantum information science. However,
existing rigorous protocols either require deep quantum circuits or
exponentially many single-qubit measurements. In this work, we prove that
almost all n-qubit target states, including those with exponential circuit
complexity, can be certified from only O(n^2) single-qubit measurements. This
result is established by a new technique that relates certification to the
mixing time of a random walk. Our protocol has applications for benchmarking
quantum systems, for optimizing quantum circuits to generate a desired target
state, and for learning and verifying neural networks, tensor networks, and
various other representations of quantum states using only single-qubit
measurements. We show that such verified representations can be used to
efficiently predict highly non-local properties that would otherwise require an
exponential number of measurements. We demonstrate these applications in
numerical experiments with up to 120 qubits, and observe advantage over
existing methods such as cross-entropy benchmarking (XEB).
[COMMENTS]
63 pages, 5 figures
[LINK]
http://arxiv.org/abs/2404.07281v1
[DATE]
2024-04-11 02:21:11+08:00
[CATEGORIES]
cs.LG
Elucidating the Exposure Bias in Diffusion Models
[AUTHORS]
Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, Itir Onal Ertugrul
[ABSTRACT]
Diffusion models have demonstrated impressive generative capabilities, but
their \textit{exposure bias} problem, described as the input mismatch between
training and sampling, lacks in-depth exploration. In this paper, we
systematically investigate the exposure bias problem in diffusion models by
first analytically modelling the sampling distribution, based on which we then
attribute the prediction error at each sampling step as the root cause of the
exposure bias issue. Furthermore, we discuss potential solutions to this issue
and propose an intuitive metric for it. Along with the elucidation of exposure
bias, we propose a simple, yet effective, training-free method called Epsilon
Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly
moves the sampling trajectory closer to the vector field learned in the
training phase by scaling down the network output, mitigating the input
mismatch between training and sampling. Experiments on various diffusion
frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our
method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler,
obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code
is available at \url{https://github.com/forever208/ADM-ES} and
\url{https://github.com/forever208/EDM-ES}.
[COMMENTS]
ICLR 2024
[LINK]
http://arxiv.org/abs/2308.15321v6
[DATE]
2024-04-11 02:13:00+08:00
[CATEGORIES]
cs.LG
Predicting Side Effect of Drug Molecules using Recurrent Neural Networks
[AUTHORS]
Collin Beaudoin, Koustubh Phalak, Swaroop Ghosh
[ABSTRACT]
Identification and verification of molecular properties such as side effects
is one of the most important and time-consuming steps in the process of
molecule synthesis. For example, failure to identify side effects before
submission to regulatory groups can cost millions of dollars and months of
additional research to the companies. Failure to identify side effects during
the regulatory review can also cost lives. The complexity and expense of this
task have made it a candidate for a machine learning-based solution. Prior
approaches rely on complex model designs and excessive parameter counts for
side effect predictions. We believe reliance on complex models only shifts the
difficulty away from chemists rather than alleviating the issue. Implementing
large models is also expensive without prior access to high-performance
computers. We propose a heuristic approach that allows for the utilization of
simple neural networks, specifically the recurrent neural network, with a 98+%
reduction in the number of required parameters compared to available large
language models while still obtaining near identical results as top-performing
models.
[COMMENTS]
6 pages, 4 figures, 2 tables
[LINK]
http://arxiv.org/abs/2305.10473v2
[DATE]
2024-04-11 02:07:20+08:00
[CATEGORIES]
cs.LG
Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity
[AUTHORS]
Vahid Balazadeh, Keertana Chidambaram, Viet Nguyen, Rahul G. Krishnan, Vasilis Syrgkanis
[ABSTRACT]
We study the problem of online sequential decision-making given auxiliary
demonstrations from experts who made their decisions based on unobserved
contextual information. These demonstrations can be viewed as solving related
but slightly different tasks than what the learner faces. This setting arises
in many application domains, such as self-driving cars, healthcare, and
finance, where expert demonstrations are made using contextual information,
which is not recorded in the data available to the learning agent. We model the
problem as a zero-shot meta-reinforcement learning setting with an unknown task
distribution and a Bayesian regret minimization objective, where the unobserved
tasks are encoded as parameters with an unknown prior. We propose the
Experts-as-Priors algorithm (ExPerior), a non-parametric empirical Bayes
approach that utilizes the principle of maximum entropy to establish an
informative prior over the learner’s decision-making problem. This prior
enables the application of any Bayesian approach for online decision-making,
such as posterior sampling. We demonstrate that our strategy surpasses existing
behaviour cloning and online algorithms for multi-armed bandits and
reinforcement learning, showcasing the utility of our approach in leveraging
expert demonstrations across different decision-making setups.
[LINK]
http://arxiv.org/abs/2404.07266v1
[DATE]
2024-04-11 02:00:17+08:00
[CATEGORIES]
cs.LG
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
[AUTHORS]
Zewei Zhang, Huan Liu, Jun Chen, Xiangyu Xu
[ABSTRACT]
In this paper, we introduce GoodDrag, a novel approach to improve the
stability and image quality of drag editing. Unlike existing methods that
struggle with accumulated perturbations and often result in distortions,
GoodDrag introduces an AlDD framework that alternates between drag and
denoising operations within the diffusion process, effectively improving the
fidelity of the result. We also propose an information-preserving motion
supervision operation that maintains the original features of the starting
point for precise manipulation and artifact reduction. In addition, we
contribute to the benchmarking of drag editing by introducing a new dataset,
Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy
Index and Gemini Score, utilizing Large Multimodal Models. Extensive
experiments demonstrate that the proposed GoodDrag compares favorably against
the state-of-the-art approaches both qualitatively and quantitatively. The
project page is https://gooddrag.github.io.
[LINK]
http://arxiv.org/abs/2404.07206v1
[DATE]
2024-04-11 01:59:59+08:00
[CATEGORIES]
cs.LG
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
[AUTHORS]
Shaoxiang Qin, Fuyuan Lyu, Wenhui Peng, Dingyang Geng, Ju Wang, Naiping Gao, Xue Liu, Liangzhu Leon Wang
[ABSTRACT]
In solving partial differential equations (PDEs), Fourier Neural Operators
(FNOs) have exhibited notable effectiveness compared to Convolutional Neural
Networks (CNNs). This paper presents clear empirical evidence through spectral
analysis to elucidate the superiority of FNO over CNNs: FNO is significantly
more capable of learning low-frequencies. This empirical evidence also unveils
FNO’s distinct low-frequency bias, which limits FNO’s effectiveness in learning
high-frequency information from PDE data. To tackle this challenge, we
introduce SpecBoost, an ensemble learning framework that employs multiple FNOs
to better capture high-frequency information. Specifically, a secondary FNO is
utilized to learn the overlooked high-frequency information from the prediction
residual of the initial FNO. Experiments demonstrate that SpecBoost noticeably
enhances FNO’s prediction accuracy on diverse PDE applications, achieving an up
to 71% improvement.
[LINK]
http://arxiv.org/abs/2404.07200v1
[DATE]
2024-04-11 01:58:04+08:00
[CATEGORIES]
cs.LG
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
[AUTHORS]
Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi
[ABSTRACT]
We introduce RealmDreamer, a technique for generation of general
forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D
Gaussian Splatting representation to match complex text prompts. We initialize
these splats by utilizing the state-of-the-art text-to-image generators,
lifting their samples into 3D, and computing the occlusion volume. We then
optimize this representation across multiple views as a 3D inpainting task with
image-conditional diffusion models. To learn correct geometric structure, we
incorporate a depth diffusion model by conditioning on the samples from the
inpainting model, giving rich geometric structure. Finally, we finetune the
model using sharpened samples from image generators. Notably, our technique
does not require video or multi-view data and can synthesize a variety of
high-quality 3D scenes in different styles, consisting of multiple objects. Its
generality additionally allows 3D synthesis from a single image.
[COMMENTS]
Project Page: https://realmdreamer.github.io/
[LINK]
http://arxiv.org/abs/2404.07199v1
[DATE]
2024-04-11 01:57:41+08:00
[CATEGORIES]
cs.LG
Zero-shot Logical Query Reasoning on any Knowledge Graph
[AUTHORS]
Mikhail Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, Zhaocheng Zhu
[ABSTRACT]
Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond
simple KG completion and aims at answering compositional queries comprised of
multiple projections and logical operations. Existing CLQA methods that learn
parameters bound to certain entity or relation vocabularies can only be applied
to the graph they are trained on which requires substantial training time
before being deployed on a new graph. Here we present UltraQuery, an inductive
reasoning model that can zero-shot answer logical queries on any KG. The core
idea of UltraQuery is to derive both projections and logical operations as
vocabulary-independent functions which generalize to new entities and relations
in any KG. With the projection operation initialized from a pre-trained
inductive KG reasoning model, UltraQuery can solve CLQA on any KG even if it is
only finetuned on a single dataset. Experimenting on 23 datasets, UltraQuery in
the zero-shot inference mode shows competitive or better query answering
performance than best available baselines and sets a new state of the art on 14
of them.
[LINK]
http://arxiv.org/abs/2404.07198v1
[DATE]
2024-04-11 01:56:07+08:00
[CATEGORIES]
cs.LG
VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification
[AUTHORS]
Florian Sestak, Lisa Schneckenreiter, Johannes Brandstetter, Sepp Hochreiter, Andreas Mayr, Günter Klambauer
[ABSTRACT]
Being able to identify regions within or around proteins, to which ligands
can potentially bind, is an essential step to develop new drugs. Binding site
identification methods can now profit from the availability of large amounts of
3D structures in protein structure databases or from AlphaFold predictions.
Current binding site identification methods heavily rely on graph neural
networks (GNNs), usually designed to output E(3)-equivariant predictions. Such
methods turned out to be very beneficial for physics-related tasks like binding
energy or motion trajectory prediction. However, the performance of GNNs at
binding site identification is still limited potentially due to the lack of
dedicated nodes that model hidden geometric entities, such as binding pockets.
In this work, we extend E(n)-Equivariant Graph Neural Networks (EGNNs) by
adding virtual nodes and applying an extended message passing scheme. The
virtual nodes in these graphs are dedicated quantities to learn representations
of binding sites, which leads to improved predictive performance. In our
experiments, we show that our proposed method VN-EGNN sets a new
state-of-the-art at locating binding site centers on COACH420, HOLO4K and
PDBbind2020.
[LINK]
http://arxiv.org/abs/2404.07194v1
[DATE]
2024-04-11 01:50:29+08:00
[CATEGORIES]
cs.LG
Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detection
[AUTHORS]
Jared M. Ping, Ken J. Nixon
[ABSTRACT]
Advances in Tiny Machine Learning (TinyML) have bolstered the creation of
smart industry solutions, including smart agriculture, healthcare and smart
cities. Whilst related research contributes to enabling TinyML solutions on
constrained hardware, there is a need to amplify real-world applications by
optimising energy consumption in battery-powered systems. The work presented
extends and contributes to TinyML research by optimising battery-powered
image-based anomaly detection Internet of Things (IoT) systems. Whilst previous
work in this area has yielded the capabilities of on-device inferencing and
training, there has yet to be an investigation into optimising the management
of such capabilities using machine learning approaches, such as Reinforcement
Learning (RL), to improve the deployment battery life of such systems. Using
modelled simulations, the battery life effects of an RL algorithm are
benchmarked against static and dynamic optimisation approaches, with the
foundation laid for a hardware benchmark to follow. It is shown that using RL
within a TinyML-enabled IoT system to optimise the system operations, including
cloud anomaly processing and on-device training, yields an improved battery
life of 22.86% and 10.86% compared to static and dynamic optimisation
approaches respectively. The proposed solution can be deployed to
resource-constrained hardware, given its low memory footprint of 800 B, which
could be further reduced. This further facilitates the real-world deployment of
such systems, including key sectors such as smart agriculture.
[COMMENTS]
Accepted as a full paper by the tinyML Research Symposium 2024
[LINK]
http://arxiv.org/abs/2403.05106v2
[DATE]
2024-04-11 01:39:53+08:00
[CATEGORIES]
cs.LG
Scaling Laws for Data Filtering – Data Curation cannot be Compute Agnostic
[AUTHORS]
Sachin Goyal, Pratyush Maini, Zachary C. Lipton, Aditi Raghunathan, J. Zico Kolter
[ABSTRACT]
Vision-language models (VLMs) are trained for thousands of GPU hours on
carefully curated web datasets. In recent times, data curation has gained
prominence with several works developing strategies to retain ‘high-quality’
subsets of ‘raw’ scraped data. For instance, the LAION public dataset retained
only 10% of the total crawled data. However, these strategies are typically
developed agnostic of the available compute for training. In this paper, we
first demonstrate that making filtering decisions independent of training
compute is often suboptimal: the limited high-quality data rapidly loses its
utility when repeated, eventually requiring the inclusion of ‘unseen’ but
‘lower-quality’ data. To address this quality-quantity tradeoff
($\texttt{QQT}$), we introduce neural scaling laws that account for the
non-homogeneous nature of web data, an angle ignored in existing literature.
Our scaling laws (i) characterize the $\textit{differing}$ ‘utility’ of various
quality subsets of web data; (ii) account for how utility diminishes for a data
point at its ‘nth’ repetition; and (iii) formulate the mutual interaction of
various data pools when combined, enabling the estimation of model performance
on a combination of multiple data pools without ever jointly training on them.
Our key message is that data curation $\textit{cannot}$ be agnostic of the
total compute that a model will be trained for. Our scaling laws allow us to
curate the best possible pool for achieving top performance on Datacomp at
various compute budgets, carving out a pareto-frontier for data curation. Code
is available at https://github.com/locuslab/scaling_laws_data_filtering.
[COMMENTS]
Published at CVPR 2024
[LINK]
http://arxiv.org/abs/2404.07177v1
[DATE]
2024-04-11 01:27:54+08:00
[CATEGORIES]
cs.LG
Deep Learning for Inertial Sensor Alignment
[AUTHORS]
Maxim Freydin, Niv Sfaradi, Nimrod Segol, Areej Eweida, Barak Or
[ABSTRACT]
Accurate alignment of a fixed mobile device equipped with inertial sensors
inside a moving vehicle is important for navigation, activity recognition, and
other applications. Accurate estimation of the device mounting angle is
required to rotate the inertial measurement from the sensor frame to the moving
platform frame to standardize measurements and improve the performance of the
target task. In this work, a data-driven approach using deep neural networks
(DNNs) is proposed to learn the yaw mounting angle of a smartphone equipped
with an inertial measurement unit (IMU) and strapped to a car. The proposed
model uses only the accelerometer and gyroscope readings from an IMU as input
and, in contrast to existing solutions, does not require global position inputs
from global navigation satellite systems (GNSS). To train the model in a
supervised manner, IMU data is collected for training and validation with the
sensor mounted at a known yaw mounting angle, and a range of ground truth
labels is generated by applying a random rotation in a bounded range to the
measurements. The trained model is tested on data with real rotations showing
similar performance as with synthetic rotations. The trained model is deployed
on an Android device and evaluated in real-time to test the accuracy of the
estimated yaw mounting angle. The model is shown to find the mounting angle at
an accuracy of 8 degrees within 5 seconds, and 4 degrees within 27 seconds. An
experiment is conducted to compare the proposed model with an existing
off-the-shelf solution.
[COMMENTS]
9 Pages, Preprint. Accepted IEEE
[LINK]
http://arxiv.org/abs/2212.11120v2
[DATE]
2024-04-11 01:15:23+08:00
[CATEGORIES]
cs.LG
A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks
[AUTHORS]
Neel Mishra, Bamdev Mishra, Pratik Jawanpuria, Pawan Kumar
[ABSTRACT]
A novel first-order method is proposed for training generative adversarial
networks (GANs). It modifies the Gauss-Newton method to approximate the min-max
Hessian and uses the Sherman-Morrison inversion formula to calculate the
inverse. The method corresponds to a fixed-point method that ensures necessary
contraction. To evaluate its effectiveness, numerical experiments are conducted
on various datasets commonly used in image generation tasks, such as MNIST,
Fashion MNIST, CIFAR10, FFHQ, and LSUN. Our method is capable of generating
high-fidelity images with greater diversity across multiple datasets. It also
achieves the highest inception score for CIFAR10 among all compared methods,
including state-of-the-art second-order methods. Additionally, its execution
time is comparable to that of first-order min-max methods.
[COMMENTS]
accepted in IJCNN 2023, 9 pages
[LINK]
http://arxiv.org/abs/2404.07172v1
[DATE]
2024-04-11 01:08:46+08:00
[CATEGORIES]
cs.LG
Worst-Case Convergence Time of ML Algorithms via Extreme Value Theory
[AUTHORS]
Saeid Tizpaz-Niari, Sriram Sankaranarayanan
[ABSTRACT]
This paper leverages the statistics of extreme values to predict the
worst-case convergence times of machine learning algorithms. Timing is a
critical non-functional property of ML systems, and providing the worst-case
converge times is essential to guarantee the availability of ML and its
services. However, timing properties such as worst-case convergence times
(WCCT) are difficult to verify since (1) they are not encoded in the syntax or
semantics of underlying programming languages of AI, (2) their evaluations
depend on both algorithmic implementations and underlying systems, and (3)
their measurements involve uncertainty and noise. Therefore, prevalent formal
methods and statistical models fail to provide rich information on the amounts
and likelihood of WCCT.
Our key observation is that the timing information we seek represents the
extreme tail of execution times. Therefore, extreme value theory (EVT), a
statistical discipline that focuses on understanding and predicting the
distribution of extreme values in the tail of outcomes, provides an ideal
framework to model and analyze WCCT in the training and inference phases of ML
paradigm. Building upon the mathematical tools from EVT, we propose a practical
framework to predict the worst-case timing properties of ML. Over a set of
linear ML training algorithms, we show that EVT achieves a better accuracy for
predicting WCCTs than relevant statistical methods such as the Bayesian factor.
On the set of larger machine learning training algorithms and deep neural
network inference, we show the feasibility and usefulness of EVT models to
accurately predict WCCTs, their expected return periods, and their likelihood.
[COMMENTS]
In 3rd International Conference on AI Engineering: Software
Engineering for AI (CAIN 2024)
[LINK]
http://arxiv.org/abs/2404.07170v1
[DATE]
2024-04-11 01:05:12+08:00
[CATEGORIES]
cs.LG
Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System
[AUTHORS]
Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu
[ABSTRACT]
Machine Learning (ML) training on large-scale datasets is a very expensive
and time-consuming workload. Processor-centric architectures (e.g., CPU, GPU)
commonly used for modern ML training workloads are limited by the data movement
bottleneck, i.e., due to repeatedly accessing the training dataset. As a
result, processor-centric systems suffer from performance degradation and high
energy consumption. Processing-In-Memory (PIM) is a promising solution to
alleviate the data movement bottleneck by placing the computation mechanisms
inside or near memory.
Our goal is to understand the capabilities and characteristics of popular
distributed optimization algorithms on real-world PIM architectures to
accelerate data-intensive ML training workloads. To this end, we 1) implement
several representative centralized distributed optimization algorithms on
UPMEM’s real-world general-purpose PIM system, 2) rigorously evaluate these
algorithms for ML training on large-scale datasets in terms of performance,
accuracy, and scalability, 3) compare to conventional CPU and GPU baselines,
and 4) discuss implications for future PIM hardware and the need to shift to an
algorithm-hardware codesign perspective to accommodate decentralized
distributed optimization algorithms.
Our results demonstrate three major findings: 1) Modern general-purpose PIM
architectures can be a viable alternative to state-of-the-art CPUs and GPUs for
many memory-bound ML training workloads, when operations and datatypes are
natively supported by PIM hardware, 2) the importance of carefully choosing the
optimization algorithm that best fit PIM, and 3) contrary to popular belief,
contemporary PIM architectures do not scale approximately linearly with the
number of nodes for many data-intensive ML training workloads. To facilitate
future research, we aim to open-source our complete codebase.
[LINK]
http://arxiv.org/abs/2404.07164v1
[DATE]
2024-04-11 01:00:04+08:00
[CATEGORIES]
cs.LG
Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning
[AUTHORS]
Thomas Chen
[ABSTRACT]
We consider the scenario of supervised learning in Deep Learning (DL)
networks, and exploit the arbitrariness of choice in the Riemannian metric
relative to which the gradient descent flow can be defined (a general fact of
differential geometry). In the standard approach to DL, the gradient flow on
the space of parameters (weights and biases) is defined with respect to the
Euclidean metric. Here instead, we choose the gradient flow with respect to the
Euclidean metric in the output layer of the DL network. This naturally induces
two modified versions of the gradient descent flow in the parameter space, one
adapted for the overparametrized setting, and the other for the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform
exponential convergence rate; one thereby obtains an a priori stopping time for
any prescribed proximity to the global minimum. We point out relations of the
latter to sub-Riemannian geometry. Moreover, we generalize the above framework
to the situation in which the rank condition does not hold; in particular, we
show that local equilibria can only exist if a rank loss occurs, and that
generically, they are not isolated points, but elements of a critical
submanifold of parameter space.
[COMMENTS]
AMS Latex, 20 pages. Significantly edited and extended, abstract
changed
[LINK]
http://arxiv.org/abs/2311.15487v4
[DATE]
2024-04-11 00:55:52+08:00
[CATEGORIES]
cs.LG
A Large-Scale Exploration of $μ$-Transfer
[AUTHORS]
Lucas Lingle
[ABSTRACT]
Large neural network models have become a mainstay of natural language
processing and computer vision, yet their initialization and learning rates are
set in a largely heuristic fashion, potentially varying from paper to paper and
one model size to the next. The $\mu$-Parameterization ($\mu$P) offers a
potential solution to these challenges, yielding scaling rules for model
initialization and learning rates, and reportedly enabling zero-shot
hyperparameter transfer from small to large models in a variety of cases.
Despite the evident promise, the $\mu$P scaling rules are not yet widely
adopted, perhaps due to higher implementation complexity, many variations, or
complex theoretical background. This work investigates $\mu$P empirically,
focusing on the ubiquitous transformer architecture, and aims to answer a
simple question: does $\mu$-Transfer yield optimal learning rates in practice?
From models with 2M to 10B parameters, we show that $\mu$-Transfer works as
intended for the majority of important cases, but also identify some surprising
cases where it may not.
Our experiment codebase is available at
https://github.com/lucaslingle/mu_transformer/
[COMMENTS]
Improved formatting and added comparison with SP
[LINK]
http://arxiv.org/abs/2404.05728v2
[DATE]
2024-04-11 00:55:37+08:00
[CATEGORIES]
cs.LG
Exploring Physiological Responses in Virtual Reality-based Interventions for Autism Spectrum Disorder: A Data-Driven Investigation
[AUTHORS]
Gianpaolo Alvari, Ersilia Vallefuoco, Melanie Cristofolini, Elio Salvadori, Marco Dianti, Alessia Moltani, Davide Dal Castello, Paola Venuti, Cesare Furlanello
[ABSTRACT]
Virtual Reality (VR) has emerged as a promising tool for enhancing social
skills and emotional well-being in individuals with Autism Spectrum Disorder
(ASD). Through a technical exploration, this study employs a multiplayer
serious gaming environment within VR, engaging 34 individuals diagnosed with
ASD and employing high-precision biosensors for a comprehensive view of the
participants’ arousal and responses during the VR sessions. Participants were
subjected to a series of 3 virtual scenarios designed in collaboration with
stakeholders and clinical experts to promote socio-cognitive skills and
emotional regulation in a controlled and structured virtual environment. We
combined the framework with wearable non-invasive sensors for bio-signal
acquisition, focusing on the collection of heart rate variability, and
respiratory patterns to monitor participants behaviors. Further, behavioral
assessments were conducted using observation and semi-structured interviews,
with the data analyzed in conjunction with physiological measures to identify
correlations and explore digital-intervention efficacy. Preliminary analysis
revealed significant correlations between physiological responses and
behavioral outcomes, indicating the potential of physiological feedback to
enhance VR-based interventions for ASD. The study demonstrated the feasibility
of using real-time data to adapt virtual scenarios, suggesting a promising
avenue to support personalized therapy. The integration of quantitative
physiological feedback into digital platforms represents a forward step in the
personalized intervention for ASD. By leveraging real-time data to adjust
therapeutic content, this approach promises to enhance the efficacy and
engagement of digital-based therapies.
[COMMENTS]
19 pages, 6 figures
[LINK]
http://arxiv.org/abs/2404.07159v1
[DATE]
2024-04-11 00:50:07+08:00
[CATEGORIES]
cs.LG
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach
[AUTHORS]
Yixuan Sun, Ololade Sowunmi, Romain Egele, Sri Hari Krishna Narayanan, Luke Van Roekel, Prasanna Balaprakash
[ABSTRACT]
Training an effective deep learning model to learn ocean processes involves
careful choices of various hyperparameters. We leverage the advanced search
algorithms for multiobjective optimization in DeepHyper, a scalable
hyperparameter optimization software, to streamline the development of neural
networks tailored for ocean modeling. The focus is on optimizing Fourier neural
operators (FNOs), a data-driven model capable of simulating complex ocean
behaviors. Selecting the correct model and tuning the hyperparameters are
challenging tasks, requiring much effort to ensure model accuracy. DeepHyper
allows efficient exploration of hyperparameters associated with data
preprocessing, FNO architecture-related hyperparameters, and various model
training strategies. We aim to obtain an optimal set of hyperparameters leading
to the most performant model. Moreover, on top of the commonly used mean
squared error for model training, we propose adopting the negative anomaly
correlation coefficient as the additional loss term to improve model
performance and investigate the potential trade-off between the two terms. The
experimental results show that the optimal set of hyperparameters enhanced
model performance in single timestepping forecasting and greatly exceeded the
baseline configuration in the autoregressive rollout for long-horizon
forecasting up to 30 days. Utilizing DeepHyper, we demonstrate an approach to
enhance the use of FNOs in ocean dynamics forecasting, offering a scalable
solution with improved precision.
[LINK]
http://arxiv.org/abs/2404.05768v2
[DATE]
2024-04-11 00:41:49+08:00
[CATEGORIES]
cs.LG
How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models
[AUTHORS]
Unnseo Park, Venkatesh Sivaraman, Adam Perer
[COMMENTS]
6 pages, 3 figures; accepted workshop paper at Time Series for Health
@ ICLR 2024
[LINK]
http://arxiv.org/abs/2404.07148v1
[DATE]
2024-04-11 00:29:21+08:00
[CATEGORIES]
cs.LG
Local Causal Discovery for Estimating Causal Effects
[AUTHORS]
Shantanu Gupta, David Childers, Zachary C. Lipton
[ABSTRACT]
Even when the causal graph underlying our data is unknown, we can use
observational data to narrow down the possible values that an average treatment
effect (ATE) can take by (1) identifying the graph up to a Markov equivalence
class; and (2) estimating that ATE for each graph in the class. While the PC
algorithm can identify this class under strong faithfulness assumptions, it can
be computationally prohibitive. Fortunately, only the local graph structure
around the treatment is required to identify the set of possible ATE values, a
fact exploited by local discovery algorithms to improve computational
efficiency. In this paper, we introduce Local Discovery using Eager Collider
Checks (LDECC), a new local causal discovery algorithm that leverages
unshielded colliders to orient the treatment’s parents differently from
existing methods. We show that there exist graphs where LDECC exponentially
outperforms existing local discovery algorithms and vice versa. Moreover, we
show that LDECC and existing algorithms rely on different faithfulness
assumptions, leveraging this insight to weaken the assumptions for identifying
the set of possible ATE values.
[COMMENTS]
Accepted at CLeaR 2023
[LINK]
http://arxiv.org/abs/2302.08070v4
[DATE]
2024-04-11 00:22:16+08:00
[CATEGORIES]
cs.LG
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
[AUTHORS]
Aaditya K. Singh, Ted Moskovitz, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe
[ABSTRACT]
In-context learning is a powerful emergent ability in transformer models.
Prior work in mechanistic interpretability has identified a circuit element
that may be critical for in-context learning – the induction head (IH), which
performs a match-and-copy operation. During training of large transformers on
natural language data, IHs emerge around the same time as a notable phase
change in the loss. Despite the robust evidence for IHs and this interesting
coincidence with the phase change, relatively little is known about the
diversity and emergence dynamics of IHs. Why is there more than one IH, and how
are they dependent on each other? Why do IHs appear all of a sudden, and what
are the subcircuits that enable them to emerge? We answer these questions by
studying IH emergence dynamics in a controlled setting by training on synthetic
data. In doing so, we develop and share a novel optogenetics-inspired causal
framework for modifying activations throughout training. Using this framework,
we delineate the diverse and additive nature of IHs. By clamping subsets of
activations throughout training, we then identify three underlying subcircuits
that interact to drive IH formation, yielding the phase change. Furthermore,
these subcircuits shed light on data-dependent properties of formation, such as
phase change timing, already showing the promise of this more in-depth
understanding of subcircuits that need to “go right” for an induction head.
[COMMENTS]
26 pages, 18 figures
[LINK]
http://arxiv.org/abs/2404.07129v1
[DATE]
2024-04-11 00:07:38+08:00
[CATEGORIES]
cs.LG
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
[AUTHORS]
Sara Kangaslahti, David Alvarez-Melis
[ABSTRACT]
As large language models (LLMs) have gained popularity for a variety of use
cases, making them adaptable and controllable has become increasingly
important, especially for user-facing applications. While the existing
literature on LLM adaptation primarily focuses on finding a model (or models)
that optimizes a single predefined objective, here we focus on the challenging
case where the model must dynamically adapt to diverse – and often changing –
user preferences. For this, we leverage adaptation methods based on linear
weight interpolation, casting them as continuous multi-domain interpolators
that produce models with specific prescribed generation characteristics
on-the-fly. Specifically, we use low-rank updates to fine-tune a base model to
various different domains, yielding a set of anchor models with distinct
generation profiles. Then, we use the weight updates of these anchor models to
parametrize the entire (infinite) class of models contained within their convex
hull. We empirically show that varying the interpolation weights yields
predictable and consistent change in the model outputs with respect to all of
the controlled attributes. We find that there is little entanglement between
most attributes and identify and discuss the pairs of attributes for which this
is not the case. Our results suggest that linearly interpolating between the
weights of fine-tuned models facilitates predictable, fine-grained control of
model outputs with respect to multiple stylistic characteristics
simultaneously.
[COMMENTS]
20 pages, 22 figures
[LINK]
http://arxiv.org/abs/2404.07117v1
[DATE]
2024-04-10 23:55:07+08:00
[CATEGORIES]
cs.CL
cs.LG
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
[AUTHORS]
Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen
[ABSTRACT]
In-context learning (ICL) is now a common method for teaching large language
models (LLMs) new tasks: given labeled examples in the input context, the LLM
learns to perform the task without weight updates. Do models guided via ICL
infer the underlying structure of the task defined by the context, or do they
rely on superficial heuristics that only generalize to identically distributed
examples? We address this question using transformations tasks and an NLI task
that assess sensitivity to syntax - a requirement for robust language
understanding. We further investigate whether out-of-distribution
generalization can be improved via chain-of-thought prompting, where the model
is provided with a sequence of intermediate computation steps that illustrate
how the task ought to be performed. In experiments with models from the GPT,
PaLM, and Llama 2 families, we find large variance across LMs. The variance is
explained more by the composition of the pre-training corpus and supervision
methods than by model size; in particular, models pre-trained on code
generalize better, and benefit more from chain-of-thought prompting.
[COMMENTS]
Accepted to NAACL 2024
[LINK]
http://arxiv.org/abs/2311.07811v2
[DATE]
2024-04-10 23:38:33+08:00
[CATEGORIES]
cs.CL
Dynamic Generation of Personalities with Large Language Models
[AUTHORS]
Jianzhi Liu, Hexiang Gu, Tianyu Zheng, Liuyu Xiang, Huijia Wu, Jie Fu, Zhaofeng He
[ABSTRACT]
In the realm of mimicking human deliberation, large language models (LLMs)
show promising performance, thereby amplifying the importance of this research
area. Deliberation is influenced by both logic and personality. However,
previous studies predominantly focused on the logic of LLMs, neglecting the
exploration of personality aspects. In this work, we introduce Dynamic
Personality Generation (DPG), a dynamic personality generation method based on
Hypernetworks. Initially, we embed the Big Five personality theory into GPT-4
to form a personality assessment machine, enabling it to evaluate characters’
personality traits from dialogues automatically. We propose a new metric to
assess personality generation capability based on this evaluation method. Then,
we use this personality assessment machine to evaluate dialogues in script
data, resulting in a personality-dialogue dataset. Finally, we fine-tune DPG on
the personality-dialogue dataset. Experiments prove that DPG’s personality
generation capability is stronger after fine-tuning on this dataset than
traditional fine-tuning methods, surpassing prompt-based GPT-4.
[LINK]
http://arxiv.org/abs/2404.07084v1
[DATE]
2024-04-10 23:17:17+08:00
[CATEGORIES]
cs.CL
Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study
[AUTHORS]
Alessandro Stolfo
[ABSTRACT]
We present an empirical study of groundedness in long-form question answering
(LFQA) by retrieval-augmented large language models (LLMs). In particular, we
evaluate whether every generated sentence is grounded in the retrieved
documents or the model’s pre-training data. Across 3 datasets and 4 model
families, our findings reveal that a significant fraction of generated
sentences are consistently ungrounded, even when those sentences contain
correct ground-truth answers. Additionally, we examine the impacts of factors
such as model size, decoding strategy, and instruction tuning on groundedness.
Our results show that while larger models tend to ground their outputs more
effectively, a significant portion of correct answers remains compromised by
hallucinations. This study provides novel insights into the groundedness
challenges in LFQA and underscores the necessity for more robust mechanisms in
LLMs to mitigate the generation of ungrounded content.
[COMMENTS]
NAACL 2024 (Findings)
[LINK]
http://arxiv.org/abs/2404.07060v1
[DATE]
2024-04-10 22:50:10+08:00
[CATEGORIES]
cs.CL
cs.LG
Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation
[AUTHORS]
Elisa Sanchez-Bayona, Rodrigo Agerri
[ABSTRACT]
Metaphors, although occasionally unperceived, are ubiquitous in our everyday
language. Thus, it is crucial for Language Models to be able to grasp the
underlying meaning of this kind of figurative language. In this work, we
present Meta4XNLI, a novel parallel dataset for the tasks of metaphor detection
and interpretation that contains metaphor annotations in both Spanish and
English. We investigate language models’ metaphor identification and
understanding abilities through a series of monolingual and cross-lingual
experiments by leveraging our proposed corpus. In order to comprehend how these
non-literal expressions affect models’ performance, we look over the results
and perform an error analysis. Additionally, parallel data offers many
potential opportunities to investigate metaphor transferability between these
languages and the impact of translation on the development of multilingual
annotated resources.
[LINK]
http://arxiv.org/abs/2404.07053v1
[DATE]
2024-04-10 22:44:48+08:00
[CATEGORIES]
cs.CL
cs.LG
A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media
[AUTHORS]
Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver, Senja Pollak
[COMMENTS]
The first authors have contributted equally. Accepted at LREC-COLING
[LINK]
http://arxiv.org/abs/2404.07036v1
[DATE]
2024-04-10 22:28:09+08:00
[CATEGORIES]
cs.CL
Characterizing and Classifying Developer Forum Posts with their Intentions
[AUTHORS]
Xingfang Wu, Eric Laufer, Heng Li, Foutse Khomh, Santhosh Srinivasan, Jayden Luo
[ABSTRACT]
With the rapid growth of the developer community, the amount of posts on
online technical forums has been growing rapidly, which poses difficulties for
users to filter useful posts and find important information. Tags provide a
concise feature dimension for users to locate their interested posts and for
search engines to index the most relevant posts according to the queries.
However, most tags are only focused on the technical perspective (e.g., program
language, platform, tool). In most cases, forum posts in online developer
communities reveal the author’s intentions to solve a problem, ask for advice,
share information, etc. The modeling of the intentions of posts can provide an
extra dimension to the current tag taxonomy. By referencing previous studies
and learning from industrial perspectives, we create a refined taxonomy for the
intentions of technical forum posts. Through manual labeling and analysis on a
sampled post dataset extracted from online forums, we understand the relevance
between the constitution of posts (code, error messages) and their intentions.
Furthermore, inspired by our manual study, we design a pre-trained
transformer-based model to automatically predict post intentions. The best
variant of our intention prediction framework, which achieves a Micro F1-score
of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787,
outperforms the state-of-the-art baseline approach. Our characterization and
automated classification of forum posts regarding their intentions may help
forum maintainers or third-party tool developers improve the organization and
retrieval of posts on technical forums. We have released our annotated dataset
and codes in our supplementary material package.
[COMMENTS]
Journal of Empirical Software Engineering, 40 pages
[LINK]
http://arxiv.org/abs/2312.14279v2
[DATE]
2024-04-10 22:25:30+08:00
[CATEGORIES]
cs.CL
cs.LG
Using Persuasive Writing Strategies to Explain and Detect Health Misinformation
[AUTHORS]
Danial Kamali, Joseph Romain, Huiyi Liu, Wei Peng, Jingbo Meng, Parisa Kordjamshidi
[COMMENTS]
Accepted at LREC-CoLING-2024
[LINK]
http://arxiv.org/abs/2211.05985v4
[DATE]
2024-04-10 22:13:29+08:00
[CATEGORIES]
cs.CL
cs.LG
Improving Language Model Reasoning with Self-motivated Learning
[AUTHORS]
Yunlong Feng, Yang Xu, Libo Qin, Yasheng Wang, Wanxiang Che
[COMMENTS]
Accepted at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.07017v1
[DATE]
2024-04-10 22:05:44+08:00
[CATEGORIES]
cs.CL
A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification
[AUTHORS]
Yunlong Feng, Bohan Li, Libo Qin, Xiao Xu, Wanxiang Che
[COMMENTS]
Accepted at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2304.09820v2
[DATE]
2024-04-10 22:03:01+08:00
[CATEGORIES]
cs.CL
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
[AUTHORS]
Igor Tufanov, Karen Hambardzumyan, Javier Ferrando, Elena Voita
[ABSTRACT]
We present the LM Transparency Tool (LM-TT), an open-source interactive
toolkit for analyzing the internal workings of Transformer-based language
models. Differently from previously existing tools that focus on isolated parts
of the decision-making process, our framework is designed to make the entire
prediction process transparent, and allows tracing back model behavior from the
top-layer representation to very fine-grained parts of the model. Specifically,
it (1) shows the important part of the whole input-to-output information flow,
(2) allows attributing any changes done by a model block to individual
attention heads and feed-forward neurons, (3) allows interpreting the functions
of those heads or neurons. A crucial part of this pipeline is showing the
importance of specific model components at each step. As a result, we are able
to look at the roles of model components only in cases where they are important
for a prediction. Since knowing which components should be inspected is key for
analyzing large models where the number of these components is extremely high,
we believe our tool will greatly support the interpretability community both in
research settings and in practical applications.
[LINK]
http://arxiv.org/abs/2404.07004v1
[DATE]
2024-04-10 21:39:11+08:00
[CATEGORIES]
cs.CL
XNLIeu: a dataset for cross-lingual NLI in Basque
[AUTHORS]
Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa
[ABSTRACT]
XNLI is a popular Natural Language Inference (NLI) benchmark widely used to
evaluate cross-lingual Natural Language Understanding (NLU) capabilities across
languages. In this paper, we expand XNLI to include Basque, a low-resource
language that can greatly benefit from transfer-learning approaches. The new
dataset, dubbed XNLIeu, has been developed by first machine-translating the
English XNLI corpus into Basque, followed by a manual post-edition step. We
have conducted a series of experiments using mono- and multilingual LLMs to
assess a) the effect of professional post-edition on the MT system; b) the best
cross-lingual strategy for NLI in Basque; and c) whether the choice of the best
cross-lingual strategy is influenced by the fact that the dataset is built by
translation. The results show that post-edition is necessary and that the
translate-train cross-lingual strategy obtains better results overall, although
the gain is lower when tested in a dataset that has been built natively from
scratch. Our code and datasets are publicly available under open licenses.
[COMMENTS]
Accepted to NAACL 2024
[LINK]
http://arxiv.org/abs/2404.06996v1
[DATE]
2024-04-10 21:19:56+08:00
[CATEGORIES]
cs.CL
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
[AUTHORS]
Muhammad U. Nasir, Sam Earle, Julian Togelius, Steven James, Christopher Cleghorn
[ABSTRACT]
Large Language Models (LLMs) have emerged as powerful tools capable of
accomplishing a broad spectrum of tasks. Their abilities span numerous areas,
and one area where they have made a significant impact is in the domain of code
generation. Here, we propose using the coding abilities of LLMs to introduce
meaningful variations to code defining neural networks. Meanwhile,
Quality-Diversity (QD) algorithms are known to discover diverse and robust
solutions. By merging the code-generating abilities of LLMs with the diversity
and robustness of QD solutions, we introduce \texttt{LLMatic}, a Neural
Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS
directly through prompts, \texttt{LLMatic} uses a procedural approach,
leveraging QD for prompts and network architecture to create diverse and
high-performing networks. We test \texttt{LLMatic} on the CIFAR-10 and
NAS-bench-201 benchmarks, demonstrating that it can produce competitive
networks while evaluating just $2,000$ candidates, even without prior knowledge
of the benchmark domain or exposure to any previous top-performing models for
the benchmark. The open-sourced code is available in
\url{https://github.com/umair-nasir14/LLMatic}.
[COMMENTS]
Accepted to The Genetic and Evolutionary Computation Conference 2024
[LINK]
http://arxiv.org/abs/2306.01102v7
[DATE]
2024-04-10 21:18:37+08:00
[CATEGORIES]
cs.CL
Quality and Quantity of Machine Translation References for Automatic Metrics
[AUTHORS]
Vilém Zouhar, Ondřej Bojar
[ABSTRACT]
Automatic machine translation metrics typically rely on human translations to
determine the quality of system translations. Common wisdom in the field
dictates that the human references should be of very high quality. However,
there are no cost-benefit analyses that could be used to guide practitioners
who plan to collect references for machine translation evaluation. We find that
higher-quality references lead to better metric correlations with humans at the
segment-level. Having up to 7 references per segment and taking their average
(or maximum) helps all metrics. Interestingly, the references from vendors of
different qualities can be mixed together and improve metric success. Higher
quality references, however, cost more to create and we frame this as an
optimization problem: given a specific budget, what references should be
collected to maximize metric success. These findings can be used by evaluators
of shared tasks when references need to be created under a certain budget.
[LINK]
http://arxiv.org/abs/2401.01283v5
[DATE]
2024-04-10 21:05:42+08:00
[CATEGORIES]
cs.CL
Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning
[AUTHORS]
Peipei Liu, Gaosheng Wang, Ying Tong, Jian Liang, Zhenquan Ding, Hongsong Zhu
[ABSTRACT]
Few-shot named entity recognition can identify new types of named entities
based on a few labeled examples. Previous methods employing token-level or
span-level metric learning suffer from the computational burden and a large
number of negative sample spans. In this paper, we propose the Hybrid
Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning
(MsFNER), which splits the general NER into two stages: entity-span detection
and entity classification. There are 3 processes for introducing MsFNER:
training, finetuning, and inference. In the training process, we train and get
the best entity-span detection model and the entity classification model
separately on the source domain using meta-learning, where we create a
contrastive learning module to enhance entity representations for entity
classification. During finetuning, we finetune the both models on the support
dataset of target domain. In the inference process, for the unlabeled data, we
first detect the entity-spans, then the entity-spans are jointly determined by
the entity classification model and the KNN. We conduct experiments on the open
FewNERD dataset and the results demonstrate the advance of MsFNER.
[LINK]
http://arxiv.org/abs/2404.06970v1
[DATE]
2024-04-10 20:31:09+08:00
[CATEGORIES]
cs.CL
Charles Translator: A Machine Translation System between Ukrainian and Czech
[AUTHORS]
Martin Popel, Lucie Poláková, Michal Novák, Jindřich Helcl, Jindřich Libovický, Pavel Straňák, Tomáš Krabač, Jaroslava Hlaváčová, Mariia Anisimova, Tereza Chlaňová
[ABSTRACT]
We present Charles Translator, a machine translation system between Ukrainian
and Czech, developed as part of a society-wide effort to mitigate the impact of
the Russian-Ukrainian war on individuals and society. The system was developed
in the spring of 2022 with the help of many language data providers in order to
quickly meet the demand for such a service, which was not available at the time
in the required quality. The translator was later implemented as an online web
interface and as an Android app with speech input, both featuring
Cyrillic-Latin script transliteration. The system translates directly, compared
to other available systems that use English as a pivot, and thus take advantage
of the typological similarity of the two languages. It uses the block
back-translation method, which allows for efficient use of monolingual training
data. The paper describes the development process, including data collection
and implementation, evaluation, mentions several use cases, and outlines
possibilities for the further development of the system for educational
purposes.
[LINK]
http://arxiv.org/abs/2404.06964v1
[DATE]
2024-04-10 20:22:32+08:00
[CATEGORIES]
cs.CL
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
[AUTHORS]
Yijin Liu, Fandong Meng, Jie Zhou
[ABSTRACT]
Recently, dynamic computation methods have shown notable acceleration for
Large Language Models (LLMs) by skipping several layers of computations through
elaborate heuristics or additional predictors. However, in the decoding process
of existing approaches, different samples are assigned different computational
budgets, which cannot guarantee a stable and precise acceleration effect.
Furthermore, existing approaches generally skip multiple contiguous layers at
the bottom or top of the layers, leading to a drastic change in the model’s
layer-wise representations, and thus a consequent performance degeneration.
Therefore, we propose a Unified Layer Skipping strategy, which selects the
number of layers to skip computation based solely on the target speedup ratio,
and then skips the corresponding number of intermediate layer computations in a
balanced manner. Since the Unified Layer Skipping strategy is independent of
input samples, it naturally supports popular acceleration techniques such as
batch decoding and KV caching, thus demonstrating more practicality for
real-world applications. Experimental results on two common tasks, i.e.,
machine translation and text summarization, indicate that given a target
speedup ratio, the Unified Layer Skipping strategy significantly enhances both
the inference performance and the actual model throughput over existing dynamic
approaches.
[COMMENTS]
12 pages, codes at https://github.com/Adaxry/Unified_Layer_Skipping
[LINK]
http://arxiv.org/abs/2404.06954v1
[DATE]
2024-04-10 20:12:07+08:00
[CATEGORIES]
cs.CL
GraSAME: Injecting Token-Level Structural Information to Pretrained Language Models via Graph-guided Self-Attention Mechanism
[AUTHORS]
Shuzhou Yuan, Michael Färber
[ABSTRACT]
Pretrained Language Models (PLMs) benefit from external knowledge stored in
graph structures for various downstream tasks. However, bridging the modality
gap between graph structures and text remains a significant challenge.
Traditional methods like linearizing graphs for PLMs lose vital graph
connectivity, whereas Graph Neural Networks (GNNs) require cumbersome processes
for integration into PLMs. In this work, we propose a novel graph-guided
self-attention mechanism, GraSAME. GraSAME seamlessly incorporates token-level
structural information into PLMs without necessitating additional alignment or
concatenation efforts. As an end-to-end, lightweight multimodal module, GraSAME
follows a multi-task learning strategy and effectively bridges the gap between
graph and textual modalities, facilitating dynamic interactions between GNNs
and PLMs. Our experiments on the graph-to-text generation task demonstrate that
GraSAME outperforms baseline models and achieves results comparable to
state-of-the-art (SOTA) models on WebNLG datasets. Furthermore, compared to
SOTA models, GraSAME eliminates the need for extra pre-training tasks to adjust
graph inputs and reduces the number of trainable parameters by over 100
million.
[COMMENTS]
NAACL 2024 Findings
[LINK]
http://arxiv.org/abs/2404.06911v1
[DATE]
2024-04-10 19:03:57+08:00
[CATEGORIES]
cs.CL
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
[AUTHORS]
Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi
[ABSTRACT]
Despite the successes of large language models (LLMs), they exhibit
significant drawbacks, particularly when processing long contexts. Their
inference cost scales quadratically with respect to sequence length, making it
expensive for deployment in some real-world text processing applications, such
as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the
“distraction phenomenon,” where irrelevant context in the prompt degrades
output quality. To address these drawbacks, we propose a novel RAG prompting
methodology, superposition prompting, which can be directly applied to
pre-trained transformer-based LLMs without the need for fine-tuning. At a high
level, superposition prompting allows the LLM to process input documents in
parallel prompt paths, discarding paths once they are deemed irrelevant. We
demonstrate the capability of our method to simultaneously enhance time
efficiency across a variety of question-answering benchmarks using multiple
pre-trained LLMs. Furthermore, our technique significantly improves accuracy
when the retrieved context is large relative the context the model was trained
on. For example, our approach facilitates an 93x reduction in compute time
while improving accuracy by 43\% on the NaturalQuestions-Open dataset with the
MPT-7B instruction-tuned model over naive RAG.
[LINK]
http://arxiv.org/abs/2404.06910v1
[DATE]
2024-04-10 19:03:17+08:00
[CATEGORIES]
cs.CL
cs.LG
Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models
[AUTHORS]
Feihu Jiang, Chuan Qin, Kaichun Yao, Chuyu Fang, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong
[ABSTRACT]
Efficient knowledge management plays a pivotal role in augmenting both the
operational efficiency and the innovative capacity of businesses and
organizations. By indexing knowledge through vectorization, a variety of
knowledge retrieval methods have emerged, significantly enhancing the efficacy
of knowledge management systems. Recently, the rapid advancements in generative
natural language processing technologies paved the way for generating precise
and coherent answers after retrieving relevant documents tailored to user
queries. However, for enterprise knowledge bases, assembling extensive training
data from scratch for knowledge retrieval and generation is a formidable
challenge due to the privacy and security policies of private data, frequently
entailing substantial costs. To address the challenge above, in this paper, we
propose EKRG, a novel Retrieval-Generation framework based on large language
models (LLMs), expertly designed to enable question-answering for Enterprise
Knowledge bases with limited annotation costs. Specifically, for the retrieval
process, we first introduce an instruction-tuning method using an LLM to
generate sufficient document-question pairs for training a knowledge retriever.
This method, through carefully designed instructions, efficiently generates
diverse questions for enterprise knowledge bases, encompassing both
fact-oriented and solution-oriented knowledge. Additionally, we develop a
relevance-aware teacher-student learning strategy to further enhance the
efficiency of the training process. For the generation process, we propose a
novel chain of thought (CoT) based fine-tuning method to empower the LLM-based
generator to adeptly respond to user questions using retrieved documents.
Finally, extensive experiments on real-world datasets have demonstrated the
effectiveness of our proposed framework.
[COMMENTS]
DASFAA 2024 Accepted
[LINK]
http://arxiv.org/abs/2404.08695v1
[DATE]
2024-04-10 18:38:17+08:00
[CATEGORIES]
cs.CL
Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata
[AUTHORS]
Jinghong Chen, Weizhe Lin, Jingbiao Mei, Bill Byrne
[ABSTRACT]
The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model
that performs well in Neural Machine Translation. Two issues prevent its
application to general Natural Language Generation (NLG) tasks: frequent
Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity
names. We introduce Control-DAG, a constrained decoding algorithm for our
Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length
control. We show that Control-DAG significantly enhances DA-T5 on the Schema
Guided Dialogue and the DART datasets, establishing strong NAR results for
Task-Oriented Dialogue and Data-to-Text NLG.
[COMMENTS]
11 pages. NAACL 2024
[LINK]
http://arxiv.org/abs/2404.06854v1
[DATE]
2024-04-10 17:28:14+08:00
[CATEGORIES]
cs.CL
Simpler becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?
[AUTHORS]
Miriam Anschütz, Edoardo Mosca, Georg Groh
[ABSTRACT]
Text simplification seeks to improve readability while retaining the original
content and meaning. Our study investigates whether pre-trained classifiers
also maintain such coherence by comparing their predictions on both original
and simplified inputs. We conduct experiments using 11 pre-trained models,
including BERT and OpenAI’s GPT 3.5, across six datasets spanning three
languages. Additionally, we conduct a detailed analysis of the correlation
between prediction change rates and simplification types/strengths. Our
findings reveal alarming inconsistencies across all languages and models. If
not promptly addressed, simplified inputs can be easily exploited to craft
zero-iteration model-agnostic adversarial attacks with success rates of up to
50%
[COMMENTS]
Published at DeTermIt! Workshop at LREC-COLING 2024
[LINK]
http://arxiv.org/abs/2404.06838v1
[DATE]
2024-04-10 17:02:33+08:00
[CATEGORIES]
cs.CL
Emotion-cause pair extraction method based on multi-granularity information and multi-module interaction
[AUTHORS]
Mingrui Fu, Weijiang Li
[ABSTRACT]
The purpose of emotion-cause pair extraction is to extract the pair of
emotion clauses and cause clauses. On the one hand, the existing methods do not
take fully into account the relationship between the emotion extraction of two
auxiliary tasks. On the other hand, the existing two-stage model has the
problem of error propagation. In addition, existing models do not adequately
address the emotion and cause-induced locational imbalance of samples. To solve
these problems, an end-to-end multitasking model (MM-ECPE) based on shared
interaction between GRU, knowledge graph and transformer modules is proposed.
Furthermore, based on MM-ECPE, in order to use the encoder layer to better
solve the problem of imbalanced distribution of clause distances between
clauses and emotion clauses, we propose a novel encoding based on BERT,
sentiment lexicon, and position-aware interaction module layer of emotion motif
pair retrieval model (MM-ECPE(BERT)). The model first fully models the
interaction between different tasks through the multi-level sharing module, and
mines the shared information between emotion-cause pair extraction and the
emotion extraction and cause extraction. Second, to solve the imbalanced
distribution of emotion clauses and cause clauses problem, suitable labels are
screened out according to the knowledge graph path length and task-specific
features are constructed so that the model can focus on extracting pairs with
corresponding emotion-cause relationships. Experimental results on the ECPE
benchmark dataset show that the proposed model achieves good performance,
especially on position-imbalanced samples.
[LINK]
http://arxiv.org/abs/2404.06812v1
[DATE]
2024-04-10 16:00:26+08:00
[CATEGORIES]
cs.CL
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation
[AUTHORS]
Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun
[ABSTRACT]
The rapid development of large language models has led to the widespread
adoption of Retrieval-Augmented Generation (RAG), which integrates external
knowledge to alleviate knowledge bottlenecks and mitigate hallucinations.
However, the existing RAG paradigm inevitably suffers from the impact of flawed
information introduced during the retrieval phrase, thereby diminishing the
reliability and correctness of the generated outcomes. In this paper, we
propose Credibility-aware Generation (CAG), a universally applicable framework
designed to mitigate the impact of flawed information in RAG. At its core, CAG
aims to equip models with the ability to discern and process information based
on its credibility. To this end, we propose an innovative data transformation
framework that generates data based on credibility, thereby effectively
endowing models with the capability of CAG. Furthermore, to accurately evaluate
the models’ capabilities of CAG, we construct a comprehensive benchmark
covering three critical real-world scenarios. Experimental results demonstrate
that our model can effectively understand and utilize credibility for
generation, significantly outperform other models with retrieval augmentation,
and exhibit resilience against the disruption caused by noisy documents,
thereby maintaining robust performance. Moreover, our model supports customized
credibility, offering a wide range of potential applications.
[COMMENTS]
Our code, benchmark, and models are available at
https://github.com/panruotong/CAG
[LINK]
http://arxiv.org/abs/2404.06809v1
[DATE]
2024-04-10 15:56:26+08:00
[CATEGORIES]
cs.CL
Retrieval Augmented Generation using Engineering Design Knowledge
[AUTHORS]
L Siddharth, Jianxi Luo
[ABSTRACT]
Large-language Models (LLMs) need to adopt Retrieval-Augmented Generation
(RAG) to generate factual responses that are better suited to knowledge-based
applications in the design process. We present a data-driven method to identify
explicit facts of the form - head entity :: relationship :: tail entity from
patented artefact descriptions. We train roBERTa Transformer-based sequence
classification models using our proprietary dataset of 44,227 sentences. Upon
classifying tokens in a sentence as entities or relationships, our method uses
another classifier to identify specific relationship tokens for a given pair of
entities. We compare the performances against linear classifiers and Graph
Neural Networks (GNNs) that both incorporate BERT Transformer-based token
embeddings to predict associations among the entities and relationships. We
apply our method to 4,870 fan system related patents and populate a knowledge
base that constitutes around 3 million facts. Using the knowledge base, we
demonstrate retrieving generalisable and specific domain knowledge for
contextualising LLMs.
[LINK]
http://arxiv.org/abs/2307.06985v6
[DATE]
2024-04-10 15:51:22+08:00
[CATEGORIES]
cs.CL
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
[AUTHORS]
Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen
[COMMENTS]
NAACL 2024
[LINK]
http://arxiv.org/abs/2404.06480v2
[DATE]
2024-04-10 15:40:56+08:00
[CATEGORIES]
cs.CL
DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space
[AUTHORS]
Jianxiang Xiang, Zhenhua Liu, Haodong Liu, Yin Bai, Jia Cheng, Wenliang Chen
[ABSTRACT]
In real-life conversations, the content is diverse, and there exists the
one-to-many problem that requires diverse generation. Previous studies
attempted to introduce discrete or Gaussian-based continuous latent variables
to address the one-to-many problem, but the diversity is limited. Recently,
diffusion models have made breakthroughs in computer vision, and some attempts
have been made in natural language processing. In this paper, we propose
DiffusionDialog, a novel approach to enhance the diversity of dialogue
generation with the help of diffusion model. In our approach, we introduce
continuous latent variables into the diffusion model. The problem of using
latent variables in the dialog task is how to build both an effective prior of
the latent space and an inferring process to obtain the proper latent given the
context. By combining the encoder and latent-based diffusion model, we encode
the response’s latent representation in a continuous space as the prior,
instead of fixed Gaussian distribution or simply discrete ones. We then infer
the latent by denoising step by step with the diffusion model. The experimental
results show that our model greatly enhances the diversity of dialog responses
while maintaining coherence. Furthermore, in further analysis, we find that our
diffusion model achieves high inference efficiency, which is the main challenge
of applying diffusion models in natural language processing.
[COMMENTS]
LREC-COLING 2024 camera ready
[LINK]
http://arxiv.org/abs/2404.06760v1
[DATE]
2024-04-10 13:56:46+08:00
[CATEGORIES]
cs.CL
Language Generation in the Limit
[AUTHORS]
Jon Kleinberg, Sendhil Mullainathan
[ABSTRACT]
Although current large language models are complex, the most basic
specifications of the underlying language generation problem itself are simple
to state: given a finite set of training samples from an unknown language,
produce valid new strings from the language that don’t already appear in the
training data. Here we ask what we can conclude about language generation using
only this specification, without further assumptions. In particular, suppose
that an adversary enumerates the strings of an unknown target language L that
is known only to come from one of a possibly infinite list of candidates. A
computational agent is trying to learn to generate from this language; we say
that the agent generates from L in the limit if after some finite point in the
enumeration of L, the agent is able to produce new elements that come
exclusively from L and that have not yet been presented by the adversary. Our
main result is that there is an agent that is able to generate in the limit for
every countable list of candidate languages. This contrasts dramatically with
negative results due to Gold and Angluin in a well-studied model of language
learning where the goal is to identify an unknown language from samples; the
difference between these results suggests that identifying a language is a
fundamentally different problem than generating from it.
[COMMENTS]
24 pages, 2 figures
[LINK]
http://arxiv.org/abs/2404.06757v1
[DATE]
2024-04-10 13:53:25+08:00
[CATEGORIES]
cs.CL
cs.LG
FPT: Feature Prompt Tuning for Few-shot Readability Assessment
[AUTHORS]
Ziyang Wang, Sanwoo Lee, Hsiu-Yuan Huang, Yunfang Wu
[ABSTRACT]
Prompt-based methods have achieved promising results in most few-shot text
classification tasks. However, for readability assessment tasks, traditional
prompt methods lackcrucial linguistic knowledge, which has already been proven
to be essential. Moreover, previous studies on utilizing linguistic features
have shown non-robust performance in few-shot settings and may even impair
model performance.To address these issues, we propose a novel prompt-based
tuning framework that incorporates rich linguistic knowledge, called Feature
Prompt Tuning (FPT). Specifically, we extract linguistic features from the text
and embed them into trainable soft prompts. Further, we devise a new loss
function to calibrate the similarity ranking order between categories.
Experimental results demonstrate that our proposed method FTP not only exhibits
a significant performance improvement over the prior best prompt-based tuning
approaches, but also surpasses the previous leading methods that incorporate
linguistic features. Also, our proposed model significantly outperforms the
large language model gpt-3.5-turbo-16k in most cases. Our proposed method
establishes a new architecture for prompt tuning that sheds light on how
linguistic features can be easily adapted to linguistic-related tasks.
[COMMENTS]
NAACL-2024 main conference
[LINK]
http://arxiv.org/abs/2404.02772v2
[DATE]
2024-04-10 12:25:09+08:00
[CATEGORIES]
cs.CL
Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision
[AUTHORS]
Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel
[ABSTRACT]
Modern electronic health records (EHRs) hold immense promise in tracking
personalized patient health trajectories through sequential deep learning,
owing to their extensive breadth, scale, and temporal granularity. Nonetheless,
how to effectively leverage multiple modalities from EHRs poses significant
challenges, given its complex characteristics such as high dimensionality,
multimodality, sparsity, varied recording frequencies, and temporal
irregularities. To this end, this paper introduces a novel multimodal
contrastive learning framework, specifically focusing on medical time series
and clinical notes. To tackle the challenge of sparsity and irregular time
intervals in medical time series, the framework integrates temporal
cross-attention transformers with a dynamic embedding and tokenization scheme
for learning multimodal feature representations. To harness the interconnected
relationships between medical time series and clinical notes, the framework
equips a global contrastive loss, aligning a patient’s multimodal feature
representations with the corresponding discharge summaries. Since discharge
summaries uniquely pertain to individual patients and represent a holistic view
of the patient’s hospital stay, machine learning models are led to learn
discriminative multimodal features via global contrasting. Extensive
experiments with a real-world EHR dataset demonstrated that our framework
outperformed state-of-the-art approaches on the exemplar task of predicting the
occurrence of nine postoperative complications for more than 120,000 major
inpatient surgeries using multimodal data from UF health system split among
three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health
Jacksonville-North).
[COMMENTS]
12 pages, 3 figures. arXiv admin note: text overlap with
arXiv:2403.04012
[LINK]
http://arxiv.org/abs/2404.06723v1
[DATE]
2024-04-10 12:19:59+08:00
[CATEGORIES]
cs.LG
cs.CL
Apollonion: Profile-centric Dialog Agent
[AUTHORS]
Shangyu Chen, Zibo Zhao, Yuanyuan Zhao, Xiang Li
[ABSTRACT]
The emergence of Large Language Models (LLMs) has innovated the development
of dialog agents. Specially, a well-trained LLM, as a central process unit, is
capable of providing fluent and reasonable response for user’s request.
Besides, auxiliary tools such as external knowledge retrieval, personalized
character for vivid response, short/long-term memory for ultra long context
management are developed, completing the usage experience for LLM-based dialog
agents. However, the above-mentioned techniques does not solve the issue of
\textbf{personalization from user perspective}: agents response in a same
fashion to different users, without consideration of their features, such as
habits, interests and past experience. In another words, current implementation
of dialog agents fail in “knowing the user”. The capacity of well-description
and representation of user is under development. In this work, we proposed a
framework for dialog agent to incorporate user profiling (initialization,
update): user’s query and response is analyzed and organized into a structural
user profile, which is latter served to provide personal and more precise
response. Besides, we proposed a series of evaluation protocols for
personalization: to what extend the response is personal to the different
users.
The framework is named as \method{}, inspired by inscription of ``Know
Yourself’’ in the temple of Apollo (also known as \method{}) in Ancient Greek.
Few works have been conducted on incorporating personalization into LLM,
\method{} is a pioneer work on guiding LLM’s response to meet individuation via
the application of dialog agents, with a set of evaluation methods for
measurement in personalization.
[LINK]
http://arxiv.org/abs/2404.08692v1
[DATE]
2024-04-10 11:32:41+08:00
[CATEGORIES]
cs.CL
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
[AUTHORS]
Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng
[ABSTRACT]
Recent advancements in zero-shot text-to-speech (TTS) modeling have led to
significant strides in generating high-fidelity and diverse speech. However,
dialogue generation, along with achieving human-like naturalness in speech,
continues to be a challenge in the field. In this paper, we introduce CoVoMix:
Conversational Voice Mixture Generation, a novel model for zero-shot,
human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix is
capable of first converting dialogue text into multiple streams of discrete
tokens, with each token stream representing semantic information for individual
talkers. These token streams are then fed into a flow-matching based acoustic
model to generate mixed mel-spectrograms. Finally, the speech waveforms are
produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of
metrics for measuring the effectiveness of dialogue modeling and generation.
Our experimental results show that CoVoMix can generate dialogues that are not
only human-like in their naturalness and coherence but also involve multiple
talkers engaging in multiple rounds of conversation. These dialogues, generated
within a single channel, are characterized by seamless speech transitions,
including overlapping speech, and appropriate paralinguistic behaviors such as
laughter. Audio samples are available at https://aka.ms/covomix.
[LINK]
http://arxiv.org/abs/2404.06690v1
[DATE]
2024-04-10 10:32:58+08:00
[CATEGORIES]
cs.CL
cs.LG
MiniLLM: Knowledge Distillation of Large Language Models
[AUTHORS]
Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
[ABSTRACT]
Knowledge Distillation (KD) is a promising technique for reducing the high
computational demand of large language models (LLMs). However, previous KD
methods are primarily applied to white-box classification models or training
small models to imitate black-box model APIs like ChatGPT. How to effectively
distill the knowledge of white-box LLMs into small models is still
under-explored, which becomes more important with the prosperity of open-source
LLMs. In this work, we propose a KD approach that distills LLMs into smaller
language models. We first replace the forward Kullback-Leibler divergence (KLD)
objective in the standard KD approaches with reverse KLD, which is more
suitable for KD on generative language models, to prevent the student model
from overestimating the low-probability regions of the teacher distribution.
Then, we derive an effective optimization approach to learn this objective. The
student models are named MiniLLM. Extensive experiments in the
instruction-following setting show that MiniLLM generates more precise
responses with higher overall quality, lower exposure bias, better calibration,
and higher long-text generation performance than the baselines. Our method is
scalable for different model families with 120M to 13B parameters. Our code,
data, and model checkpoints can be found in
https://github.com/microsoft/LMOps/tree/main/minillm.
[COMMENTS]
Published as a conference paper in ICLR 2024
[LINK]
http://arxiv.org/abs/2306.08543v4
[DATE]
2024-04-10 10:30:19+08:00
[CATEGORIES]
cs.CL
Subspace Representations for Soft Set Operations and Sentence Similarities
[AUTHORS]
Yoichi Ishibashi, Sho Yokoi, Katsuhito Sudoh, Satoshi Nakamura
[COMMENTS]
Accepted at NAACL 2024
[LINK]
http://arxiv.org/abs/2210.13034v4
[DATE]
2024-04-10 10:16:55+08:00
[CATEGORIES]
cs.CL
cs.LG
Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology
[AUTHORS]
Shashi Kant Gupta, Aditya Basu, Bradley Taylor, Anai Kothari, Hrituraj Singh
[ABSTRACT]
Retrieving information from EHR systems is essential for answering specific
questions about patient journeys and improving the delivery of clinical care.
Despite this fact, most EHR systems still rely on keyword-based searches. With
the advent of generative large language models (LLMs), retrieving information
can lead to better search and summarization capabilities. Such retrievers can
also feed Retrieval-augmented generation (RAG) pipelines to answer any query.
However, the task of retrieving information from EHR real-world clinical data
contained within EHR systems in order to solve several downstream use cases is
challenging due to the difficulty in creating query-document support pairs. We
provide a blueprint for creating such datasets in an affordable manner using
large language models. Our method results in a retriever that is 30-50 F-1
points better than propriety counterparts such as Ada and Mistral for oncology
data elements. We further compare our model, called Onco-Retriever, against
fine-tuned PubMedBERT model as well. We conduct an extensive manual evaluation
on real-world EHR data along with latency analysis of the different models and
provide a path forward for healthcare organizations to build domain-specific
retrievers.
[COMMENTS]
18 pages
[LINK]
http://arxiv.org/abs/2404.06680v1
[DATE]
2024-04-10 10:02:34+08:00
[CATEGORIES]
cs.CL
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
[AUTHORS]
Kehua Feng, Keyan Ding, Kede Ma, Zhihua Wang, Qiang Zhang, Huajun Chen
[ABSTRACT]
The past years have witnessed a proliferation of large language models
(LLMs). Yet, automated and unbiased evaluation of LLMs is challenging due to
the inaccuracy of standard metrics in reflecting human preferences and the
inefficiency in sampling informative and diverse test examples. While human
evaluation remains the gold standard, it is expensive and time-consuming,
especially when dealing with a large number of testing samples. To address this
problem, we propose a sample-efficient human evaluation method based on MAximum
Discrepancy (MAD) competition. MAD automatically selects a small set of
informative and diverse instructions, each adapted to two LLMs, whose responses
are subject to three-alternative forced choice by human subjects. The pairwise
comparison results are then aggregated into a global ranking using the Elo
rating system. We select eight representative LLMs and compare them in terms of
four skills: knowledge understanding, mathematical reasoning, writing, and
coding. Experimental results show that the proposed method achieves a reliable
and sensible ranking of LLMs’ capabilities, identifies their relative strengths
and weaknesses, and offers valuable insights for further LLM advancement.
[COMMENTS]
32 pages, 6 figures
[LINK]
http://arxiv.org/abs/2404.08008v1
[DATE]
2024-04-10 09:26:24+08:00
[CATEGORIES]
cs.LG
cs.CL
SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models
[AUTHORS]
Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu
[ABSTRACT]
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited
remarkable performance in generating high-quality images from text descriptions
in recent years. However, text-to-image models may be tricked into generating
not-safe-for-work (NSFW) content, particularly in sexual scenarios. Existing
countermeasures mostly focus on filtering inappropriate inputs and outputs, or
suppressing improper text embeddings, which can block explicit NSFW-related
content (e.g., naked or sexy) but may still be vulnerable to adversarial
prompts inputs that appear innocent but are ill-intended. In this paper, we
present SafeGen, a framework to mitigate unsafe content generation by
text-to-image models in a text-agnostic manner. The key idea is to eliminate
unsafe visual representations from the model regardless of the text input. In
this way, the text-to-image model is resistant to adversarial prompts since
unsafe visual representations are obstructed from within. Extensive experiments
conducted on four datasets demonstrate SafeGen’s effectiveness in mitigating
unsafe content generation while preserving the high-fidelity of benign images.
SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.1%
sexual content removal performance. Furthermore, our constructed benchmark of
adversarial prompts provides a basis for future development and evaluation of
anti-NSFW-generation methods.
[LINK]
http://arxiv.org/abs/2404.06666v1
[DATE]
2024-04-10 08:26:08+08:00
[CATEGORIES]
cs.CL
Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection
[AUTHORS]
Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer, Christian Schroeder de Witt
[ABSTRACT]
While reinforcement learning (RL) algorithms have been successfully applied
across numerous sequential decision-making problems, their generalization to
unforeseen testing environments remains a significant concern. In this paper,
we study the problem of out-of-distribution (OOD) detection in RL, which
focuses on identifying situations at test time that RL agents have not
encountered in their training environments. We first propose a clarification of
terminology for OOD detection in RL, which aligns it with the literature from
other machine learning domains. We then present new benchmark scenarios for OOD
detection, which introduce anomalies with temporal autocorrelation into
different components of the agent-environment loop. We argue that such
scenarios have been understudied in the current literature, despite their
relevance to real-world situations. Confirming our theoretical predictions, our
experimental results suggest that state-of-the-art OOD detectors are not able
to identify such anomalies. To address this problem, we propose a novel method
for OOD detection, which we call DEXTER (Detection via Extraction of Time
Series Representations). By treating environment observations as time series
data, DEXTER extracts salient time series features, and then leverages an
ensemble of isolation forest algorithms to detect anomalies. We find that
DEXTER can reliably identify anomalies across benchmark scenarios, exhibiting
superior performance compared to both state-of-the-art OOD detectors and
high-dimensional changepoint detectors adopted from statistics.
[COMMENTS]
Accepted as a full paper to the 23rd International Conference on
Autonomous Agents and Multiagent Systems (AAMAS 2024)
[LINK]
http://arxiv.org/abs/2404.07099v1
[DATE]
2024-04-10 23:39:49+08:00
[CATEGORIES]
cs.LG
LaTiM: Longitudinal representation learning in continuous-time models to predict disease progression
[AUTHORS]
Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Hugo Le Boité, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Alireza Rezaei, Ikram Brahim, Gwenolé Quellec, Mathieu Lamard
[ABSTRACT]
This work proposes a novel framework for analyzing disease progression using
time-aware neural ordinary differential equations (NODE). We introduce a
“time-aware head” in a framework trained through self-supervised learning (SSL)
to leverage temporal information in latent space for data augmentation. This
approach effectively integrates NODEs with SSL, offering significant
performance improvements compared to traditional methods that lack explicit
temporal integration. We demonstrate the effectiveness of our strategy for
diabetic retinopathy progression prediction using the OPHDIAT database.
Compared to the baseline, all NODE architectures achieve statistically
significant improvements in area under the ROC curve (AUC) and Kappa metrics,
highlighting the efficacy of pre-training with SSL-inspired approaches.
Additionally, our framework promotes stable training for NODEs, a commonly
encountered challenge in time-aware modeling.
[COMMENTS]
Submitted to MICCAI 2024
[LINK]
http://arxiv.org/abs/2404.07091v1
[DATE]
2024-04-10 23:29:29+08:00
[CATEGORIES]
cs.LG
M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling
[AUTHORS]
Xudong Sun, Nutan Chen, Alexej Gossmann, Yu Xing, Carla Feistner, Emilio Dorigatt, Felix Drost, Daniele Scarcella, Lisa Beer, Carsten Marr
[ABSTRACT]
We address the online combinatorial choice of weight multipliers for
multi-objective optimization of many loss terms parameterized by neural works
via a probabilistic graphical model (PGM) for the joint model parameter and
multiplier evolution process, with a hypervolume based likelihood promoting
multi-objective descent. The corresponding parameter and multiplier estimation
as a sequential decision process is then cast into an optimal control problem,
where the multi-objective descent goal is dispatched hierarchically into a
series of constraint optimization sub-problems. The subproblem constraint
automatically adapts itself according to Pareto dominance and serves as the
setpoint for the low level multiplier controller to schedule loss landscapes
via output feedback of each loss term. Our method is multiplier-free and
operates at the timescale of epochs, thus saves tremendous computational
resources compared to full training cycle multiplier tuning. It also
circumvents the excessive memory requirements and heavy computational burden of
existing multi-objective deep learning methods. We applied it to domain
invariant variational auto-encoding with 6 loss terms on the PACS domain
generalization task, and observed robust performance across a range of
controller hyperparameters, as well as different multiplier initial conditions,
outperforming other multiplier scheduling methods. We offered modular
implementation of our method, admitting extension to custom definition of many
loss terms.
[LINK]
http://arxiv.org/abs/2403.13728v2
[DATE]
2024-04-10 23:25:00+08:00
[CATEGORIES]
cs.LG
Understanding Video Transformers via Universal Concept Discovery
[AUTHORS]
Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov
[ABSTRACT]
This paper studies the problem of concept-based interpretability of
transformer representations for videos. Concretely, we seek to explain the
decision-making process of video transformers based on high-level,
spatiotemporal concepts that are automatically discovered. Prior research on
concept-based interpretability has concentrated solely on image-level tasks.
Comparatively, video models deal with the added temporal dimension, increasing
complexity and posing challenges in identifying dynamic concepts over time. In
this work, we systematically address these challenges by introducing the first
Video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose
an efficient approach for unsupervised identification of units of video
transformer representations - concepts, and ranking their importance to the
output of a model. The resulting concepts are highly interpretable, revealing
spatio-temporal reasoning mechanisms and object-centric representations in
unstructured video models. Performing this analysis jointly over a diverse set
of supervised and self-supervised representations, we discover that some of
these mechanism are universal in video transformers. Finally, we show that VTCD
can be used for fine-grained action recognition and video object segmentation.
[COMMENTS]
CVPR 2024 (Highlight)
[LINK]
http://arxiv.org/abs/2401.10831v3
[DATE]
2024-04-10 23:19:07+08:00
[CATEGORIES]
cs.LG
Is Learning in Biological Neural Networks based on Stochastic Gradient Descent? An analysis using stochastic processes
[AUTHORS]
Sören Christensen, Jan Kallsen
[ABSTRACT]
In recent years, there has been an intense debate about how learning in
biological neural networks (BNNs) differs from learning in artificial neural
networks. It is often argued that the updating of connections in the brain
relies only on local information, and therefore a stochastic gradient-descent
type optimization method cannot be used. In this paper, we study a stochastic
model for supervised learning in BNNs. We show that a (continuous) gradient
step occurs approximately when each learning opportunity is processed by many
local updates. This result suggests that stochastic gradient descent may indeed
play a role in optimizing BNNs.
[LINK]
http://arxiv.org/abs/2309.05102v3
[DATE]
2024-04-10 23:02:35+08:00
[CATEGORIES]
cs.LG
Towards Learning Stochastic Population Models by Gradient Descent
[AUTHORS]
Justin N. Kreikemeyer, Philipp Andelfinger, Adelinde M. Uhrmacher
[ABSTRACT]
Increasing effort is put into the development of methods for learning
mechanistic models from data. This task entails not only the accurate
estimation of parameters, but also a suitable model structure. Recent work on
the discovery of dynamical systems formulates this problem as a linear equation
system. Here, we explore several simulation-based optimization approaches,
which allow much greater freedom in the objective formulation and weaker
conditions on the available data. We show that even for relatively small
stochastic population models, simultaneous estimation of parameters and
structure poses major challenges for optimization procedures. Particularly, we
investigate the application of the local stochastic gradient descent method,
commonly used for training machine learning models. We demonstrate accurate
estimation of models but find that enforcing the inference of parsimonious,
interpretable models drastically increases the difficulty. We give an outlook
on how this challenge can be overcome.
[COMMENTS]
5 pages, 2 figures, to appear in Proceedings of the ACM
SIGSIM-PADS’24
[LINK]
http://arxiv.org/abs/2404.07049v1
[DATE]
2024-04-10 22:38:58+08:00
[CATEGORIES]
cs.LG
PLAN: Variance-Aware Private Mean Estimation
[AUTHORS]
Martin Aumüller, Christian Janos Lebeda, Boel Nelson, Rasmus Pagh
[ABSTRACT]
Differentially private mean estimation is an important building block in
privacy-preserving algorithms for data analysis and machine learning. Though
the trade-off between privacy and utility is well understood in the worst case,
many datasets exhibit structure that could potentially be exploited to yield
better algorithms. In this paper we present $\textit{Private Limit Adapted
Noise}$ (PLAN), a family of differentially private algorithms for mean
estimation in the setting where inputs are independently sampled from a
distribution $\mathcal{D}$ over $\mathbf{R}^d$, with coordinate-wise standard
deviations $\boldsymbol{\sigma} \in \mathbf{R}^d$. Similar to mean estimation
under Mahalanobis distance, PLAN tailors the shape of the noise to the shape of
the data, but unlike previous algorithms the privacy budget is spent
non-uniformly over the coordinates. Under a concentration assumption on
$\mathcal{D}$, we show how to exploit skew in the vector $\boldsymbol{\sigma}$,
obtaining a (zero-concentrated) differentially private mean estimate with
$\ell_2$ error proportional to $|\boldsymbol{\sigma}|_1$. Previous work has
either not taken $\boldsymbol{\sigma}$ into account, or measured error in
Mahalanobis distance $\unicode{x2013}$ in both cases resulting in $\ell_2$
error proportional to $\sqrt{d}|\boldsymbol{\sigma}|_2$, which can be up to a
factor $\sqrt{d}$ larger. To verify the effectiveness of PLAN, we empirically
evaluate accuracy on both synthetic and real world data.
[LINK]
http://arxiv.org/abs/2306.08745v3
[DATE]
2024-04-10 22:30:58+08:00
[CATEGORIES]
cs.LG
Trajectory-Oriented Policy Optimization with Sparse Rewards
[AUTHORS]
Guojian Wang, Faguo Wu, Xiao Zhang
[ABSTRACT]
Mastering deep reinforcement learning (DRL) proves challenging in tasks
featuring scant rewards. These limited rewards merely signify whether the task
is partially or entirely accomplished, necessitating various exploration
actions before the agent garners meaningful feedback. Consequently, the
majority of existing DRL exploration algorithms struggle to acquire practical
policies within a reasonable timeframe. To address this challenge, we introduce
an approach leveraging offline demonstration trajectories for swifter and more
efficient online RL in environments with sparse rewards. Our pivotal insight
involves treating offline demonstration trajectories as guidance, rather than
mere imitation, allowing our method to learn a policy whose distribution of
state-action visitation marginally matches that of offline demonstrations. We
specifically introduce a novel trajectory distance relying on maximum mean
discrepancy (MMD) and cast policy optimization as a distance-constrained
optimization problem. We then illustrate that this optimization problem can be
streamlined into a policy-gradient algorithm, integrating rewards shaped by
insights from offline demonstrations. The proposed algorithm undergoes
evaluation across extensive discrete and continuous control tasks with sparse
and misleading rewards. The experimental findings demonstrate the significant
superiority of our proposed algorithm over baseline methods concerning diverse
exploration and the acquisition of an optimal policy.
[COMMENTS]
6 pages, 7 figures
[LINK]
http://arxiv.org/abs/2401.02225v3
[DATE]
2024-04-10 22:05:38+08:00
[CATEGORIES]
cs.LG
Data-Efficient Multimodal Fusion on a Single GPU
[AUTHORS]
Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti, Valentin Villecroze, Jesse C. Cresswell, Guangwei Yu, Gabriel Loaiza-Ganem, Maksims Volkovs
[ABSTRACT]
The goal of multimodal alignment is to learn a single latent space that is
shared between multimodal inputs. The most powerful models in this space have
been trained using massive datasets of paired inputs and large-scale
computational resources, making them prohibitively expensive to train in many
practical scenarios. We surmise that existing unimodal encoders pre-trained on
large amounts of unimodal data should provide an effective bootstrap to create
multimodal models from unimodal ones at much lower costs. We therefore propose
FuseMix, a multimodal augmentation scheme that operates on the latent spaces of
arbitrary pre-trained unimodal encoders. Using FuseMix for multimodal
alignment, we achieve competitive performance – and in certain cases
outperform state-of-the art methods – in both image-text and audio-text
retrieval, with orders of magnitude less compute and data: for example, we
outperform CLIP on the Flickr30K text-to-image retrieval task with $\sim !
600\times$ fewer GPU days and $\sim ! 80\times$ fewer image-text pairs.
Additionally, we show how our method can be applied to convert pre-trained
text-to-image generative models into audio-to-image ones. Code is available at:
https://github.com/layer6ai-labs/fusemix.
[COMMENTS]
CVPR 2024 (Highlight)
[LINK]
http://arxiv.org/abs/2312.10144v4
[DATE]
2024-04-10 21:58:08+08:00
[CATEGORIES]
cs.LG
Multi-Agent Soft Actor-Critic with Global Loss for Autonomous Mobility-on-Demand Fleet Control
[AUTHORS]
Zeno Woywood, Jasper I. Wiltfang, Julius Luy, Tobias Enders, Maximilian Schiffer
[ABSTRACT]
We study a sequential decision-making problem for a profit-maximizing
operator of an Autonomous Mobility-on-Demand system. Optimizing a central
operator’s vehicle-to-request dispatching policy requires efficient and
effective fleet control strategies. To this end, we employ a multi-agent Soft
Actor-Critic algorithm combined with weighted bipartite matching. We propose a
novel vehicle-based algorithm architecture and adapt the critic’s loss function
to appropriately consider global actions. Furthermore, we extend our algorithm
to incorporate rebalancing capabilities. Through numerical experiments, we show
that our approach outperforms state-of-the-art benchmarks by up to 12.9% for
dispatching and up to 38.9% with integrated rebalancing.
[LINK]
http://arxiv.org/abs/2404.06975v1
[DATE]
2024-04-10 21:49:20+08:00
[CATEGORIES]
cs.LG
Knowledge graphs for empirical concept retrieval
[AUTHORS]
Lenka Tětková, Teresa Karen Scheidt, Maria Mandrup Fogh, Ellen Marie Gaunby Jørgensen, Finn Årup Nielsen, Lars Kai Hansen
[ABSTRACT]
Concept-based explainable AI is promising as a tool to improve the
understanding of complex models at the premises of a given user, viz.\ as a
tool for personalized explainability. An important class of concept-based
explainability methods is constructed with empirically defined concepts,
indirectly defined through a set of positive and negative examples, as in the
TCAV approach (Kim et al., 2018). While it is appealing to the user to avoid
formal definitions of concepts and their operationalization, it can be
challenging to establish relevant concept datasets. Here, we address this
challenge using general knowledge graphs (such as, e.g., Wikidata or WordNet)
for comprehensive concept definition and present a workflow for user-driven
data collection in both text and image domains. The concepts derived from
knowledge graphs are defined interactively, providing an opportunity for
personalization and ensuring that the concepts reflect the user’s intentions.
We test the retrieved concept datasets on two concept-based explainability
methods, namely concept activation vectors (CAVs) and concept activation
regions (CARs) (Crabbe and van der Schaar, 2022). We show that CAVs and CARs
based on these empirical concept datasets provide robust and accurate
explanations. Importantly, we also find good alignment between the models’
representations of concepts and the structure of knowledge graphs, i.e., human
representations. This supports our conclusion that knowledge graph-based
concepts are relevant for XAI.
[COMMENTS]
Preprint. Accepted to The 2nd World Conference on eXplainable
Artificial Intelligence
[LINK]
http://arxiv.org/abs/2404.07008v1
[DATE]
2024-04-10 21:47:22+08:00
[CATEGORIES]
cs.LG
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation
[AUTHORS]
Samuel Holt, Max Ruiz Luyten, Mihaela van der Schaar
[ABSTRACT]
Transformer-based large language models (LLMs) are constrained by the fixed
context window of the underlying transformer architecture, hindering their
ability to produce long and coherent outputs. Memory-augmented LLMs are a
promising solution, but current approaches cannot handle long output generation
tasks since they (1) only focus on reading memory and reduce its evolution to
the concatenation of new memories or (2) use very specialized memories that
cannot adapt to other domains. This paper presents L2MAC, the first practical
LLM-based general-purpose stored-program automatic computer (von Neumann
architecture) framework, an LLM-based multi-agent system, for long and
consistent output generation. Its memory has two components: the instruction
registry, which is populated with a prompt program to solve the user-given
task, and a file store, which will contain the final and intermediate outputs.
Each instruction in turn is executed by a separate LLM agent, whose context is
managed by a control unit capable of precise memory reading and writing to
ensure effective interaction with the file store. These components enable L2MAC
to generate extensive outputs, bypassing the constraints of the finite context
window while producing outputs that fulfill a complex user-specified task. We
empirically demonstrate that L2MAC achieves state-of-the-art performance in
generating large codebases for system design tasks, significantly outperforming
other coding methods in implementing the detailed user-specified task; we show
that L2MAC works for general-purpose extensive text-based tasks, such as
writing an entire book; and we provide valuable insights into L2MAC’s
performance improvement over existing methods.
[COMMENTS]
Published in The Twelfth International Conference on Learning
Representations (ICLR), 2024. Copyright 2023 by the author(s)
[LINK]
http://arxiv.org/abs/2310.02003v5
[DATE]
2024-04-10 21:38:30+08:00
[CATEGORIES]
cs.LG
Prediction Horizon Requirements for Automated Driving: Optimizing Safety, Comfort, and Efficiency
[AUTHORS]
Manuel Muñoz Sánchez, Chris van der Ploeg, Robin Smit, Jos Elfring, Emilia Silvas, René van de Molengraft
[ABSTRACT]
Predicting the movement of other road users is beneficial for improving
automated vehicle (AV) performance. However, the relationship between the time
horizon associated with these predictions and AV performance remains unclear.
Despite the existence of numerous trajectory prediction algorithms, no studies
have been conducted on how varying prediction lengths affect AV safety and
other vehicle performance metrics, resulting in undefined horizon requirements
for prediction methods. Our study addresses this gap by examining the effects
of different prediction horizons on AV performance, focusing on safety,
comfort, and efficiency. Through multiple experiments using a state-of-the-art,
risk-based predictive trajectory planner, we simulated predictions with
horizons up to 20 seconds. Based on our simulations, we propose a framework for
specifying the minimum required and optimal prediction horizons based on
specific AV performance criteria and application needs. Our results indicate
that a horizon of 1.6 seconds is required to prevent collisions with crossing
pedestrians, horizons of 7-8 seconds yield the best efficiency, and horizons up
to 15 seconds improve passenger comfort. We conclude that prediction horizon
requirements are application-dependent, and recommend aiming for a prediction
horizon of 11.8 seconds as a general guideline for applications involving
crossing pedestrians.
[COMMENTS]
Submitted to IEEE Intelligent Vehicles Symposium. 9 pages. 10
figures. 6 tables
[LINK]
http://arxiv.org/abs/2402.03893v2
[DATE]
2024-04-10 21:34:24+08:00
[CATEGORIES]
cs.LG
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations
[AUTHORS]
Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen, Zhiming Zheng
[ABSTRACT]
The sparsity of reward feedback remains a challenging problem in online deep
reinforcement learning (DRL). Previous approaches have utilized offline
demonstrations to achieve impressive results in multiple hard tasks. However,
these approaches place high demands on demonstration quality, and obtaining
expert-like actions is often costly and unrealistic. To tackle these problems,
we propose a simple and efficient algorithm called Policy Optimization with
Smooth Guidance (POSG), which leverages a small set of state-only
demonstrations (where only state information is included in demonstrations) to
indirectly make approximate and feasible long-term credit assignments and
facilitate exploration. Specifically, we first design a trajectory-importance
evaluation mechanism to determine the quality of the current trajectory against
demonstrations. Then, we introduce a guidance reward computation technology
based on trajectory importance to measure the impact of each state-action pair.
We theoretically analyze the performance improvement caused by smooth guidance
rewards and derive a new worst-case lower bound on the performance improvement.
Extensive results demonstrate POSG’s significant advantages in control
performance and convergence speed in four sparse-reward environments, including
the grid-world maze, Hopper-v4, HalfCheetah-v4, and Ant maze. Notably, the
specific metrics and quantifiable results are investigated to demonstrate the
superiority of POSG.
[COMMENTS]
31 pages, 23 figures
[LINK]
http://arxiv.org/abs/2401.00162v2
[DATE]
2024-04-10 21:32:06+08:00
[CATEGORIES]
cs.LG
Agent-driven Generative Semantic Communication for Remote Surveillance
[AUTHORS]
Wanting Yang, Zehui Xiong, Yanli Yuan, Wenchao Jiang, Tony Q. S. Quek, Merouane Debbah
[ABSTRACT]
In the era of 6G, featuring compelling visions of intelligent transportation
system, digital twins, remote surveillance is poised to become a ubiquitous
practice. The substantial data volume and frequent updates present challenges
in wireless networks. To address this, we propose a novel agent-driven
generative semantic communication (A-GSC) framework based on reinforcement
learning. In contrast to the existing research on semantic communication
(SemCom), which mainly focuses on semantic compression or semantic sampling, we
seamlessly cascade both together by jointly considering the intrinsic
attributes of source information and the contextual information regarding the
task. Notably, the introduction of the generative artificial intelligence (GAI)
enables the independent design of semantic encoders and decoders. In this work,
we develop an agent-assisted semantic encoder leveraging the knowledge based
soft actor-critic algorithm, which can track the semantic changes, channel
condition, and sampling intervals, so as to perform adaptive semantic sampling.
Accordingly, we design a semantic decoder with both predictive and generative
capabilities, which consists of two tailored modules. Moreover, the
effectiveness of the designed models has been verified based on the dataset
generated from CDNet2014, and the performance gain of the overall A-GSC
framework in both energy saving and reconstruction accuracy have been
demonstrated.
[COMMENTS]
Under review with IEEE Transactions on Wireless Communications
[LINK]
http://arxiv.org/abs/2404.06997v1
[DATE]
2024-04-10 21:24:27+08:00
[CATEGORIES]
cs.LG
DREAM: Visual Decoding from Reversing Human Visual System
[AUTHORS]
Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue
[ABSTRACT]
In this work we present DREAM, an fMRI-to-image method for reconstructing
viewed images from brain activities, grounded on fundamental knowledge of the
human visual system. We craft reverse pathways that emulate the hierarchical
and parallel nature of how humans perceive the visual world. These tailored
pathways are specialized to decipher semantics, color, and depth cues from fMRI
data, mirroring the forward pathways from visual stimuli to fMRI recordings. To
do so, two components mimic the inverse processes within the human visual
system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways
of this brain region, extracting semantics from fMRI data; the Reverse Parallel
PKM (R-PKM) component simultaneously predicting color and depth from fMRI
signals. The experiments indicate that our method outperforms the current
state-of-the-art models in terms of the consistency of appearance, structure,
and semantics. Code will be made publicly available to facilitate further
research in this field.
[COMMENTS]
Project Page: https://weihaox.github.io/DREAM
[LINK]
http://arxiv.org/abs/2310.02265v2
[DATE]
2024-04-10 20:54:12+08:00
[CATEGORIES]
cs.LG
TrajPRed: Trajectory Prediction with Region-based Relation Learning
[AUTHORS]
Chen Zhou, Ghassan AlRegib, Armin Parchami, Kunjan Singh
[ABSTRACT]
Forecasting human trajectories in traffic scenes is critical for safety
within mixed or fully autonomous systems. Human future trajectories are driven
by two major stimuli, social interactions, and stochastic goals. Thus, reliable
forecasting needs to capture these two stimuli. Edge-based relation modeling
represents social interactions using pairwise correlations from precise
individual states. Nevertheless, edge-based relations can be vulnerable under
perturbations. To alleviate these issues, we propose a region-based relation
learning paradigm that models social interactions via region-wise dynamics of
joint states, i.e., the changes in the density of crowds. In particular,
region-wise agent joint information is encoded within convolutional feature
grids. Social relations are modeled by relating the temporal changes of local
joint information from a global perspective. We show that region-based
relations are less susceptible to perturbations. In order to account for the
stochastic individual goals, we exploit a conditional variational autoencoder
to realize multi-goal estimation and diverse future prediction. Specifically,
we perform variational inference via the latent distribution, which is
conditioned on the correlation between input states and associated target
goals. Sampling from the latent distribution enables the framework to reliably
capture the stochastic behavior in test data. We integrate multi-goal
estimation and region-based relation learning to model the two stimuli, social
interactions, and stochastic goals, in a prediction framework. We evaluate our
framework on the ETH-UCY dataset and Stanford Drone Dataset (SDD). We show that
the diverse prediction better fits the ground truth when incorporating the
relation module. Our framework outperforms the state-of-the-art models on SDD
by $27.61\%$/$18.20\%$ of ADE/FDE metrics.
[LINK]
http://arxiv.org/abs/2404.06971v1
[DATE]
2024-04-10 20:31:43+08:00
[CATEGORIES]
cs.LG
Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study
[AUTHORS]
Hongru Du, Jianan Zhao, Yang Zhao, Shaochong Xu, Xihong Lin, Yiran Chen, Lauren M. Gardner, Hao Frank Yang
[ABSTRACT]
Forecasting the short-term spread of an ongoing disease outbreak is a
formidable challenge due to the complexity of contributing factors, some of
which can be characterized through interlinked, multi-modality variables such
as epidemiological time series data, viral biology, population demographics,
and the intersection of public policy and human behavior. Existing forecasting
model frameworks struggle with the multifaceted nature of relevant data and
robust results translation, which hinders their performances and the provision
of actionable insights for public health decision-makers. Our work introduces
PandemicLLM, a novel framework with multi-modal Large Language Models (LLMs)
that reformulates real-time forecasting of disease spread as a text reasoning
problem, with the ability to incorporate real-time, complex, non-numerical
information that previously unattainable in traditional forecasting models.
This approach, through a unique AI-human cooperative prompt design and time
series representation learning, encodes multi-modal data for LLMs. The model is
applied to the COVID-19 pandemic, and trained to utilize textual public health
policies, genomic surveillance, spatial, and epidemiological time series data,
and is subsequently tested across all 50 states of the U.S. Empirically,
PandemicLLM is shown to be a high-performing pandemic forecasting framework
that effectively captures the impact of emerging variants and can provide
timely and accurate predictions. The proposed PandemicLLM opens avenues for
incorporating various pandemic-related data in heterogeneous formats and
exhibits performance benefits over existing models. This study illuminates the
potential of adapting LLMs and representation learning to enhance pandemic
forecasting, illustrating how AI innovations can strengthen pandemic responses
and crisis management in the future.
[COMMENTS]
35 pages, 10 figures
[LINK]
http://arxiv.org/abs/2404.06962v1
[DATE]
2024-04-10 20:22:03+08:00
[CATEGORIES]
cs.LG
Model-based deep reinforcement learning for accelerated learning from flow simulations
[AUTHORS]
Andre Weiner, Janis Geise
[ABSTRACT]
In recent years, deep reinforcement learning has emerged as a technique to
solve closed-loop flow control problems. Employing simulation-based
environments in reinforcement learning enables a priori end-to-end optimization
of the control system, provides a virtual testbed for safety-critical control
applications, and allows to gain a deep understanding of the control
mechanisms. While reinforcement learning has been applied successfully in a
number of rather simple flow control benchmarks, a major bottleneck toward
real-world applications is the high computational cost and turnaround time of
flow simulations. In this contribution, we demonstrate the benefits of
model-based reinforcement learning for flow control applications. Specifically,
we optimize the policy by alternating between trajectories sampled from flow
simulations and trajectories sampled from an ensemble of environment models.
The model-based learning reduces the overall training time by up to $85\%$ for
the fluidic pinball test case. Even larger savings are expected for more
demanding flow simulations.
[LINK]
http://arxiv.org/abs/2402.16543v2
[DATE]
2024-04-10 20:01:43+08:00
[CATEGORIES]
cs.LG
Enhancing Efficiency in Multidevice Federated Learning through Data Selection
[AUTHORS]
Fan Mo, Mohammad Malekzadeh, Soumyajit Chatterjee, Fahim Kawsar, Akhil Mathur
[ABSTRACT]
Federated learning (FL) in multidevice environments creates new opportunities
to learn from a vast and diverse amount of private data. Although personal
devices capture valuable data, their memory, computing, connectivity, and
battery resources are often limited. Since deep neural networks (DNNs) are the
typical machine learning models employed in FL, there are demands for
integrating ubiquitous constrained devices into the training process of DNNs.
In this paper, we develop an FL framework to incorporate on-device data
selection on such constrained devices, which allows partition-based training of
a DNN through collaboration between constrained devices and resourceful devices
of the same client. Evaluations on five benchmark DNNs and six benchmark
datasets across different modalities show that, on average, our framework
achieves ~19% higher accuracy and ~58% lower latency; compared to the baseline
FL without our implemented strategies. We demonstrate the effectiveness of our
FL framework when dealing with imbalanced data, client participation
heterogeneity, and various mobility patterns. As a benchmark for the community,
our code is available at
https://github.com/dr-bell/data-centric-federated-learning
[COMMENTS]
Previous version (v3) was presented at ICLR 2023 Workshop on Machine
Learning for IoT: Datasets, Perception, and Understanding
[LINK]
http://arxiv.org/abs/2211.04175v4
[DATE]
2024-04-10 20:01:20+08:00
[CATEGORIES]
cs.LG
ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers
[AUTHORS]
Ioannis Romanelis, Vlassis Fotis, Konstantinos Moustakas, Adrian Munteanu
[ABSTRACT]
In this paper we delve into the properties of transformers, attained through
self-supervision, in the point cloud domain. Specifically, we evaluate the
effectiveness of Masked Autoencoding as a pretraining scheme, and explore
Momentum Contrast as an alternative. In our study we investigate the impact of
data quantity on the learned features, and uncover similarities in the
transformer’s behavior across domains. Through comprehensive visualiations, we
observe that the transformer learns to attend to semantically meaningful
regions, indicating that pretraining leads to a better understanding of the
underlying geometry. Moreover, we examine the finetuning process and its effect
on the learned representations. Based on that, we devise an unfreezing strategy
which consistently outperforms our baseline without introducing any other
modifications to the model or the training pipeline, and achieve
state-of-the-art results in the classification task among transformer models.
[LINK]
http://arxiv.org/abs/2306.10798v3
[DATE]
2024-04-10 19:42:22+08:00
[CATEGORIES]
cs.LG
Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning
[AUTHORS]
Boumediene Hamzi, Marcus Hutter, Houman Owhadi
[ABSTRACT]
Machine Learning (ML) and Algorithmic Information Theory (AIT) look at
Complexity from different points of view. We explore the interface between AIT
and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on
the problem of learning kernels from data, in kernel ridge regression, through
the method of Sparse Kernel Flows. In particular, by looking at the differences
and commonalities between Minimal Description Length (MDL) and Regularization
in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is
the natural approach to adopt to learn kernels from data. This approach aligns
naturally with the MDL principle, offering a more robust theoretical basis than
the existing reliance on cross-validation. The study reveals that deriving
Sparse Kernel Flows does not require a statistical approach; instead, one can
directly engage with code-lengths and complexities, concepts central to AIT.
Thereby, this approach opens the door to reformulating algorithms in machine
learning using tools from AIT, with the aim of providing them a more solid
theoretical foundation.
[COMMENTS]
An earlier version of this paper appeared at
https://www.researchgate.net/publication/371875631_A_note_on_learning_kernels_from_data_from_an_Algorithmic_Information_Theoretic_point_of_view.
arXiv admin note: text overlap with arXiv:2111.13037, arXiv:2007.05074
[LINK]
http://arxiv.org/abs/2311.12624v3
[DATE]
2024-04-10 19:35:14+08:00
[CATEGORIES]
cs.LG
fairret: a Framework for Differentiable Fairness Regularization Terms
[AUTHORS]
Maarten Buyl, MaryBeth Defrance, Tijl De Bie
[ABSTRACT]
Current fairness toolkits in machine learning only admit a limited range of
fairness definitions and have seen little integration with automatic
differentiation libraries, despite the central role these libraries play in
modern machine learning pipelines.
We introduce a framework of fairness regularization terms (fairrets) which
quantify bias as modular, flexible objectives that are easily integrated in
automatic differentiation pipelines. By employing a general definition of
fairness in terms of linear-fractional statistics, a wide class of fairrets can
be computed efficiently. Experiments show the behavior of their gradients and
their utility in enforcing fairness with minimal loss of predictive power
compared to baselines. Our contribution includes a PyTorch implementation of
the fairret framework.
[COMMENTS]
Presented at ICLR 2024
[LINK]
http://arxiv.org/abs/2310.17256v2
[DATE]
2024-04-10 19:22:51+08:00
[CATEGORIES]
cs.LG
A tutorial on learning from preferences and choices with Gaussian Processes
[AUTHORS]
Alessio Benavoli, Dario Azzimonti
[ABSTRACT]
Preference modelling lies at the intersection of economics, decision theory,
machine learning and statistics. By understanding individuals’ preferences and
how they make choices, we can build products that closely match their
expectations, paving the way for more efficient and personalised applications
across a wide range of domains. The objective of this tutorial is to present a
cohesive and comprehensive framework for preference learning with Gaussian
Processes (GPs), demonstrating how to seamlessly incorporate rationality
principles (from economics and decision theory) into the learning process. By
suitably tailoring the likelihood function, this framework enables the
construction of preference learning models that encompass random utility
models, limits of discernment, and scenarios with multiple conflicting
utilities for both object- and label-preference. This tutorial builds upon
established research while simultaneously introducing some novel GP-based
models to address specific gaps in the existing literature.
[LINK]
http://arxiv.org/abs/2403.11782v3
[DATE]
2024-04-10 17:44:31+08:00
[CATEGORIES]
cs.LG
Beyond Random Inputs: A Novel ML-Based Hardware Fuzzing
[AUTHORS]
Mohamadreza Rostami, Marco Chilese, Shaza Zeitouni, Rahul Kande, Jeyavijayan Rajendran, Ahmad-Reza Sadeghi
[ABSTRACT]
Modern computing systems heavily rely on hardware as the root of trust.
However, their increasing complexity has given rise to security-critical
vulnerabilities that cross-layer at-tacks can exploit. Traditional hardware
vulnerability detection methods, such as random regression and formal
verification, have limitations. Random regression, while scalable, is slow in
exploring hardware, and formal verification techniques are often concerned with
manual effort and state explosions. Hardware fuzzing has emerged as an
effective approach to exploring and detecting security vulnerabilities in
large-scale designs like modern processors. They outperform traditional methods
regarding coverage, scalability, and efficiency. However, state-of-the-art
fuzzers struggle to achieve comprehensive coverage of intricate hardware
designs within a practical timeframe, often falling short of a 70% coverage
threshold. We propose a novel ML-based hardware fuzzer, ChatFuzz, to address
this challenge. Ourapproach leverages LLMs like ChatGPT to understand processor
language, focusing on machine codes and generating assembly code sequences. RL
is integrated to guide the input generation process by rewarding the inputs
using code coverage metrics. We use the open-source RISCV-based RocketCore
processor as our testbed. ChatFuzz achieves condition coverage rate of 75% in
just 52 minutes compared to a state-of-the-art fuzzer, which requires a lengthy
30-hour window to reach a similar condition coverage. Furthermore, our fuzzer
can attain 80% coverage when provided with a limited pool of 10 simulation
instances/licenses within a 130-hour window. During this time, it conducted a
total of 199K test cases, of which 6K produced discrepancies with the
processor’s golden model. Our analysis identified more than 10 unique
mismatches, including two new bugs in the RocketCore and discrepancies from the
RISC-V ISA Simulator.
[LINK]
http://arxiv.org/abs/2404.06856v1
[DATE]
2024-04-10 17:28:54+08:00
[CATEGORIES]
cs.LG
The Topos of Transformer Networks
[AUTHORS]
Mattia Jacopo Villani, Peter McBurney
[ABSTRACT]
The transformer neural network has significantly out-shined all other neural
network architectures as the engine behind large language models. We provide a
theoretical analysis of the expressivity of the transformer architecture
through the lens of topos theory. From this viewpoint, we show that many common
neural network architectures, such as the convolutional, recurrent and graph
convolutional networks, can be embedded in a pretopos of piecewise-linear
functions, but that the transformer necessarily lives in its topos completion.
In particular, this suggests that the two network families instantiate
different fragments of logic: the former are first order, whereas transformers
are higher-order reasoners. Furthermore, we draw parallels with architecture
search and gradient descent, integrating our analysis in the framework of
cybernetic agents.
[LINK]
http://arxiv.org/abs/2403.18415v2
[DATE]
2024-04-10 17:24:16+08:00
[CATEGORIES]
cs.LG
Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation
[AUTHORS]
Daniel Biebert, Christian Hakert, Kuan-Hsun Chen, Jian-Jia Chen
[ABSTRACT]
Bringing high-level machine learning models to efficient and well-suited
machine implementations often invokes a bunch of tools, e.g.~code generators,
compilers, and optimizers. Along such tool chains, abstractions have to be
applied. This leads to not optimally used CPU registers. This is a shortcoming,
especially in resource constrained embedded setups. In this work, we present a
code generation approach for decision tree ensembles, which produces machine
assembly code within a single conversion step directly from the high-level
model representation. Specifically, we develop various approaches to
effectively allocate registers for the inference of decision tree ensembles.
Extensive evaluations of the proposed method are conducted in comparison to the
basic realization of C code from the high-level machine learning model and
succeeding compilation. The results show that the performance of decision tree
ensemble inference can be significantly improved (by up to $\approx1.6\times$),
if the methods are applied carefully to the appropriate scenario.
[LINK]
http://arxiv.org/abs/2404.06846v1
[DATE]
2024-04-10 17:17:22+08:00
[CATEGORIES]
cs.LG
An experimental evaluation of Deep Reinforcement Learning algorithms for HVAC control
[AUTHORS]
Antonio Manjavacas, Alejandro Campoy-Nieves, Javier Jiménez-Raboso, Miguel Molina-Solana, Juan Gómez-Romero
[ABSTRACT]
Heating, Ventilation, and Air Conditioning (HVAC) systems are a major driver
of energy consumption in commercial and residential buildings. Recent studies
have shown that Deep Reinforcement Learning (DRL) algorithms can outperform
traditional reactive controllers. However, DRL-based solutions are generally
designed for ad hoc setups and lack standardization for comparison. To fill
this gap, this paper provides a critical and reproducible evaluation, in terms
of comfort and energy consumption, of several state-of-the-art DRL algorithms
for HVAC control. The study examines the controllers’ robustness, adaptability,
and trade-off between optimization goals by using the Sinergym framework. The
results obtained confirm the potential of DRL algorithms, such as SAC and TD3,
in complex scenarios and reveal several challenges related to generalization
and incremental learning.
[COMMENTS]
Submitted to Artificial Intelligence Review. Under review
[LINK]
http://arxiv.org/abs/2401.05737v2
[DATE]
2024-04-10 17:06:41+08:00
[CATEGORIES]
cs.LG
SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection
[AUTHORS]
Mathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn
[ABSTRACT]
Detecting anomalies in images has become a well-explored problem in both
academia and industry. State-of-the-art algorithms are able to detect defects
in increasingly difficult settings and data modalities. However, most current
methods are not suited to address 3D objects captured from differing poses.
While solutions using Neural Radiance Fields (NeRFs) have been proposed, they
suffer from excessive computation requirements, which hinder real-world
usability. For this reason, we propose the novel 3D Gaussian splatting-based
framework SplatPose which, given multi-view images of a 3D object, accurately
estimates the pose of unseen views in a differentiable manner, and detects
anomalies in them. We achieve state-of-the-art results in both training and
inference speed, and detection performance, even when using less training data
than competing methods. We thoroughly evaluate our framework using the recently
proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly
detection (MAD) data set.
[COMMENTS]
Visual Anomaly and Novelty Detection 2.0 Workshop at CVPR 2024
[LINK]
http://arxiv.org/abs/2404.06832v1
[DATE]
2024-04-10 16:48:09+08:00
[CATEGORIES]
cs.LG
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
[AUTHORS]
Taegyun Kwon, Dasaem Jeong, Juhan Nam
[ABSTRACT]
In recent years, advancements in neural network designs and the availability
of large-scale labeled datasets have led to significant improvements in the
accuracy of piano transcription models. However, most previous work focused on
high-performance offline transcription, neglecting deliberate consideration of
model size. The goal of this work is to implement real-time inference for piano
transcription while ensuring both high performance and lightweight. To this
end, we propose novel architectures for convolutional recurrent neural
networks, redesigning an existing autoregressive piano transcription model.
First, we extend the acoustic module by adding a frequency-conditioned FiLM
layer to the CNN module to adapt the convolutional filters on the frequency
axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM
that focuses on note-state transitions within a note. In addition, we augment
the autoregressive connection with an enhanced recursive context. Using these
components, we propose two types of models; one for high performance and the
other for high compactness. Through extensive experiments, we show that the
proposed models are comparable to state-of-the-art models in terms of note
accuracy on the MAESTRO dataset. We also investigate the effective model size
and real-time inference latency by gradually streamlining the architecture.
Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth
analysis to elucidate the effect of the proposed components in the view of note
length and pitch range.
[COMMENTS]
11 pages, 8 figures, preprint
[LINK]
http://arxiv.org/abs/2404.06818v1
[DATE]
2024-04-10 16:06:15+08:00
[CATEGORIES]
cs.LG
Generative Resident Separation and Multi-label Classification for Multi-person Activity Recognition
[AUTHORS]
Xi Chen, Julien Cumin, Fano Ramparany, Dominique Vaufreydaz
[ABSTRACT]
This paper presents two models to address the problem of multi-person
activity recognition using ambient sensors in a home. The first model, Seq2Res,
uses a sequence generation approach to separate sensor events from different
residents. The second model, BiGRU+Q2L, uses a Query2Label multi-label
classifier to predict multiple activities simultaneously. Performances of these
models are compared to a state-of-the-art model in different experimental
scenarios, using a state-of-the-art dataset of two residents in a home
instrumented with ambient sensors. These results lead to a discussion on the
advantages and drawbacks of resident separation and multi-label classification
for multi-person activity recognition.
[COMMENTS]
Context and Activity Modeling and Recognition (CoMoReA) Workshop at
IEEE International Conference on Pervasive Computing and Communications
(PerCom 2024), Mar 2024, Biarritz, France
[LINK]
http://arxiv.org/abs/2404.07245v1
[DATE]
2024-04-10 15:46:30+08:00
[CATEGORIES]
cs.LG
Extracting Clean and Balanced Subset for Noisy Long-tailed Classification
[AUTHORS]
Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan
[ABSTRACT]
Real-world datasets usually are class-imbalanced and corrupted by label
noise. To solve the joint issue of long-tailed distribution and label noise,
most previous works usually aim to design a noise detector to distinguish the
noisy and clean samples. Despite their effectiveness, they may be limited in
handling the joint issue effectively in a unified way. In this work, we develop
a novel pseudo labeling method using class prototypes from the perspective of
distribution matching, which can be solved with optimal transport (OT). By
setting a manually-specific probability measure and using a learned transport
plan to pseudo-label the training samples, the proposed method can reduce the
side-effects of noisy and long-tailed data simultaneously. Then we introduce a
simple yet effective filter criteria by combining the observed labels and
pseudo labels to obtain a more balanced and less noisy subset for a robust
model training. Extensive experiments demonstrate that our method can extract
this class-balanced subset with clean labels, which brings effective
performance gains for long-tailed classification with label noise.
[LINK]
http://arxiv.org/abs/2404.06795v1
[DATE]
2024-04-10 15:34:37+08:00
[CATEGORIES]
cs.LG
A General Theory for Kernel Packets: from state space model to compactly supported basis
[AUTHORS]
Liang Ding, Rui Tuo
[ABSTRACT]
It is well known that the state space (SS) model formulation of a Gaussian
process (GP) can lower its training and prediction time both to $\CalO(n)$ for
$n$ data points. We prove that an $m$-dimensional SS model formulation of GP is
equivalent to a concept we introduce as the general right Kernel Packet (KP): a
transformation for the GP covariance $K$ such that
$\sum_{i=0}^{m}a_iD_t^{(j)}K(t,t_i)=0$ holds for any $t \leq t_1$, 0 $\leq j
\leq m-1$, and $m+1$ consecutive points $t_i$, where ${D}t^{(j)}f(t) $ denotes
$j$-th derivative acting on $t$. We extend this idea to the backward SS model
formulation, leading to the left KP for next $m$ consecutive points:
$\sum{i=0}^{m}b_i{D}t^{(j)}K(t,t{m+i})=0$ for any $t\geq t_{2m}$. By
combining both left and right KPs, we can prove that a suitable linear
combination of these covariance functions yields $m$ KP functions compactly
supported on $(t_0,t_{2m})$. KPs improve GP prediction time to
$\mathcal{O}(\log n)$ or $\mathcal{O}(1)$, enable broader applications
including GP’s derivatives and kernel multiplications, and can be generalized
to multi-dimensional additive and product kernels for scattered data.
[LINK]
http://arxiv.org/abs/2402.04022v4
[DATE]
2024-04-10 15:24:59+08:00
[CATEGORIES]
cs.LG
Neural Architecture Search via Two Constant Shared Weights Initialisations
[AUTHORS]
Ekaterina Gracheva
[ABSTRACT]
In recent years, zero-cost metrics are gaining ground in neural architecture
search (NAS). There metrics allow finding the optimal neural network for a
given task faster and with a lesser computational load than conventional NAS
methods. Equally important is that they also shed some light on the internal
workings of neural architectures. This paper presents a zero-cost metric that
highly correlated with the train set accuracy across the NAS-Bench-101,
NAS-Bench-201 and NAS-Bench-NLP benchmark datasets. We evaluate a neural
achitecture’s potential based on the outputs’ statistics after two constant
shared weights initialisations. For this, we only use an unlabelled mini-batch
of data. We observe that the dispersion of the outputs between two
initialisations positively correlates with trained accuracy. The correlation
further improves when we normalise dispersion by average output magnitude. The
resulting metric, epsilon, does not require gradients computation and unbinds
the NAS procedure from training hyperparameters, loss metrics and
human-labelled data. Our method is easy to integrate within existing NAS
algorithms and takes a fraction of a second to evaluate a single network. The
code supporting this study can be found on GitHub at
https://github.com/egracheva/epsinas.
[LINK]
http://arxiv.org/abs/2302.04406v2
[DATE]
2024-04-10 15:12:31+08:00
[CATEGORIES]
cs.LG
Private Wasserstein Distance with Random Noises
[AUTHORS]
Wenqian Li, Haozhi Wang, Zhe Huang, Yan Pang
[ABSTRACT]
Wasserstein distance is a principle measure of data divergence from a
distributional standpoint. However, its application becomes challenging in the
context of data privacy, where sharing raw data is restricted. Prior attempts
have employed techniques like Differential Privacy or Federated optimization to
approximate Wasserstein distance. Nevertheless, these approaches often lack
accuracy and robustness against potential attack. In this study, we investigate
the underlying triangular properties within the Wasserstein space, leading to a
straightforward solution named TriangleWad. This approach enables the
computation of Wasserstein distance between datasets stored across different
entities. Notably, TriangleWad is 20 times faster, making raw data information
truly invisible, enhancing resilience against attacks, and without sacrificing
estimation accuracy. Through comprehensive experimentation across various tasks
involving both image and text data, we demonstrate its superior performance and
generalizations.
[LINK]
http://arxiv.org/abs/2404.06787v1
[DATE]
2024-04-10 14:58:58+08:00
[CATEGORIES]
cs.LG
Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data
[AUTHORS]
Yu Qiao, Chaoning Zhang, Apurba Adhikary, Choong Seon Hong
[ABSTRACT]
Federated learning (FL) is a privacy-preserving distributed framework for
collaborative model training on devices in edge networks. However, challenges
arise due to vulnerability to adversarial examples (AEs) and the
non-independent and identically distributed (non-IID) nature of data
distribution among devices, hindering the deployment of adversarially robust
and accurate learning models at the edge. While adversarial training (AT) is
commonly acknowledged as an effective defense strategy against adversarial
attacks in centralized training, we shed light on the adverse effects of
directly applying AT in FL that can severely compromise accuracy, especially in
non-IID challenges. Given this limitation, this paper proposes FatCC, which
incorporates local logit \underline{C}alibration and global feature
\underline{C}ontrast into the vanilla federated adversarial training
(\underline{FAT}) process from both logit and feature perspectives. This
approach can effectively enhance the federated system’s robust accuracy (RA)
and clean accuracy (CA). First, we propose logit calibration, where the logits
are calibrated during local adversarial updates, thereby improving adversarial
robustness. Second, FatCC introduces feature contrast, which involves a global
alignment term that aligns each local representation with unbiased global
features, thus further enhancing robustness and accuracy in federated
adversarial environments. Extensive experiments across multiple datasets
demonstrate that FatCC achieves comparable or superior performance gains in
both CA and RA compared to other baselines.
[LINK]
http://arxiv.org/abs/2404.06776v1
[DATE]
2024-04-10 14:35:25+08:00
[CATEGORIES]
cs.LG
An inclusive review on deep learning techniques and their scope in handwriting recognition
[AUTHORS]
Sukhdeep Singh, Sudhir Rohilla, Anuj Sharma
[ABSTRACT]
Deep learning expresses a category of machine learning algorithms that have
the capability to combine raw inputs into intermediate features layers. These
deep learning algorithms have demonstrated great results in different fields.
Deep learning has particularly witnessed for a great achievement of human level
performance across a number of domains in computer vision and pattern
recognition. For the achievement of state-of-the-art performances in diverse
domains, the deep learning used different architectures and these architectures
used activation functions to perform various computations between hidden and
output layers of any architecture. This paper presents a survey on the existing
studies of deep learning in handwriting recognition field. Even though the
recent progress indicates that the deep learning methods has provided valuable
means for speeding up or proving accurate results in handwriting recognition,
but following from the extensive literature survey, the present study finds
that the deep learning has yet to revolutionize more and has to resolve many of
the most pressing challenges in this field, but promising advances have been
made on the prior state of the art. Additionally, an inadequate availability of
labelled data to train presents problems in this domain. Nevertheless, the
present handwriting recognition survey foresees deep learning enabling changes
at both bench and bedside with the potential to transform several domains as
image processing, speech recognition, computer vision, machine translation,
robotics and control, medical imaging, medical information processing,
bio-informatics, natural language processing, cyber security, and many others.
[LINK]
http://arxiv.org/abs/2404.08011v1
[DATE]
2024-04-10 14:30:33+08:00
[CATEGORIES]
cs.LG
Unsupervised Learning for Solving the Travelling Salesman Problem
[AUTHORS]
Yimeng Min, Yiwei Bai, Carla P. Gomes
[ABSTRACT]
We propose UTSP, an unsupervised learning (UL) framework for solving the
Travelling Salesman Problem (TSP). We train a Graph Neural Network (GNN) using
a surrogate loss. The GNN outputs a heat map representing the probability for
each edge to be part of the optimal path. We then apply local search to
generate our final prediction based on the heat map. Our loss function consists
of two parts: one pushes the model to find the shortest path and the other
serves as a surrogate for the constraint that the route should form a
Hamiltonian Cycle. Experimental results show that UTSP outperforms the existing
data-driven TSP heuristics. Our approach is parameter efficient as well as data
efficient: the model takes $\sim$ 10\% of the number of parameters and $\sim$
0.2\% of training samples compared with reinforcement learning or supervised
learning methods.
[COMMENTS]
NeurIPS 2023 Camera-ready version fix typos in appendix
[LINK]
http://arxiv.org/abs/2303.10538v2
[DATE]
2024-04-10 13:59:10+08:00
[CATEGORIES]
cs.LG
CrimeAlarm: Towards Intensive Intent Dynamics in Fine-grained Crime Prediction
[AUTHORS]
Kaixi Hu, Lin Li, Qing Xie, Xiaohui Tao, Guandong Xu
[ABSTRACT]
Granularity and accuracy are two crucial factors for crime event prediction.
Within fine-grained event classification, multiple criminal intents may
alternately exhibit in preceding sequential events, and progress differently in
next. Such intensive intent dynamics makes training models hard to capture
unobserved intents, and thus leads to sub-optimal generalization performance,
especially in the intertwining of numerous potential events. To capture
comprehensive criminal intents, this paper proposes a fine-grained sequential
crime prediction framework, CrimeAlarm, that equips with a novel mutual
distillation strategy inspired by curriculum learning. During the early
training phase, spot-shared criminal intents are captured through
high-confidence sequence samples. In the later phase, spot-specific intents are
gradually learned by increasing the contribution of low-confidence sequences.
Meanwhile, the output probability distributions are reciprocally learned
between prediction networks to model unobserved criminal intents. Extensive
experiments show that CrimeAlarm outperforms state-of-the-art methods in terms
of NDCG@5, with improvements of 4.51% for the NYC16 and 7.73% for the CHI18 in
accuracy measures.
[COMMENTS]
Accepted by DASFAA 2024
[LINK]
http://arxiv.org/abs/2404.06756v1
[DATE]
2024-04-10 13:44:28+08:00
[CATEGORIES]
cs.LG
BONES: Near-Optimal Neural-Enhanced Video Streaming
[AUTHORS]
Lingdong Wang, Simran Singh, Jacob Chakareski, Mohammad Hajiesmaili, Ramesh K. Sitaraman
[ABSTRACT]
Accessing high-quality video content can be challenging due to insufficient
and unstable network bandwidth. Recent advances in neural enhancement have
shown promising results in improving the quality of degraded videos through
deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach
into video streaming, allowing users to download low-quality video segments and
then enhance them to obtain high-quality content without violating the playback
of the video stream. We introduce BONES, an NES control algorithm that jointly
manages the network and computational resources to maximize the quality of
experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization
problem and solves it in an online manner with near-optimal performance, making
it the first NES algorithm to provide a theoretical performance guarantee.
Comprehensive experimental results indicate that BONES increases QoE by 5\% to
20\% over state-of-the-art algorithms with minimal overhead. Our code is
available at https://github.com/UMass-LIDS/bones.
[LINK]
http://arxiv.org/abs/2310.09920v2
[DATE]
2024-04-10 13:39:23+08:00
[CATEGORIES]
cs.LG
CGNSDE: Conditional Gaussian Neural Stochastic Differential Equation for Modeling Complex Systems and Data Assimilation
[AUTHORS]
Chuanqi Chen, Nan Chen, Jin-Long Wu
[ABSTRACT]
A new knowledge-based and machine learning hybrid modeling approach, called
conditional Gaussian neural stochastic differential equation (CGNSDE), is
developed to facilitate modeling complex dynamical systems and implementing
analytic formulae of the associated data assimilation (DA). In contrast to the
standard neural network predictive models, the CGNSDE is designed to
effectively tackle both forward prediction tasks and inverse state estimation
problems. The CGNSDE starts by exploiting a systematic causal inference via
information theory to build a simple knowledge-based nonlinear model that
nevertheless captures as much explainable physics as possible. Then, neural
networks are supplemented to the knowledge-based model in a specific way, which
not only characterizes the remaining features that are challenging to model
with simple forms but also advances the use of analytic formulae to efficiently
compute the nonlinear DA solution. These analytic formulae are used as an
additional computationally affordable loss to train the neural networks that
directly improve the DA accuracy. This DA loss function promotes the CGNSDE to
capture the interactions between state variables and thus advances its modeling
skills. With the DA loss, the CGNSDE is more capable of estimating extreme
events and quantifying the associated uncertainty. Furthermore, crucial
physical properties in many complex systems, such as the translate-invariant
local dependence of state variables, can significantly simplify the neural
network structures and facilitate the CGNSDE to be applied to high-dimensional
systems. Numerical experiments based on chaotic systems with intermittency and
strong non-Gaussian features indicate that the CGNSDE outperforms
knowledge-based regression models, and the DA loss further enhances the
modeling skills of the CGNSDE.
[LINK]
http://arxiv.org/abs/2404.06749v1
[DATE]
2024-04-10 13:32:03+08:00
[CATEGORIES]
cs.LG
Discovering Closed-Loop Failures of Vision-Based Controllers via Reachability Analysis
[AUTHORS]
Kaustav Chakraborty, Somil Bansal
[ABSTRACT]
Machine learning driven image-based controllers allow robotic systems to take
intelligent actions based on the visual feedback from their environment.
Understanding when these controllers might lead to system safety violations is
important for their integration in safety-critical applications and engineering
corrective safety measures for the system. Existing methods leverage
simulation-based testing (or falsification) to find the failures of
vision-based controllers, i.e., the visual inputs that lead to closed-loop
safety violations. However, these techniques do not scale well to the scenarios
involving high-dimensional and complex visual inputs, such as RGB images. In
this work, we cast the problem of finding closed-loop vision failures as a
Hamilton-Jacobi (HJ) reachability problem. Our approach blends simulation-based
analysis with HJ reachability methods to compute an approximation of the
backward reachable tube (BRT) of the system, i.e., the set of unsafe states for
the system under vision-based controllers. Utilizing the BRT, we can tractably
and systematically find the system states and corresponding visual inputs that
lead to closed-loop failures. These visual inputs can be subsequently analyzed
to find the input characteristics that might have caused the failure. Besides
its scalability to high-dimensional visual inputs, an explicit computation of
BRT allows the proposed approach to capture non-trivial system failures that
are difficult to expose via random simulations. We demonstrate our framework on
two case studies involving an RGB image-based neural network controller for (a)
autonomous indoor navigation, and (b) autonomous aircraft taxiing.
[LINK]
http://arxiv.org/abs/2211.02736v4
[DATE]
2024-04-10 12:51:33+08:00
[CATEGORIES]
cs.LG
A Copula Graphical Model for Multi-Attribute Data using Optimal Transport
[AUTHORS]
Qi Zhang, Bing Li, Lingzhou Xue
[ABSTRACT]
Motivated by modern data forms such as images and multi-view data, the
multi-attribute graphical model aims to explore the conditional independence
structure among vectors. Under the Gaussian assumption, the conditional
independence between vectors is characterized by blockwise zeros in the
precision matrix. To relax the restrictive Gaussian assumption, in this paper,
we introduce a novel semiparametric multi-attribute graphical model based on a
new copula named Cyclically Monotone Copula. This new copula treats the
distribution of the node vectors as multivariate marginals and transforms them
into Gaussian distributions based on the optimal transport theory. Since the
model allows the node vectors to have arbitrary continuous distributions, it is
more flexible than the classical Gaussian copula method that performs
coordinatewise Gaussianization. We establish the concentration inequalities of
the estimated covariance matrices and provide sufficient conditions for
selection consistency of the group graphical lasso estimator. For the setting
with high-dimensional attributes, a {Projected Cyclically Monotone Copula}
model is proposed to address the curse of dimensionality issue that arises from
solving high-dimensional optimal transport problems. Numerical results based on
synthetic and real data show the efficiency and flexibility of our methods.
[COMMENTS]
37 pages
[LINK]
http://arxiv.org/abs/2404.06735v1
[DATE]
2024-04-10 12:49:00+08:00
[CATEGORIES]
cs.LG
Enhancing Safety in Mixed Traffic: Learning-Based Modeling and Efficient Control of Autonomous and Human-Driven Vehicles
[AUTHORS]
Jie Wang, Yash Vardhan Pant, Lei Zhao, Michał Antkiewicz, Krzysztof Czarnecki
[ABSTRACT]
With the increasing presence of autonomous vehicles (AVs) on public roads,
developing robust control strategies to navigate the uncertainty of
human-driven vehicles (HVs) is crucial. This paper introduces an advanced
method for modeling HV behavior, combining a first-principles model with
Gaussian process (GP) learning to enhance velocity prediction accuracy and
provide a measurable uncertainty. We validated this innovative HV model using
real-world data from field experiments and applied it to develop a GP-enhanced
model predictive control (GP-MPC) strategy. This strategy aims to improve
safety in mixed vehicle platoons by integrating uncertainty assessment into
distance constraints. Comparative simulation studies with a conventional model
predictive control (MPC) approach demonstrated that our GP-MPC strategy ensures
more reliable safe distancing and fosters efficient vehicular dynamics,
achieving notably higher speeds within the platoon. By incorporating a sparse
GP technique in HV modeling and adopting a dynamic GP prediction within the MPC
framework, we significantly reduced the computation time of GP-MPC, marking it
only 4.6% higher than that of the conventional MPC. This represents a
substantial improvement, making the process about 100 times faster than our
preliminary work without these approximations. Our findings underscore the
effectiveness of learning-based HV modeling in enhancing both safety and
operational efficiency in mixed-traffic environments, paving the way for more
harmonious AV-HV interactions.
[COMMENTS]
in IEEE Transactions on Intelligent Transportation Systems, 2024
[LINK]
http://arxiv.org/abs/2404.06732v1
[DATE]
2024-04-10 12:36:24+08:00
[CATEGORIES]
cs.LG
Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems
[AUTHORS]
Moise Blanchard
[ABSTRACT]
In this paper we provide oracle complexity lower bounds for finding a point
in a given set using a memory-constrained algorithm that has access to a
separation oracle. We assume that the set is contained within the unit
$d$-dimensional ball and contains a ball of known radius $\epsilon>0$. This
setup is commonly referred to as the feasibility problem. We show that to solve
feasibility problems with accuracy $\epsilon \geq e^{-d^{o(1)}}$, any
deterministic algorithm either uses $d^{1+\delta}$ bits of memory or must make
at least $1/(d^{0.01\delta }\epsilon^{2\frac{1-\delta}{1+1.01 \delta}-o(1)})$
oracle queries, for any $\delta\in[0,1]$. Additionally, we show that randomized
algorithms either use $d^{1+\delta}$ memory or make at least $1/(d^{2\delta}
\epsilon^{2(1-4\delta)-o(1)})$ queries for any $\delta\in[0,\frac{1}{4}]$.
Because gradient descent only uses linear memory $\mathcal O(d\ln 1/\epsilon)$
but makes $\Omega(1/\epsilon^2)$ queries, our results imply that it is
Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results
show that the oracle complexity for deterministic algorithms is always
polynomial in $1/\epsilon$ if the algorithm has less than quadratic memory in
$d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2
\ln1/\epsilon)$ memory, cutting plane methods only require $\mathcal O(d\ln
1/\epsilon)$ queries.
[LINK]
http://arxiv.org/abs/2404.06720v1
[DATE]
2024-04-10 12:15:50+08:00
[CATEGORIES]
cs.LG
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
[AUTHORS]
Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji
[ABSTRACT]
Generative adversarial networks (GANs) learn a target probability
distribution by optimizing a generator and a discriminator with minimax
objectives. This paper addresses the question of whether such optimization
actually provides the generator with gradients that make its distribution close
to the target distribution. We derive metrizable conditions, sufficient
conditions for the discriminator to serve as the distance between the
distributions by connecting the GAN formulation with the concept of sliced
optimal transport. Furthermore, by leveraging these theoretical results, we
propose a novel GAN training scheme, called slicing adversarial network (SAN).
With only simple modifications, a broad class of existing GANs can be converted
to SANs. Experiments on synthetic and image datasets support our theoretical
results and the SAN’s effectiveness as compared to usual GANs. Furthermore, we
also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score
amongst GANs for class conditional generation on ImageNet 256$\times$256. Our
implementation is available on https://ytakida.github.io/san.
[COMMENTS]
34 pages with 17 figures, accepted for publication in ICLR 2024
[LINK]
http://arxiv.org/abs/2301.12811v4
[DATE]
2024-04-10 12:03:06+08:00
[CATEGORIES]
cs.LG
Verification of Neural Reachable Tubes via Scenario Optimization and Conformal Prediction
[AUTHORS]
Albert Lin, Somil Bansal
[ABSTRACT]
Learning-based approaches for controlling safety-critical systems are rapidly
growing in popularity; thus, it is important to assure their performance and
safety. Hamilton-Jacobi (HJ) reachability analysis is a popular formal
verification tool for providing such guarantees, since it can handle general
nonlinear system dynamics, bounded adversarial system disturbances, and state
and input constraints. However, its computational and memory complexity scales
exponentially with the state dimension, making it intractable for large-scale
systems. To overcome this challenge, neural approaches, such as DeepReach, have
been used to synthesize reachable tubes and safety controllers for
high-dimensional systems. However, verifying these neural reachable tubes
remains challenging. In this work, we propose two verification methods, based
on robust scenario optimization and conformal prediction, to provide
probabilistic safety guarantees for neural reachable tubes. Our methods allow a
direct trade-off between resilience to outlier errors in the neural tube, which
are inevitable in a learning-based approach, and the strength of the
probabilistic safety guarantee. Furthermore, we show that split conformal
prediction, a widely used method in the machine learning community for
uncertainty quantification, reduces to a scenario-based approach, making the
two methods equivalent not only for verification of neural reachable tubes but
also more generally. To our knowledge, our proof is the first in the literature
to show a strong relationship between conformal prediction and scenario
optimization. Finally, we propose an outlier-adjusted verification approach
that uses the error distribution in neural reachable tubes to recover greater
safe volumes. We demonstrate the efficacy of the proposed approaches for the
high-dimensional problems of multi-vehicle collision avoidance and rocket
landing with no-go zones.
[COMMENTS]
Accepted to 6th Annual Learning for Dynamics & Control Conference.
arXiv admin note: text overlap with arXiv:2209.12336
[LINK]
http://arxiv.org/abs/2312.08604v2
[DATE]
2024-04-10 11:29:32+08:00
[CATEGORIES]
cs.LG
How to Craft Backdoors with Unlabeled Data Alone?
[AUTHORS]
Yifei Wang, Wenhan Ma, Yisen Wang
[ABSTRACT]
Relying only on unlabeled data, Self-supervised learning (SSL) can learn rich
features in an economical and scalable way. As the drive-horse for building
foundation models, SSL has received a lot of attention recently with wide
applications, which also raises security concerns where backdoor attack is a
major type of threat: if the released dataset is maliciously poisoned,
backdoored SSL models can behave badly when triggers are injected to test
samples. The goal of this work is to investigate this potential risk. We notice
that existing backdoors all require a considerable amount of \emph{labeled}
data that may not be available for SSL. To circumvent this limitation, we
explore a more restrictive setting called no-label backdoors, where we only
have access to the unlabeled data alone, where the key challenge is how to
select the proper poison set without using label information. We propose two
strategies for poison selection: clustering-based selection using pseudolabels,
and contrastive selection derived from the mutual information principle.
Experiments on CIFAR-10 and ImageNet-100 show that both no-label backdoors are
effective on many SSL methods and outperform random poisoning by a large
margin. Code will be available at https://github.com/PKU-ML/nlb.
[COMMENTS]
Accepted at ICLR 2024 Workshop on Data Problems for Foundation Models
(DPFM)
[LINK]
http://arxiv.org/abs/2404.06694v1
[DATE]
2024-04-10 10:54:18+08:00
[CATEGORIES]
cs.LG
Latent Chemical Space Searching for Plug-in Multi-objective Molecule Generation
[AUTHORS]
Ningfeng Liu, Jie Yu, Siyu Xiu, Xinfang Zhao, Siyu Lin, Bo Qiang, Ruqiu Zheng, Hongwei Jin, Liangren Zhang, Zhenming Liu
[ABSTRACT]
Molecular generation, an essential method for identifying new drug
structures, has been supported by advancements in machine learning and
computational technology. However, challenges remain in multi-objective
generation, model adaptability, and practical application in drug discovery. In
this study, we developed a versatile ‘plug-in’ molecular generation model that
incorporates multiple objectives related to target affinity, drug-likeness, and
synthesizability, facilitating its application in various drug development
contexts. We improved the Particle Swarm Optimization (PSO) in the context of
drug discoveries, and identified PSO-ENP as the optimal variant for
multi-objective molecular generation and optimization through comparative
experiments. The model also incorporates a novel target-ligand affinity
predictor, enhancing the model’s utility by supporting three-dimensional
information and improving synthetic feasibility. Case studies focused on
generating and optimizing drug-like big marine natural products were performed,
underscoring PSO-ENP’s effectiveness and demonstrating its considerable
potential for practical drug discovery applications.
[LINK]
http://arxiv.org/abs/2404.06691v1
[DATE]
2024-04-10 10:37:24+08:00
[CATEGORIES]
cs.LG
Zipformer: A faster and better encoder for automatic speech recognition
[AUTHORS]
Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
[ABSTRACT]
The Conformer has become the most popular encoder model for automatic speech
recognition (ASR). It adds convolution modules to a transformer to learn both
local and global dependencies. In this work we describe a faster, more
memory-efficient, and better-performing transformer, called Zipformer. Modeling
changes include: 1) a U-Net-like encoder structure where middle stacks operate
at lower frame rates; 2) reorganized block structure with more modules, within
which we re-use attention weights for efficiency; 3) a modified form of
LayerNorm called BiasNorm allows us to retain some length information; 4) new
activation functions SwooshR and SwooshL work better than Swish. We also
propose a new optimizer, called ScaledAdam, which scales the update by each
tensor’s current scale to keep the relative change about the same, and also
explictly learns the parameter scale. It achieves faster convergence and better
performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and
WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer
over other state-of-the-art ASR models. Our code is publicly available at
https://github.com/k2-fsa/icefall.
[COMMENTS]
Published as a conference paper at ICLR 2024
[LINK]
http://arxiv.org/abs/2310.11230v4
[DATE]
2024-04-10 10:35:38+08:00
[CATEGORIES]
cs.LG
Causal Representation Learning from Multiple Distributions: A General Setting
[AUTHORS]
Kun Zhang, Shaoan Xie, Ignavier Ng, Yujia Zheng
[ABSTRACT]
In many problems, the measured variables (e.g., image pixels) are just
mathematical functions of the hidden causal variables (e.g., the underlying
concepts or objects). For the purpose of making predictions in changing
environments or making proper changes to the system, it is helpful to recover
the hidden causal variables $Z_i$ and their causal relations represented by
graph $\mathcal{G}_Z$. This problem has recently been known as causal
representation learning. This paper is concerned with a general, completely
nonparametric setting of causal representation learning from multiple
distributions (arising from heterogeneous data or nonstationary time series),
without assuming hard interventions behind distribution changes. We aim to
develop general solutions in this fundamental case; as a by product, this helps
see the unique benefit offered by other assumptions such as parametric causal
models or hard interventions. We show that under the sparsity constraint on the
recovered graph over the latent variables and suitable sufficient change
conditions on the causal influences, interestingly, one can recover the
moralized graph of the underlying directed acyclic graph, and the recovered
latent variables and their relations are related to the underlying causal model
in a specific, nontrivial way. In some cases, each latent variable can even be
recovered up to component-wise transformations. Experimental results verify our
theoretical claims.
[LINK]
http://arxiv.org/abs/2402.05052v2
[DATE]
2024-04-10 10:08:29+08:00
[CATEGORIES]
cs.LG
Causal Unit Selection using Tractable Arithmetic Circuits
[AUTHORS]
Haiying Huang, Adnan Darwiche
[ABSTRACT]
The unit selection problem aims to find objects, called units, that optimize
a causal objective function which describes the objects’ behavior in a causal
context (e.g., selecting customers who are about to churn but would most likely
change their mind if encouraged). While early studies focused mainly on
bounding a specific class of counterfactual objective functions using data,
more recent work allows one to find optimal units exactly by reducing the
causal objective to a classical objective on a meta-model, and then applying a
variant of the classical Variable Elimination (VE) algorithm to the meta-model
– assuming a fully specified causal model is available. In practice, however,
finding optimal units using this approach can be very expensive because the
used VE algorithm must be exponential in the constrained treewidth of the
meta-model, which is larger and denser than the original model. We address this
computational challenge by introducing a new approach for unit selection that
is not necessarily limited by the constrained treewidth. This is done through
compiling the meta-model into a special class of tractable arithmetic circuits
that allows the computation of optimal units in time linear in the circuit
size. We finally present empirical results on random causal models that show
order-of-magnitude speedups based on the proposed method for solving unit
selection.
[LINK]
http://arxiv.org/abs/2404.06681v1
[DATE]
2024-04-10 10:02:34+08:00
[CATEGORIES]
cs.LG
Topological Feature Search Method for Multichannel EEG: Application in ADHD classification
[AUTHORS]
Tianming Cai, Guoying Zhao, Junbin Zang, Chen Zong, Zhidong Zhang, Chenyang Xue
[ABSTRACT]
In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity
Disorder (ADHD) using electroencephalography (EEG) has garnered attention from
researchers. EEG, known for its expediency and efficiency, plays a pivotal role
in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG
signals and inter-subject variability pose challenges to the diagnostic and
classification processes. Topological Data Analysis (TDA) offers a novel
perspective for ADHD classification, diverging from traditional time-frequency
domain features. Yet, conventional TDA models are restricted to single-channel
time series and are susceptible to noise, leading to the loss of topological
features in persistence diagrams.This paper presents an enhanced TDA approach
applicable to multi-channel EEG in ADHD. Initially, optimal input parameters
for multi-channel EEG are determined. Subsequently, each channel’s EEG
undergoes phase space reconstruction (PSR) followed by the utilization of
k-Power Distance to Measure (k-PDTM) for approximating ideal point clouds.
Then, multi-dimensional time series are re-embedded, and TDA is applied to
obtain topological feature information. Gaussian function-based Multivariate
Kernel Density Estimation (MKDE) is employed in the merger persistence diagram
to filter out desired topological feature mappings. Finally, persistence image
(PI) method is utilized to extract topological features, and the influence of
various weighting functions on the results is discussed.The effectiveness of
our method is evaluated using the IEEE ADHD dataset. Results demonstrate that
the accuracy, sensitivity, and specificity reach 85.60%, 83.61%, and 88.33%,
respectively. Compared to traditional TDA methods, our method was effectively
improved and outperforms typical nonlinear descriptors. These findings indicate
that our method exhibits higher precision and robustness.
[LINK]
http://arxiv.org/abs/2404.06676v1
[DATE]
2024-04-10 09:37:41+08:00
[CATEGORIES]
cs.LG
Toward Cross-Layer Energy Optimizations in Machine Learning Systems
[AUTHORS]
Jae-Won Chung, Mosharaf Chowdhury
[ABSTRACT]
The enormous energy consumption of machine learning (ML) and generative AI
workloads shows no sign of waning, taking a toll on operating costs, power
delivery, and environmental sustainability. Despite a long line of research on
energy-efficient hardware, we found that software plays a critical role in ML
energy optimization through two recent works: Zeus and Perseus. This is
especially true for large language models (LLMs) because their model sizes and,
therefore, energy demands are growing faster than hardware efficiency
improvements. Therefore, we advocate for a cross-layer approach for energy
optimizations in ML systems, where hardware provides architectural support that
pushes energy-efficient software further, while software leverages and
abstracts the hardware to develop techniques that bring hardware-agnostic
energy-efficiency gains.
[LINK]
http://arxiv.org/abs/2404.06675v1
[DATE]
2024-04-10 09:35:17+08:00
[CATEGORIES]
cs.LG
Leveraging Diffusion For Strong and High Quality Face Morphing Attacks
[AUTHORS]
Zander W. Blasingame, Chen Liu
[ABSTRACT]
Face morphing attacks seek to deceive a Face Recognition (FR) system by
presenting a morphed image consisting of the biometric qualities from two
different identities with the aim of triggering a false acceptance with one of
the two identities, thereby presenting a significant threat to biometric
systems. The success of a morphing attack is dependent on the ability of the
morphed image to represent the biometric characteristics of both identities
that were used to create the image. We present a novel morphing attack that
uses a Diffusion-based architecture to improve the visual fidelity of the image
and the ability of the morphing attack to represent characteristics from both
identities. We demonstrate the effectiveness of the proposed attack by
evaluating its visual fidelity via the Frechet Inception Distance (FID). Also,
extensive experiments are conducted to measure the vulnerability of FR systems
to the proposed attack. The ability of a morphing attack detector to detect the
proposed attack is measured and compared against two state-of-the-art GAN-based
morphing attacks along with two Landmark-based attacks. Additionally, a novel
metric to measure the relative strength between different morphing attacks is
introduced and evaluated.
[COMMENTS]
Diffusion Morphs (DiM) paper. Accepted in IEEE TBIOM
[LINK]
http://arxiv.org/abs/2301.04218v4
[DATE]
2024-04-10 09:11:15+08:00
[CATEGORIES]
cs.LG
Stabilizing Estimates of Shapley Values with Control Variates
[AUTHORS]
Jeremy Goldwasser, Giles Hooker
[ABSTRACT]
Shapley values are among the most popular tools for explaining predictions of
blackbox machine learning models. However, their high computational cost
motivates the use of sampling approximations, inducing a considerable degree of
uncertainty. To stabilize these model explanations, we propose ControlSHAP, an
approach based on the Monte Carlo technique of control variates. Our
methodology is applicable to any machine learning model and requires virtually
no extra computation or modeling effort. On several high-dimensional datasets,
we find it can produce dramatic reductions in the Monte Carlo variability of
Shapley estimates.
[LINK]
http://arxiv.org/abs/2310.07672v3
[DATE]
2024-04-10 08:35:36+08:00
[CATEGORIES]
cs.LG
A Comprehensive Survey on Uncertainty Quantification for Deep Learning
[AUTHORS]
Wenchong He, Zhe Jiang
[ABSTRACT]
Deep neural networks (DNNs) have achieved tremendous success in making
accurate predictions for computer vision, natural language processing, as well
as science and engineering domains. However, it is also well-recognized that
DNNs sometimes make unexpected, incorrect, but overconfident predictions. This
can cause serious consequences in high-stake applications, such as autonomous
driving, medical diagnosis, and disaster response. Uncertainty quantification
(UQ) aims to estimate the confidence of DNN predictions beyond prediction
accuracy. In recent years, many UQ methods have been developed for DNNs. It is
of great practical value to systematically categorize these UQ methods and
compare their advantages and disadvantages. However, existing surveys mostly
focus on categorizing UQ methodologies from a neural network architecture
perspective or a Bayesian perspective and ignore the source of uncertainty that
each methodology can incorporate, making it difficult to select an appropriate
UQ method in practice. To fill the gap, this paper presents a systematic
taxonomy of UQ methods for DNNs based on the types of uncertainty sources (data
uncertainty versus model uncertainty). We summarize the advantages and
disadvantages of methods in each category. We show how our taxonomy of UQ
methodologies can potentially help guide the choice of UQ method in different
machine learning problems (e.g., active learning, robustness, and reinforcement
learning). We also identify current research gaps and propose several future
research directions.
[COMMENTS]
39 pages, 14 figures
[LINK]
http://arxiv.org/abs/2302.13425v4
[DATE]
2024-04-10 08:19:54+08:00
[CATEGORIES]
cs.LG