Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
[AUTHORS]
Gašper Beguš, Thomas Lu, Zili Wang
[ABSTRACT]
Computational models of syntax are predominantly text-based. Here we propose
that the most basic first step in the evolution of syntax can be modeled
directly from raw speech in a fully unsupervised way. We focus on one of the
most ubiquitous and elementary suboperation of syntax – concatenation. We
introduce spontaneous concatenation: a phenomenon where convolutional neural
networks (CNNs) trained on acoustic recordings of individual words start
generating outputs with two or even three words concatenated without ever
accessing data with multiple words in the input. We replicate this finding in
several independently trained models with different hyperparameters and
training data. Additionally, networks trained on two words learn to embed words
into novel unobserved word combinations. We also show that the concatenated
outputs contain precursors to compositionality. To our knowledge, this is a
previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on
raw speech and has implications both for our understanding of how these
architectures learn as well as for modeling syntax and its evolution in the
brain from raw acoustic inputs. We also propose a potential neural mechanism
called disinhibition that outlines a possible neural pathway towards
concatenation and compositionality and suggests our modeling is useful for
generating testable prediction for biological and artificial neural processing
of speech.
[LINK]
http://arxiv.org/abs/2305.01626v3
[DATE]
2024-11-21 02:30:49+08:00
[CATEGORIES]
cs.CL
Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models
[AUTHORS]
Chanseo Lee, Sonu Kumar, Kimon A. Vogt, Sam Meraj, Antonia Vogt
[ABSTRACT]
The increasing demand for multilingual capabilities in healthcare underscores
the need for AI models adept at processing diverse languages, particularly in
clinical documentation and decision-making. Arabic, with its complex
morphology, syntax, and diglossia, poses unique challenges for natural language
processing (NLP) in medical contexts. This case study evaluates Sporo AraSum, a
language model tailored for Arabic clinical documentation, against JAIS, the
leading Arabic NLP model. Using synthetic datasets and modified PDQI-9 metrics
modified ourselves for the purposes of assessing model performances in a
different language. The study assessed the models’ performance in summarizing
patient-physician interactions, focusing on accuracy, comprehensiveness,
clinical utility, and linguistic-cultural competence.
Results indicate that Sporo AraSum significantly outperforms JAIS in
AI-centric quantitative metrics and all qualitative attributes measured in our
modified version of the PDQI-9. AraSum’s architecture enables precise and
culturally sensitive documentation, addressing the linguistic nuances of Arabic
while mitigating risks of AI hallucinations. These findings suggest that Sporo
AraSum is better suited to meet the demands of Arabic-speaking healthcare
environments, offering a transformative solution for multilingual clinical
workflows. Future research should incorporate real-world data to further
validate these findings and explore broader integration into healthcare
systems.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2411.06713
[LINK]
http://arxiv.org/abs/2411.13518v1
[DATE]
2024-11-21 02:10:19+08:00
[CATEGORIES]
cs.CL
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
[AUTHORS]
Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui
[ABSTRACT]
One of the most striking findings in modern research on large language models
(LLMs) is that scaling up compute during training leads to better results.
However, less attention has been given to the benefits of scaling compute
during inference. This survey focuses on these inference-time approaches. We
explore three areas under a unified mathematical formalism: token-level
generation algorithms, meta-generation algorithms, and efficient generation.
Token-level generation algorithms, often called decoding algorithms, operate by
sampling a single token at a time or constructing a token-level search space
and then selecting an output. These methods typically assume access to a
language model’s logits, next-token distributions, or probability scores.
Meta-generation algorithms work on partial or full sequences, incorporating
domain knowledge, enabling backtracking, and integrating external information.
Efficient generation methods aim to reduce token costs and improve the speed of
generation. Our survey unifies perspectives from three research communities:
traditional natural language processing, modern LLMs, and machine learning
systems.
[LINK]
http://arxiv.org/abs/2406.16838v2
[DATE]
2024-11-21 01:57:26+08:00
[CATEGORIES]
cs.CL
cs.LG
Utilizing Large Language Models to Synthesize Product Desirability Datasets
[AUTHORS]
John D. Hastings, Sherri Weitl-Harms, Joseph Doty, Zachary L. Myers, Warren Thompson
[ABSTRACT]
This research explores the application of large language models (LLMs) to
generate synthetic datasets for Product Desirability Toolkit (PDT) testing, a
key component in evaluating user sentiment and product experience. Utilizing
gpt-4o-mini, a cost-effective alternative to larger commercial LLMs, three
methods, Word+Review, Review+Word, and Supply-Word, were each used to
synthesize 1000 product reviews. The generated datasets were assessed for
sentiment alignment, textual diversity, and data generation cost. Results
demonstrated high sentiment alignment across all methods, with Pearson
correlations ranging from 0.93 to 0.97. Supply-Word exhibited the highest
diversity and coverage of PDT terms, although with increased generation costs.
Despite minor biases toward positive sentiments, in situations with limited
test data, LLM-generated synthetic data offers significant advantages,
including scalability, cost savings, and flexibility in dataset production.
[COMMENTS]
9 pages, 2 figures, 6 tables
[LINK]
http://arxiv.org/abs/2411.13485v1
[DATE]
2024-11-21 01:35:21+08:00
[CATEGORIES]
cs.CL
cs.LG
PatentEdits: Framing Patent Novelty as Textual Entailment
[AUTHORS]
Ryan Lee, Alexander Spangher, Xuezhe Ma
[LINK]
http://arxiv.org/abs/2411.13477v1
[DATE]
2024-11-21 01:23:40+08:00
[CATEGORIES]
cs.CL
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
[AUTHORS]
Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso
[COMMENTS]
18 pages, 3 figures, an abridged version to appear in NeurIPS 2024
AFM Workshop
[LINK]
http://arxiv.org/abs/2411.13451v1
[DATE]
2024-11-21 00:54:15+08:00
[CATEGORIES]
cs.CL
cs.LG
WaterPark: A Robustness Assessment of Language Model Watermarking
[AUTHORS]
Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang
[ABSTRACT]
To mitigate the misuse of large language models (LLMs), such as
disinformation, automated phishing, and academic cheating, there is a pressing
need for the capability of identifying LLM-generated texts. Watermarking
emerges as one promising solution: it plants statistical signals into LLMs’
generative processes and subsequently verifies whether LLMs produce given
texts. Various watermarking methods (“watermarkers”) have been proposed; yet,
due to the lack of unified evaluation platforms, many critical questions remain
under-explored: i) What are the strengths/limitations of various watermarkers,
especially their attack robustness? ii) How do various design choices impact
their robustness? iii) How to optimally operate watermarkers in adversarial
environments?
To fill this gap, we systematize existing LLM watermarkers and watermark
removal attacks, mapping out their design spaces. We then develop WaterPark, a
unified platform that integrates 10 state-of-the-art watermarkers and 12
representative attacks. More importantly, leveraging WaterPark, we conduct a
comprehensive assessment of existing watermarkers, unveiling the impact of
various design choices on their attack robustness. For instance, a
watermarker’s resilience to increasingly intensive attacks hinges on its
context dependency. We further explore the best practices to operate
watermarkers in adversarial environments. For instance, using a generic
detector alongside a watermark-specific detector improves the security of
vulnerable watermarkers. We believe our study sheds light on current LLM
watermarking techniques while WaterPark serves as a valuable testbed to
facilitate future research.
[COMMENTS]
22 pages
[LINK]
http://arxiv.org/abs/2411.13425v1
[DATE]
2024-11-21 00:09:22+08:00
[CATEGORIES]
cs.CL
cs.LG
AI-generated Image Detection: Passive or Watermark?
[AUTHORS]
Moyang Guo, Yuepeng Hu, Zhengyuan Jiang, Zeyu Li, Amir Sadovnik, Arka Daw, Neil Gong
[ABSTRACT]
While text-to-image models offer numerous benefits, they also pose
significant societal risks. Detecting AI-generated images is crucial for
mitigating these risks. Detection methods can be broadly categorized into
passive and watermark-based approaches: passive detectors rely on artifacts
present in AI-generated images, whereas watermark-based detectors proactively
embed watermarks into such images. A key question is which type of detector
performs better in terms of effectiveness, robustness, and efficiency. However,
the current literature lacks a comprehensive understanding of this issue. In
this work, we aim to bridge that gap by developing ImageDetectBench, the first
comprehensive benchmark to compare the effectiveness, robustness, and
efficiency of passive and watermark-based detectors. Our benchmark includes
four datasets, each containing a mix of AI-generated and non-AI-generated
images. We evaluate five passive detectors and four watermark-based detectors
against eight types of common perturbations and three types of adversarial
perturbations. Our benchmark results reveal several interesting findings. For
instance, watermark-based detectors consistently outperform passive detectors,
both in the presence and absence of perturbations. Based on these insights, we
provide recommendations for detecting AI-generated images, e.g., when both
types of detectors are applicable, watermark-based detectors should be the
preferred choice.
[LINK]
http://arxiv.org/abs/2411.13553v1
[DATE]
2024-11-21 02:59:58+08:00
[CATEGORIES]
cs.LG
Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning
[AUTHORS]
Quyen Tran, Hoang Phan, Minh Le, Tuan Truong, Dinh Phung, Linh Ngo, Thien Nguyen, Nhat Ho, Trung Le
[ABSTRACT]
Drawing inspiration from human learning behaviors, this work proposes a novel
approach to mitigate catastrophic forgetting in Prompt-based Continual Learning
models by exploiting the relationships between continuously emerging class
data. We find that applying human habits of organizing and connecting
information can serve as an efficient strategy when training deep learning
models. Specifically, by building a hierarchical tree structure based on the
expanding set of labels, we gain fresh insights into the data, identifying
groups of similar classes could easily cause confusion. Additionally, we delve
deeper into the hidden connections between classes by exploring the original
pretrained model’s behavior through an optimal transport-based approach. From
these insights, we propose a novel regularization loss function that encourages
models to focus more on challenging knowledge areas, thereby enhancing overall
performance. Experimentally, our method demonstrated significant superiority
over the most robust state-of-the-art models on various benchmarks.
[LINK]
http://arxiv.org/abs/2410.04327v2
[DATE]
2024-11-21 02:59:23+08:00
[CATEGORIES]
cs.LG
HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution
[AUTHORS]
Shoaib Meraj Sami, Md Mahedi Hasan, Jeremy Dawson, Nasser Nasrabadi
[ABSTRACT]
Although recent diffusion-based single-step super-resolution methods achieve
better performance as compared to SinSR, they are computationally complex. To
improve the performance of SinSR, we investigate preserving the high-frequency
detail features during super-resolution (SR) because the downgraded images lack
detailed information. For this purpose, we introduce a high-frequency
perceptual loss by utilizing an invertible neural network (INN) pretrained on
the ImageNet dataset. Different feature maps of pretrained INN produce
different high-frequency aspects of an image. During the training phase, we
impose to preserve the high-frequency features of super-resolved and ground
truth (GT) images that improve the SR image quality during inference.
Furthermore, we also utilize the Jenson-Shannon divergence between GT and SR
images in the pretrained DINO-v2 embedding space to match their distribution.
By introducing the $\textbf{h}igh$- $\textbf{f}requency$ preserving loss and
distribution matching constraint in the single-step $\textbf{diff}usion-based$
SR ($\textbf{HF-Diff}$), we achieve a state-of-the-art CLIPIQA score in the
benchmark RealSR, RealSet65, DIV2K-Val, and ImageNet datasets. Furthermore, the
experimental results in several datasets demonstrate that our high-frequency
perceptual loss yields better SR image quality than LPIPS and VGG-based
perceptual losses. Our code will be released at
https://github.com/shoaib-sami/HF-Diff.
[COMMENTS]
8 pages
[LINK]
http://arxiv.org/abs/2411.13548v1
[DATE]
2024-11-21 02:56:24+08:00
[CATEGORIES]
cs.LG
Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm
[AUTHORS]
Rushabh Solanki, Elliot Creager
[ABSTRACT]
The deployment of AI in consumer products is currently focused on the use of
so-called foundation models, large neural networks pre-trained on massive
corpora of digital records. This emphasis on scaling up datasets and
pre-training computation raises the risk of further consolidating the industry,
and enabling monopolistic (or oligopolistic) behavior. Judges and regulators
seeking to improve market competition may employ various remedies. This paper
explores dissolution – the breaking up of a monopolistic entity into smaller
firms – as one such remedy, focusing in particular on the technical challenges
and opportunities involved in the breaking up of large models and datasets. We
show how the framework of Conscious Data Contribution can enable user autonomy
during under dissolution. Through a simulation study, we explore how
fine-tuning and the phenomenon of “catastrophic forgetting” could actually
prove beneficial as a type of machine unlearning that allows users to specify
which data they want used for what purposes.
[COMMENTS]
This paper appeared at the 2nd Workshop on Regulatable ML at NeurIPS
2024
[LINK]
http://arxiv.org/abs/2411.13546v1
[DATE]
2024-11-21 02:55:51+08:00
[CATEGORIES]
cs.LG
Identity Preserving 3D Head Stylization with Multiview Score Distillation
[AUTHORS]
Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Guzelant, Aysegul Dundar
[ABSTRACT]
3D head stylization transforms realistic facial features into artistic
representations, enhancing user engagement across gaming and virtual reality
applications. While 3D-aware generators have made significant advancements,
many 3D stylization methods primarily provide near-frontal views and struggle
to preserve the unique identities of original subjects, often resulting in
outputs that lack diversity and individuality. This paper addresses these
challenges by leveraging the PanoHead model, synthesizing images from a
comprehensive 360-degree perspective. We propose a novel framework that employs
negative log-likelihood distillation (LD) to enhance identity preservation and
improve stylization quality. By integrating multi-view grid score and mirror
gradients within the 3D GAN architecture and introducing a score rank weighing
technique, our approach achieves substantial qualitative and quantitative
improvements. Our findings not only advance the state of 3D head stylization
but also provide valuable insights into effective distillation processes
between diffusion models and GANs, focusing on the critical issue of identity
preservation. Please visit the https://three-bee.github.io/head_stylization for
more visuals.
[COMMENTS]
https://three-bee.github.io/head_stylization
[LINK]
http://arxiv.org/abs/2411.13536v1
[DATE]
2024-11-21 02:37:58+08:00
[CATEGORIES]
cs.LG
Retrieval with Learned Similarities
[AUTHORS]
Bailu Ding, Jiaqi Zhai
[ABSTRACT]
Retrieval plays a fundamental role in recommendation systems, search, and
natural language processing (NLP) by efficiently finding relevant items from a
large corpus given a query. Dot products have been widely used as the
similarity function in such tasks, enabled by Maximum Inner Product Search
(MIPS) algorithms for efficient retrieval. However, state-of-the-art retrieval
algorithms have migrated to learned similarities. These advanced approaches
encompass multiple query embeddings, complex neural networks, direct item ID
decoding via beam search, and hybrid solutions. Unfortunately, we lack
efficient solutions for retrieval in these state-of-the-art setups. Our work
addresses this gap by investigating efficient retrieval techniques with
expressive learned similarity functions. We establish Mixture-of-Logits (MoL)
as a universal approximator of similarity functions, demonstrate that MoL’s
expressiveness can be realized empirically to achieve superior performance on
diverse retrieval scenarios, and propose techniques to retrieve the approximate
top-k results using MoL with tight error bounds. Through extensive
experimentation, we show that MoL, enhanced by our proposed mutual
information-based load balancing loss, sets new state-of-the-art results across
heterogeneous scenarios, including sequential retrieval models in
recommendation systems and finetuning language models for question answering;
and our approximate top-$k$ algorithms outperform baselines by up to 66x in
latency while achieving >.99 recall rate compared to exact algorithms.
[COMMENTS]
21 pages, 3 figures. Our code and pre-trained model checkpoints are
available at https://github.com/bailuding/rails
[LINK]
http://arxiv.org/abs/2407.15462v3
[DATE]
2024-11-21 02:30:19+08:00
[CATEGORIES]
cs.LG
Delegating Data Collection in Decentralized Machine Learning
[AUTHORS]
Nivasini Ananthakrishnan, Stephen Bates, Michael I. Jordan, Nika Haghtalab
[ABSTRACT]
Motivated by the emergence of decentralized machine learning (ML) ecosystems,
we study the delegation of data collection. Taking the field of contract theory
as our starting point, we design optimal and near-optimal contracts that deal
with two fundamental information asymmetries that arise in decentralized ML:
uncertainty in the assessment of model quality and uncertainty regarding the
optimal performance of any model. We show that a principal can cope with such
asymmetry via simple linear contracts that achieve 1-1/e fraction of the
optimal utility. To address the lack of a priori knowledge regarding the
optimal performance, we give a convex program that can adaptively and
efficiently compute the optimal contract. We also study linear contracts and
derive the optimal utility in the more complex setting of multiple
interactions.
[LINK]
http://arxiv.org/abs/2309.01837v3
[DATE]
2024-11-21 02:26:03+08:00
[CATEGORIES]
cs.LG
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms
[AUTHORS]
Khashayar Khosravi, Renato Paes Leme, Chara Podimata, Apostolis Tsorvantzis
[ABSTRACT]
We propose a model for learning with bandit feedback while accounting for
deterministically evolving and unobservable states that we call Bandits with
Deterministically Evolving States ($B$-$DES$). The workhorse applications of
our model are learning for recommendation systems and learning for online ads.
In both cases, the reward that the algorithm obtains at each round is a
function of the short-term reward of the action chosen and how “healthy” the
system is (i.e., as measured by its state). For example, in recommendation
systems, the reward that the platform obtains from a user’s engagement with a
particular type of content depends not only on the inherent features of the
specific content, but also on how the user’s preferences have evolved as a
result of interacting with other types of content on the platform. Our general
model accounts for the different rate $\lambda \in [0,1]$ at which the state
evolves (e.g., how fast a user’s preferences shift as a result of previous
content consumption) and encompasses standard multi-armed bandits as a special
case. The goal of the algorithm is to minimize a notion of regret against the
best-fixed sequence of arms pulled, which is significantly harder to attain
compared to standard benchmark of the best-fixed action in hindsight. We
present online learning algorithms for any possible value of the evolution rate
$\lambda$ and we show the robustness of our results to various model
misspecifications.
[LINK]
http://arxiv.org/abs/2307.11655v4
[DATE]
2024-11-21 02:25:16+08:00
[CATEGORIES]
cs.LG
Quantum Attention for Vision Transformers in High Energy Physics
[AUTHORS]
Alessandro Tesi, Gopal Ramesh Dahale, Sergei Gleyzer, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva
[ABSTRACT]
We present a novel hybrid quantum-classical vision transformer architecture
incorporating quantum orthogonal neural networks (QONNs) to enhance performance
and computational efficiency in high-energy physics applications. Building on
advancements in quantum vision transformers, our approach addresses limitations
of prior models by leveraging the inherent advantages of QONNs, including
stability and efficient parameterization in high-dimensional spaces. We
evaluate the proposed architecture using multi-detector jet images from CMS
Open Data, focusing on the task of distinguishing quark-initiated from
gluon-initiated jets. The results indicate that embedding quantum orthogonal
transformations within the attention mechanism can provide robust performance
while offering promising scalability for machine learning challenges associated
with the upcoming High Luminosity Large Hadron Collider. This work highlights
the potential of quantum-enhanced models to address the computational demands
of next-generation particle physics experiments.
[COMMENTS]
9 pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.13520v1
[DATE]
2024-11-21 02:11:17+08:00
[CATEGORIES]
cs.LG
Procurement Auctions via Approximately Optimal Submodular Optimization
[AUTHORS]
Yuan Deng, Amin Karbasi, Vahab Mirrokni, Renato Paes Leme, Grigoris Velegkas, Song Zuo
[ABSTRACT]
We study procurement auctions, where an auctioneer seeks to acquire services
from strategic sellers with private costs. The quality of services is measured
by a submodular function known to the auctioneer. Our goal is to design
computationally efficient procurement auctions that (approximately) maximize
the difference between the quality of the acquired services and the total cost
of the sellers, while ensuring incentive compatibility (IC), individual
rationality (IR) for sellers, and non-negative surplus (NAS) for the
auctioneer.
Our contributions are twofold: (i) we provide an improved analysis of
existing algorithms for non-positive submodular function maximization, and (ii)
we design efficient frameworks that transform submodular optimization
algorithms into mechanisms that are IC, IR, NAS, and approximation-preserving.
These frameworks apply to both the offline setting, where all sellers’ bids and
services are available simultaneously, and the online setting, where sellers
arrive in an adversarial order, requiring the auctioneer to make irrevocable
decisions.
We also explore whether state-of-the-art submodular optimization algorithms
can be converted into descending auctions in adversarial settings, where the
schedule of descending prices is determined by an adversary. We show that a
submodular optimization algorithm satisfying bi-criteria $(1/2,
1)$-approximation in welfare can be effectively adapted to a descending
auction. Additionally, we establish a connection between descending auctions
and online submodular optimization.
Finally, we demonstrate the practical applications of our frameworks by
instantiating them with state-of-the-art submodular optimization algorithms and
empirically comparing their welfare performance on publicly available datasets
with thousands of sellers.
[LINK]
http://arxiv.org/abs/2411.13513v1
[DATE]
2024-11-21 02:06:55+08:00
[CATEGORIES]
cs.LG
Conformal Prediction for Hierarchical Data
[AUTHORS]
Guillaume Principato, Yvenn Amara-Ouali, Yannig Goude, Bachir Hamrouche, Jean-Michel Poggi, Gilles Stoltz
[ABSTRACT]
Reconciliation has become an essential tool in multivariate point forecasting
for hierarchical time series. However, there is still a lack of understanding
of the theoretical properties of probabilistic Forecast Reconciliation
techniques. Meanwhile, Conformal Prediction is a general framework with growing
appeal that provides prediction sets with probabilistic guarantees in finite
sample. In this paper, we propose a first step towards combining Conformal
Prediction and Forecast Reconciliation by analyzing how including a
reconciliation step in the Split Conformal Prediction (SCP) procedure enhances
the resulting prediction sets. In particular, we show that the validity granted
by SCP remains while improving the efficiency of the prediction sets. We also
advocate a variation of the theoretical procedure for practical use. Finally,
we illustrate these results with simulations.
[COMMENTS]
14 pages, 2 figures
[LINK]
http://arxiv.org/abs/2411.13479v1
[DATE]
2024-11-21 01:26:26+08:00
[CATEGORIES]
cs.LG
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
[AUTHORS]
Mingyuan Zhou, Huangjie Zheng, Yi Gu, Zhendong Wang, Hai Huang
[ABSTRACT]
Score identity Distillation (SiD) is a data-free method that has achieved
SOTA performance in image generation by leveraging only a pretrained diffusion
model, without requiring any training data. However, its ultimate performance
is constrained by how accurate the pretrained model captures the true data
scores at different stages of the diffusion process. In this paper, we
introduce SiDA (SiD with Adversarial Loss), which not only enhances generation
quality but also improves distillation efficiency by incorporating real images
and adversarial loss. SiDA utilizes the encoder from the generator’s score
network as a discriminator, boosting its ability to distinguish between real
images and those generated by SiD. The adversarial loss is batch-normalized
within each GPU and then combined with the original SiD loss. This integration
effectively incorporates the average “fakeness” per GPU batch into the
pixel-based SiD loss, enabling SiDA to distill a single-step generator either
from scratch or by fine-tuning an existing one. SiDA converges significantly
faster than its predecessor when trained from scratch, and swiftly improves
upon the original model’s performance after an initial warmup period during
fine-tuning from a pre-distilled SiD generator. This one-step adversarial
distillation method establishes new benchmarks in generation performance when
distilling EDM diffusion models pretrained on CIFAR-10 (32x32) and ImageNet
(64x64), achieving FID score of 1.110 on ImageNet 64x64. It sets record-low FID
scores when distilling EDM2 models trained on ImageNet (512x512), surpassing
even the largest teacher model, EDM2-XXL. Our SiDA’s results record FID scores
of 2.156 for EDM2-XS, 1.669 for S, 1.488 for M, 1.413 for L, 1.379 for XL, and
1.366 for XXL, demonstrating significant improvements across all model sizes.
Our open-source code will be integrated into the SiD codebase.
[LINK]
http://arxiv.org/abs/2410.14919v3
[DATE]
2024-11-21 01:20:00+08:00
[CATEGORIES]
cs.LG
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
[AUTHORS]
Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk
[ABSTRACT]
This paper considers the learning of logical (Boolean) functions with a focus
on the generalization on the unseen (GOTU) setting, a strong case of
out-of-distribution generalization. This is motivated by the fact that the rich
combinatorial nature of data in certain reasoning tasks (e.g.,
arithmetic/logic) makes representative data sampling challenging, and learning
successfully under GOTU gives a first vignette of an ‘extrapolating’ or
‘reasoning’ learner. We study how different network architectures trained by
(S)GD perform under GOTU and provide both theoretical and experimental evidence
that for sparse functions and a class of network models including instances of
Transformers, random features models, and linear networks, a
min-degree-interpolator is learned on the unseen. More specifically, this means
an interpolator of the training data that has minimal Fourier mass on the
higher degree basis elements. These findings lead to two implications: (1) we
provide an explanation to the length generalization problem for Boolean
functions (e.g., Anil et al. 2022); (2) we introduce a curriculum learning
algorithm called Degree-Curriculum that learns monomials more efficiently by
incrementing supports. Finally, we discuss extensions to other models or
non-sparse regimes where the min-degree bias may still occur or fade, as well
as how it can be potentially corrected when undesirable.
[COMMENTS]
extended JMLR version of the original ICML 2023 paper
[LINK]
http://arxiv.org/abs/2301.13105v3
[DATE]
2024-11-21 01:16:01+08:00
[CATEGORIES]
cs.LG
Safe Exploitative Play with Untrusted Type Beliefs
[AUTHORS]
Tongxin Li, Tinashe Handina, Shaolei Ren, Adam Wierman
[ABSTRACT]
The combination of the Bayesian game and learning has a rich history, with
the idea of controlling a single agent in a system composed of multiple agents
with unknown behaviors given a set of types, each specifying a possible
behavior for the other agents. The idea is to plan an agent’s own actions with
respect to those types which it believes are most likely to maximize the
payoff. However, the type beliefs are often learned from past actions and
likely to be incorrect. With this perspective in mind, we consider an agent in
a game with type predictions of other components, and investigate the impact of
incorrect beliefs to the agent’s payoff. In particular, we formally define a
tradeoff between risk and opportunity by comparing the payoff obtained against
the optimal payoff, which is represented by a gap caused by trusting or
distrusting the learned beliefs. Our main results characterize the tradeoff by
establishing upper and lower bounds on the Pareto front for both normal-form
and stochastic Bayesian games, with numerical results provided.
[COMMENTS]
26 pages, NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.07679v2
[DATE]
2024-11-21 01:11:21+08:00
[CATEGORIES]
cs.LG
Sampling and Integration of Logconcave Functions by Algorithmic Diffusion
[AUTHORS]
Yunbum Kook, Santosh S. Vempala
[ABSTRACT]
We study the complexity of sampling, rounding, and integrating arbitrary
logconcave functions. Our new approach provides the first complexity
improvements in nearly two decades for general logconcave functions for all
three problems, and matches the best-known complexities for the special case of
uniform distributions on convex bodies. For the sampling problem, our output
guarantees are significantly stronger than previously known, and lead to a
streamlined analysis of statistical estimation based on dependent random
samples.
[COMMENTS]
60 pages, 1 figure
[LINK]
http://arxiv.org/abs/2411.13462v1
[DATE]
2024-11-21 01:10:24+08:00
[CATEGORIES]
cs.LG
CODES: Benchmarking Coupled ODE Surrogates
[AUTHORS]
Robin Janssen, Immanuel Sulzer, Tobias Buck
[ABSTRACT]
We introduce CODES, a benchmark for comprehensive evaluation of surrogate
architectures for coupled ODE systems. Besides standard metrics like mean
squared error (MSE) and inference time, CODES provides insights into surrogate
behaviour across multiple dimensions like interpolation, extrapolation, sparse
data, uncertainty quantification and gradient correlation. The benchmark
emphasizes usability through features such as integrated parallel training, a
web-based configuration generator, and pre-implemented baseline models and
datasets. Extensive documentation ensures sustainability and provides the
foundation for collaborative improvement. By offering a fair and multi-faceted
comparison, CODES helps researchers select the most suitable surrogate for
their specific dataset and application while deepening our understanding of
surrogate learning behaviour.
[COMMENTS]
13 pages, 10 figures, accepted for the Machine Learning and the
Physical Sciences workshop at NeurIPS 2024, source code available on GitHub
at https://github.com/robin-janssen/CODES-Benchmark
[LINK]
http://arxiv.org/abs/2410.20886v2
[DATE]
2024-11-21 00:47:44+08:00
[CATEGORIES]
cs.LG
SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers
[AUTHORS]
Hojjat Karami, David Atienza, Anisoara Ionescu
[ABSTRACT]
Generating synthetic Electronic Health Records (EHRs) offers significant
potential for data augmentation, privacy-preserving data sharing, and improving
machine learning model training. We propose a novel tokenization strategy
tailored for structured EHR data, which encompasses diverse data types such as
covariates, ICD codes, and irregularly sampled time series. Using a GPT-like
decoder-only transformer model, we demonstrate the generation of high-quality
synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we
benchmark the fidelity, utility, and privacy of the generated data against
state-of-the-art models.
[LINK]
http://arxiv.org/abs/2411.13428v1
[DATE]
2024-11-21 00:11:20+08:00
[CATEGORIES]
cs.LG
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
[AUTHORS]
Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar Gulcehre
[ABSTRACT]
Reinforcement learning (RL) is inherently rife with non-stationarity since
the states and rewards the agent observes during training depend on its
changing policy. Therefore, networks in deep RL must be capable of adapting to
new observations and fitting new targets. However, previous works have observed
that networks trained under non-stationarity exhibit an inability to continue
learning, termed loss of plasticity, and eventually a collapse in performance.
For off-policy deep value-based RL methods, this phenomenon has been correlated
with a decrease in representation rank and the ability to fit random targets,
termed capacity loss. Although this correlation has generally been attributed
to neural network learning under non-stationarity, the connection to
representation dynamics has not been carefully studied in on-policy policy
optimization methods. In this work, we empirically study representation
dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo
environments, revealing that PPO agents are also affected by feature rank
deterioration and capacity loss. We show that this is aggravated by stronger
non-stationarity, ultimately driving the actor’s performance to collapse,
regardless of the performance of the critic. We ask why the trust region,
specific to methods like PPO, cannot alleviate or prevent the collapse and find
a connection between representation collapse and the degradation of the trust
region, one exacerbating the other. Finally, we present Proximal Feature
Optimization (PFO), a novel auxiliary loss that, along with other
interventions, shows that regularizing the representation dynamics mitigates
the performance collapse of PPO agents.
[COMMENTS]
NeurIPS2024 version. Code and run histories are available at
https://github.com/CLAIRE-Labo/no-representation-no-trust
[LINK]
http://arxiv.org/abs/2405.00662v3
[DATE]
2024-11-21 00:07:04+08:00
[CATEGORIES]
cs.LG
Heuristically Adaptive Diffusion-Model Evolutionary Strategy
[AUTHORS]
Benedikt Hartl, Yanbo Zhang, Hananel Hazan, Michael Levin
[ABSTRACT]
Diffusion Models represent a significant advancement in generative modeling,
employing a dual-phase process that first degrades domain-specific information
via Gaussian noise and restores it through a trainable model. This framework
enables pure noise-to-data generation and modular reconstruction of, images or
videos. Concurrently, evolutionary algorithms employ optimization methods
inspired by biological principles to refine sets of numerical parameters
encoding potential solutions to rugged objective functions. Our research
reveals a fundamental connection between diffusion models and evolutionary
algorithms through their shared underlying generative mechanisms: both methods
generate high-quality samples via iterative refinement on random initial
distributions. By employing deep learning-based diffusion models as generative
models across diverse evolutionary tasks and iteratively refining diffusion
models with heuristically acquired databases, we can iteratively sample
potentially better-adapted offspring parameters, integrating them into
successive generations of the diffusion model. This approach achieves efficient
convergence toward high-fitness parameters while maintaining explorative
diversity. Diffusion models introduce enhanced memory capabilities into
evolutionary algorithms, retaining historical information across generations
and leveraging subtle data correlations to generate refined samples. We elevate
evolutionary algorithms from procedures with shallow heuristics to frameworks
with deep memory. By deploying classifier-free guidance for conditional
sampling at the parameter level, we achieve precise control over evolutionary
search dynamics to further specific genotypical, phenotypical, or
population-wide traits. Our framework marks a major heuristic and algorithmic
transition, offering increased flexibility, precision, and control in
evolutionary optimization processes.
[LINK]
http://arxiv.org/abs/2411.13420v1
[DATE]
2024-11-21 00:06:28+08:00
[CATEGORIES]
cs.LG
Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology
[AUTHORS]
Muhammad Sharif, Jiangyan Yi, Muhammad Shoaib
[ABSTRACT]
The language called Balti belongs to the Sino-Tibetan, specifically the
Tibeto-Burman language family. It is understood with variations, across
populations in India, China, Pakistan, Nepal, Tibet, Burma, and Bhutan,
influenced by local cultures and producing various dialects. Considering the
diverse cultural, socio-political, religious, and geographical impacts, it is
important to step forward unifying the dialects, the basis of common root,
lexica, and phonological perspectives, is vital. In the era of globalization
and the increasingly frequent developments in AI technology, understanding the
diversity and the efforts of dialect unification is important to understanding
commonalities and shortening the gaps impacted by unavoidable circumstances.
This article analyzes and examines how artificial intelligence AI in the
essence of Large Language Models LLMs, can assist in analyzing, documenting,
and standardizing the endangered Balti Language, based on the efforts made in
different dialects so far.
[COMMENTS]
Accepted by IEEE conference ISCSLP 2024
[LINK]
http://arxiv.org/abs/2411.13409v1
[DATE]
2024-11-20 23:48:21+08:00
[CATEGORIES]
cs.CL
Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese
[AUTHORS]
Dat Van-Thanh Nguyen, Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
[LINK]
http://arxiv.org/abs/2411.13407v1
[DATE]
2024-11-20 23:46:48+08:00
[CATEGORIES]
cs.CL
On the Way to LLM Personalization: Learning to Remember User Conversations
[AUTHORS]
Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, Maartje ter Hoeve
[ABSTRACT]
Large Language Models (LLMs) have quickly become an invaluable assistant for
a variety of tasks. However, their effectiveness is constrained by their
ability to tailor responses to human preferences and behaviors via
personalization. Prior work in LLM personalization has largely focused on style
transfer or incorporating small factoids about the user, as knowledge injection
remains an open challenge. In this paper, we explore injecting knowledge of
prior conversations into LLMs to enable future work on less redundant,
personalized conversations. We identify two real-world constraints: (1)
conversations are sequential in time and must be treated as such during
training, and (2) per-user personalization is only viable in
parameter-efficient settings. To this aim, we propose PLUM, a pipeline
performing data augmentation for up-sampling conversations as question-answer
pairs, that are then used to finetune a low-rank adaptation adapter with a
weighted cross entropy loss. Even in this first exploration of the problem, we
perform competitively with baselines such as RAG, attaining an accuracy of
81.5% across 100 conversations.
[COMMENTS]
16 pages, 6 tables, 3 figures
[LINK]
http://arxiv.org/abs/2411.13405v1
[DATE]
2024-11-20 23:45:08+08:00
[CATEGORIES]
cs.CL
cs.LG
When Context Leads but Parametric Memory Follows in Large Language Models
[AUTHORS]
Yufei Tao, Adam Hiatt, Erik Haake, Antonie J. Jetter, Ameeta Agrawal
[COMMENTS]
Accepted by EMNLP 2024 Main Conference
[LINK]
http://arxiv.org/abs/2409.08435v3
[DATE]
2024-11-20 23:41:38+08:00
[CATEGORIES]
cs.CL
Executable QR codes with Machine Learning for Industrial Applications
[AUTHORS]
Stefano Scanzio, Francesco Velluto, Matteo Rosani, Lukasz Wisniewski, Gianluca Cena
[ABSTRACT]
Executable QR codes, also known as eQR codes or just sQRy, are a special kind
of QR codes that embed programs conceived to run on mobile devices like
smartphones. Since the program is directly encoded in binary form within the QR
code, it can be executed even when the reading device is not provided with
Internet access. The applications of this technology are manifold, and range
from smart user guides to advisory systems. The first programming language made
available for eQR is QRtree, which enables the implementation of decision trees
aimed, for example, at guiding the user in operating/maintaining a complex
machinery or for reaching a specific location.
In this work, an additional language is proposed, we term QRind, which was
specifically devised for Industry. It permits to integrate distinct
computational blocks into the QR code, e.g., machine learning models to enable
predictive maintenance and algorithms to ease machinery usage. QRind permits
the Industry 4.0/5.0 paradigms to be implemented, in part, also in those cases
where Internet is unavailable.
[COMMENTS]
preprint, 4 pages, 2024
[LINK]
http://arxiv.org/abs/2411.13400v1
[DATE]
2024-11-20 23:38:33+08:00
[CATEGORIES]
cs.CL
Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation
[AUTHORS]
Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang
[ABSTRACT]
Language Models (LMs) have become widely used in software engineering,
especially for tasks such as code generation, where they are referred to as
code LMs. These models have proven effective in generating code, making it
easier for developers to automate coding activities. However, research has
highlighted a significant limitation: despite their effectiveness, LMs often
produce code that is incorrect, buggy, or not fully functional. Updating these
models with limited data can be prohibitively challenging, yet it is essential
to maximize their utility. This may require hot-fix techniques (updating models
with limited data) to resolve. In this paper, we propose \ul{M}odel
\ul{I}mprovement via \ul{N}euron \ul{T}argeting (\textsc{MINT}), a novel
approach for repairing code LMs. MINT leverages the semantic property of
language models to perform neuron-level repairs in a novel way. Further, by
analyzing the relationships between the model’s latent representations, the
incorrect outputs, and the desired outputs, \textsc{MINT} determines which
neurons are worth updating. This approach ensures that only the neurons crucial
to the model’s failure are targeted, avoiding unnecessary changes and allowing
for a more efficient and precise repair process. \textsc{MINT} is effective,
efficient, and reliable, capable of correcting a neural model by patching a
minimum number of neurons (usually one or two neurons). Our approach is
evaluated on three coding tasks: line-level code generation, shellcode
generation, and intent-to-bash translation. The experimental results
demonstrate that the proposed approach significantly outperforms the
state-of-the-art in both effectiveness and efficiency measures. In addition, we
analyze and discuss the side effects of model repair techniques, including the
balance between generalization and specificity, and the performance after
multiple repairs in succession.
[COMMENTS]
13 pages, 7 figures, 7 tables, under peer-review
[LINK]
http://arxiv.org/abs/2312.05356v5
[DATE]
2024-11-20 22:22:06+08:00
[CATEGORIES]
cs.CL
cs.LG
Fact-Level Confidence Calibration and Self-Correction
[AUTHORS]
Yige Yuan, Bingbing Xu, Hexiang Tan, Fei Sun, Teng Xiao, Wei Li, Huawei Shen, Xueqi Cheng
[ABSTRACT]
Confidence calibration in LLMs, i.e., aligning their self-assessed confidence
with the actual accuracy of their responses, enabling them to self-evaluate the
correctness of their outputs. However, current calibration methods for LLMs
typically estimate two scalars to represent overall response confidence and
correctness, which is inadequate for long-form generation where the response
includes multiple atomic facts and may be partially confident and correct.
These methods also overlook the relevance of each fact to the query. To address
these challenges, we propose a Fact-Level Calibration framework that operates
at a finer granularity, calibrating confidence to relevance-weighted
correctness at the fact level. Furthermore, comprehensive analysis under the
framework inspired the development of Confidence-Guided Fact-level
Self-Correction ($\textbf{ConFix}$), which uses high-confidence facts within a
response as additional knowledge to improve low-confidence ones. Extensive
experiments across four datasets and six models demonstrate that ConFix
effectively mitigates hallucinations without requiring external knowledge
sources such as retrieval systems.
[COMMENTS]
Code is available at https://github.com/yuanyige/fact-calibration
[LINK]
http://arxiv.org/abs/2411.13343v1
[DATE]
2024-11-20 22:15:18+08:00
[CATEGORIES]
cs.CL
Combining Autoregressive and Autoencoder Language Models for Text Classification
[AUTHORS]
João Gonçalves
[ABSTRACT]
This paper presents CAALM-TC (Combining Autoregressive and Autoencoder
Language Models for Text Classification), a novel method that enhances text
classification by integrating autoregressive and autoencoder language models.
Autoregressive large language models such as Open AI’s GPT, Meta’s Llama or
Microsoft’s Phi offer promising prospects for content analysis practitioners,
but they generally underperform supervised BERT based models for text
classification. CAALM leverages autoregressive models to generate contextual
information based on input texts, which is then combined with the original text
and fed into an autoencoder model for classification. This hybrid approach
capitalizes on the extensive contextual knowledge of autoregressive models and
the efficient classification capabilities of autoencoders. Experimental results
on four benchmark datasets demonstrate that CAALM consistently outperforms
existing methods, particularly in tasks with smaller datasets and more abstract
classification objectives. The findings indicate that CAALM offers a scalable
and effective solution for automated content analysis in social science
research that minimizes sample size requirements.
[LINK]
http://arxiv.org/abs/2411.13282v1
[DATE]
2024-11-20 20:49:42+08:00
[CATEGORIES]
cs.CL
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
[AUTHORS]
Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu
[ABSTRACT]
In this paper, we focus on monolithic Multimodal Large Language Models
(MLLMs) that integrate visual encoding and language decoding into a single LLM.
In particular, we identify that existing pre-training strategies for monolithic
MLLMs often suffer from unstable optimization or catastrophic forgetting. To
address this issue, our core idea is to embed a new visual parameter space into
a pre-trained LLM, thereby stably learning visual knowledge from noisy data
while freezing the LLM. Based on this principle, we present Mono-InternVL, a
novel monolithic MLLM that seamlessly integrates a set of visual experts via a
multimodal mixture-of-experts structure. Moreover, we propose an innovative
pre-training strategy to maximize the visual capability of Mono-InternVL,
namely Endogenous Visual Pre-training (EViP). In particular, EViP is designed
as a progressive learning process for visual experts, which aims to fully
exploit the visual knowledge from noisy data to high-quality data. To validate
our approach, we conduct extensive experiments on 16 benchmarks. Experimental
results confirm the superior performance of Mono-InternVL than existing
monolithic MLLMs on 13 of 16 multimodal benchmarks, e.g., +80 points over Emu3
on OCRBench. Compared to the modular baseline, i.e., InternVL-1.5,
Mono-InternVL still retains comparable multimodal performance while reducing up
to 67% first token latency. Code and model are released at
https://huggingface.co/OpenGVLab/Mono-InternVL-2B.
[LINK]
http://arxiv.org/abs/2410.08202v2
[DATE]
2024-11-20 20:15:08+08:00
[CATEGORIES]
cs.CL
Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL
[AUTHORS]
Zhibo Chu, Zichong Wang, Qitao Qin
[ABSTRACT]
Large Language Models (LLMs) exhibit impressive problem-solving skills across
many tasks, but they still underperform compared to humans in various
downstream applications, such as text-to-SQL. On the BIRD benchmark
leaderboard, human performance achieves an accuracy of 92.96\%, whereas the
top-performing method reaches only 72.39\%. Notably, these state-of-the-art
(SoTA) methods predominantly rely on in-context learning to simulate human-like
reasoning. However, they overlook a critical human skill: continual learning.
Inspired by the educational practice of maintaining mistake notebooks during
our formative years, we propose LPE-SQL (Leveraging Prior Experience: An
Expandable Auxiliary Knowledge Base for Text-to-SQL), a novel framework
designed to augment LLMs by enabling continual learning without requiring
parameter fine-tuning. LPE-SQL consists of four modules that \textbf{i)}
retrieve relevant entries, \textbf{ii)} efficient sql generation, \textbf{iii)}
generate the final result through a cross-consistency mechanism and
\textbf{iv)} log successful and failed tasks along with their reasoning
processes or reflection-generated tips. Importantly, the core module of LPE-SQL
is the fourth one, while the other modules employ foundational methods,
allowing LPE-SQL to be easily integrated with SoTA technologies to further
enhance performance. Our experimental results demonstrate that this continual
learning approach yields substantial performance gains, with the smaller
Llama-3.1-70B model with surpassing the performance of the larger
Llama-3.1-405B model using SoTA methods.
[LINK]
http://arxiv.org/abs/2411.13244v1
[DATE]
2024-11-20 20:03:17+08:00
[CATEGORIES]
cs.CL
BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation Framework
[AUTHORS]
Xu Zou
[ABSTRACT]
Recently, generative pre-trained models have made significant strides,
particularly highlighted by the release of ChatGPT and GPT-4, which exhibit
superior cross-domain capabilities. However, these models still face challenges
on constrained writing tasks like poem generation under open-domain titles. In
response to this challenge, we introduce Block Inverse Prompting (BIPro)
constrained generation framework. BIPro leverages two block inverse prompting
methods, revise and rewrite, that mimic the process of human text writing using
block generative models. It significantly improves the zero-shot generation
quality on the formidable constrained generation task of open-domain
traditional-form Chinese poem generation. Based on a less powerful block
generative model GLM-10B-Chinese, poems composed via BIPro without priming or
additional training outperform both most advanced direct generative systems
like GPT-4 or GLM-4 and best domain-specific systems such as Yusheng,
Shisanbai, or Baidu Poetry Helper in human evaluation by proficient poets.
Finally, BIPro considerably narrows the gap between AI-generated works and
short-listed human literary arts in another human evaluation, unveiling the
promising potential of block generative models in improving the quality of
constrained generation.
[LINK]
http://arxiv.org/abs/2411.13237v1
[DATE]
2024-11-20 19:56:56+08:00
[CATEGORIES]
cs.CL
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs
[AUTHORS]
Zhuofeng Li, Zixing Gou, Xiangnan Zhang, Zhongyuan Liu, Sirui Li, Yuntong Hu, Chen Ling, Zheng Zhang, Liang Zhao
[ABSTRACT]
Text-Attributed Graphs (TAGs) augment graph structures with natural language
descriptions, facilitating detailed depictions of data and their
interconnections across various real-world settings. However, existing TAG
datasets predominantly feature textual information only at the nodes, with
edges typically represented by mere binary or categorical attributes. This lack
of rich textual edge annotations significantly limits the exploration of
contextual relationships between entities, hindering deeper insights into
graph-structured data. To address this gap, we introduce Textual-Edge Graphs
Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of
benchmark textual-edge datasets featuring rich textual descriptions on nodes
and edges. The TEG-DB datasets are large-scale and encompass a wide range of
domains, from citation networks to social networks. In addition, we conduct
extensive benchmark experiments on TEG-DB to assess the extent to which current
techniques, including pre-trained language models, graph neural networks, and
their combinations, can utilize textual node and edge information. Our goal is
to elicit advancements in textual-edge graph research, specifically in
developing methodologies that exploit rich textual node and edge descriptions
to enhance graph analysis and provide deeper insights into complex real-world
networks. The entire TEG-DB project is publicly accessible as an open-source
repository on Github, accessible at
https://github.com/Zhuofeng-Li/TEG-Benchmark.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.10310v2
[DATE]
2024-11-20 19:47:58+08:00
[CATEGORIES]
cs.CL
AIDBench: A benchmark for evaluating the authorship identification capability of large language models
[AUTHORS]
Zichen Wen, Dadi Guo, Huishuai Zhang
[ABSTRACT]
As large language models (LLMs) rapidly advance and integrate into daily
life, the privacy risks they pose are attracting increasing attention. We focus
on a specific privacy risk where LLMs may help identify the authorship of
anonymous texts, which challenges the effectiveness of anonymity in real-world
systems such as anonymous peer review systems. To investigate these risks, we
present AIDBench, a new benchmark that incorporates several author
identification datasets, including emails, blogs, reviews, articles, and
research papers. AIDBench utilizes two evaluation methods: one-to-one
authorship identification, which determines whether two texts are from the same
author; and one-to-many authorship identification, which, given a query text
and a list of candidate texts, identifies the candidate most likely written by
the same author as the query text. We also introduce a Retrieval-Augmented
Generation (RAG)-based method to enhance the large-scale authorship
identification capabilities of LLMs, particularly when input lengths exceed the
models’ context windows, thereby establishing a new baseline for authorship
identification using LLMs. Our experiments with AIDBench demonstrate that LLMs
can correctly guess authorship at rates well above random chance, revealing new
privacy risks posed by these powerful models. The source code and data will be
made publicly available after acceptance.
[COMMENTS]
21 pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.13226v1
[DATE]
2024-11-20 19:41:08+08:00
[CATEGORIES]
cs.CL
Neon: News Entity-Interaction Extraction for Enhanced Question Answering
[AUTHORS]
Sneha Singhania, Silviu Cucerzan, Allen Herring, Sujay Kumar Jauhar
[ABSTRACT]
Capturing fresh information in near real-time and using it to augment
existing large language models (LLMs) is essential to generate up-to-date,
grounded, and reliable output. This problem becomes particularly challenging
when LLMs are used for informational tasks in rapidly evolving fields, such as
Web search related to recent or unfolding events involving entities, where
generating temporally relevant responses requires access to up-to-the-hour news
sources. However, the information modeled by the parametric memory of LLMs is
often outdated, and Web results from prototypical retrieval systems may fail to
capture the latest relevant information and struggle to handle conflicting
reports in evolving news. To address this challenge, we present the NEON
framework, designed to extract emerging entity interactions – such as events
or activities – as described in news articles. NEON constructs an
entity-centric timestamped knowledge graph that captures such interactions,
thereby facilitating enhanced QA capabilities related to news events. Our
framework innovates by integrating open Information Extraction (openIE) style
tuples into LLMs to enable in-context retrieval-augmented generation. This
integration demonstrates substantial improvements in QA performance when
tackling temporal, entity-centric search queries. Through NEON, LLMs can
deliver more accurate, reliable, and up-to-date responses.
[LINK]
http://arxiv.org/abs/2411.12449v2
[DATE]
2024-11-20 18:06:05+08:00
[CATEGORIES]
cs.CL
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
[AUTHORS]
Hyun Ryu, Eric Kim
[ABSTRACT]
Efficient inference in large language models (LLMs) has become a critical
focus as their scale and complexity grow. Traditional autoregressive decoding,
while effective, suffers from computational inefficiencies due to its
sequential token generation process. Speculative decoding addresses this
bottleneck by introducing a two-stage framework: drafting and verification. A
smaller, efficient model generates a preliminary draft, which is then refined
by a larger, more sophisticated model. This paper provides a comprehensive
survey of speculative decoding methods, categorizing them into draft-centric
and model-centric approaches. We discuss key ideas associated with each method,
highlighting their potential for scaling LLM inference. This survey aims to
guide future research in optimizing speculative decoding and its integration
into real-world LLM applications.
[LINK]
http://arxiv.org/abs/2411.13157v1
[DATE]
2024-11-20 17:46:30+08:00
[CATEGORIES]
cs.CL
cs.LG
Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control
[AUTHORS]
Yunkee Chae, Eunsik Shin, Hwang Suntae, Seungryeol Paik, Kyogu Lee
[ABSTRACT]
Lyrics generation presents unique challenges, particularly in achieving
precise syllable control while adhering to song form structures such as verses
and choruses. Conventional line-by-line approaches often lead to unnatural
phrasing, underscoring the need for more granular syllable management. We
propose a framework for lyrics generation that enables multi-level syllable
control at the word, phrase, line, and paragraph levels, aware of song form.
Our approach generates complete lyrics conditioned on input text and song form,
ensuring alignment with specified syllable constraints. Generated lyrics
samples are available at: https://tinyurl.com/lyrics9999
[LINK]
http://arxiv.org/abs/2411.13100v1
[DATE]
2024-11-20 15:57:58+08:00
[CATEGORIES]
cs.CL
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
[AUTHORS]
Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun
[ABSTRACT]
Fine-tuning is a crucial process for adapting large language models (LLMs) to
diverse applications. In certain scenarios, such as multi-tenant serving,
deploying multiple LLMs becomes necessary to meet complex demands. Recent
studies suggest decomposing a fine-tuned LLM into a base model and
corresponding delta weights, which are then compressed using low-rank or
low-bit approaches to reduce costs. In this work, we observe that existing
low-rank and low-bit compression methods can significantly harm the model
performance for task-specific fine-tuned LLMs (e.g., WizardMath for math
problems). Motivated by the long-tail distribution of singular values in the
delta weights, we propose a delta quantization approach using mixed-precision.
This method employs higher-bit representation for singular vectors
corresponding to larger singular values. We evaluate our approach on various
fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs.
Experimental results demonstrate that our approach performs comparably to full
fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a
considerable margin. Additionally, we show that our method is compatible with
various backbone LLMs, such as Llama-2, Llama-3, and Mistral, highlighting its
generalizability.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.08903v2
[DATE]
2024-11-20 15:42:38+08:00
[CATEGORIES]
cs.CL
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Enhanced Code Generation
[AUTHORS]
Bin Xu, Yiguan Lin, Yinghao Li, Yang Gao
[ABSTRACT]
Large language models demonstrate exceptional performance in simple code
generation tasks but still face challenges in tackling complex problems. These
challenges may stem from insufficient reasoning and problem decomposition
capabilities. To address this issue, we propose a reasoning-augmented data
generation process, SRA-MCTS, which guides the model to autonomously generate
high-quality intermediate reasoning paths. This creates a positive feedback
loop, enabling continuous improvement. Our method operates entirely through the
model itself without requiring additional supervision. By synthesizing natural
language reasoning paths and translating them into executable code, the
approach ensures analytical accuracy and enhances the success rate in solving
complex tasks. Experimental results show that, even without additional
supervisory signals, our method achieves performance improvements across
different model scales, demonstrating the significant potential of
self-improvement in small models. Furthermore, the method remains robust when
traditional Chain-of-Thought (CoT) approaches exhibit performance degradation,
with notable improvements observed in diversity metrics such as pass@10. We
encourage further exploration of reasoning processes within training data to
enhance the ability of language models to address complex problems.
[LINK]
http://arxiv.org/abs/2411.11053v2
[DATE]
2024-11-20 15:34:47+08:00
[CATEGORIES]
cs.CL
Patience Is The Key to Large Language Model Reasoning
[AUTHORS]
Yijiong Yu
[ABSTRACT]
Recent advancements in the field of large language models, particularly
through the Chain of Thought (CoT) approach, have demonstrated significant
improvements in solving complex problems. However, existing models either tend
to sacrifice detailed reasoning for brevity due to user preferences, or require
extensive and expensive training data to learn complicated reasoning ability,
limiting their potential in solving complex tasks. To bridge this gap,
following the concept of scaling test-time, we propose a simple method by
encouraging models to adopt a more patient reasoning style without the need of
introducing new knowledge or skills. To employ a preference optimization
approach, we generate detailed reasoning processes as positive examples and
simple answers as negative examples, thereby training the model to favor
thoroughness in its responses. Our results demonstrate a performance increase
of up to 6.7% on GSM8k with training just on a lightweight dataset.
[COMMENTS]
The dataset and model are available at
https://huggingface.co/datasets/yuyijiong/patient-math-cot
[LINK]
http://arxiv.org/abs/2411.13082v1
[DATE]
2024-11-20 15:20:48+08:00
[CATEGORIES]
cs.CL
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
[AUTHORS]
Yang Cao
[ABSTRACT]
In this paper, we propose Singular Values and Orthonormal Regularized
Singular Vectors Adaptation, or SORSA, a novel PEFT method. Each SORSA adapter
consists of two main parts: trainable principal singular weights $W_p = U_p
\text{diag}(S_p) V^\top_p$, and frozen residual weights $W_r = U_r
\text{diag}(S_r) V^\top_r$. These parts are initialized by performing singular
value decomposition (SVD) on pre-trained weights. Moreover, we implement and
analyze an orthonormal regularizer, which we prove could decrease the condition
number of $W_p$ and make the optimization more efficient. SORSA adapters could
be merged during inference, thus eliminating any inference latency. We also
introduce a method to analyze the variation of the parameters by performing SVD
and discuss and analyze SORSA’s superiority in minimizing the alteration in the
SVD aspect. After all, SORSA shows a faster convergence than LoRA and PiSSA in
our experiments. On the GSM-8K benchmark, Llama 2 7B adapted using SORSA
achieved 56.03% accuracy, surpassing LoRA (42.30%), AdaLoRA (47.30%), Full FT
(49.05%), and PiSSA (53.07%). On the MATH benchmark, SORSA achieved 10.36%
accuracy, outperforming LoRA (5.50%), AdaLoRA (6.48%), Full FT (7.22%), and
PiSSA (7.44%). We conclude that SORSA offers a new perspective on
parameter-efficient fine-tuning, demonstrating remarkable performance.
[LINK]
http://arxiv.org/abs/2409.00055v5
[DATE]
2024-11-20 15:08:22+08:00
[CATEGORIES]
cs.LG
cs.CL
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning
[AUTHORS]
Gang Zhao, Ximing Zhang, Chenji Lu, Hui Zhao, Tianshu Wu, Pengjie Wang, Jian Xu, Bo Zheng
[ABSTRACT]
Effective query-item relevance modeling is pivotal for enhancing user
experience and safeguarding user satisfaction in e-commerce search systems.
Recently, benefiting from the vast inherent knowledge, Large Language Model
(LLM) approach demonstrates strong performance and long-tail generalization
ability compared with previous neural-based specialized relevance learning
methods. Though promising, current LLM-based methods encounter the following
inadequacies in practice: First, the massive parameters and computational
demands make it difficult to be deployed online. Second, distilling LLM models
to online models is a feasible direction, but the LLM relevance modeling is a
black box, and its rich intrinsic knowledge is difficult to extract and apply
online. To improve the interpretability of LLM and boost the performance of
online relevance models via LLM, we propose an Explainable LLM-driven
Multi-dimensional Distillation framework for e-commerce relevance learning,
which comprises two core components: (1) An Explainable LLM for relevance
modeling (ELLM-rele), which decomposes the relevance learning into intermediate
steps and models relevance learning as a Chain-of-Thought (CoT) reasoning,
thereby enhancing both interpretability and performance of LLM. (2) A
Multi-dimensional Knowledge Distillation (MKD) architecture that transfers the
knowledge of ELLM-rele to current deployable interaction-based and
representation-based student models from both the relevance score distribution
and CoT reasoning aspects. Through distilling the probabilistic and CoT
reasoning knowledge, MKD improves both the semantic interaction and long-tail
generalization abilities of student models. Extensive offline evaluations and
online experiments on Taobao search ad scene demonstrate that our proposed
framework significantly enhances e-commerce relevance learning performance and
user experience.
[COMMENTS]
Submitted to WWW 2025
[LINK]
http://arxiv.org/abs/2411.13045v1
[DATE]
2024-11-20 13:30:15+08:00
[CATEGORIES]
cs.CL
Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
[AUTHORS]
Ming Dong, Yujing Chen, Miao Zhang, Hao Sun, Tingting He
[COMMENTS]
This paper is accepted by Findings of the Association for
Computational Linguistics: ACL 2024
[LINK]
http://arxiv.org/abs/2403.08492v3
[DATE]
2024-11-20 12:00:39+08:00
[CATEGORIES]
cs.CL
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
[AUTHORS]
Ning Ding, Yehui Tang, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Heng Liao, Yunhe Wang
[ABSTRACT]
In order to reduce the computational complexity of large language models,
great efforts have been made to to improve the efficiency of transformer models
such as linear attention and flash-attention. However, the model size and
corresponding computational complexity are constantly scaled up in pursuit of
higher performance. In this work, we present MemoryFormer, a novel transformer
architecture which significantly reduces the computational complexity (FLOPs)
from a new perspective. We eliminate nearly all the computations of the
transformer model except for the necessary computation required by the
multi-head attention operation. This is made possible by utilizing an
alternative method for feature transformation to replace the linear projection
of fully-connected layers. Specifically, we first construct a group of
in-memory lookup tables that store a large amount of discrete vectors to
replace the weight matrix used in linear projection. We then use a hash
algorithm to retrieve a correlated subset of vectors dynamically based on the
input embedding. The retrieved vectors combined together will form the output
embedding, which provides an estimation of the result of matrix multiplication
operation in a fully-connected layer. Compared to conducting matrix
multiplication, retrieving data blocks from memory is a much cheaper operation
which requires little computations. We train MemoryFormer from scratch and
conduct extensive experiments on various benchmarks to demonstrate the
effectiveness of the proposed model.
[COMMENTS]
NeurIPS2024
[LINK]
http://arxiv.org/abs/2411.12992v1
[DATE]
2024-11-20 10:41:53+08:00
[CATEGORIES]
cs.CL
Training Bilingual LMs with Data Constraints in the Targeted Language
[AUTHORS]
Skyler Seto, Maartje ter Hoeve, He Bai, Natalie Schluter, David Grangier
[ABSTRACT]
Large language models are trained on massive scrapes of the web, as required
by current scaling laws. Most progress is made for English, given its abundance
of high-quality pretraining data. For most other languages, however, such high
quality pretraining data is unavailable. In this work, we study how to boost
pretrained model performance in a data constrained target language by enlisting
data from an auxiliary language for which high quality data is available. We
study this by quantifying the performance gap between training with data in a
data-rich auxiliary language compared with training in the target language,
exploring the benefits of translation systems, studying the limitations of
model scaling for data constrained languages, and proposing new methods for
upsampling data from the auxiliary language. Our results show that stronger
auxiliary datasets result in performance gains without modification to the
model or training objective for close languages, and, in particular, that
performance gains due to the development of more information-rich English
pretraining datasets can extend to targeted language settings with limited
data.
[COMMENTS]
22 pages, 14 figures, 15 tables
[LINK]
http://arxiv.org/abs/2411.12986v1
[DATE]
2024-11-20 10:27:40+08:00
[CATEGORIES]
cs.CL
cs.LG
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models
[AUTHORS]
Luohe Shi, Yao Yao, Zuchao Li, Lefei Zhang, Hai Zhao
[ABSTRACT]
Large language models (LLMs) have rapidly advanced and demonstrated
impressive capabilities. In-Context Learning (ICL) and Parameter-Efficient
Fine-Tuning (PEFT) are currently two mainstream methods for augmenting LLMs to
downstream tasks. ICL typically constructs a few-shot learning scenario, either
manually or by setting up a Retrieval-Augmented Generation (RAG) system,
helping models quickly grasp domain knowledge or question-answering patterns
without changing model parameters. However, this approach involves trade-offs,
such as slower inference speed and increased space occupancy. PEFT assists the
model in adapting to tasks through minimal parameter modifications, but the
training process still demands high hardware requirements, even with a small
number of parameters involved. To address these challenges, we propose
Reference Trustable Decoding (RTD), a paradigm that allows models to quickly
adapt to new tasks without fine-tuning, maintaining low inference costs. RTD
constructs a reference datastore from the provided training examples and
optimizes the LLM’s final vocabulary distribution by flexibly selecting
suitable references based on the input, resulting in more trustable responses
and enabling the model to adapt to downstream tasks at a low cost. Experimental
evaluations on various LLMs using different benchmarks demonstrate that RTD
establishes a new paradigm for augmenting models to downstream tasks.
Furthermore, our method exhibits strong orthogonality with traditional methods,
allowing for concurrent usage. Our code can be found at
https://github.com/ShiLuohe/ReferenceTrustableDecoding
[COMMENTS]
Accepted by the Thirty-Eighth Annual Conference on Neural Information
Processing Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2409.20181v2
[DATE]
2024-11-20 10:10:16+08:00
[CATEGORIES]
cs.CL
Keep the Cost Down: A Review on Methods to Optimize LLM’ s KV-Cache Consumption
[AUTHORS]
Luohe Shi, Hongyi Zhang, Yao Yao, Zuchao Li, Hai Zhao
[ABSTRACT]
Large Language Models (LLMs), epitomized by ChatGPT’s release in late 2022,
have revolutionized various industries with their advanced language
comprehension. However, their efficiency is challenged by the Transformer
architecture’s struggle with handling long texts. KV Cache has emerged as a
pivotal solution to this issue, converting the time complexity of token
generation from quadratic to linear, albeit with increased GPU memory overhead
proportional to conversation length. With the development of the LLM community
and academia, various KV Cache compression methods have been proposed. In this
review, we dissect the various properties of KV Cache and elaborate on various
methods currently used to optimize the KV Cache space usage of LLMs. These
methods span the pre-training phase, deployment phase, and inference phase, and
we summarize the commonalities and differences among these methods.
Additionally, we list some metrics for evaluating the long-text capabilities of
large language models, from both efficiency and capability perspectives. Our
review thus sheds light on the evolving landscape of LLM optimization, offering
insights into future advancements in this dynamic field. Links to the papers
mentioned in this review can be found in our Github Repo
https://github.com/zcli-charlie/Awesome-KV-Cache.
[COMMENTS]
Published on the First Conference on Language Modeling (COLM 2024)
[LINK]
http://arxiv.org/abs/2407.18003v4
[DATE]
2024-11-20 10:04:10+08:00
[CATEGORIES]
cs.CL
Demystifying Large Language Models for Medicine: A Primer
[AUTHORS]
Qiao Jin, Nicholas Wan, Robert Leaman, Shubo Tian, Zhizheng Wang, Yifan Yang, Zifeng Wang, Guangzhi Xiong, Po-Ting Lai, Qingqing Zhu, Benjamin Hou, Maame Sarfo-Gyamfi, Gongbo Zhang, Aidan Gilson, Balu Bhasuran, Zhe He, Aidong Zhang, Jimeng Sun, Chunhua Weng, Ronald M. Summers, Qingyu Chen, Yifan Peng, Zhiyong Lu
[ABSTRACT]
Large language models (LLMs) represent a transformative class of AI tools
capable of revolutionizing various aspects of healthcare by generating
human-like responses across diverse contexts and adapting to novel tasks
following human instructions. Their potential application spans a broad range
of medical tasks, such as clinical documentation, matching patients to clinical
trials, and answering medical questions. In this primer paper, we propose an
actionable guideline to help healthcare professionals more efficiently utilize
LLMs in their work, along with a set of best practices. This approach consists
of several main phases, including formulating the task, choosing LLMs, prompt
engineering, fine-tuning, and deployment. We start with the discussion of
critical considerations in identifying healthcare tasks that align with the
core capabilities of LLMs and selecting models based on the selected task and
data, performance requirements, and model interface. We then review the
strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs
to specialized medical tasks. Deployment considerations, including regulatory
compliance, ethical guidelines, and continuous monitoring for fairness and
bias, are also discussed. By providing a structured step-by-step methodology,
this tutorial aims to equip healthcare professionals with the tools necessary
to effectively integrate LLMs into clinical practice, ensuring that these
powerful technologies are applied in a safe, reliable, and impactful manner.
[COMMENTS]
Under review
[LINK]
http://arxiv.org/abs/2410.18856v3
[DATE]
2024-11-20 09:04:33+08:00
[CATEGORIES]
cs.CL
Literature Meets Data: A Synergistic Approach to Hypothesis Generation
[AUTHORS]
Haokun Liu, Yangqiaoyu Zhou, Mingxuan Li, Chenfei Yuan, Chenhao Tan
[ABSTRACT]
AI holds promise for transforming scientific processes, including hypothesis
generation. Prior work on hypothesis generation can be broadly categorized into
theory-driven and data-driven approaches. While both have proven effective in
generating novel and plausible hypotheses, it remains an open question whether
they can complement each other. To address this, we develop the first method
that combines literature-based insights with data to perform LLM-powered
hypothesis generation. We apply our method on five different datasets and
demonstrate that integrating literature and data outperforms other baselines
(8.97\% over few-shot, 15.75\% over literature-based alone, and 3.37\% over
data-driven alone). Additionally, we conduct the first human evaluation to
assess the utility of LLM-generated hypotheses in assisting human
decision-making on two challenging tasks: deception detection and AI generated
content detection. Our results show that human accuracy improves significantly
by 7.44\% and 14.19\% on these tasks, respectively. These findings suggest that
integrating literature-based and data-driven approaches provides a
comprehensive and nuanced framework for hypothesis generation and could open
new avenues for scientific inquiry.
[COMMENTS]
30 pages, 7 figures, code link:
https://github.com/ChicagoHAI/hypothesis-generation
[LINK]
http://arxiv.org/abs/2410.17309v2
[DATE]
2024-11-20 07:32:13+08:00
[CATEGORIES]
cs.CL
cs.LG
Loss-to-Loss Prediction: Scaling Laws for All Datasets
[AUTHORS]
David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach, Sham Kakade
[ABSTRACT]
While scaling laws provide a reliable methodology for predicting train loss
across compute scales for a single data distribution, less is known about how
these predictions should change as we change the distribution. In this paper,
we derive a strategy for predicting one loss from another and apply it to
predict across different pre-training datasets and from pre-training data to
downstream task data. Our predictions extrapolate well even at 20x the largest
FLOP budget used to fit the curves. More precisely, we find that there are
simple shifted power law relationships between (1) the train losses of two
models trained on two separate datasets when the models are paired by training
compute (train-to-train), (2) the train loss and the test loss on any
downstream distribution for a single model (train-to-test), and (3) the test
losses of two models trained on two separate train datasets (test-to-test). The
results hold up for pre-training datasets that differ substantially (some are
entirely code and others have no code at all) and across a variety of
downstream tasks. Finally, we find that in some settings these shifted power
law relationships can yield more accurate predictions than extrapolating
single-dataset scaling laws.
[LINK]
http://arxiv.org/abs/2411.12925v1
[DATE]
2024-11-20 07:23:16+08:00
[CATEGORIES]
cs.LG
cs.CL
Signformer is all you need: Towards Edge AI for Sign Language
[AUTHORS]
Eta Yang
[ABSTRACT]
Sign language translation, especially in gloss-free paradigm, is confronting
a dilemma of impracticality and unsustainability due to growing
resource-intensive methodologies. Contemporary state-of-the-arts (SOTAs) have
significantly hinged on pretrained sophiscated backbones such as Large Language
Models (LLMs), embedding sources, or extensive datasets, inducing considerable
parametric and computational inefficiency for sustainable use in real-world
scenario. Despite their success, following this research direction undermines
the overarching mission of this domain to create substantial value to bridge
hard-hearing and common populations. Committing to the prevailing trend of LLM
and Natural Language Processing (NLP) studies, we pursue a profound essential
change in architecture to achieve ground-up improvements without external aid
from pretrained models, prior knowledge transfer, or any NLP strategies
considered not-from-scratch.
Introducing Signformer, a from-scratch Feather-Giant transforming the area
towards Edge AI that redefines extremities of performance and efficiency with
LLM-competence and edgy-deployable compactness. In this paper, we present
nature analysis of sign languages to inform our algorithmic design and deliver
a scalable transformer pipeline with convolution and attention novelty. We
achieve new 2nd place on leaderboard with a parametric reduction of 467-1807x
against the finests as of 2024 and outcompete almost every other methods in a
lighter configuration of 0.57 million parameters.
[COMMENTS]
Official Code at: https://github.com/EtaEnding/Signformer/tree/main
[LINK]
http://arxiv.org/abs/2411.12901v1
[DATE]
2024-11-20 06:27:53+08:00
[CATEGORIES]
cs.CL
cs.LG
Selective Attention: Enhancing Transformer through Principled Context Control
[AUTHORS]
Xuechen Zhang, Xiangyu Chang, Mingchen Li, Amit Roy-Chowdhury, Jiasi Chen, Samet Oymak
[ABSTRACT]
The attention mechanism within the transformer architecture enables the model
to weigh and combine tokens based on their relevance to the query. While
self-attention has enjoyed major success, it notably treats all queries $q$ in
the same way by applying the mapping $V^\top\text{softmax}(Kq)$, where $V,K$
are the value and key embeddings respectively. In this work, we argue that this
uniform treatment hinders the ability to control contextual sparsity and
relevance. As a solution, we introduce the $\textit{Selective Self-Attention}$
(SSA) layer that augments the softmax nonlinearity with a principled
temperature scaling strategy. By controlling temperature, SSA adapts the
contextual sparsity of the attention map to the query embedding and its
position in the context window. Through theory and experiments, we demonstrate
that this alleviates attention dilution, aids the optimization process, and
enhances the model’s ability to control softmax spikiness of individual
queries. We also incorporate temperature scaling for value embeddings and show
that it boosts the model’s ability to suppress irrelevant/noisy tokens.
Notably, SSA is a lightweight method which introduces less than 0.5% new
parameters through a weight-sharing strategy and can be fine-tuned on existing
LLMs. Extensive empirical evaluations demonstrate that SSA-equipped models
achieve a noticeable and consistent accuracy improvement on language modeling
benchmarks.
[LINK]
http://arxiv.org/abs/2411.12892v1
[DATE]
2024-11-20 06:17:18+08:00
[CATEGORIES]
cs.LG
cs.CL
ProSec: Fortifying Code LLMs with Proactive Security Alignment
[AUTHORS]
Xiangzhe Xu, Zian Su, Jinyao Guo, Kaiyuan Zhang, Zhenting Wang, Xiangyu Zhang
[ABSTRACT]
Recent advances in code-specific large language models (LLMs) have greatly
enhanced code generation and refinement capabilities. However, the safety of
code LLMs remains under-explored, posing potential risks as insecure code
generated by these models may introduce vulnerabilities into real-world
systems. Previous work proposes to collect security-focused instruction-tuning
dataset from real-world vulnerabilities. It is constrained by the data sparsity
of vulnerable code, and has limited applicability in the iterative
post-training workflows of modern LLMs. In this paper, we propose ProSec, a
novel proactive security alignment approach designed to align code LLMs with
secure coding practices. ProSec systematically exposes the vulnerabilities in a
code LLM by synthesizing error-inducing coding scenarios from Common Weakness
Enumerations (CWEs), and generates fixes to vulnerable code snippets, allowing
the model to learn secure practices through advanced preference learning
objectives. The scenarios synthesized by ProSec triggers 25 times more
vulnerable code than a normal instruction-tuning dataset, resulting in a
security-focused alignment dataset 7 times larger than the previous work.
Experiments show that models trained with ProSec is 29.2% to 35.5% more secure
compared to previous work, with a marginal negative effect of less than 2
percentage points on model’s utility.
[COMMENTS]
The first two authors contributed equally to this work
[LINK]
http://arxiv.org/abs/2411.12882v1
[DATE]
2024-11-20 06:00:01+08:00
[CATEGORIES]
cs.CL
AzSLD: Azerbaijani Sign Language Dataset for Fingerspelling, Word, and Sentence Translation with Baseline Software
[AUTHORS]
Nigar Alishzade, Jamaladdin Hasanov
[ABSTRACT]
Sign language processing technology development relies on extensive and
reliable datasets, instructions, and ethical guidelines. We present a
comprehensive Azerbaijani Sign Language Dataset (AzSLD) collected from diverse
sign language users and linguistic parameters to facilitate advancements in
sign recognition and translation systems and support the local sign language
community. The dataset was created within the framework of a vision-based AzSL
translation project. This study introduces the dataset as a summary of the
fingerspelling alphabet and sentence- and word-level sign language datasets.
The dataset was collected from signers of different ages, genders, and signing
styles, with videos recorded from two camera angles to capture each sign in
full detail. This approach ensures robust training and evaluation of gesture
recognition models. AzSLD contains 30,000 videos, each carefully annotated with
accurate sign labels and corresponding linguistic translations. The dataset is
accompanied by technical documentation and source code to facilitate its use in
training and testing. This dataset offers a valuable resource of labeled data
for researchers and developers working on sign language recognition,
translation, or synthesis. Ethical guidelines were strictly followed throughout
the project, with all participants providing informed consent for collecting,
publishing, and using the data.
[LINK]
http://arxiv.org/abs/2411.12865v1
[DATE]
2024-11-20 05:15:47+08:00
[CATEGORIES]
cs.CL
A Benchmark for Long-Form Medical Question Answering
[AUTHORS]
Pedram Hosseini, Jessica M. Sin, Bing Ren, Bryceton G. Thomas, Elnaz Nouri, Ali Farahanchi, Saeed Hassanpour
[ABSTRACT]
There is a lack of benchmarks for evaluating large language models (LLMs) in
long-form medical question answering (QA). Most existing medical QA evaluation
benchmarks focus on automatic metrics and multiple-choice questions. While
valuable, these benchmarks fail to fully capture or assess the complexities of
real-world clinical applications where LLMs are being deployed. Furthermore,
existing studies on evaluating long-form answer generation in medical QA are
primarily closed-source, lacking access to human medical expert annotations,
which makes it difficult to reproduce results and enhance existing baselines.
In this work, we introduce a new publicly available benchmark featuring
real-world consumer medical questions with long-form answer evaluations
annotated by medical doctors. We performed pairwise comparisons of responses
from various open and closed-source medical and general-purpose LLMs based on
criteria such as correctness, helpfulness, harmfulness, and bias. Additionally,
we performed a comprehensive LLM-as-a-judge analysis to study the alignment
between human judgments and LLMs. Our preliminary results highlight the strong
potential of open LLMs in medical QA compared to leading closed models. Code &
Data: https://github.com/lavita-ai/medical-eval-sphere
[COMMENTS]
AIM-FM: Advancements in Medical Foundation Models Workshop, 38th
Conference on Neural Information Processing Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2411.09834v2
[DATE]
2024-11-20 05:04:38+08:00
[CATEGORIES]
cs.CL
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
[AUTHORS]
Daniel P. Jeong, Saurabh Garg, Zachary C. Lipton, Michael Oberst
[COMMENTS]
This version was published at EMNLP 2024 Main Conference as a Long
Paper (Oral). See the extended version (arXiv:2411.08870) for additional
results on QA tasks based on clinical notes and evaluations in the supervised
fine-tuning regime
[LINK]
http://arxiv.org/abs/2411.04118v2
[DATE]
2024-11-20 04:51:58+08:00
[CATEGORIES]
cs.CL
cs.LG
SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus
[AUTHORS]
Stephanie M. Lukin, Claire Bonial, Matthew Marge, Taylor Hudson, Cory J. Hayes, Kimberly A. Pollard, Anthony Baker, Ashley N. Foots, Ron Artstein, Felix Gervits, Mitchell Abrams, Cassidy Henry, Lucia Donatelli, Anton Leuski, Susan G. Hill, David Traum, Clare R. Voss
[ABSTRACT]
We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a
multi-modal collection of human-robot dialogue in the task domain of
collaborative exploration. The corpus was constructed from multiple
Wizard-of-Oz experiments where human participants gave verbal instructions to a
remotely-located robot to move and gather information about its surroundings.
SCOUT contains 89,056 utterances and 310,095 words from 278 dialogues averaging
320 utterances per dialogue. The dialogues are aligned with the multi-modal
data streams available during the experiments: 5,785 images and 30 maps. The
corpus has been annotated with Abstract Meaning Representation and Dialogue-AMR
to identify the speaker’s intent and meaning within an utterance, and with
Transactional Units and Relations to track relationships between utterances to
reveal patterns of the Dialogue Structure. We describe how the corpus and its
annotations have been used to develop autonomous human-robot systems and enable
research in open questions of how humans speak to robots. We release this
corpus to accelerate progress in autonomous, situated, human-robot dialogue,
especially in the context of navigation tasks where details about the
environment need to be discovered.
[COMMENTS]
14 pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.12844v1
[DATE]
2024-11-20 04:18:55+08:00
[CATEGORIES]
cs.CL
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
[AUTHORS]
Shang Liu, Yu Pan, Guanting Chen, Xiaocheng Li
[ABSTRACT]
Learning a reward model (RM) from human preferences has been an important
component in aligning large language models (LLMs). The canonical setup of
learning RMs from pairwise preference data is rooted in the classic
Bradley-Terry (BT) model that accepts binary feedback, i.e., the label being
either Response 1 is better than Response 2, or the opposite. Such a setup
inevitably discards potentially useful samples (such as “tied” between the two
responses) and loses more fine-grained information (such as “slightly better”).
In this paper, we propose a framework for learning RMs under ordinal feedback
which generalizes the case of binary preference feedback to any arbitrary
granularity. Specifically, we first identify a marginal unbiasedness condition,
which generalizes the assumption of the BT model in the existing binary
feedback setting. The condition validates itself via the sociological concept
of the wisdom of the crowd. Under the condition, we develop a natural
probability model for pairwise preference data under ordinal feedback and
analyze its properties. We prove the statistical benefits of ordinal feedback
in terms of reducing the Rademacher complexity compared to the case of binary
feedback. The proposed learning objective and the theory also extend to hinge
loss and direct policy optimization (DPO). In particular, the theoretical
analysis may be of independent interest when applying to a seemingly unrelated
problem of knowledge distillation to interpret the bias-variance trade-off
therein. The framework also sheds light on writing guidance for human
annotators. Our numerical experiments validate that fine-grained feedback leads
to better reward learning for both in-distribution and out-of-distribution
settings. Further experiments show that incorporating a certain proportion of
samples with tied preference boosts RM learning.
[LINK]
http://arxiv.org/abs/2411.12843v1
[DATE]
2024-11-20 04:17:04+08:00
[CATEGORIES]
cs.LG
cs.CL
On the Implicit Relation Between Low-Rank Adaptation and Differential Privacy
[AUTHORS]
Saber Malekmohammadi, Golnoosh Farnadi
[ABSTRACT]
A significant approach in natural language processing involves large-scale
pre-training models on general domain data followed by their adaptation to
specific tasks or domains. As models grow in size, full fine-tuning all of
their parameters becomes increasingly impractical. To address this, some
methods for low-rank task adaptation of language models have been proposed,
e.g., LoRA and FLoRA. These methods keep the pre-trained model weights fixed
and incorporate trainable low-rank decomposition matrices into some layers of
the transformer architecture, called adapters. This approach significantly
reduces the number of trainable parameters required for downstream tasks
compared to full fine-tuning all parameters. In this work, we look at low-rank
adaptation from the lens of data privacy. We show theoretically that the
low-rank adaptation used in LoRA and FLoRA is equivalent to injecting some
random noise into the batch gradients w.r.t the adapter parameters, and we
quantify the variance of the injected noise. By establishing a Berry-Esseen
type bound on the total variation distance between distribution of the injected
noise and a Gaussian distribution with the same variance, we show that the
dynamics of low-rank adaptation is close to that of differentially private
fine-tuning of the adapters. Finally, using Johnson-Lindenstrauss lemma, we
show that when augmented with gradient scaling, low-rank adaptation is very
close to performing DPSGD algorithm with a fixed noise scale to fine-tune the
adapters. These theoretical findings suggest that unlike other existing
fine-tuning algorithms, low-rank adaptation provides privacy w.r.t the
fine-tuning data implicitly.
[LINK]
http://arxiv.org/abs/2409.17538v3
[DATE]
2024-11-20 04:10:18+08:00
[CATEGORIES]
cs.LG
cs.CL
Human-Robot Dialogue Annotation for Multi-Modal Common Ground
[AUTHORS]
Claire Bonial, Stephanie M. Lukin, Mitchell Abrams, Anthony Baker, Lucia Donatelli, Ashley Foots, Cory J. Hayes, Cassidy Henry, Taylor Hudson, Matthew Marge, Kimberly A. Pollard, Ron Artstein, David Traum, Clare R. Voss
[ABSTRACT]
In this paper, we describe the development of symbolic representations
annotated on human-robot dialogue data to make dimensions of meaning accessible
to autonomous systems participating in collaborative, natural language
dialogue, and to enable common ground with human partners. A particular
challenge for establishing common ground arises in remote dialogue (occurring
in disaster relief or search-and-rescue tasks), where a human and robot are
engaged in a joint navigation and exploration task of an unfamiliar
environment, but where the robot cannot immediately share high quality visual
information due to limited communication constraints. Engaging in a dialogue
provides an effective way to communicate, while on-demand or lower-quality
visual information can be supplemented for establishing common ground. Within
this paradigm, we capture propositional semantics and the illocutionary force
of a single utterance within the dialogue through our Dialogue-AMR annotation,
an augmentation of Abstract Meaning Representation. We then capture patterns in
how different utterances within and across speaker floors relate to one another
in our development of a multi-floor Dialogue Structure annotation schema.
Finally, we begin to annotate and analyze the ways in which the visual
modalities provide contextual information to the dialogue for overcoming
disparities in the collaborators’ understanding of the environment. We conclude
by discussing the use-cases, architectures, and systems we have implemented
from our annotations that enable physical robots to autonomously engage with
humans in bi-directional dialogue and navigation.
[COMMENTS]
52 pages, 14 figures
[LINK]
http://arxiv.org/abs/2411.12829v1
[DATE]
2024-11-20 03:33:54+08:00
[CATEGORIES]
cs.CL
ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models
[AUTHORS]
Salma Kharrat, Fares Fourati, Marco Canini
[ABSTRACT]
The effectiveness of Large Language Models (LLMs) in solving tasks vastly
depends on the quality of the instructions, which often require fine-tuning
through extensive human effort. This highlights the need for automated
instruction optimization; however, this optimization is particularly
challenging when dealing with black-box LLMs, where model parameters and
gradients remain inaccessible. We propose ACING, a task-specific prompt
optimization approach framed as a stateless continuous-action Reinforcement
Learning (RL) problem, known as the continuum bandit setting. ACING leverages
an actor-critic-based method to optimize prompts, learning from
non-differentiable reward signals. We validate ACING by optimizing prompts for
ChatGPT on 30 instruction-based tasks. ACING consistently outperforms baseline
methods, achieving a median score improvement of 10 percentage points.
Furthermore, ACING not only recovers but also surpasses human-crafted expert
instructions, achieving up to a 39 percentage point improvement against human
benchmarks.
[LINK]
http://arxiv.org/abs/2411.12736v1
[DATE]
2024-11-20 02:58:03+08:00
[CATEGORIES]
cs.CL
cs.LG
Information Theory of Meaningful Communication
[AUTHORS]
Doron Sivan, Misha Tsodyks
[ABSTRACT]
In Shannon’s seminal paper, entropy of printed English, treated as a
stationary stochastic process, was estimated to be roughly 1 bit per character.
However, considered as a means of communication, language differs considerably
from its printed form: (i) the units of information are not characters or even
words but clauses, i.e. shortest meaningful parts of speech; and (ii) what is
transmitted is principally the meaning of what is being said or written, while
the precise phrasing that was used to communicate the meaning is typically
ignored. In this study, we show that one can leverage recently developed large
language models to quantify information communicated in meaningful narratives
in terms of bits of meaning per clause.
[LINK]
http://arxiv.org/abs/2411.12728v1
[DATE]
2024-11-20 02:51:23+08:00
[CATEGORIES]
cs.CL
Scaling laws for nonlinear dynamical models of speech
[AUTHORS]
Sam Kirkham
[ABSTRACT]
The addition of a nonlinear restoring force to dynamical models of the speech
gesture significantly improves the empirical accuracy of model predictions, but
nonlinearity introduces challenges in selecting appropriate parameters and
numerical stability, especially when modelling variation in empirical data. We
address this issue by introducing simple numerical methods for parameterization
of nonlinear task dynamic models. We first illustrate the problem and then
outline solutions in the form of power laws that scale nonlinear stiffness
terms. We apply the scaling laws to a cubic model and show how they facilitate
interpretable simulations of the nonlinear gestural dynamics underpinning
speech production.
[LINK]
http://arxiv.org/abs/2411.12720v1
[DATE]
2024-11-20 02:38:01+08:00
[CATEGORIES]
cs.CL
Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs
[AUTHORS]
Ahmed Akib Jawad Karim, Muhammad Zawad Mahmud, Samiha Islam, Aznur Azam
[ABSTRACT]
In this research, we explored the improvement in terms of multi-class disease
classification via pre-trained language models over Medical-Abstracts-TC-Corpus
that spans five medical conditions. We excluded non-cancer conditions and
examined four specific diseases. We assessed four LLMs, BioBERT, XLNet, and
BERT, as well as a novel base model (Last-BERT). BioBERT, which was pre-trained
on medical data, demonstrated superior performance in medical text
classification (97% accuracy). Surprisingly, XLNet followed closely (96%
accuracy), demonstrating its generalizability across domains even though it was
not pre-trained on medical data. LastBERT, a custom model based on the lighter
version of BERT, also proved competitive with 87.10% accuracy (just under
BERT’s 89.33%). Our findings confirm the importance of specialized models such
as BioBERT and also support impressions around more general solutions like
XLNet and well-tuned transformer architectures with fewer parameters (in this
case, LastBERT) in medical domain tasks.
[COMMENTS]
7 Pages, 4 tables and 11 figures. Under review in a IEEE conference
[LINK]
http://arxiv.org/abs/2411.12712v1
[DATE]
2024-11-20 02:27:25+08:00
[CATEGORIES]
cs.CL
Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT?
[AUTHORS]
Ahmed Akib Jawad Karim, Kazi Hafiz Md Asad, Aznur Azam
[ABSTRACT]
The rapid spread of misinformation, particularly through online platforms,
underscores the urgent need for reliable detection systems. This study explores
the utilization of machine learning and natural language processing,
specifically Support Vector Machines (SVM) and BERT, to detect news that are
fake. We employ three distinct text vectorization methods for SVM: Term
Frequency Inverse Document Frequency (TF-IDF), Word2Vec, and Bag of Words (BoW)
evaluating their effectiveness in distinguishing between genuine and fake news.
Additionally, we compare these methods against the transformer large language
model, BERT. Our comprehensive approach includes detailed preprocessing steps,
rigorous model implementation, and thorough evaluation to determine the most
effective techniques. The results demonstrate that while BERT achieves superior
accuracy with 99.98% and an F1-score of 0.9998, the SVM model with a linear
kernel and BoW vectorization also performs exceptionally well, achieving 99.81%
accuracy and an F1-score of 0.9980. These findings highlight that, despite
BERT’s superior performance, SVM models with BoW and TF-IDF vectorization
methods come remarkably close, offering highly competitive performance with the
advantage of lower computational requirements.
[COMMENTS]
6 pages, 3 tables and 6 Figures. Submitted to a conference
[LINK]
http://arxiv.org/abs/2411.12703v1
[DATE]
2024-11-20 02:15:46+08:00
[CATEGORIES]
cs.CL
Is Programming by Example solved by LLMs?
[AUTHORS]
Wen-Ding Li, Kevin Ellis
[ABSTRACT]
Programming-by-Examples (PBE) aims to generate an algorithm from input-output
examples. Such systems are practically and theoretically important: from an
end-user perspective, they are deployed to millions of people, and from an AI
perspective, PBE corresponds to a very general form of few-shot inductive
inference. Given the success of Large Language Models (LLMs) in code-generation
tasks, we investigate here the extent to which LLMs can be said to have
“solved” PBE. We experiment on classic domains such as lists and strings, and
an uncommon graphics programming domain not well represented in typical
pretraining data. We find that pretrained models are not effective at PBE, but
that they can be fine-tuned for much higher performance, provided the test
problems are in-distribution. We analyze empirically what causes these models
to succeed and fail, and take steps toward understanding how to achieve better
out-of-distribution generalization. Collectively these results suggest that
LLMs make strong progress toward solving the typical suite of PBE tasks,
potentially increasing the flexibility and applicability of PBE systems, while
also identifying ways in which LLMs still fall short.
[LINK]
http://arxiv.org/abs/2406.08316v3
[DATE]
2024-11-20 01:49:27+08:00
[CATEGORIES]
cs.CL
cs.LG
Enhanced Sign Language Translation between American Sign Language (ASL) and Indian Sign Language (ISL) Using LLMs
[AUTHORS]
Malay Kumar, S. Sarvajit Visagan, Tanish Sarang Mahajan, Anisha Natarajan
[ABSTRACT]
We have come up with a research that hopes to provide a bridge between the
users of American Sign Language and the users of spoken language and Indian
Sign Language (ISL). The research enabled us to create a novel framework that
we have developed for Learner Systems. Leveraging art of Large models to create
key features including: - Real-time translation between these two sign
languages in an efficient manner. Making LLM’s capability available for
seamless translations to ISL. Here is the full study showing its implementation
in this paper. The core of the system is a sophisticated pipeline that begins
with reclassification and recognition of ASL gestures based on a strong Random
Forest Classifier. By recognizing the ASL, it is translated into text which can
be more easily processed. Highly evolved natural language NLP (Natural Language
Processing) techniques come in handy as they play a role in our LLM integration
where you then use LLMs to be able to convert the ASL text to ISL which
provides you with the intent of sentence or phrase. The final step is to
synthesize the translated text back into ISL gestures, creating an end-to-end
translation experience using RIFE-Net. This framework is tasked with key
challenges such as automatically dealing with gesture variability and
overcoming the linguistic differences between ASL and ISL. By automating the
translation process, we hope to vastly improve accessibility for sign language
users. No longer will the communication gap between ASL and ISL create
barriers; this totally cool innovation aims to bring our communities closer
together. And we believe, with full confidence in our framework, that we’re
able to apply the same principles across a wide variety of sign language
dialects.
[LINK]
http://arxiv.org/abs/2411.12685v1
[DATE]
2024-11-20 01:45:12+08:00
[CATEGORIES]
cs.CL
Combining Induction and Transduction for Abstract Reasoning
[AUTHORS]
Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb Woo, Spencer M. Dunn, Hao Tang, Michelangelo Naim, Dat Nguyen, Wei-Long Zheng, Zenna Tavares, Yewen Pu, Kevin Ellis
[ABSTRACT]
When learning an input-output mapping from very few examples, is it better to
first infer a latent function that explains the examples, or is it better to
directly predict new test outputs, e.g. using a neural network? We study this
question on ARC, a highly diverse dataset of abstract reasoning tasks. We train
neural models for induction (inferring latent functions) and transduction
(directly predicting the test output for a given test input). Our models are
trained on synthetic data generated by prompting LLMs to produce Python code
specifying a function to be inferred, plus a stochastic subroutine for
generating inputs to that function. We find inductive and transductive models
solve very different problems, despite training on the same problems, and
despite sharing the same neural architecture.
[LINK]
http://arxiv.org/abs/2411.02272v3
[DATE]
2024-11-20 01:29:58+08:00
[CATEGORIES]
cs.LG
cs.CL
Neurosymbolic Graph Enrichment for Grounded World Models
[AUTHORS]
Stefano De Giorgis, Aldo Gangemi, Alessandro Russo
[ABSTRACT]
The development of artificial intelligence systems capable of understanding
and reasoning about complex real-world scenarios is a significant challenge. In
this work we present a novel approach to enhance and exploit LLM reactive
capability to address complex problems and interpret deeply contextual
real-world meaning. We introduce a method and a tool for creating a multimodal,
knowledge-augmented formal representation of meaning that combines the
strengths of large language models with structured semantic representations.
Our method begins with an image input, utilizing state-of-the-art large
language models to generate a natural language description. This description is
then transformed into an Abstract Meaning Representation (AMR) graph, which is
formalized and enriched with logical design patterns, and layered semantics
derived from linguistic and factual knowledge bases. The resulting graph is
then fed back into the LLM to be extended with implicit knowledge activated by
complex heuristic learning, including semantic implicatures, moral values,
embodied cognition, and metaphorical representations. By bridging the gap
between unstructured language models and formal semantic structures, our method
opens new avenues for tackling intricate problems in natural language
understanding and reasoning.
[LINK]
http://arxiv.org/abs/2411.12671v1
[DATE]
2024-11-20 01:23:55+08:00
[CATEGORIES]
cs.CL
Optimizing Airline Reservation Systems with Edge-Enabled Microservices: A Framework for Real-Time Data Processing and Enhanced User Responsiveness
[AUTHORS]
Biman Barua, M. Shamim Kaiser
[ABSTRACT]
The growing complexity of the operations of airline reservations requires a
smart solution for the adoption of novel approaches to the development of
quick, efficient, and adaptive reservation systems. This paper outlines in
detail a conceptual framework for the implementation of edge computing
microservices in order to address the shortcomings of traditional centralized
architectures. Specifically, as edge computing allows for certain activities
such as seat inventory checks, booking processes and even confirmation to be
done nearer to the user, thus lessening the overall response time and improving
the performance of the system. In addition, the framework value should include
achieving the high performance of the system such as low latency, high
throughput and higher user experience. The major design components include
deployed distributed computing microservices orchestrated by Kubernetes,
real-time message processing system with Kafka and its elastic scaling. Other
operational components include Prometheus and Grafana, which are used to
monitor and manage resources, ensuring that all operational processes are
optimized. Although this research focuses on a design and theoretical scheming
of the framework, its use is foreseen to be more advantageous in facilitating a
transform in the provision of services in the airline industry by improving
customers’ satisfaction, providing infrastructure which is cheap to install and
efficiently supporting technology changes such as artificial intelligence and
internet of things embedded systems. This research addresses the increasing
demand for new technologies with modern well-distributed and real-time-centric
systems and also provides a basis for future case implementation and testing.
As such, the proposed architecture offers a market-ready, extensible solution
to the problems posed by existing airline reservation systems .
[COMMENTS]
22 pages, 11 figures
[LINK]
http://arxiv.org/abs/2411.12650v1
[DATE]
2024-11-20 00:58:15+08:00
[CATEGORIES]
cs.CL
HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection
[AUTHORS]
Theo King, Zekun Wu, Adriano Koshiyama, Emre Kazim, Philip Treleaven
[COMMENTS]
Accepted in NeurIPS 2024 SoLaR Workshop and Safety Gen AI Workshop
[LINK]
http://arxiv.org/abs/2409.11579v2
[DATE]
2024-11-20 00:39:57+08:00
[CATEGORIES]
cs.CL
Leveraging Virtual Reality and AI Tutoring for Language Learning: A Case Study of a Virtual Campus Environment with OpenAI GPT Integration with Unity 3D
[AUTHORS]
Adithya TG, Abhinavaram N, Gowri Srinivasa
[ABSTRACT]
This paper presents a new approach to multiple language learning, with Hindi
the language to be learnt in our case, by using the integration of virtual
reality environments and AI enabled tutoring systems using OpenAIs GPT api
calls. We have developed a scenario which has a virtual campus environment
using Unity which focuses on a detailed representation of our universitys
buildings 11th floor, where most of the cultural and technological activities
take place. Within this virtual environment that we have created, we have an AI
tutor powered by OpenAI’s GPT model which was called using an api which moves
around with the user. This provided language learning support in Hindi, as GPT
is able to take care of language translation. Our approach mainly involves
utilising speech to text, text to text conversion and text to speech
capabilities to facilitate real time interaction between users and the AI tutor
in the presence of internet. This research demonstrates the use of combining VR
technology with AI tutoring for immersive language learning experiences and
provides interaction.
[COMMENTS]
5 pages, 2 tables, 8 figures
[LINK]
http://arxiv.org/abs/2411.12619v1
[DATE]
2024-11-20 00:26:19+08:00
[CATEGORIES]
cs.CL
A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback
[AUTHORS]
Alireza Rashidi Laleh, Majid Nili Ahmadabadi
[ABSTRACT]
Reinforcement learning (RL) is one of the active fields in machine learning,
demonstrating remarkable potential in tackling real-world challenges. Despite
its promising prospects, this methodology has encountered with issues and
challenges, hindering it from achieving the best performance. In particular,
these approaches lack decent performance when navigating environments and
solving tasks with large observation space, often resulting in
sample-inefficiency and prolonged learning times. This issue, commonly referred
to as the curse of dimensionality, complicates decision-making for RL agents,
necessitating a careful balance between attention and decision-making. RL
agents, when augmented with human or large language models’ (LLMs) feedback,
may exhibit resilience and adaptability, leading to enhanced performance and
accelerated learning. Such feedback, conveyed through various modalities or
granularities including natural language, serves as a guide for RL agents,
aiding them in discerning relevant environmental cues and optimizing
decision-making processes. In this survey paper, we mainly focus on problems of
two-folds: firstly, we focus on humans or an LLMs assistance, investigating the
ways in which these entities may collaborate with the RL agent in order to
foster optimal behavior and expedite learning; secondly, we delve into the
research papers dedicated to addressing the intricacies of environments
characterized by large observation space.
[LINK]
http://arxiv.org/abs/2411.13410v1
[DATE]
2024-11-20 23:52:03+08:00
[CATEGORIES]
cs.LG
ODTE – An ensemble of multi-class SVM-based oblique decision trees
[AUTHORS]
Ricardo Montañana, José A. Gámez, José M. Puerta
[ABSTRACT]
We propose ODTE, a new ensemble that uses oblique decision trees as base
classifiers. Additionally, we introduce STree, the base algorithm for growing
oblique decision trees, which leverages support vector machines to define
hyperplanes within the decision nodes. We embed a multiclass strategy –
one-vs-one or one-vs-rest – at the decision nodes, allowing the model to
directly handle non-binary classification tasks without the need to cluster
instances into two groups, as is common in other approaches from the
literature. In each decision node, only the best-performing model SVM – the
one that minimizes an impurity measure for the n-ary classification – is
retained, even if the learned SVM addresses a binary classification subtask. An
extensive experimental study involving 49 datasets and various state-of-the-art
algorithms for oblique decision tree ensembles has been conducted. Our results
show that ODTE ranks consistently above its competitors, achieving significant
performance gains when hyperparameters are carefully tuned. Moreover, the
oblique decision trees learned through STree are more compact than those
produced by other algorithms evaluated in our experiments.
[COMMENTS]
29 pages
[LINK]
http://arxiv.org/abs/2411.13376v1
[DATE]
2024-11-20 22:58:32+08:00
[CATEGORIES]
cs.LG
Explainable Finite-Memory Policies for Partially Observable Markov Decision Processes
[AUTHORS]
Muqsit Azeem, Debraj Chakraborty, Sudeep Kanav, Jan Kretinsky
[ABSTRACT]
Partially Observable Markov Decision Processes (POMDPs) are a fundamental
framework for decision-making under uncertainty and partial observability.
Since in general optimal policies may require infinite memory, they are hard to
implement and often render most problems undecidable. Consequently,
finite-memory policies are mostly considered instead. However, the algorithms
for computing them are typically very complex, and so are the resulting
policies. Facing the need for their explainability, we provide a representation
of such policies, both (i) in an interpretable formalism and (ii) typically of
smaller size, together yielding higher explainability. To that end, we combine
models of Mealy machines and decision trees; the latter describing simple,
stationary parts of the policies and the former describing how to switch among
them. We design a translation for policies of the finite-state-controller (FSC)
form from standard literature and show how our method smoothly generalizes to
other variants of finite-memory policies. Further, we identify specific
properties of recently used “attractor-based” policies, which allow us to
construct yet simpler and smaller representations. Finally, we illustrate the
higher explainability in a few case studies.
[COMMENTS]
Preprint – Under Review
[LINK]
http://arxiv.org/abs/2411.13365v1
[DATE]
2024-11-20 22:42:23+08:00
[CATEGORIES]
cs.LG
Random Representations Outperform Online Continually Learned Representations
[AUTHORS]
Ameya Prabhu, Shiven Sinha, Ponnurangam Kumaraguru, Philip H. S. Torr, Ozan Sener, Puneet K. Dokania
[ABSTRACT]
Continual learning has primarily focused on the issue of catastrophic
forgetting and the associated stability-plasticity tradeoffs. However, little
attention has been paid to the efficacy of continually learned representations,
as representations are learned alongside classifiers throughout the learning
process. Our primary contribution is empirically demonstrating that existing
online continually trained deep networks produce inferior representations
compared to a simple pre-defined random transforms. Our approach projects raw
pixels using a fixed random transform, approximating an RBF-Kernel initialized
before any data is seen. We then train a simple linear classifier on top
without storing any exemplars, processing one sample at a time in an online
continual learning setting. This method, called RanDumb, significantly
outperforms state-of-the-art continually learned representations across all
standard online continual learning benchmarks. Our study reveals the
significant limitations of representation learning, particularly in
low-exemplar and online continual learning scenarios. Extending our
investigation to popular exemplar-free scenarios with pretrained models, we
find that training only a linear classifier on top of pretrained
representations surpasses most continual fine-tuning and prompt-tuning
strategies. Overall, our investigation challenges the prevailing assumptions
about effective representation learning in online continual learning. Our code
is available at://github.com/drimpossible/RanDumb.
[COMMENTS]
Accepted at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2402.08823v3
[DATE]
2024-11-20 22:33:10+08:00
[CATEGORIES]
cs.LG
Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions
[AUTHORS]
Mai Elkady, Thu Bui, Bruno Ribeiro, David I. Inouye
[ABSTRACT]
There has been a growing excitement that implicit graph generative models
could be used to design or discover new molecules for medicine or material
design. Because these molecules have not been discovered, they naturally lie in
unexplored or scarcely supported regions of the distribution of known
molecules. However, prior evaluation methods for implicit graph generative
models have focused on validating statistics computed from the thick support
(e.g., mean and variance of a graph property). Therefore, there is a mismatch
between the goal of generating novel graphs and the evaluation methods. To
address this evaluation gap, we design a novel evaluation method called
Vertical Validation (VV) that systematically creates thin support regions
during the train-test splitting procedure and then reweights generated samples
so that they can be compared to the held-out test data. This procedure can be
seen as a generalization of the standard train-test procedure except that the
splits are dependent on sample features. We demonstrate that our method can be
used to perform model selection if performance on thin support regions is the
desired goal. As a side benefit, we also show that our approach can better
detect overfitting as exemplified by memorization.
[COMMENTS]
Accepted to UAI 2024
[LINK]
http://arxiv.org/abs/2411.13358v1
[DATE]
2024-11-20 22:29:59+08:00
[CATEGORIES]
cs.LG
Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications
[AUTHORS]
Mehdi Letafati, Samad Ali, Matti Latva-aho
[ABSTRACT]
In this paper, conditional denoising diffusion probabilistic models (DDPMs)
are proposed to enhance the data transmission and reconstruction over wireless
channels. The underlying mechanism of DDPM is to decompose the data generation
process over the so-called “denoising” steps. Inspired by this, the key idea is
to leverage the generative prior of diffusion models in learning a
“noisy-to-clean” transformation of the information signal to help enhance data
reconstruction. The proposed scheme could be beneficial for communication
scenarios in which a prior knowledge of the information content is available,
e.g., in multimedia transmission. Hence, instead of employing complicated
channel codes that reduce the information rate, one can exploit diffusion
priors for reliable data reconstruction, especially under extreme channel
conditions due to low signal-to-noise ratio (SNR), or hardware-impaired
communications. The proposed DDPM-assisted receiver is tailored for the
scenario of wireless image transmission using MNIST dataset. Our numerical
results highlight the reconstruction performance of our scheme compared to the
conventional digital communication, as well as the deep neural network
(DNN)-based benchmark. It is also shown that more than 10 dB improvement in the
reconstruction could be achieved in low SNR regimes, without the need to reduce
the information rate for error correction.
[COMMENTS]
arXiv admin note: substantial text overlap with arXiv:2309.08568
[LINK]
http://arxiv.org/abs/2310.19460v3
[DATE]
2024-11-20 22:24:25+08:00
[CATEGORIES]
cs.LG
Verifying Machine Unlearning with Explainable AI
[AUTHORS]
Àlex Pujol Vidal, Anders S. Johansen, Mohammad N. S. Jahromi, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund
[ABSTRACT]
We investigate the effectiveness of Explainable AI (XAI) in verifying Machine
Unlearning (MU) within the context of harbor front monitoring, focusing on data
privacy and regulatory compliance. With the increasing need to adhere to
privacy legislation such as the General Data Protection Regulation (GDPR),
traditional methods of retraining ML models for data deletions prove
impractical due to their complexity and resource demands. MU offers a solution
by enabling models to selectively forget specific learned patterns without full
retraining. We explore various removal techniques, including data relabeling,
and model perturbation. Then, we leverage attribution-based XAI to discuss the
effects of unlearning on model performance. Our proof-of-concept introduces
feature importance as an innovative verification step for MU, expanding beyond
traditional metrics and demonstrating techniques’ ability to reduce reliance on
undesired patterns. Additionally, we propose two novel XAI-based metrics,
Heatmap Coverage (HC) and Attention Shift (AS), to evaluate the effectiveness
of these methods. This approach not only highlights how XAI can complement MU
by providing effective verification, but also sets the stage for future
research to enhance their joint integration.
[COMMENTS]
ICPRW2024
[LINK]
http://arxiv.org/abs/2411.13332v1
[DATE]
2024-11-20 21:57:32+08:00
[CATEGORIES]
cs.LG
Revisiting Discrete Soft Actor-Critic
[AUTHORS]
Haibin Zhou, Tong Wei, Zichuan Lin, junyou li, Junliang Xing, Yuanchun Shi, Li Shen, Chao Yu, Deheng Ye
[ABSTRACT]
We study the adaption of Soft Actor-Critic (SAC), which is considered as a
state-of-the-art reinforcement learning (RL) algorithm, from continuous action
space to discrete action space. We revisit vanilla discrete SAC and provide an
in-depth understanding of its Q value underestimation and performance
instability issues when applied to discrete settings. We thereby propose Stable
Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double
average Q-learning with Q-clip to address these issues. Extensive experiments
on typical benchmarks with discrete action space, including Atari games and a
large-scale MOBA game, show the efficacy of our proposed method. Our code is
at: https://github.com/coldsummerday/SD-SAC.git.
[COMMENTS]
Accepted by Transactions on Machine Learning Research (TMLR)
[LINK]
http://arxiv.org/abs/2209.10081v4
[DATE]
2024-11-20 21:52:42+08:00
[CATEGORIES]
cs.LG
Are Large Language Models Memorizing Bug Benchmarks?
[AUTHORS]
Daniel Ramos, Claudia Mamede, Kush Jain, Paulo Canelas, Catarina Gamboa, Claire Le Goues
[ABSTRACT]
Large Language Models (LLMs) have become integral to various software
engineering tasks, including code generation, bug detection, and repair. To
evaluate model performance in these domains, numerous bug benchmarks containing
real-world bugs from software projects have been developed. However, a growing
concern within the software engineering community is that these benchmarks may
not reliably reflect true LLM performance due to the risk of data leakage.
Despite this concern, limited research has been conducted to quantify the
impact of potential leakage.
In this paper, we systematically evaluate popular LLMs to assess their
susceptibility to data leakage from widely used bug benchmarks. To identify
potential leakage, we use multiple metrics, including a study of benchmark
membership within commonly used training datasets, as well as analyses of
negative log-likelihood and n-gram accuracy. Our findings show that certain
models, in particular codegen-multi, exhibit significant evidence of
memorization in widely used benchmarks like Defects4J, while newer models
trained on larger datasets like LLaMa 3.1 exhibit limited signs of leakage.
These results highlight the need for careful benchmark selection and the
adoption of robust metrics to adequately assess models capabilities.
[COMMENTS]
pre-print
[LINK]
http://arxiv.org/abs/2411.13323v1
[DATE]
2024-11-20 21:46:04+08:00
[CATEGORIES]
cs.LG
Scaling Laws for Online Advertisement Retrieval
[AUTHORS]
Yunli Wang, Zixuan Yang, Zhen Zhang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Peng Jiang, Kun Gai
[ABSTRACT]
The scaling law is a notable property of neural network models and has
significantly propelled the development of large language models. Scaling laws
hold great promise in guiding model design and resource allocation. Recent
research increasingly shows that scaling laws are not limited to NLP tasks or
Transformer architectures; they also apply to domains such as recommendation.
However, there is still a lack of literature on scaling law research in online
advertisement retrieval systems. This may be because 1) identifying the scaling
law for resource cost and online revenue is often expensive in both time and
training resources for large-scale industrial applications, and 2) varying
settings for different systems prevent the scaling law from being applied
across various scenarios. To address these issues, we propose a lightweight
paradigm to identify the scaling law of online revenue and machine cost for a
certain online advertisement retrieval scenario with a low experimental cost.
Specifically, we focus on a sole factor (FLOPs) and propose an offline metric
named R/R* that exhibits a high linear correlation with online revenue for
retrieval models. We estimate the machine cost offline via a simulation
algorithm. Thus, we can transform most online experiments into low-cost offline
experiments. We conduct comprehensive experiments to verify the effectiveness
of our proposed metric R/R* and to identify the scaling law in the online
advertisement retrieval system of Kuaishou. With the scaling law, we
demonstrate practical applications for ROI-constrained model designing and
multi-scenario resource allocation in Kuaishou advertising system. To the best
of our knowledge, this is the first work to study the scaling laws for online
advertisement retrieval of real-world systems, showing great potential for
scaling law in advertising system optimization.
[COMMENTS]
10 pages, 8 figures
[LINK]
http://arxiv.org/abs/2411.13322v1
[DATE]
2024-11-20 21:44:59+08:00
[CATEGORIES]
cs.LG
Locally Adaptive One-Class Classifier Fusion with Dynamic $\ell$p-Norm Constraints for Robust Anomaly Detection
[AUTHORS]
Sepehr Nourmohammadi, Arda Sarp Yenicesu, Shervin Rahimzadeh Arashloo, Ozgur S. Oguz
[ABSTRACT]
This paper presents a novel approach to one-class classifier fusion through
locally adaptive learning with dynamic $\ell$p-norm constraints. We introduce a
framework that dynamically adjusts fusion weights based on local data
characteristics, addressing fundamental challenges in ensemble-based anomaly
detection. Our method incorporates an interior-point optimization technique
that significantly improves computational efficiency compared to traditional
Frank-Wolfe approaches, achieving up to 19-fold speed improvements in complex
scenarios. The framework is extensively evaluated on standard UCI benchmark
datasets and specialized temporal sequence datasets, demonstrating superior
performance across diverse anomaly types. Statistical validation through
Skillings-Mack tests confirms our method’s significant advantages over existing
approaches, with consistent top rankings in both pure and non-pure learning
scenarios. The framework’s ability to adapt to local data patterns while
maintaining computational efficiency makes it particularly valuable for
real-time applications where rapid and accurate anomaly detection is crucial.
[LINK]
http://arxiv.org/abs/2411.06406v2
[DATE]
2024-11-20 21:39:23+08:00
[CATEGORIES]
cs.LG
Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime
[AUTHORS]
Haoyu Geng, Hang Ruan, Runzhong Wang, Yang Li, Yang Wang, Lei Chen, Junchi Yan
[ABSTRACT]
Predictive combinatorial optimization, where the parameters of combinatorial
optimization (CO) are unknown at the decision-making time, is the precise
modeling of many real-world applications, including energy cost-aware
scheduling and budget allocation on advertising. Tackling such a problem
usually involves a prediction model and a CO solver. These two modules are
integrated into the predictive CO pipeline following two design principles:
“Predict-then-Optimize (PtO)”, which learns predictions by supervised training
and subsequently solves CO using predicted coefficients, while the other, named
“Predict-and-Optimize (PnO)”, directly optimizes towards the ultimate decision
quality and claims to yield better decisions than traditional PtO approaches.
However, there lacks a systematic benchmark of both approaches, including the
specific design choices at the module level, as well as an evaluation dataset
that covers representative real-world scenarios. To this end, we develop a
modular framework to benchmark 11 existing PtO/PnO methods on 8 problems,
including a new industrial dataset for combinatorial advertising that will be
released. Our study shows that PnO approaches are better than PtO on 7 out of 8
benchmarks, but there is no silver bullet found for the specific design choices
of PnO. A comprehensive categorization of current approaches and integration of
typical scenarios are provided under a unified benchmark. Therefore, this paper
could serve as a comprehensive benchmark for future PnO approach development
and also offer fast prototyping for application-focused development. The code
is available at https://github.com/Thinklab-SJTU/PredictiveCO-Benchmark.
[COMMENTS]
NeurIPS 2024 Datasets and Benchmarks Track
[LINK]
http://arxiv.org/abs/2311.07633v5
[DATE]
2024-11-20 21:20:45+08:00
[CATEGORIES]
cs.LG
Lifted Model Construction without Normalisation: A Vectorised Approach to Exploit Symmetries in Factor Graphs
[AUTHORS]
Malte Luttermann, Ralf Möller, Marcel Gehrke
[ABSTRACT]
Lifted probabilistic inference exploits symmetries in a probabilistic model
to allow for tractable probabilistic inference with respect to domain sizes of
logical variables. We found that the current state-of-the-art algorithm to
construct a lifted representation in form of a parametric factor graph misses
symmetries between factors that are exchangeable but scaled differently,
thereby leading to a less compact representation. In this paper, we propose a
generalisation of the advanced colour passing (ACP) algorithm, which is the
state of the art to construct a parametric factor graph. Our proposed algorithm
allows for potentials of factors to be scaled arbitrarily and efficiently
detects more symmetries than the original ACP algorithm. By detecting strictly
more symmetries than ACP, our algorithm significantly reduces online query
times for probabilistic inference when the resulting model is applied, which we
also confirm in our experiments.
[COMMENTS]
Accepted to the Proceedings of the 3rd Learning on Graphs Conference
(LoG 2024)
[LINK]
http://arxiv.org/abs/2411.11730v2
[DATE]
2024-11-20 21:01:18+08:00
[CATEGORIES]
cs.LG
DATTA: Domain-Adversarial Test-Time Adaptation for Cross-Domain WiFi-Based Human Activity Recognition
[AUTHORS]
Julian Strohmayer, Rafael Sterzinger, Matthias Wödlinger, Martin Kampel
[ABSTRACT]
Cross-domain generalization is an open problem in WiFi-based sensing due to
variations in environments, devices, and subjects, causing domain shifts in
channel state information. To address this, we propose Domain-Adversarial
Test-Time Adaptation (DATTA), a novel framework combining domain-adversarial
training (DAT), test-time adaptation (TTA), and weight resetting to facilitate
adaptation to unseen target domains and to prevent catastrophic forgetting.
DATTA is integrated into a lightweight, flexible architecture optimized for
speed. We conduct a comprehensive evaluation of DATTA, including an ablation
study on all key components using publicly available data, and verify its
suitability for real-time applications such as human activity recognition. When
combining a SotA video-based variant of TTA with WiFi-based DAT and comparing
it to DATTA, our method achieves an 8.1% higher F1-Score. The PyTorch
implementation of DATTA is publicly available at:
https://github.com/StrohmayerJ/DATTA.
[LINK]
http://arxiv.org/abs/2411.13284v1
[DATE]
2024-11-20 20:52:36+08:00
[CATEGORIES]
cs.LG
3D-Aware Instance Segmentation and Tracking in Egocentric Videos
[AUTHORS]
Yash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman
[ABSTRACT]
Egocentric videos present unique challenges for 3D scene understanding due to
rapid camera motion, frequent object occlusions, and limited object visibility.
This paper introduces a novel approach to instance segmentation and tracking in
first-person video that leverages 3D awareness to overcome these obstacles. Our
method integrates scene geometry, 3D object centroid tracking, and instance
segmentation to create a robust framework for analyzing dynamic egocentric
scenes. By incorporating spatial and temporal cues, we achieve superior
performance compared to state-of-the-art 2D approaches. Extensive evaluations
on the challenging EPIC Fields dataset demonstrate significant improvements
across a range of tracking and segmentation consistency metrics. Specifically,
our method outperforms the next best performing approach by $7$ points in
Association Accuracy (AssA) and $4.5$ points in IDF1 score, while reducing the
number of ID switches by $73\%$ to $80\%$ across various object categories.
Leveraging our tracked instance segmentations, we showcase downstream
applications in 3D object reconstruction and amodal video object segmentation
in these egocentric settings.
[COMMENTS]
Camera-ready for ACCV 2024. More experiments added
[LINK]
http://arxiv.org/abs/2408.09860v2
[DATE]
2024-11-20 20:51:25+08:00
[CATEGORIES]
cs.LG
Transformers with Sparse Attention for Granger Causality
[AUTHORS]
Riya Mahesh, Rahul Vashisht, Chandrashekar Lakshminarayanan
[ABSTRACT]
Temporal causal analysis means understanding the underlying causes behind
observed variables over time. Deep learning based methods such as transformers
are increasingly used to capture temporal dynamics and causal relationships
beyond mere correlations. Recent works suggest self-attention weights of
transformers as a useful indicator of causal links. We leverage this to propose
a novel modification to the self-attention module to establish causal links
between the variables of multivariate time-series data with varying lag
dependencies. Our Sparse Attention Transformer captures causal relationships
using a two-fold approach - performing temporal attention first followed by
attention between the variables across the time steps masking them individually
to compute Granger Causality indices. The key novelty in our approach is the
ability of the model to assert importance and pick the most significant past
time instances for its prediction task against manually feeding a fixed time
lag value. We demonstrate the effectiveness of our approach via extensive
experimentation on several synthetic benchmark datasets. Furthermore, we
compare the performance of our model with the traditional Vector Autoregression
based Granger Causality method that assumes fixed lag length.
[LINK]
http://arxiv.org/abs/2411.13264v1
[DATE]
2024-11-20 20:34:06+08:00
[CATEGORIES]
cs.LG
PDE-CNNs: Axiomatic Derivations and Applications
[AUTHORS]
Gijs Bellaard, Sei Sakata, Bart M. N. Smets, Remco Duits
[ABSTRACT]
PDE-based Group Convolutional Neural Networks (PDE-G-CNNs) use solvers of
evolution PDEs as substitutes for the conventional components in G-CNNs.
PDE-G-CNNs can offer several benefits simultaneously: fewer parameters,
inherent equivariance, better accuracy, and data efficiency.
In this article we focus on Euclidean equivariant PDE-G-CNNs where the
feature maps are two-dimensional throughout. We call this variant of the
framework a PDE-CNN.
From a machine learning perspective, we list several practically desirable
axioms and derive from these which PDEs should be used in a PDE-CNN, this being
our main contribution. Our approach to geometric learning via PDEs is inspired
by the axioms of scale-space theory, which we generalize by introducing
semifield-valued signals.
Our theory reveals new PDEs that can be used in PDE-CNNs and we
experimentally examine what impact these have on the accuracy of PDE-CNNs. We
also confirm for small networks that PDE-CNNs offer fewer parameters, increased
accuracy, and better data efficiency when compared to CNNs.
[LINK]
http://arxiv.org/abs/2403.15182v3
[DATE]
2024-11-20 20:22:53+08:00
[CATEGORIES]
cs.LG
On lower bounds of the density of planar periodic sets without unit distances
[AUTHORS]
Alexander Tolmachev
[ABSTRACT]
Determining the maximal density $m_1(\mathbb{R}^2)$ of planar sets without
unit distances is a fundamental problem in combinatorial geometry. This paper
investigates lower bounds for this quantity. We introduce a novel approach to
estimating $m_1(\mathbb{R}^2)$ by reformulating the problem as a Maximal
Independent Set (MIS) problem on graphs constructed from flat torus, focusing
on periodic sets with respect to two non-collinear vectors. Our experimental
results supported by theoretical justifications of proposed method demonstrate
that for a sufficiently wide range of parameters this approach does not improve
the known lower bound $0.22936 \le m_1(\mathbb{R}^2)$. The best discrete sets
found are approximations of Croft’s construction. In addition, several open
source software packages for MIS problem are compared on this task.
[COMMENTS]
21 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.13248v1
[DATE]
2024-11-20 20:07:19+08:00
[CATEGORIES]
cs.LG
ViSTa Dataset: Do vision-language models understand sequential tasks?
[AUTHORS]
Evžen Wybitul, Evan Ryan Gunter, Mikhail Seleznyov
[ABSTRACT]
Using vision-language models (VLMs) as reward models in reinforcement
learning holds promise for reducing costs and improving safety. So far, VLM
reward models have only been used for goal-oriented tasks, where the agent must
reach a particular final outcome. We explore VLMs’ potential to supervise tasks
that cannot be scored by the final state alone. To this end, we introduce
ViSTa, a dataset for evaluating Vision-based understanding of Sequential Tasks.
ViSTa comprises over 4,000 videos with step-by-step descriptions in virtual
home, Minecraft, and real-world environments. Its novel hierarchical structure
– basic single-step tasks composed into more and more complex sequential tasks
– allows a fine-grained understanding of how well VLMs can judge tasks with
varying complexity. To illustrate this, we use ViSTa to evaluate
state-of-the-art VLMs, including CLIP, ViCLIP, and GPT-4o. We find that, while
they are all good at object recognition, they fail to understand sequential
tasks, with only GPT-4o achieving non-trivial performance.
[LINK]
http://arxiv.org/abs/2411.13211v1
[DATE]
2024-11-20 19:19:22+08:00
[CATEGORIES]
cs.LG
Engagement-Driven Content Generation with Large Language Models
[AUTHORS]
Erica Coppolillo, Marco Minici, Federico Cinus, Francesco Bonchi, Giuseppe Manco
[ABSTRACT]
Large Language Models (LLMs) exhibit significant persuasion capabilities in
one-on-one interactions, but their influence within social networks remains
underexplored. This study investigates the potential social impact of LLMs in
these environments, where interconnected users and complex opinion dynamics
pose unique challenges. In particular, we address the following research
question: can LLMs learn to generate meaningful content that maximizes user
engagement on social networks?
To answer this question, we define a pipeline to guide the LLM-based content
generation which employs reinforcement learning with simulated feedback. In our
framework, the reward is based on an engagement model borrowed from the
literature on opinion dynamics and information propagation. Moreover, we force
the text generated by the LLM to be aligned with a given topic and to satisfy a
minimum fluency requirement.
Using our framework, we analyze the capabilities and limitations of LLMs in
tackling the given task, specifically considering the relative positions of the
LLM as an agent within the social network and the distribution of opinions in
the network on the given topic. Our findings show the full potential of LLMs in
creating social engagement. Notable properties of our approach are that the
learning procedure is adaptive to the opinion distribution of the underlying
network and agnostic to the specifics of the engagement model, which is
embedded as a plug-and-play component. In this regard, our approach can be
easily refined for more complex engagement tasks and interventions in
computational social science.
The code used for the experiments is publicly available at
https://anonymous.4open.science/r/EDCG/.
[LINK]
http://arxiv.org/abs/2411.13187v1
[DATE]
2024-11-20 18:40:08+08:00
[CATEGORIES]
cs.LG
How Much Data is Enough? Optimization of Data Collection for Artifact Detection in EEG Recordings
[AUTHORS]
Lu Wang-Nöth, Philipp Heiler, Hai Huang, Daniel Lichtenstern, Alexandra Reichenbach, Luis Flacke, Linus Maisch, Helmut Mayer
[ABSTRACT]
Objective. Electroencephalography (EEG) is a widely used neuroimaging
technique known for its cost-effectiveness and user-friendliness. However,
various artifacts, particularly biological artifacts like Electromyography
(EMG) signals, lead to a poor signal-to-noise ratio, limiting the precision of
analyses and applications. The currently reported EEG data cleaning performance
largely depends on the data used for validation, and in the case of machine
learning approaches, also on the data used for training. The data are typically
gathered either by recruiting subjects to perform specific artifact tasks or by
integrating existing datasets. Prevailing approaches, however, tend to rely on
intuitive, concept-oriented data collection with minimal justification for the
selection of artifacts and their quantities. Given the substantial costs
associated with biological data collection and the pressing need for effective
data utilization, we propose an optimization procedure for data-oriented data
collection design using deep learning-based artifact detection. Approach. We
apply a binary classification between artifact epochs (time intervals
containing artifacts) and non-artifact epochs (time intervals containing no
artifact) using three different neural architectures. Our aim is to minimize
data collection efforts while preserving the cleaning efficiency. Main results.
We were able to reduce the number of artifact tasks from twelve to three and
decrease repetitions of isometric contraction tasks from ten to three or
sometimes even just one. Significance. Our work addresses the need for
effective data utilization in biological data collection, offering a systematic
and dynamic quantitative approach. By providing clear justifications for the
choices of artifacts and their quantity, we aim to guide future studies toward
more effective and economical data collection in EEG and EMG research.
[COMMENTS]
Several changes of wording. Caption of figure 10 corrected
[LINK]
http://arxiv.org/abs/2411.11886v2
[DATE]
2024-11-20 18:38:55+08:00
[CATEGORIES]
cs.LG
Operator learning without the adjoint
[AUTHORS]
Nicolas Boullé, Diana Halikias, Samuel E. Otto, Alex Townsend
[ABSTRACT]
There is a mystery at the heart of operator learning: how can one recover a
non-self-adjoint operator from data without probing the adjoint? Current
practical approaches suggest that one can accurately recover an operator while
only using data generated by the forward action of the operator without access
to the adjoint. However, naively, it seems essential to sample the action of
the adjoint. In this paper, we partially explain this mystery by proving that
without querying the adjoint, one can approximate a family of non-self-adjoint
infinite-dimensional compact operators via projection onto a Fourier basis. We
then apply the result to recovering Green’s functions of elliptic partial
differential operators and derive an adjoint-free sample complexity bound.
While existing theory justifies low sample complexity in operator learning,
ours is the first adjoint-free analysis that attempts to close the gap between
theory and practice.
[COMMENTS]
54 pages, 5 figures, to appear in Journal of Machine Learning
Research
[LINK]
http://arxiv.org/abs/2401.17739v2
[DATE]
2024-11-20 18:38:29+08:00
[CATEGORIES]
cs.LG
Regional Ocean Forecasting with Hierarchical Graph Neural Networks
[AUTHORS]
Daniel Holmberg, Emanuela Clementi, Teemu Roos
[ABSTRACT]
Accurate ocean forecasting systems are vital for understanding marine
dynamics, which play a crucial role in environmental management and climate
adaptation strategies. Traditional numerical solvers, while effective, are
computationally expensive and time-consuming. Recent advancements in machine
learning have revolutionized weather forecasting, offering fast and
energy-efficient alternatives. Building on these advancements, we introduce
SeaCast, a neural network designed for high-resolution, medium-range ocean
forecasting. SeaCast employs a graph-based framework to effectively handle the
complex geometry of ocean grids and integrates external forcing data tailored
to the regional ocean context. Our approach is validated through experiments at
a high spatial resolution using the operational numerical model of the
Mediterranean Sea provided by the Copernicus Marine Service, along with both
numerical and data-driven atmospheric forcings.
[COMMENTS]
28 pages, 35 figures. Accepted to the Tackling Climate Change with
Machine Learning workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.11807v2
[DATE]
2024-11-20 18:33:05+08:00
[CATEGORIES]
cs.LG
A Unified Analysis for Finite Weight Averaging
[AUTHORS]
Peng Wang, Li Shen, Zerui Tao, Yan Sun, Guodong Zheng, Dacheng Tao
[ABSTRACT]
Averaging iterations of Stochastic Gradient Descent (SGD) have achieved
empirical success in training deep learning models, such as Stochastic Weight
Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging
(LAWA). Especially, with a finite weight averaging method, LAWA can attain
faster convergence and better generalization. However, its theoretical
explanation is still less explored since there are fundamental differences
between finite and infinite settings. In this work, we first generalize SGD and
LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to
SGD from the perspective of optimization and generalization. A key challenge is
the inapplicability of traditional methods in the sense of expectation or
optimal values for infinite-dimensional settings in analyzing FWA’s
convergence. Second, the cumulative gradients introduced by FWA introduce
additional confusion to the generalization analysis, especially making it more
difficult to discuss them under different assumptions. Extending the final
iteration convergence analysis to the FWA, this paper, under a convexity
assumption, establishes a convergence bound
$\mathcal{O}(\log\left(\frac{T}{k}\right)/\sqrt{T})$, where $k\in[1, T/2]$ is a
constant representing the last $k$ iterations. Compared to SGD with
$\mathcal{O}(\log(T)/\sqrt{T})$, we prove theoretically that FWA has a faster
convergence rate and explain the effect of the number of average points. In the
generalization analysis, we find a recursive representation for bounding the
cumulative gradient using mathematical induction. We provide bounds for
constant and decay learning rates and the convex and non-convex cases to show
the good generalization performance of FWA. Finally, experimental results on
several benchmarks verify our theoretical results.
[COMMENTS]
34 pages
[LINK]
http://arxiv.org/abs/2411.13169v1
[DATE]
2024-11-20 18:08:22+08:00
[CATEGORIES]
cs.LG
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding
[AUTHORS]
Nabeel Seedat, Caterina Tozzi, Andrea Hita Ardiaca, Mihaela van der Schaar, James Weatherall, Adam Taylor
[ABSTRACT]
The reuse of historical clinical trial data has significant potential to
accelerate medical research and drug development. However, interoperability
challenges, particularly with missing medical codes, hinders effective data
integration across studies. While Large Language Models (LLMs) offer a
promising solution for automated coding without labeled data, current
approaches face challenges on complex coding tasks. We introduce ALIGN, a novel
compositional LLM-based system for automated, zero-shot medical coding. ALIGN
follows a three-step process: (1) diverse candidate code generation; (2)
self-evaluation of codes and (3) confidence scoring and uncertainty estimation
enabling human deferral to ensure reliability. We evaluate ALIGN on harmonizing
medication terms into Anatomical Therapeutic Chemical (ATC) and medical history
terms into Medical Dictionary for Regulatory Activities (MedDRA) codes
extracted from 22 immunology trials. ALIGN outperformed the LLM baselines,
while also providing capabilities for trustworthy deployment. For MedDRA
coding, ALIGN achieved high accuracy across all levels, matching RAG and
excelling at the most specific levels (87-90% for HLGT). For ATC coding, ALIGN
demonstrated superior performance, particularly at lower hierarchy levels (ATC
Level 4), with 72-73% overall accuracy and 86-89% accuracy for common
medications, outperforming baselines by 7-22%. ALIGN’s uncertainty-based
deferral improved accuracy by 17% to 90% accuracy with 30% deferral, notably
enhancing performance on uncommon medications. ALIGN achieves this
cost-efficiently at $0.0007 and $0.02 per code for GPT-4o-mini and GPT-4o,
reducing barriers to clinical adoption. ALIGN advances automated medical coding
for clinical trial data, contributing to enhanced data interoperability and
reusability, positioning it as a promising tool to improve clinical research
and accelerate drug development.
[LINK]
http://arxiv.org/abs/2411.13163v1
[DATE]
2024-11-20 17:59:12+08:00
[CATEGORIES]
cs.LG
Long Term Memory: The Foundation of AI Self-Evolution
[AUTHORS]
Xun Jiang, Feng Li, Han Zhao, Jiaying Wang, Jun Shao, Shihao Xu, Shu Zhang, Weiling Chen, Xavier Tang, Yize Chen, Mengyue Wu, Weizhi Ma, Mengdi Wang, Tianqiao Chen
[ABSTRACT]
Large language models (LLMs) like GPTs, trained on vast datasets, have
demonstrated impressive capabilities in language understanding, reasoning, and
planning, achieving human-level performance in various tasks. Most studies
focus on enhancing these models by training on ever-larger datasets to build
more powerful foundation models. While training stronger models is important,
enabling models to evolve during inference is equally crucial, a process we
refer to as AI self-evolution. Unlike large-scale training, self-evolution may
rely on limited data or interactions. Inspired by the columnar organization of
the human cerebral cortex, we hypothesize that AI models could develop
cognitive abilities and build internal representations through iterative
interactions with their environment. To achieve this, models need long-term
memory (LTM) to store and manage processed interaction data. LTM supports
self-evolution by representing diverse experiences across environments and
agents. In this report, we explore AI self-evolution and its potential to
enhance models during inference. We examine LTM’s role in lifelong learning,
allowing models to evolve based on accumulated interactions. We outline the
structure of LTM and the systems needed for effective data retention and
representation. We also classify approaches for building personalized models
with LTM data and show how these models achieve self-evolution through
interaction. Using LTM, our multi-agent framework OMNE achieved first place on
the GAIA benchmark, demonstrating LTM’s potential for AI self-evolution.
Finally, we present a roadmap for future research, emphasizing the importance
of LTM for advancing AI technology and its practical applications.
[COMMENTS]
56 pages, 13 figures
[LINK]
http://arxiv.org/abs/2410.15665v3
[DATE]
2024-11-20 17:08:14+08:00
[CATEGORIES]
cs.LG
Derivatives of Stochastic Gradient Descent in parametric optimization
[AUTHORS]
Franck Iutzeler, Edouard Pauwels, Samuel Vaiter
[ABSTRACT]
We consider stochastic optimization problems where the objective depends on
some parameter, as commonly found in hyperparameter optimization for instance.
We investigate the behavior of the derivatives of the iterates of Stochastic
Gradient Descent (SGD) with respect to that parameter and show that they are
driven by an inexact SGD recursion on a different objective function, perturbed
by the convergence of the original SGD. This enables us to establish that the
derivatives of SGD converge to the derivative of the solution mapping in terms
of mean squared error whenever the objective is strongly convex. Specifically,
we demonstrate that with constant step-sizes, these derivatives stabilize
within a noise ball centered at the solution derivative, and that with
vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates.
Additionally, we prove exponential convergence in the interpolation regime. Our
theoretical findings are illustrated by numerical experiments on synthetic
tasks.
[LINK]
http://arxiv.org/abs/2405.15894v2
[DATE]
2024-11-20 17:04:29+08:00
[CATEGORIES]
cs.LG
ZNorm: Z-Score Gradient Normalization Accelerating Skip-Connected Network Training without Architectural Modification
[AUTHORS]
Juyoung Yun
[ABSTRACT]
The rapid advancements in deep learning necessitate better training methods
for deep neural networks (DNNs). As models grow in complexity, vanishing and
exploding gradients impede performance, particularly in skip-connected
architectures like Deep Residual Networks. We propose Z-Score Normalization for
Gradient Descent (ZNorm), an innovative technique that adjusts only the
gradients without modifying the network architecture to accelerate training and
improve model performance. ZNorm normalizes the overall gradients, providing
consistent gradient scaling across layers, effectively reducing the risks of
vanishing and exploding gradients and achieving superior performance. Extensive
experiments on CIFAR-10 and medical datasets confirm that ZNorm consistently
outperforms existing methods under the same experimental settings. In medical
imaging applications, ZNorm significantly enhances tumor prediction and
segmentation accuracy, underscoring its practical utility. These findings
highlight ZNorm’s potential as a robust and versatile tool for enhancing the
training and effectiveness of deep neural networks, especially in
skip-connected architectures, across various applications.
[LINK]
http://arxiv.org/abs/2408.01215v5
[DATE]
2024-11-20 16:54:05+08:00
[CATEGORIES]
cs.LG
Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration
[AUTHORS]
Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu
[ABSTRACT]
The deep unfolding approach has attracted significant attention in computer
vision tasks, which well connects conventional image processing modeling
manners with more recent deep learning techniques. Specifically, by
establishing a direct correspondence between algorithm operators at each
implementation step and network modules within each layer, one can rationally
construct an almost “white box” network architecture with high
interpretability. In this architecture, only the predefined component of the
proximal operator, known as a proximal network, needs manual configuration,
enabling the network to automatically extract intrinsic image priors in a
data-driven manner. In current deep unfolding methods, such a proximal network
is generally designed as a CNN architecture, whose necessity has been proven by
a recent theory. That is, CNN structure substantially delivers the
translational invariant image prior, which is the most universally possessed
structural prior across various types of images. However, standard CNN-based
proximal networks have essential limitations in capturing the rotation symmetry
prior, another universal structural prior underlying general images. This
leaves a large room for further performance improvement in deep unfolding
approaches. To address this issue, this study makes efforts to suggest a
high-accuracy rotation equivariant proximal network that effectively embeds
rotation symmetry priors into the deep unfolding framework. Especially, we
deduce, for the first time, the theoretical equivariant error for such a
designed proximal network with arbitrary layers under arbitrary rotation
degrees. This analysis should be the most refined theoretical conclusion for
such error evaluation to date and is also indispensable for supporting the
rationale behind such networks with intrinsic interpretability requirements.
[COMMENTS]
Published in TPAMI 2024
[LINK]
http://arxiv.org/abs/2312.15701v2
[DATE]
2024-11-20 16:44:06+08:00
[CATEGORIES]
cs.LG
Select High-Level Features: Efficient Experts from a Hierarchical Classification Network
[AUTHORS]
André Kelm, Niels Hannemann, Bruno Heberle, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop
[ABSTRACT]
This study introduces a novel expert generation method that dynamically
reduces task and computational complexity without compromising predictive
performance. It is based on a new hierarchical classification network topology
that combines sequential processing of generic low-level features with
parallelism and nesting of high-level features. This structure allows for the
innovative extraction technique: the ability to select only high-level features
of task-relevant categories. In certain cases, it is possible to skip almost
all unneeded high-level features, which can significantly reduce the inference
cost and is highly beneficial in resource-constrained conditions. We believe
this method paves the way for future network designs that are lightweight and
adaptable, making them suitable for a wide range of applications, from compact
edge devices to large-scale clouds. In terms of dynamic inference our
methodology can achieve an exclusion of up to 88.7\,\% of parameters and
73.4\,\% fewer giga-multiply accumulate (GMAC) operations, analysis against
comparative baselines showing an average reduction of 47.6\,\% in parameters
and 5.8\,\% in GMACs across the cases we evaluated.
[COMMENTS]
This two-page paper was accepted for a poster presentation at the 5th
ICLR 2024 Workshop on Practical ML for Limited/Low Resource Settings
(PML4LRS)
[LINK]
http://arxiv.org/abs/2403.05601v2
[DATE]
2024-11-20 16:42:04+08:00
[CATEGORIES]
cs.LG
Virtual Staining of Label-Free Tissue in Imaging Mass Spectrometry
[AUTHORS]
Yijie Zhang, Luzhe Huang, Nir Pillar, Yuzhu Li, Lukasz G. Migas, Raf Van de Plas, Jeffrey M. Spraggins, Aydogan Ozcan
[ABSTRACT]
Imaging mass spectrometry (IMS) is a powerful tool for untargeted, highly
multiplexed molecular mapping of tissue in biomedical research. IMS offers a
means of mapping the spatial distributions of molecular species in biological
tissue with unparalleled chemical specificity and sensitivity. However, most
IMS platforms are not able to achieve microscopy-level spatial resolution and
lack cellular morphological contrast, necessitating subsequent histochemical
staining, microscopic imaging and advanced image registration steps to enable
molecular distributions to be linked to specific tissue features and cell
types. Here, we present a virtual histological staining approach that enhances
spatial resolution and digitally introduces cellular morphological contrast
into mass spectrometry images of label-free human tissue using a diffusion
model. Blind testing on human kidney tissue demonstrated that the virtually
stained images of label-free samples closely match their histochemically
stained counterparts (with Periodic Acid-Schiff staining), showing high
concordance in identifying key renal pathology structures despite utilizing IMS
data with 10-fold larger pixel size. Additionally, our approach employs an
optimized noise sampling technique during the diffusion model’s inference
process to reduce variance in the generated images, yielding reliable and
repeatable virtual staining. We believe this virtual staining method will
significantly expand the applicability of IMS in life sciences and open new
avenues for mass spectrometry-based biomedical research.
[COMMENTS]
33 Pages, 6 Figures
[LINK]
http://arxiv.org/abs/2411.13120v1
[DATE]
2024-11-20 16:30:11+08:00
[CATEGORIES]
cs.LG
Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders
[AUTHORS]
Charles O’Neill, David Klindt
[ABSTRACT]
A recent line of work has shown promise in using sparse autoencoders (SAEs)
to uncover interpretable features in neural network representations. However,
the simple linear-nonlinear encoding mechanism in SAEs limits their ability to
perform accurate sparse inference. In this paper, we investigate sparse
inference and learning in SAEs through the lens of sparse coding. Specifically,
we show that SAEs perform amortised sparse inference with a computationally
restricted encoder and, using compressed sensing theory, we prove that this
mapping is inherently insufficient for accurate sparse inference, even in
solvable cases. Building on this theory, we empirically explore conditions
where more sophisticated sparse inference methods outperform traditional SAE
encoders. Our key contribution is the decoupling of the encoding and decoding
processes, which allows for a comparison of various sparse encoding strategies.
We evaluate these strategies on two dimensions: alignment with true underlying
sparse features and correct inference of sparse codes, while also accounting
for computational costs during training and inference. Our results reveal that
substantial performance gains can be achieved with minimal increases in compute
cost. We demonstrate that this generalises to SAEs applied to large language
models (LLMs), where advanced encoders achieve similar interpretability. This
work opens new avenues for understanding neural network representations and
offers important implications for improving the tools we use to analyse the
activations of large language models.
[LINK]
http://arxiv.org/abs/2411.13117v1
[DATE]
2024-11-20 16:21:53+08:00
[CATEGORIES]
cs.LG
Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions
[AUTHORS]
Hadi Beik Mohammadi, Søren Hauberg, Georgios Arvanitidis, Gerhard Neumann, Leonel Rozo
[ABSTRACT]
Stability guarantees are crucial when ensuring that a fully autonomous robot
does not take undesirable or potentially harmful actions. We recently proposed
the Neural Contractive Dynamical Systems (NCDS), which is a neural network
architecture that guarantees contractive stability. With this,
learning-from-demonstrations approaches can trivially provide stability
guarantees. However, our early work left several unanswered questions, which we
here address. Beyond providing an in-depth explanation of NCDS, this paper
extends the framework with more careful regularization, a conditional variant
of the framework for handling multiple tasks, and an uncertainty-driven
approach to latent obstacle avoidance. Experiments verify that the developed
system has the flexibility of ordinary neural networks while providing the
stability guarantees needed for autonomous robotics.
[COMMENTS]
arXiv admin note: substantial text overlap with arXiv:2401.09352
[LINK]
http://arxiv.org/abs/2411.11405v2
[DATE]
2024-11-20 16:20:35+08:00
[CATEGORIES]
cs.LG
Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning
[AUTHORS]
Zhi Luo, Xiyuan Yang, Pan Zhou, Di Wang
[ABSTRACT]
Manipulating the interaction trajectories between the intelligent agent and
the environment can control the agent’s training and behavior, exposing the
potential vulnerabilities of reinforcement learning (RL). For example, in
Cyber-Physical Systems (CPS) controlled by RL, the attacker can manipulate the
actions of the adopted RL to other actions during the training phase, which
will lead to bad consequences. Existing work has studied action-manipulation
attacks in tabular settings, where the states and actions are discrete. As seen
in many up-and-coming RL applications, such as autonomous driving, continuous
action space is widely accepted, however, its action-manipulation attacks have
not been thoroughly investigated yet. In this paper, we consider this crucial
problem in both white-box and black-box scenarios. Specifically, utilizing the
knowledge derived exclusively from trajectories, we propose a black-box attack
algorithm named LCBT, which uses the Monte Carlo tree search method for
efficient action searching and manipulation. Additionally, we demonstrate that
for an agent whose dynamic regret is sub-linearly related to the total number
of steps, LCBT can teach the agent to converge to target policies with only
sublinear attack cost, i.e., $O\left(\mathcal{R}(T) + MH^3K^E\log
(MT)\right)(0<E<1)$, where $H$ is the number of steps per episode, $K$ is the
total number of episodes, $T=KH$ is the total number of steps, $M$ is the
number of subspaces divided in the state space, and $\mathcal{R}(T)$ is the
bound of the RL algorithm’s regret. We conduct our proposed attack methods on
three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which
show a promising attack performance.
[LINK]
http://arxiv.org/abs/2411.13116v1
[DATE]
2024-11-20 16:20:29+08:00
[CATEGORIES]
cs.LG
TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection
[AUTHORS]
Mengxuan Li, Ke Liu, Hongyang Chen, Jiajun Bu, Hongwei Wang, Haishuai Wang
[ABSTRACT]
Time series anomaly detection aims to identify unusual patterns in data or
deviations from systems’ expected behavior. The reconstruction-based methods
are the mainstream in this task, which learn point-wise representation via
unsupervised learning. However, the unlabeled anomaly points in training data
may cause these reconstruction-based methods to learn and reconstruct anomalous
data, resulting in the challenge of capturing normal patterns. In this paper,
we propose a time series anomaly detection method based on implicit neural
representation (INR) reconstruction, named TSINR, to address this challenge.
Due to the property of spectral bias, TSINR enables prioritizing low-frequency
signals and exhibiting poorer performance on high-frequency abnormal data.
Specifically, we adopt INR to parameterize time series data as a continuous
function and employ a transformer-based architecture to predict the INR of
given data. As a result, the proposed TSINR method achieves the advantage of
capturing the temporal continuity and thus is more sensitive to discontinuous
anomaly data. In addition, we further design a novel form of INR continuous
function to learn inter- and intra-channel information, and leverage a
pre-trained large language model to amplify the intense fluctuations in
anomalies. Extensive experiments demonstrate that TSINR achieves superior
overall performance on both univariate and multivariate time series anomaly
detection benchmarks compared to other state-of-the-art reconstruction-based
methods. Our codes are available.
[COMMENTS]
Accepted by SIGKDD 2025
[LINK]
http://arxiv.org/abs/2411.11641v2
[DATE]
2024-11-20 16:04:43+08:00
[CATEGORIES]
cs.LG
DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV
[AUTHORS]
Zheng Zhang, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief
[ABSTRACT]
To address communication latency issues, the Third Generation Partnership
Project (3GPP) has defined Cellular-Vehicle to Everything (C-V2X) technology,
which includes Vehicle-to-Vehicle (V2V) communication for direct
vehicle-to-vehicle communication. However, this method requires vehicles to
autonomously select communication resources based on the Semi-Persistent
Scheduling (SPS) protocol, which may lead to collisions due to different
vehicles sharing the same communication resources, thereby affecting
communication effectiveness. Non-Orthogonal Multiple Access (NOMA) is
considered a potential solution for handling large-scale vehicle communication,
as it can enhance the Signal-to-Interference-plus-Noise Ratio (SINR) by
employing Successive Interference Cancellation (SIC), thereby reducing the
negative impact of communication collisions. When evaluating vehicle
communication performance, traditional metrics such as reliability and
transmission delay present certain contradictions. Introducing the new metric
Age of Information (AoI) provides a more comprehensive evaluation of
communication system. Additionally, to ensure service quality, user terminals
need to possess high computational capabilities, which may lead to increased
energy consumption, necessitating a trade-off between communication energy
consumption and effectiveness. Given the complexity and dynamics of
communication systems, Deep Reinforcement Learning (DRL) serves as an
intelligent learning method capable of learning optimal strategies in dynamic
environments. Therefore, this paper analyzes the effects of multi-priority
queues and NOMA on AoI in the C-V2X vehicular communication system and proposes
an energy consumption and AoI optimization method based on DRL. Finally,
through comparative simulations with baseline methods, the proposed approach
demonstrates its advances in terms of energy consumption and AoI.
[COMMENTS]
This paper has been submitted to IEEE Journal. The source code has
been released at:
https://github.com/qiongwu86/DRL-Based-Optimization-for-Information-of-Age-and-Energy-Consumption-in-C-V2X-Enabled-IoV
[LINK]
http://arxiv.org/abs/2411.13104v1
[DATE]
2024-11-20 15:59:35+08:00
[CATEGORIES]
cs.LG
Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion Perspective
[AUTHORS]
Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Jianxin Liao
[ABSTRACT]
Time series forecasting has played a pivotal role across various industries,
including finance, transportation, energy, healthcare, and climate. Due to the
abundant seasonal information they contain, timestamps possess the potential to
offer robust global guidance for forecasting techniques. However, existing
works primarily focus on local observations, with timestamps being treated
merely as an optional supplement that remains underutilized. When data gathered
from the real world is polluted, the absence of global information will damage
the robust prediction capability of these algorithms. To address these
problems, we propose a novel framework named GLAFF. Within this framework, the
timestamps are modeled individually to capture the global dependencies. Working
as a plugin, GLAFF adaptively adjusts the combined weights for global and local
information, enabling seamless collaboration with any time series forecasting
backbone. Extensive experiments conducted on nine real-world datasets
demonstrate that GLAFF significantly enhances the average performance of widely
used mainstream forecasting models by 12.5%, surpassing the previous
state-of-the-art method by 5.5%.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2409.18696v3
[DATE]
2024-11-20 15:51:18+08:00
[CATEGORIES]
cs.LG
Incremental Label Distribution Learning with Scalable Graph Convolutional Networks
[AUTHORS]
Ziqi Jia, Xiaoyang Qu, Chenghao Liu, Jianzong Wang
[ABSTRACT]
Label Distribution Learning (LDL) is an effective approach for handling label
ambiguity, as it can analyze all labels at once and indicate the extent to
which each label describes a given sample. Most existing LDL methods consider
the number of labels to be static. However, in various LDL-specific contexts
(e.g., disease diagnosis), the label count grows over time (such as the
discovery of new diseases), a factor that existing methods overlook. Learning
samples with new labels directly means learning all labels at once, thus
wasting more time on the old labels and even risking overfitting the old
labels. At the same time, learning new labels by the LDL model means
reconstructing the inter-label relationships. How to make use of constructed
relationships is also a crucial challenge. To tackle these challenges, we
introduce Incremental Label Distribution Learning (ILDL), analyze its key
issues regarding training samples and inter-label relationships, and propose
Scalable Graph Label Distribution Learning (SGLDL) as a practical framework for
implementing ILDL. Specifically, in SGLDL, we develop a New-label-aware
Gradient Compensation Loss to speed up the learning of new labels and represent
inter-label relationships as a graph to reduce the time required to reconstruct
inter-label relationships. Experimental results on the classical LDL dataset
show the clear advantages of unique algorithms and illustrate the importance of
a dedicated design for the ILDL problem.
[COMMENTS]
Accepted by the 26th IEEE International Conference on High
Performance Computing and Communications (HPCC2024)
[LINK]
http://arxiv.org/abs/2411.13097v1
[DATE]
2024-11-20 15:49:51+08:00
[CATEGORIES]
cs.LG
Is Knowledge Power? On the (Im)possibility of Learning from Strategic Interactions
[AUTHORS]
Nivasini Ananthakrishnan, Nika Haghtalab, Chara Podimata, Kunhe Yang
[ABSTRACT]
When learning in strategic environments, a key question is whether agents can
overcome uncertainty about their preferences to achieve outcomes they could
have achieved absent any uncertainty. Can they do this solely through
interactions with each other? We focus this question on the ability of agents
to attain the value of their Stackelberg optimal strategy and study the impact
of information asymmetry. We study repeated interactions in fully strategic
environments where players’ actions are decided based on learning algorithms
that take into account their observed histories and knowledge of the game. We
study the pure Nash equilibria (PNE) of a meta-game where players choose these
algorithms as their actions. We demonstrate that if one player has perfect
knowledge about the game, then any initial informational gap persists. That is,
while there is always a PNE in which the informed agent achieves her
Stackelberg value, there is a game where no PNE of the meta-game allows the
partially informed player to achieve her Stackelberg value. On the other hand,
if both players start with some uncertainty about the game, the quality of
information alone does not determine which agent can achieve her Stackelberg
value. In this case, the concept of information asymmetry becomes nuanced and
depends on the game’s structure. Overall, our findings suggest that repeated
strategic interactions alone cannot facilitate learning effectively enough to
earn an uninformed player her Stackelberg value.
[LINK]
http://arxiv.org/abs/2408.08272v2
[DATE]
2024-11-20 15:35:07+08:00
[CATEGORIES]
cs.LG
Omnipredicting Single-Index Models with Multi-Index Models
[AUTHORS]
Lunjia Hu, Kevin Tian, Chutong Yang
[ABSTRACT]
Recent work on supervised learning [GKR+22] defined the notion of
omnipredictors, i.e., predictor functions $p$ over features that are
simultaneously competitive for minimizing a family of loss functions
$\mathcal{L}$ against a comparator class $\mathcal{C}$. Omniprediction requires
approximating the Bayes-optimal predictor beyond the loss minimization
paradigm, and has generated significant interest in the learning theory
community. However, even for basic settings such as agnostically learning
single-index models (SIMs), existing omnipredictor constructions require
impractically-large sample complexities and runtimes, and output complex,
highly-improper hypotheses.
Our main contribution is a new, simple construction of omnipredictors for
SIMs. We give a learner outputting an omnipredictor that is
$\varepsilon$-competitive on any matching loss induced by a monotone, Lipschitz
link function, when the comparator class is bounded linear predictors. Our
algorithm requires $\approx \varepsilon^{-4}$ samples and runs in nearly-linear
time, and its sample complexity improves to $\approx \varepsilon^{-2}$ if link
functions are bi-Lipschitz. This significantly improves upon the only prior
known construction, due to [HJKRR18, GHK+23], which used $\gtrsim
\varepsilon^{-10}$ samples.
We achieve our construction via a new, sharp analysis of the classical
Isotron algorithm [KS09, KKKS11] in the challenging agnostic learning setting,
of potential independent interest. Previously, Isotron was known to properly
learn SIMs in the realizable setting, as well as constant-factor competitive
hypotheses under the squared loss [ZWDD24]. As they are based on Isotron, our
omnipredictors are multi-index models with $\approx \varepsilon^{-2}$
prediction heads, bringing us closer to the tantalizing goal of proper
omniprediction for general loss families and comparators.
[LINK]
http://arxiv.org/abs/2411.13083v1
[DATE]
2024-11-20 15:20:49+08:00
[CATEGORIES]
cs.LG
Learning to Optimize for Mixed-Integer Non-linear Programming
[AUTHORS]
Bo Tang, Elias B. Khalil, Ján Drgoňa
[ABSTRACT]
Mixed-integer non-linear programs (MINLPs) arise in various domains, such as
energy systems and transportation, but are notoriously difficult to solve.
Recent advances in machine learning have led to remarkable successes in
optimization tasks, an area broadly known as learning to optimize. This
approach includes using predictive models to generate solutions for
optimization problems with continuous decision variables, thereby avoiding the
need for computationally expensive optimization algorithms. However, applying
learning to MINLPs remains challenging primarily due to the presence of integer
decision variables, which complicate gradient-based learning. To address this
limitation, we propose two differentiable correction layers that generate
integer outputs while preserving gradient information. Combined with a soft
penalty for constraint violation, our framework can tackle both the integrality
and non-linear constraints in a MINLP. Experiments on three problem classes
with convex/non-convex objective/constraints and integer/mixed-integer
variables show that the proposed learning-based approach consistently produces
high-quality solutions for parametric MINLPs extremely quickly. As problem size
increases, traditional exact solvers and heuristic methods struggle to find
feasible solutions, whereas our approach continues to deliver reliable results.
Our work extends the scope of learning-to-optimize to MINLP, paving the way for
integrating integer constraints into deep learning models. Our code is
available at https://github.com/pnnl/L2O-pMINLP.
[LINK]
http://arxiv.org/abs/2410.11061v4
[DATE]
2024-11-20 15:03:40+08:00
[CATEGORIES]
cs.LG
Learning the Market: Sentiment-Based Ensemble Trading Agents
[AUTHORS]
Andrew Ye, James Xu, Vidyut Veedgav, Yi Wang, Yifan Yu, Daniel Yan, Ryan Chen, Vipin Chaudhary, Shuai Xu
[ABSTRACT]
We propose and study the integration of sentiment analysis and deep
reinforcement learning ensemble algorithms for stock trading by evaluating
strategies capable of dynamically altering their active agent given the
concurrent market environment. In particular, we design a simple-yet-effective
method for extracting financial sentiment and combine this with improvements on
existing trading agents, resulting in a strategy that effectively considers
both qualitative market factors and quantitative stock data. We show that our
approach results in a strategy that is profitable, robust, and risk-minimal -
outperforming the traditional ensemble strategy as well as single agent
algorithms and market metrics. Our findings suggest that the conventional
practice of switching and reevaluating agents in ensemble every fixed-number of
months is sub-optimal, and that a dynamic sentiment-based framework greatly
unlocks additional performance. Furthermore, as we have designed our algorithm
with simplicity and efficiency in mind, we hypothesize that the transition of
our method from historical evaluation towards real-time trading with live data
to be relatively simple.
[LINK]
http://arxiv.org/abs/2402.01441v2
[DATE]
2024-11-20 14:59:55+08:00
[CATEGORIES]
cs.LG
Surface Flux Transport Modeling using Physics Informed Neural Networks
[AUTHORS]
Jithu J Athalathil, Bhargav Vaidya, Sayan Kundu, Vishal Upendran, Mark C. M. Cheung
[ABSTRACT]
Studying the magnetic field properties on the solar surface is crucial for
understanding the solar and heliospheric activities, which in turn shape space
weather in the solar system. Surface Flux Transport (SFT) modeling helps us to
simulate and analyse the transport and evolution of magnetic flux on the solar
surface, providing valuable insights into the mechanisms responsible for solar
activity. In this work, we demonstrate the use of machine learning techniques
in solving magnetic flux transport, making it accurate. We have developed a
novel Physics-Informed Neural Networks (PINN)-based model to study the
evolution of Bipolar Magnetic Regions (BMRs) using SFT in one-dimensional
azimuthally averaged and also in two-dimensions. We demonstrate the efficiency
and computational feasibility of our PINN-based model by comparing its
performance and accuracy with that of a numerical model implemented using the
Runge-Kutta Implicit-Explicit (RK-IMEX) scheme. The mesh-independent PINN
method can be used to reproduce the observed polar magnetic field with better
flux conservation. This advancement is important for accurately reproducing
observed polar magnetic fields, thereby providing insights into the strength of
future solar cycles. This work paves the way for more efficient and accurate
simulations of solar magnetic flux transport and showcases the applicability of
PINN in solving advection-diffusion equations with a particular focus on
heliophysics.
[LINK]
http://arxiv.org/abs/2409.01744v2
[DATE]
2024-11-20 14:56:31+08:00
[CATEGORIES]
cs.LG
A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings
[AUTHORS]
Xiachong Lin, Arian Prabowo, Imran Razzak, Hao Xue, Matthew Amos, Sam Behrens, Flora D. Salim
[ABSTRACT]
The increasing demand for sustainable energy solutions has driven the
integration of digitalized buildings into the power grid, leveraging
Internet-of-Things (IoT) technologies to enhance energy efficiency and
operational performance. Despite their potential, effectively utilizing IoT
point data within deep-learning frameworks presents significant challenges,
primarily due to its inherent heterogeneity. This study investigates the
diverse dimensions of IoT data heterogeneity in both intra-building and
inter-building contexts, examining their implications for predictive modeling.
A benchmarking analysis of state-of-the-art time series models highlights their
performance on this complex dataset. The results emphasize the critical need
for multi-modal data integration, domain-informed modeling, and automated data
engineering pipelines. Additionally, the study advocates for collaborative
efforts to establish high-quality public datasets, which are essential for
advancing intelligent and sustainable energy management systems in digitalized
buildings.
[COMMENTS]
4 figures, 1 tables, 9 pages
[LINK]
http://arxiv.org/abs/2405.14267v2
[DATE]
2024-11-20 14:50:50+08:00
[CATEGORIES]
cs.LG
Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles
[AUTHORS]
Shuman Peng, Arash Khoeini, Sharan Vaswani, Martin Ester
[COMMENTS]
Accepted at the Self-Supervised Learning Workshop and the Unifying
Representations in Neural Models Workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.13073v1
[DATE]
2024-11-20 14:50:50+08:00
[CATEGORIES]
cs.LG
Towards Data Valuation via Asymmetric Data Shapley
[AUTHORS]
Xi Zheng, Xiangyu Chang, Ruoxi Jia, Yong Tan
[ABSTRACT]
As data emerges as a vital driver of technological and economic advancements,
a key challenge is accurately quantifying its value in algorithmic
decision-making. The Shapley value, a well-established concept from cooperative
game theory, has been widely adopted to assess the contribution of individual
data sources in supervised machine learning. However, its symmetry axiom
assumes all players in the cooperative game are homogeneous, which overlooks
the complex structures and dependencies present in real-world datasets. To
address this limitation, we extend the traditional data Shapley framework to
asymmetric data Shapley, making it flexible enough to incorporate inherent
structures within the datasets for structure-aware data valuation. We also
introduce an efficient $k$-nearest neighbor-based algorithm for its exact
computation. We demonstrate the practical applicability of our framework across
various machine learning tasks and data market contexts. The code is available
at: https://github.com/xzheng01/Asymmetric-Data-Shapley.
[LINK]
http://arxiv.org/abs/2411.00388v2
[DATE]
2024-11-20 14:27:46+08:00
[CATEGORIES]
cs.LG
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
[AUTHORS]
Jared Fernandez, Luca Wehrstedt, Leonid Shamis, Mostafa Elhoushi, Kalyan Saladi, Yonatan Bisk, Emma Strubell, Jacob Kahn
[ABSTRACT]
Dramatic increases in the capabilities of neural network models in recent
years are driven by scaling model size, training data, and corresponding
computational resources. To develop the exceedingly large networks required in
modern applications, such as large language models (LLMs), model training is
distributed across tens of thousands of hardware accelerators (e.g. GPUs),
requiring orchestration of computation and communication across large computing
clusters. In this work, we demonstrate that careful consideration of hardware
configuration and parallelization strategy is critical for effective (i.e.
compute- and cost-efficient) scaling of model size, training data, and total
computation. We conduct an extensive empirical study of the performance of
large-scale LLM training workloads across model size, hardware configurations,
and distributed parallelization strategies. We demonstrate that: (1) beyond
certain scales, overhead incurred from certain distributed communication
strategies leads parallelization strategies previously thought to be
sub-optimal in fact become preferable; and (2) scaling the total number of
accelerators for large model training quickly yields diminishing returns even
when hardware and parallelization strategies are properly optimized, implying
poor marginal performance per additional unit of power or GPU-hour.
[LINK]
http://arxiv.org/abs/2411.13055v1
[DATE]
2024-11-20 14:05:11+08:00
[CATEGORIES]
cs.LG
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond
[AUTHORS]
Lijun Zhang, Haomin Bai, Peng Zhao, Tianbao Yang, Zhi-Hua Zhou
[ABSTRACT]
This paper investigates group distributionally robust optimization (GDRO)
with the goal of learning a model that performs well over $m$ different
distributions. First, we formulate GDRO as a stochastic convex-concave
saddle-point problem, which is then solved by stochastic mirror descent (SMD)
with $m$ samples in each iteration, and attain a nearly optimal sample
complexity. To reduce the number of samples required in each round from $m$ to
1, we cast GDRO as a two-player game, where one player conducts SMD and the
other executes an online algorithm for non-oblivious multi-armed bandits,
maintaining the same sample complexity. Next, we extend GDRO to address
scenarios involving imbalanced data and heterogeneous distributions. In the
first scenario, we introduce a weighted variant of GDRO, enabling
distribution-dependent convergence rates that rely on the number of samples
from each distribution. We design two strategies to meet the sample budget: one
integrates non-uniform sampling into SMD, and the other employs the stochastic
mirror-prox algorithm with mini-batches, both of which deliver faster rates for
distributions with more samples. In the second scenario, we propose to optimize
the average top-$k$ risk instead of the maximum risk, thereby mitigating the
impact of outlier distributions. Similar to the case of vanilla GDRO, we
develop two stochastic approaches: one uses $m$ samples per iteration via SMD,
and the other consumes $k$ samples per iteration through an online algorithm
for non-oblivious combinatorial semi-bandits.
[LINK]
http://arxiv.org/abs/2302.09267v5
[DATE]
2024-11-20 13:58:10+08:00
[CATEGORIES]
cs.LG
On-device Content-based Recommendation with Single-shot Embedding Pruning: A Cooperative Game Perspective
[AUTHORS]
Hung Vinh Tran, Tong Chen, Guanhua Ye, Quoc Viet Hung Nguyen, Kai Zheng, Hongzhi Yin
[ABSTRACT]
Content-based Recommender Systems (CRSs) play a crucial role in shaping user
experiences in e-commerce, online advertising, and personalized
recommendations. However, due to the vast amount of categorical features, the
embedding tables used in CRS models pose a significant storage bottleneck for
real-world deployment, especially on resource-constrained devices. To address
this problem, various embedding pruning methods have been proposed, but most
existing ones require expensive retraining steps for each target parameter
budget, leading to enormous computation costs. In reality, this computation
cost is a major hurdle in real-world applications with diverse storage
requirements, such as federated learning and streaming settings. In this paper,
we propose Shapley Value-guided Embedding Reduction (Shaver) as our response.
With Shaver, we view the problem from a cooperative game perspective, and
quantify each embedding parameter’s contribution with Shapley values to
facilitate contribution-based parameter pruning. To address the inherently high
computation costs of Shapley values, we propose an efficient and unbiased
method to estimate Shapley values of a CRS’s embedding parameters. Moreover, in
the pruning stage, we put forward a field-aware codebook to mitigate the
information loss in the traditional zero-out treatment. Through extensive
experiments on three real-world datasets, Shaver has demonstrated competitive
performance with lightweight recommendation models across various parameter
budgets. The source code is available at
https://anonymous.4open.science/r/shaver-E808
[LINK]
http://arxiv.org/abs/2411.13052v1
[DATE]
2024-11-20 13:56:31+08:00
[CATEGORIES]
cs.LG
Universal Online Convex Optimization Meets Second-order Bounds
[AUTHORS]
Lijun Zhang, Yibo Wang, Guanghui Wang, Jinfeng Yi, Tianbao Yang
[ABSTRACT]
Recently, several universal methods have been proposed for online convex
optimization, and attain minimax rates for multiple types of convex functions
simultaneously. However, they need to design and optimize one surrogate loss
for each type of functions, making it difficult to exploit the structure of the
problem and utilize existing algorithms. In this paper, we propose a simple
strategy for universal online convex optimization, which avoids these
limitations. The key idea is to construct a set of experts to process the
original online functions, and deploy a meta-algorithm over the linearized
losses to aggregate predictions from experts. Specifically, the meta-algorithm
is required to yield a second-order bound with excess losses, so that it can
leverage strong convexity and exponential concavity to control the meta-regret.
In this way, our strategy inherits the theoretical guarantee of any expert
designed for strongly convex functions and exponentially concave functions, up
to a double logarithmic factor. As a result, we can plug in off-the-shelf
online solvers as black-box experts to deliver problem-dependent regret bounds.
For general convex functions, it maintains the minimax optimality and also
achieves a small-loss bound. Furthermore, we extend our universal strategy to
online composite optimization, where the loss function comprises a time-varying
function and a fixed regularizer. To deal with the composite loss functions, we
employ a meta-algorithm based on the optimistic online learning framework,
which not only possesses a second-order bound, but also can utilize estimations
for upcoming loss functions. With appropriate configurations, we demonstrate
that the additional regularizer does not contribute to the meta-regret, thus
maintaining the universality in the composite setting.
[LINK]
http://arxiv.org/abs/2105.03681v3
[DATE]
2024-11-20 13:53:38+08:00
[CATEGORIES]
cs.LG
Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GAN
[AUTHORS]
Rahul Mishra, Arnav Bhavsar
[ABSTRACT]
In this study, we tackle a modern research challenge within the field of
perceptual brain decoding, which revolves around synthesizing images from EEG
signals using an adversarial deep learning framework. The specific objective is
to recreate images belonging to various object categories by leveraging EEG
recordings obtained while subjects view those images. To achieve this, we
employ a Transformer-encoder based EEG encoder to produce EEG encodings, which
serve as inputs to the generator component of the GAN network. Alongside the
adversarial loss, we also incorporate perceptual loss to enhance the quality of
the generated images.
[LINK]
http://arxiv.org/abs/2402.10115v2
[DATE]
2024-11-20 13:35:03+08:00
[CATEGORIES]
cs.LG
Receiver-Centric Generative Semantic Communications
[AUTHORS]
Xunze Liu, Yifei Sun, Zhaorui Wang, Lizhao You, Haoyuan Pan, Fangxin Wang, Shuguang Cui
[ABSTRACT]
This paper investigates semantic communications between a transmitter and a
receiver, where original data, such as videos of interest to the receiver, is
stored at the transmitter. Although significant process has been made in
semantic communications, a fundamental design problem is that the semantic
information is extracted based on certain criteria at the transmitter alone,
without considering the receiver’s specific information needs. As a result,
critical information of primary concern to the receiver may be lost. In such
cases, the semantic transmission becomes meaningless to the receiver, as all
received information is irrelevant to its interests. To solve this problem,
this paper presents a receiver-centric generative semantic communication
system, where each transmission is initialized by the receiver. Specifically,
the receiver first sends its request for the desired semantic information to
the transmitter at the start of each transmission. Then, the transmitter
extracts the required semantic information accordingly. A key challenge is how
the transmitter understands the receiver’s requests for semantic information
and extracts the required semantic information in a reasonable and robust
manner. We address this challenge by designing a well-structured framework and
leveraging off-the-shelf generative AI products, such as GPT-4, along with
several specialized tools for detection and estimation. Evaluation results
demonstrate the feasibility and effectiveness of the proposed new semantic
communication system.
[COMMENTS]
Demo video has been made available at: https://goo.su/dUnAT
[LINK]
http://arxiv.org/abs/2411.03127v2
[DATE]
2024-11-20 13:13:36+08:00
[CATEGORIES]
cs.LG
SparseDM: Toward Sparse Efficient Diffusion Models
[AUTHORS]
Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, Jun Zhu
[ABSTRACT]
Diffusion models have been extensively used in data generation tasks and are
recognized as one of the best generative models. However, their time-consuming
deployment, long inference time, and requirements on large memory limit their
application on mobile devices. In this paper, we propose a method based on the
improved Straight-Through Estimator to improve the deployment efficiency of
diffusion models. Specifically, we add sparse masks to the Convolution and
Linear layers in a pre-trained diffusion model, then use design progressive
sparsity for model training in the fine-tuning stage, and switch the inference
mask on and off, which supports a flexible choice of sparsity during inference
according to the FID and MACs requirements. Experiments on four datasets
conducted on a state-of-the-art Transformer-based diffusion model demonstrate
that our method reduces MACs by $50\%$ while increasing FID by only 1.5 on
average. Under other MACs conditions, the FID is also lower than 1$\sim$137
compared to other methods.
[LINK]
http://arxiv.org/abs/2404.10445v3
[DATE]
2024-11-20 12:51:59+08:00
[CATEGORIES]
cs.LG
Probably Approximately Precision and Recall Learning
[AUTHORS]
Lee Cohen, Yishay Mansour, Shay Moran, Han Shao
[ABSTRACT]
Precision and Recall are foundational metrics in machine learning where both
accurate predictions and comprehensive coverage are essential, such as in
recommender systems and multi-label learning. In these tasks, balancing
precision (the proportion of relevant items among those predicted) and recall
(the proportion of relevant items successfully predicted) is crucial. A key
challenge is that one-sided feedback–where only positive examples are observed
during training–is inherent in many practical problems. For instance, in
recommender systems like YouTube, training data only consists of videos that a
user has actively selected, while unselected items remain unseen. Despite this
lack of negative feedback in training, avoiding undesirable recommendations at
test time is essential.
We introduce a PAC learning framework where each hypothesis is represented by
a graph, with edges indicating positive interactions, such as between users and
items. This framework subsumes the classical binary and multi-class PAC
learning models as well as multi-label learning with partial feedback, where
only a single random correct label per example is observed, rather than all
correct labels.
Our work uncovers a rich statistical and algorithmic landscape, with nuanced
boundaries on what can and cannot be learned. Notably, classical methods like
Empirical Risk Minimization fail in this setting, even for simple hypothesis
classes with only two hypotheses. To address these challenges, we develop novel
algorithms that learn exclusively from positive data, effectively minimizing
both precision and recall losses. Specifically, in the realizable setting, we
design algorithms that achieve optimal sample complexity guarantees. In the
agnostic case, we show that it is impossible to achieve additive error
guarantees–as is standard in PAC learning–and instead obtain meaningful
multiplicative approximations.
[LINK]
http://arxiv.org/abs/2411.13029v1
[DATE]
2024-11-20 12:21:07+08:00
[CATEGORIES]
cs.LG
A Theory for Compressibility of Graph Transformers for Transductive Learning
[AUTHORS]
Hamed Shirzad, Honghao Lin, Ameya Velingker, Balaji Venkatachalam, David Woodruff, Danica Sutherland
[ABSTRACT]
Transductive tasks on graphs differ fundamentally from typical supervised
machine learning tasks, as the independent and identically distributed (i.i.d.)
assumption does not hold among samples. Instead, all train/test/validation
samples are present during training, making them more akin to a semi-supervised
task. These differences make the analysis of the models substantially different
from other models. Recently, Graph Transformers have significantly improved
results on these datasets by overcoming long-range dependency problems.
However, the quadratic complexity of full Transformers has driven the community
to explore more efficient variants, such as those with sparser attention
patterns. While the attention matrix has been extensively discussed, the hidden
dimension or width of the network has received less attention. In this work, we
establish some theoretical bounds on how and under what conditions the hidden
dimension of these networks can be compressed. Our results apply to both sparse
and dense variants of Graph Transformers.
[LINK]
http://arxiv.org/abs/2411.13028v1
[DATE]
2024-11-20 12:20:17+08:00
[CATEGORIES]
cs.LG
Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support System
[AUTHORS]
Chollette Olisah, Lyndon Smith, Melvyn Smith, Lawrence Morolake, Osi Ojukwu
[ABSTRACT]
Crop yield prediction has been modeled on the assumption that there is no
interaction between weather and soil variables. However, this paper argues that
an interaction exists, and it can be finely modelled using the Kendall
Correlation coefficient. Given the nonlinearity of the interaction between
weather and soil variables, a deep neural network regressor (DNNR) is carefully
designed with consideration to the depth, number of neurons of the hidden
layers, and the hyperparameters with their optimizations. Additionally, a new
metric, the average of absolute root squared error (ARSE) is proposed to
combine the strengths of root mean square error (RMSE) and mean absolute error
(MAE). With the ARSE metric, the proposed DNNR(s), optimised random forest
regressor (RFR) and the extreme gradient boosting regressor (XGBR) achieved
impressively small yield errors, 0.0172 t/ha, and 0.0243 t/ha, 0.0001 t/ha, and
0.001 t/ha, respectively. However, the DNNR(s), with changes to the explanatory
variables to ensure generalizability to unforeseen data, DNNR(s) performed
best. Further analysis reveals that a strong interaction does exist between
weather and soil variables. Precisely, yield is observed to increase when
precipitation is reduced and silt increased, and vice-versa. However, the
degree of decrease or increase is not quantified in this paper. Contrary to
existing yield models targeted towards agricultural policies and global food
security, the goal of the proposed corn yield model is to empower the
smallholder farmer to farm smartly and intelligently, thus the prediction model
is integrated into a mobile application that includes education, and a
farmer-to-market access module.
[COMMENTS]
30 Pages, 11 Figures, 3 Tables
[LINK]
http://arxiv.org/abs/2401.03768v3
[DATE]
2024-11-20 11:55:04+08:00
[CATEGORIES]
cs.LG
Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI
[AUTHORS]
Yaşar Utku Alçalar, Merve Gülle, Mehmet Akçakaya
[ABSTRACT]
Physics-driven deep learning (PD-DL) approaches have become popular for
improved reconstruction of fast magnetic resonance imaging (MRI) scans. Even
though PD-DL offers higher acceleration rates compared to existing clinical
fast MRI techniques, their use has been limited outside specialized MRI
centers. One impediment for their deployment is the difficulties with
generalization to pathologies or population groups that are not
well-represented in training sets. This has been noted in several studies, and
fine-tuning on target populations to improve reconstruction has been suggested.
However, current approaches for PD-DL training require access to raw k-space
measurements, which is typically only available at specialized MRI centers that
have research agreements for such data access. This is especially an issue for
rural and underserved areas, where commercial MRI scanners only provide access
to a final reconstructed image. To tackle these challenges, we propose
Compressibility-inspired Unsupervised Learning via Parallel Imaging Fidelity
(CUPID) for high-quality PD-DL training, using only routine clinical
reconstructed images exported from an MRI scanner. CUPID evaluates the goodness
of the output with a compressibility-based approach, while ensuring that the
output stays consistent with the clinical parallel imaging reconstruction
through well-designed perturbations. Our results show that CUPID achieves
similar quality compared to well-established PD-DL training strategies that
require raw k-space data access, while outperforming conventional compressed
sensing (CS) and state-of-the-art generative methods. We also demonstrate its
effectiveness in a zero-shot training setup for retrospectively and
prospectively sub-sampled acquisitions, attesting to its minimal training
burden.
[LINK]
http://arxiv.org/abs/2411.13022v1
[DATE]
2024-11-20 11:53:41+08:00
[CATEGORIES]
cs.LG
Deriving Activation Functions via Integration
[AUTHORS]
Allen Hao Huang
[ABSTRACT]
Activation functions play a crucial role in introducing non-linearities to
deep neural networks. We propose a novel approach to designing activation
functions by focusing on their gradients and deriving the corresponding
functions through integration. Our work introduces the Expanded Integral of the
Exponential Linear Unit (xIELU), a trainable piecewise activation function
derived by integrating trainable affine transformations applied on the ELU
activation function. xIELU combines two key gradient properties: a trainable
and linearly increasing gradient for positive inputs, similar to ReLU$^2$, and
a trainable negative gradient flow for negative inputs, akin to xSiLU.
Conceptually, xIELU can be viewed as extending ReLU$^2$ to effectively handle
negative inputs. In experiments with 1.1B parameter Llama models trained on
126B tokens of FineWeb Edu, xIELU achieves lower perplexity compared to both
ReLU$^2$ and SwiGLU when matched for the same compute cost and parameter count.
[LINK]
http://arxiv.org/abs/2411.13010v1
[DATE]
2024-11-20 11:24:21+08:00
[CATEGORIES]
cs.LG
MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification
[AUTHORS]
Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang
[ABSTRACT]
We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of
distilled large language model optimized for encrypted traffic classification.
By applying model distillation techniques in a teacher-student paradigm,
compact models derived from GPT-2-base retain high classification accuracy
while minimizing computational costs. These models function as specialized
experts in an MoE architecture, dynamically assigned via a gating network.
Unlike generation-based methods, our approach directly classifies encrypted
traffic using the final decoder token with contextual feature embedding as
input. Experiments on 10 datasets show superior or competitive performance over
the state-of-the-art models while significantly reducing resource demands,
underscoring its effectiveness and robustness.
[LINK]
http://arxiv.org/abs/2411.13004v1
[DATE]
2024-11-20 11:01:41+08:00
[CATEGORIES]
cs.LG
Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition
[AUTHORS]
Liangqi Yuan, Yuan Wei, Jia Li
[ABSTRACT]
With the emphasis on healthcare, early childhood education, and fitness,
non-invasive measurement and recognition methods have received more attention.
Pressure sensing has been extensively studied because of its advantages of
simple structure, easy access, visualization application, and harmlessness.
This paper introduces a Smart Pressure e-Mat (SPeM) system based on
piezoresistive material, Velostat, for human monitoring applications, including
recognition of sleeping postures, sports, and yoga. After a subsystem scans the
e-mat readings and processes the signal, it generates a pressure image stream.
Deep neural networks (DNNs) are used to fit and train the pressure image stream
and recognize the corresponding human behavior. Four sleeping postures and 13
dynamic activities inspired by Nintendo Switch Ring Fit Adventure (RFA) are
used as a preliminary validation of the proposed SPeM system. The SPeM system
achieves high accuracies in both applications, demonstrating the high accuracy
and generalizability of the models. Compared with other pressure sensor-based
systems, SPeM possesses more flexible applications and commercial application
prospects, with reliable, robust, and repeatable properties.
[LINK]
http://arxiv.org/abs/2305.11367v2
[DATE]
2024-11-20 10:47:25+08:00
[CATEGORIES]
cs.LG
Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation
[AUTHORS]
Zehao Li, Yijie Peng
[ABSTRACT]
This article addresses the challenge of parameter calibration in stochastic
models where the likelihood function is not analytically available. We propose
a gradient-based simulated parameter estimation framework, leveraging a
multi-time scale algorithm that tackles the issue of ratio bias in both maximum
likelihood estimation and posterior density estimation problems. Additionally,
we introduce a nested simulation optimization structure, providing theoretical
analyses including strong convergence, asymptotic normality, convergence rate,
and budget allocation strategies for the proposed algorithm. The framework is
further extended to neural network training, offering a novel perspective on
stochastic approximation in machine learning. Numerical experiments show that
our algorithm can improve the estimation accuracy and save computational costs.
[LINK]
http://arxiv.org/abs/2411.12995v1
[DATE]
2024-11-20 10:46:15+08:00
[CATEGORIES]
cs.LG
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
[AUTHORS]
Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel J. Kochenderfer
[COMMENTS]
Accepted as a Spotlight Poster to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.12990v1
[DATE]
2024-11-20 10:38:24+08:00
[CATEGORIES]
cs.LG
QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models
[AUTHORS]
Zhumazhan Balapanov, Vanessa Matvei, Olivia Holmberg, Edward Magongo, Jonathan Pei, Kevin Zhu
[COMMENTS]
Accepted to NeurIPS 2024 workshop on Neural Compression
[LINK]
http://arxiv.org/abs/2410.10318v2
[DATE]
2024-11-20 10:37:27+08:00
[CATEGORIES]
cs.LG
On Diffusion Models for Multi-Agent Partial Observability: Shared Attractors, Error Bounds, and Composite Flow
[AUTHORS]
Tonghan Wang, Heng Dong, Yanchen Jiang, David C. Parkes, Milind Tambe
[ABSTRACT]
Multiagent systems grapple with partial observability (PO), and the
decentralized POMDP (Dec-POMDP) model highlights the fundamental nature of this
challenge. Whereas recent approaches to addressing PO have appealed to deep
learning models, providing a rigorous understanding of how these models and
their approximation errors affect agents’ handling of PO and their interactions
remain a challenge. In addressing this challenge, we investigate reconstructing
global states from local action-observation histories in Dec-POMDPs using
diffusion models. We first find that diffusion models conditioned on local
history represent possible states as stable fixed points. In collectively
observable (CO) Dec-POMDPs, individual diffusion models conditioned on agents’
local histories share a unique fixed point corresponding to the global state,
while in non-CO settings, the shared fixed points yield a distribution of
possible states given joint history. We further find that, with deep learning
approximation errors, fixed points can deviate from true states and the
deviation is negatively correlated to the Jacobian rank. Inspired by this
low-rank property, we bound the deviation by constructing a surrogate linear
regression model that approximates the local behavior of diffusion models. With
this bound, we propose a composite diffusion process iterating over agents with
theoretical convergence guarantees to the true state.
[LINK]
http://arxiv.org/abs/2410.13953v2
[DATE]
2024-11-20 10:05:31+08:00
[CATEGORIES]
cs.LG
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes
[AUTHORS]
Sihan Zeng, Thinh T. Doan, Justin Romberg
[ABSTRACT]
We consider a discounted cost constrained Markov decision process (CMDP)
policy optimization problem, in which an agent seeks to maximize a discounted
cumulative reward subject to a number of constraints on discounted cumulative
utilities. To solve this constrained optimization program, we study an online
actor-critic variant of a classic primal-dual method where the gradients of
both the primal and dual functions are estimated using samples from a single
trajectory generated by the underlying time-varying Markov processes. This
online primal-dual natural actor-critic algorithm maintains and iteratively
updates three variables: a dual variable (or Lagrangian multiplier), a primal
variable (or actor), and a critic variable used to estimate the gradients of
both primal and dual variables. These variables are updated simultaneously but
on different time scales (using different step sizes) and they are all
intertwined with each other. Our main contribution is to derive a finite-time
analysis for the convergence of this algorithm to the global optimum of a CMDP
problem. Specifically, we show that with a proper choice of step sizes the
optimality gap and constraint violation converge to zero in expectation at a
rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our
knowledge, this paper is the first to study the finite-time complexity of an
online primal-dual actor-critic method for solving a CMDP problem. We also
validate the effectiveness of this algorithm through numerical simulations.
[LINK]
http://arxiv.org/abs/2110.11383v3
[DATE]
2024-11-20 09:59:39+08:00
[CATEGORIES]
cs.LG
Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations
[AUTHORS]
Runlong Yu, Chonghao Qiu, Robert Ladwig, Paul C. Hanson, Yiqun Xie, Yanhua Li, Xiaowei Jia
[ABSTRACT]
This paper introduces a \textit{Process-Guided Learning (Pril)} framework
that integrates physical models with recurrent neural networks (RNNs) to
enhance the prediction of dissolved oxygen (DO) concentrations in lakes, which
is crucial for sustaining water quality and ecosystem health. Unlike
traditional RNNs, which may deliver high accuracy but often lack physical
consistency and broad applicability, the \textit{Pril} method incorporates
differential DO equations for each lake layer, modeling it as a first-order
linear solution using a forward Euler scheme with a daily timestep. However,
this method is sensitive to numerical instabilities. When drastic fluctuations
occur, the numerical integration is neither mass-conservative nor stable.
Especially during stratified conditions, exogenous fluxes into each layer cause
significant within-day changes in DO concentrations. To address this challenge,
we further propose an \textit{Adaptive Process-Guided Learning (April)} model,
which dynamically adjusts timesteps from daily to sub-daily intervals with the
aim of mitigating the discrepancies caused by variations in entrainment fluxes.
\textit{April} uses a generator-discriminator architecture to identify days
with significant DO fluctuations and employs a multi-step Euler scheme with
sub-daily timesteps to effectively manage these variations. We have tested our
methods on a wide range of lakes in the Midwestern USA, and demonstrated robust
capability in predicting DO concentrations even with limited training data.
While primarily focused on aquatic ecosystems, this approach is broadly
applicable to diverse scientific and engineering disciplines that utilize
process-based models, such as power engineering, climate science, and
biomedicine.
[LINK]
http://arxiv.org/abs/2411.12973v1
[DATE]
2024-11-20 09:58:20+08:00
[CATEGORIES]
cs.LG
A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction
[AUTHORS]
Yuan Yuan, Jingtao Ding, Chonghua Han, Depeng Jin, Yong Li
[ABSTRACT]
Urban spatio-temporal flow prediction, encompassing traffic flows and crowd
flows, is crucial for optimizing city infrastructure and managing traffic and
emergency responses. Traditional approaches have relied on separate models
tailored to either grid-based data, representing cities as uniform cells, or
graph-based data, modeling cities as networks of nodes and edges. In this
paper, we build UniFlow, a foundational model for general urban flow prediction
that unifies both grid-based and graphbased data. We first design a multi-view
spatio-temporal patching mechanism to standardize different data into a
consistent sequential format and then introduce a spatio-temporal transformer
architecture to capture complex correlations and dynamics. To leverage shared
spatio-temporal patterns across different data types and facilitate effective
cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation
(ST-MRA). By creating structured memory modules to store shared spatio-temporal
patterns, ST-MRA enhances predictions through adaptive memory retrieval.
Extensive experiments demonstrate that UniFlow outperforms existing models in
both grid-based and graph-based flow prediction, excelling particularly in
scenarios with limited data availability, showcasing its superior performance
and broad applicability. The datasets and code implementation have been
released on https://github.com/YuanYuan98/UniFlow.
[LINK]
http://arxiv.org/abs/2411.12972v1
[DATE]
2024-11-20 09:54:52+08:00
[CATEGORIES]
cs.LG
On adaptivity and minimax optimality of two-sided nearest neighbors
[AUTHORS]
Tathagata Sadhukhan, Manit Paul, Raaz Dwivedi
[ABSTRACT]
Nearest neighbor (NN) algorithms have been extensively used for missing data
problems in recommender systems and sequential decision-making systems. Prior
theoretical analysis has established favorable guarantees for NN when the
underlying data is sufficiently smooth and the missingness probabilities are
lower bounded. Here we analyze NN with non-smooth non-linear functions with
vast amounts of missingness. In particular, we consider matrix completion
settings where the entries of the underlying matrix follow a latent non-linear
factor model, with the non-linearity belonging to a \Holder function class that
is less smooth than Lipschitz. Our results establish following favorable
properties for a suitable two-sided NN: (1) The mean squared error (MSE) of NN
adapts to the smoothness of the non-linearity, (2) under certain regularity
conditions, the NN error rate matches the rate obtained by an oracle equipped
with the knowledge of both the row and column latent factors, and finally (3)
NN’s MSE is non-trivial for a wide range of settings even when several matrix
entries might be missing deterministically. We support our theoretical findings
via extensive numerical simulations and a case study with data from a mobile
health study, HeartSteps.
[COMMENTS]
29 pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.12965v1
[DATE]
2024-11-20 09:40:53+08:00
[CATEGORIES]
cs.LG
Quantum neural networks form Gaussian processes
[AUTHORS]
Diego García-Martín, Martin Larocca, M. Cerezo
[ABSTRACT]
It is well known that artificial neural networks initialized from independent
and identically distributed priors converge to Gaussian processes in the limit
of a large number of neurons per hidden layer. In this work we prove an
analogous result for Quantum Neural Networks (QNNs). Namely, we show that the
outputs of certain models based on Haar random unitary or orthogonal deep QNNs
converge to Gaussian processes in the limit of large Hilbert space dimension
$d$. The derivation of this result is more nuanced than in the classical case
due to the role played by the input states, the measurement observable, and the
fact that the entries of unitary matrices are not independent. Then, we show
that the efficiency of predicting measurements at the output of a QNN using
Gaussian process regression depends on the observable’s bodyness. Furthermore,
our theorems imply that the concentration of measure phenomenon in Haar random
QNNs is worse than previously thought, as we prove that expectation values and
gradients concentrate as $\mathcal{O}\left(\frac{1}{e^d \sqrt{d}}\right)$.
Finally, we discuss how our results improve our understanding of concentration
in $t$-designs.
[COMMENTS]
14+37 pages, 4+6 figures
[LINK]
http://arxiv.org/abs/2305.09957v3
[DATE]
2024-11-20 09:12:04+08:00
[CATEGORIES]
cs.LG
FengWu-W2S: A deep learning model for seamless weather-to-subseasonal forecast of global atmosphere
[AUTHORS]
Fenghua Ling, Kang Chen, Jiye Wu, Tao Han, Jing-Jia Luo, Wanli Ouyang, Lei Bai
[ABSTRACT]
Seamless forecasting that produces warning information at continuum
timescales based on only one system is a long-standing pursuit for
weather-climate service. While the rapid advancement of deep learning has
induced revolutionary changes in classical forecasting field, current efforts
are still focused on building separate AI models for weather and climate
forecasts. To explore the seamless forecasting ability based on one AI model,
we propose FengWu-Weather to Subseasonal (FengWu-W2S), which builds on the
FengWu global weather forecast model and incorporates an ocean-atmosphere-land
coupling structure along with a diverse perturbation strategy. FengWu-W2S can
generate 6-hourly atmosphere forecasts extending up to 42 days through an
autoregressive and seamless manner. Our hindcast results demonstrate that
FengWu-W2S reliably predicts atmospheric conditions out to 3-6 weeks ahead,
enhancing predictive capabilities for global surface air temperature,
precipitation, geopotential height and intraseasonal signals such as the
Madden-Julian Oscillation (MJO) and North Atlantic Oscillation (NAO). Moreover,
our ablation experiments on forecast error growth from daily to seasonal
timescales reveal potential pathways for developing AI-based integrated system
for seamless weather-climate forecasting in the future.
[COMMENTS]
23 pages,8 figures
[LINK]
http://arxiv.org/abs/2411.10191v2
[DATE]
2024-11-20 09:10:15+08:00
[CATEGORIES]
cs.LG
Machine learned reconstruction of tsunami dynamics from sparse observations
[AUTHORS]
Edward McDugald, Arvind Mohan, Darren Engwirda, Agnese Marcato, Javier Santos
[ABSTRACT]
We investigate the use of the Senseiver, a transformer neural network
designed for sparse sensing applications, to estimate full-field surface height
measurements of tsunami waves from sparse observations. The model is trained on
a large ensemble of simulated data generated via a shallow water equations
solver, which we show to be a faithful reproduction for the underlying dynamics
by comparison to historical events. We train the model on a dataset consisting
of 8 tsunami simulations whose epicenters correspond to historical USGS
earthquake records, and where the model inputs are restricted to measurements
obtained at actively deployed buoy locations. We test the Senseiver on a
dataset consisting of 8 simulations not included in training, demonstrating its
capability for extrapolation. The results show remarkable resolution of fine
scale phase and amplitude features from the true field, provided that at least
a few of the sensors have obtained a non-zero signal. Throughout, we discuss
which forecasting techniques can be improved by this method, and suggest ways
in which the flexibility of the architecture can be leveraged to incorporate
arbitrary remote sensing data (eg. HF Radar and satellite measurements) as well
as investigate optimal sensor placements.
[LINK]
http://arxiv.org/abs/2411.12948v1
[DATE]
2024-11-20 08:42:40+08:00
[CATEGORIES]
cs.LG
Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity
[AUTHORS]
Wassim El Ahmar, Dhanvin Kolhatkar, Farzan Nowruzi, Robert Laganiere
[ABSTRACT]
Multiple Object Tracking (MOT) in thermal imaging presents unique challenges
due to the lack of visual features and the complexity of motion patterns. This
paper introduces an innovative approach to improve MOT in the thermal domain by
developing a novel box association method that utilizes both thermal object
identity and motion similarity. Our method merges thermal feature sparsity and
dynamic object tracking, enabling more accurate and robust MOT performance.
Additionally, we present a new dataset comprised of a large-scale collection of
thermal and RGB images captured in diverse urban environments, serving as both
a benchmark for our method and a new resource for thermal imaging. We conduct
extensive experiments to demonstrate the superiority of our approach over
existing methods, showing significant improvements in tracking accuracy and
robustness under various conditions. Our findings suggest that incorporating
thermal identity with motion data enhances MOT performance. The newly collected
dataset and source code is available at https://github.com/wassimea/thermalMOT
[COMMENTS]
Workshop on Towards a Complete Analysis of People, part of the
European Conference on Computer Vision (ECCV) 2024
[LINK]
http://arxiv.org/abs/2411.12943v1
[DATE]
2024-11-20 08:27:01+08:00
[CATEGORIES]
cs.LG
On the relationship between Koopman operator approximations and neural ordinary differential equations for data-driven time-evolution predictions
[AUTHORS]
Jake Buzhardt, C. Ricardo Constante-Amores, Michael D. Graham
[ABSTRACT]
This work explores the relationship between state space methods and Koopman
operator-based methods for predicting the time-evolution of nonlinear dynamical
systems. We demonstrate that extended dynamic mode decomposition with
dictionary learning (EDMD-DL), when combined with a state space projection, is
equivalent to a neural network representation of the nonlinear discrete-time
flow map on the state space. We highlight how this projection step introduces
nonlinearity into the evolution equations, enabling significantly improved
EDMD-DL predictions. With this projection, EDMD-DL leads to a nonlinear
dynamical system on the state space, which can be represented in either
discrete or continuous time. This system has a natural structure for neural
networks, where the state is first expanded into a high dimensional feature
space followed by a linear mapping which represents the discrete-time map or
the vector field as a linear combination of these features. Inspired by these
observations, we implement several variations of neural ordinary differential
equations (ODEs) and EDMD-DL, developed by combining different aspects of their
respective model structures and training procedures. We evaluate these methods
using numerical experiments on chaotic dynamics in the Lorenz system and a
nine-mode model of turbulent shear flow, showing comparable performance across
methods in terms of short-time trajectory prediction, reconstruction of
long-time statistics, and prediction of rare events. We also show that these
methods provide comparable performance to a non-Markovian approach in terms of
prediction of extreme events.
[LINK]
http://arxiv.org/abs/2411.12940v1
[DATE]
2024-11-20 08:18:46+08:00
[CATEGORIES]
cs.LG
A community palm model
[AUTHORS]
Nicholas Clinton, Andreas Vollrath, Remi D’annunzio, Desheng Liu, Henry B. Glick, Adrià Descals, Alicia Sullivan, Oliver Guinan, Jacob Abramowitz, Fred Stolle, Chris Goodman, Tanya Birch, David Quinn, Olga Danylo, Tijs Lips, Daniel Coelho, Enikoe Bihari, Bryce Cronkite-Ratcliff, Ate Poortinga, Atena Haghighattalab, Evan Notman, Michael DeWitt, Aaron Yonas, Gennadii Donchyts, Devaja Shah, David Saah, Karis Tenneson, Nguyen Hanh Quyen, Megha Verma, Andrew Wilcox
[ABSTRACT]
Palm oil production has been identified as one of the major drivers of
deforestation for tropical countries. To meet supply chain objectives,
commodity producers and other stakeholders need timely information of land
cover dynamics in their supply shed. However, such data are difficult to obtain
from suppliers who may lack digital geographic representations of their supply
sheds and production locations. Here we present a “community model,” a machine
learning model trained on pooled data sourced from many different stakeholders,
to produce a map of palm probability at global scale. An advantage of this
method is the inclusion of varied inputs, the ability to easily update the
model as new training data becomes available and run the model on any year that
input imagery is available. Inclusion of diverse data sources into one
probability map can help establish a shared understanding across stakeholders
on the presence and absence of a land cover or commodity (in this case oil
palm). The model predictors are annual composites built from publicly available
satellite imagery provided by Sentinel-1, Sentinel-2, and ALOS-2, and terrain
data from Jaxa (AW3D30) and Copernicus (GLO-30). We provide map outputs as the
probability of palm in a given pixel, to reflect the uncertainty of the
underlying state (palm or not palm). This version of this model provides global
accuracy estimated to be 92% (at 0.5 probability threshold) on an independent
test set. This model, and resulting oil palm probability map products are
useful for accurately identifying the geographic footprint of palm cultivation.
Used in conjunction with timely deforestation information, this palm model is
useful for understanding the risk of continued oil palm plantation expansion in
sensitive forest areas.
[COMMENTS]
v03
[LINK]
http://arxiv.org/abs/2405.09530v2
[DATE]
2024-11-20 08:10:28+08:00
[CATEGORIES]
cs.LG
Improving Low-Fidelity Models of Li-ion Batteries via Hybrid Sparse Identification of Nonlinear Dynamics
[AUTHORS]
Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Prashanth Ramesh, Marcello Canova
[ABSTRACT]
Accurate modeling of lithium ion (li-ion) batteries is essential for
enhancing the safety, and efficiency of electric vehicles and renewable energy
systems. This paper presents a data-inspired approach for improving the
fidelity of reduced-order li-ion battery models. The proposed method combines a
Genetic Algorithm with Sequentially Thresholded Ridge Regression (GA-STRidge)
to identify and compensate for discrepancies between a low-fidelity model (LFM)
and data generated either from testing or a high-fidelity model (HFM). The
hybrid model, combining physics-based and data-driven methods, is tested across
different driving cycles to demonstrate the ability to significantly reduce the
voltage prediction error compared to the baseline LFM, while preserving
computational efficiency. The model robustness is also evaluated under various
operating conditions, showing low prediction errors and high Pearson
correlation coefficients for terminal voltage in unseen environments.
[COMMENTS]
6 pages
[LINK]
http://arxiv.org/abs/2411.12935v1
[DATE]
2024-11-20 08:00:11+08:00
[CATEGORIES]
cs.LG
Automata Learning from Preference and Equivalence Queries
[AUTHORS]
Eric Hsiung, Joydeep Biswas, Swarat Chaudhuri
[ABSTRACT]
Active automata learning from membership and equivalence queries is a
foundational problem with numerous applications. We propose a novel variant of
the active automata learning problem: actively learn finite automata using
preference queries – i.e., queries about the relative position of two
sequences in a total order – instead of membership queries. Our solution is
REMAP, a novel algorithm which leverages a symbolic observation table along
with unification and constraint solving to navigate a space of symbolic
hypotheses (each representing a set of automata), and uses
satisfiability-solving to construct a concrete automaton from a symbolic
hypothesis. REMAP is guaranteed to correctly infer the minimal automaton with
polynomial query complexity under exact equivalence queries, and achieves
PAC-identification ($\varepsilon$-approximate, with high probability) of the
minimal automaton using sampling-based equivalence queries. Our empirical
evaluations of REMAP on the task of learning reward machines for two
reinforcement learning domains indicate REMAP scales to large automata and is
effective at learning correct automata from consistent teachers, under both
exact and sampling-based equivalence queries.
[COMMENTS]
29 pages, 12 figures
[LINK]
http://arxiv.org/abs/2308.09301v2
[DATE]
2024-11-20 07:57:24+08:00
[CATEGORIES]
cs.LG
LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits
[AUTHORS]
Dimple Vijay Kochar, Hanrui Wang, Anantha Chandrakasan, Xin Zhang
[ABSTRACT]
Traditional approaches for designing analog circuits are time-consuming and
require significant human expertise. Existing automation efforts using methods
like Bayesian Optimization (BO) and Reinforcement Learning (RL) are sub-optimal
and costly to generalize across different topologies and technology nodes. In
our work, we introduce a novel approach, LEDRO, utilizing Large Language Models
(LLMs) in conjunction with optimization techniques to iteratively refine the
design space for analog circuit sizing. LEDRO is highly generalizable compared
to other RL and BO baselines, eliminating the need for design annotation or
model training for different topologies or technology nodes. We conduct a
comprehensive evaluation of our proposed framework and baseline on 22 different
Op-Amp topologies across four FinFET technology nodes. Results demonstrate the
superior performance of LEDRO as it outperforms our best baseline by an average
of 13% FoM improvement with 2.15x speed-up on low complexity Op-Amps and 48%
FoM improvement with 1.7x speed-up on high complexity Op-Amps. This highlights
LEDRO’s effective performance, efficiency, and generalizability.
[LINK]
http://arxiv.org/abs/2411.12930v1
[DATE]
2024-11-20 07:43:25+08:00
[CATEGORIES]
cs.LG
Enhancing Deep Learning-Driven Multi-Coil MRI Reconstruction via Self-Supervised Denoising
[AUTHORS]
Asad Aali, Marius Arvinte, Sidharth Kumar, Yamin I. Arefeen, Jonathan I. Tamir
[ABSTRACT]
We examine the effect of incorporating self-supervised denoising as a
pre-processing step for training deep learning (DL) based reconstruction
methods on data corrupted by Gaussian noise. K-space data employed for training
are typically multi-coil and inherently noisy. Although DL-based reconstruction
methods trained on fully sampled data can enable high reconstruction quality,
obtaining large, noise-free datasets is impractical. We leverage Generalized
Stein’s Unbiased Risk Estimate (GSURE) for denoising. We evaluate two DL-based
reconstruction methods: Diffusion Probabilistic Models (DPMs) and Model-Based
Deep Learning (MoDL). We evaluate the impact of denoising on the performance of
these DL-based methods in solving accelerated multi-coil magnetic resonance
imaging (MRI) reconstruction. The experiments were carried out on T2-weighted
brain and fat-suppressed proton-density knee scans. We observed that
self-supervised denoising enhances the quality and efficiency of MRI
reconstructions across various scenarios. Specifically, employing denoised
images rather than noisy counterparts when training DL networks results in
lower normalized root mean squared error (NRMSE), higher structural similarity
index measure (SSIM) and peak signal-to-noise ratio (PSNR) across different SNR
levels, including 32dB, 22dB, and 12dB for T2-weighted brain data, and 24dB,
14dB, and 4dB for fat-suppressed knee data. Overall, we showed that denoising
is an essential pre-processing technique capable of improving the efficacy of
DL-based MRI reconstruction methods under diverse conditions. By refining the
quality of input data, denoising can enable the training of more effective DL
networks, potentially bypassing the need for noise-free reference MRI scans.
[LINK]
http://arxiv.org/abs/2411.12919v1
[DATE]
2024-11-20 07:17:09+08:00
[CATEGORIES]
cs.LG
Trojan Cleansing with Neural Collapse
[AUTHORS]
Xihe Gu, Greg Fields, Yaman Jandali, Tara Javidi, Farinaz Koushanfar
[ABSTRACT]
Trojan attacks are sophisticated training-time attacks on neural networks
that embed backdoor triggers which force the network to produce a specific
output on any input which includes the trigger. With the increasing relevance
of deep networks which are too large to train with personal resources and which
are trained on data too large to thoroughly audit, these training-time attacks
pose a significant risk. In this work, we connect trojan attacks to Neural
Collapse, a phenomenon wherein the final feature representations of
over-parameterized neural networks converge to a simple geometric structure. We
provide experimental evidence that trojan attacks disrupt this convergence for
a variety of datasets and architectures. We then use this disruption to design
a lightweight, broadly generalizable mechanism for cleansing trojan attacks
from a wide variety of different network architectures and experimentally
demonstrate its efficacy.
[LINK]
http://arxiv.org/abs/2411.12914v1
[DATE]
2024-11-20 06:57:40+08:00
[CATEGORIES]
cs.LG
MLDGG: Meta-Learning for Domain Generalization on Graphs
[AUTHORS]
Qin Tian, Chen Zhao, Minglai Shao, Wenjun Wang, Yujie Lin, Dong Li
[ABSTRACT]
Domain generalization on graphs aims to develop models with robust
generalization capabilities, ensuring effective performance on the testing set
despite disparities between testing and training distributions. However,
existing methods often rely on static encoders directly applied to the target
domain, constraining its flexible adaptability. In contrast to conventional
methodologies, which concentrate on developing specific generalized models, our
framework, MLDGG, endeavors to achieve adaptable generalization across diverse
domains by integrating cross-multi-domain meta-learning with structure learning
and semantic identification. Initially, it introduces a generalized structure
learner to mitigate the adverse effects of task-unrelated edges, enhancing the
comprehensiveness of representations learned by Graph Neural Networks (GNNs)
while capturing shared structural information across domains. Subsequently, a
representation learner is designed to disentangle domain-invariant semantic and
domain-specific variation information in node embedding by leveraging causal
reasoning for semantic identification, further enhancing generalization. In the
context of meta-learning, meta-parameters for both learners are optimized to
facilitate knowledge transfer and enable effective adaptation to graphs through
fine-tuning within the target domains, where target graphs are inaccessible
during training. Our empirical results demonstrate that MLDGG surpasses
baseline methods, showcasing its effectiveness in three different distribution
shift settings.
[COMMENTS]
Accepted in KDD 2025 (research track)
[LINK]
http://arxiv.org/abs/2411.12913v1
[DATE]
2024-11-20 06:57:38+08:00
[CATEGORIES]
cs.LG
Problem-dependent convergence bounds for randomized linear gradient compression
[AUTHORS]
Thomas Flynn, Patrick Johnstone, Shinjae Yoo
[ABSTRACT]
In distributed optimization, the communication of model updates can be a
performance bottleneck. Consequently, gradient compression has been proposed as
a means of increasing optimization throughput. In general, due to information
loss, compression introduces a penalty on the number of iterations needed to
reach a solution. In this work, we investigate how the iteration penalty
depends on the interaction between compression and problem structure, in the
context of non-convex stochastic optimization. We focus on linear compression
schemes, where compression and decompression can be modeled as multiplication
with a random matrix. We consider several distributions of matrices, among them
random orthogonal matrices and matrices with random Gaussian entries. We find
that in each case, the impact of compression on convergence can be quantified
in terms of the norm of the Hessian of the objective, using a norm defined by
the compression scheme. The analysis reveals that in certain cases, compression
performance is related to low-rank structure or other spectral properties of
the problem. In these cases, our bounds predict that the penalty introduced by
compression is significantly reduced compared to worst-case bounds that only
consider the compression level, ignoring problem data. We verify the
theoretical findings on several optimization problems, including fine-tuning an
image classification model.
[COMMENTS]
15 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.12898v1
[DATE]
2024-11-20 06:26:42+08:00
[CATEGORIES]
cs.LG
Tree Species Classification using Machine Learning and 3D Tomographic SAR – a case study in Northern Europe
[AUTHORS]
Colverd Grace, Schade Laura, Takami Jumpei, Bot Karol, Gallego Joseph
[ABSTRACT]
Tree species classification plays an important role in nature conservation,
forest inventories, forest management, and the protection of endangered
species. Over the past four decades, remote sensing technologies have been
extensively utilized for tree species classification, with Synthetic Aperture
Radar (SAR) emerging as a key technique. In this study, we employed TomoSense,
a 3D tomographic dataset, which utilizes a stack of single-look complex (SLC)
images, a byproduct of SAR, captured at different incidence angles to generate
a three-dimensional representation of the terrain. Our research focuses on
evaluating multiple tabular machine-learning models using the height
information derived from the tomographic image intensities to classify eight
distinct tree species. The SLC data and tomographic imagery were analyzed
across different polarimetric configurations and geosplit configurations. We
investigated the impact of these variations on classification accuracy,
comparing the performance of various tabular machine-learning models and
optimizing them using Bayesian optimization. Additionally, we incorporated a
proxy for actual tree height using point cloud data from Light Detection and
Ranging (LiDAR) to provide height statistics associated with the model’s
predictions. This comparison offers insights into the reliability of
tomographic data in predicting tree species classification based on height.
[LINK]
http://arxiv.org/abs/2411.12897v1
[DATE]
2024-11-20 06:25:26+08:00
[CATEGORIES]
cs.LG
Transfer Learning on Transformers for Building Energy Consumption Forecasting – A Comparative Study
[AUTHORS]
Robert Spencer, Surangika Ranathunga, Mikael Boulic, Andries, van Heerden, Teo Susnjak
[ABSTRACT]
This study investigates the application of Transfer Learning (TL) on
Transformer architectures to enhance building energy consumption forecasting.
Transformers are a relatively new deep learning architecture, which has served
as the foundation for groundbreaking technologies such as ChatGPT. While TL has
been studied in the past, prior studies considered either one data-centric TL
strategy or used older deep learning models such as Recurrent Neural Networks
or Convolutional Neural Networks. Here, we carry out an extensive empirical
study on six different data-centric TL strategies and analyse their performance
under varying feature spaces. In addition to the vanilla Transformer
architecture, we also experiment with Informer and PatchTST, specifically
designed for time series forecasting. We use 16 datasets from the Building Data
Genome Project 2 to create building energy consumption forecasting models.
Experimental results reveal that while TL is generally beneficial, especially
when the target domain has no data, careful selection of the exact TL strategy
should be made to gain the maximum benefit. This decision largely depends on
the feature space properties such as the recorded weather features. We also
note that PatchTST outperforms the other two Transformer variants (vanilla
Transformer and Informer). Our findings advance the building energy consumption
forecasting using advanced approaches like TL and Transformer architectures.
[LINK]
http://arxiv.org/abs/2410.14107v2
[DATE]
2024-11-20 06:19:12+08:00
[CATEGORIES]
cs.LG
NPGPT: Natural Product-Like Compound Generation with GPT-based Chemical Language Models
[AUTHORS]
Koh Sakano, Kairi Furui, Masahito Ohue
[ABSTRACT]
Natural products are substances produced by organisms in nature and often
possess biological activity and structural diversity. Drug development based on
natural products has been common for many years. However, the intricate
structures of these compounds present challenges in terms of structure
determination and synthesis, particularly compared to the efficiency of
high-throughput screening of synthetic compounds. In recent years, deep
learning-based methods have been applied to the generation of molecules. In
this study, we trained chemical language models on a natural product dataset
and generated natural product-like compounds. The results showed that the
distribution of the compounds generated was similar to that of natural
products. We also evaluated the effectiveness of the generated compounds as
drug candidates. Our method can be used to explore the vast chemical space and
reduce the time and cost of drug discovery of natural products.
[LINK]
http://arxiv.org/abs/2411.12886v1
[DATE]
2024-11-20 06:08:26+08:00
[CATEGORIES]
cs.LG
Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit
[AUTHORS]
Seok-Jin Kim, Min-hwan Oh
[ABSTRACT]
We study the performance guarantees of exploration-free greedy algorithms for
the linear contextual bandit problem. We introduce a novel condition, named the
\textit{Local Anti-Concentration} (LAC) condition, which enables a greedy
bandit algorithm to achieve provable efficiency. We show that the LAC condition
is satisfied by a broad class of distributions, including Gaussian,
exponential, uniform, Cauchy, and Student’s~$t$ distributions, along with other
exponential family distributions and their truncated variants. This
significantly expands the class of distributions under which greedy algorithms
can perform efficiently. Under our proposed LAC condition, we prove that the
cumulative expected regret of the greedy algorithm for the linear contextual
bandit is bounded by $O(\operatorname{poly} \log T)$. Our results establish the
widest range of distributions known to date that allow a sublinear regret bound
for greedy algorithms, further achieving a sharp poly-logarithmic regret.
[COMMENTS]
NeurIPS2024
[LINK]
http://arxiv.org/abs/2411.12878v1
[DATE]
2024-11-20 05:53:06+08:00
[CATEGORIES]
cs.LG
LatentQGAN: A Hybrid QGAN with Classical Convolutional Autoencoder
[AUTHORS]
Alexis Vieloszynski, Soumaya Cherkaoui, Ola Ahmad, Jean-Frédéric Laprade, Oliver Nahman-Lévesque, Abdallah Aaraba, Shengrui Wang
[ABSTRACT]
Quantum machine learning consists in taking advantage of quantum computations
to generate classical data. A potential application of quantum machine learning
is to harness the power of quantum computers for generating classical data, a
process essential to a multitude of applications such as enriching training
datasets, anomaly detection, and risk management in finance. Given the success
of Generative Adversarial Networks in classical image generation, the
development of its quantum versions has been actively conducted. However,
existing implementations on quantum computers often face significant
challenges, such as scalability and training convergence issues. To address
these issues, we propose LatentQGAN, a novel quantum model that uses a hybrid
quantum-classical GAN coupled with an autoencoder. Although it was initially
designed for image generation, the LatentQGAN approach holds potential for
broader application across various practical data generation tasks.
Experimental outcomes on both classical simulators and noisy intermediate scale
quantum computers have demonstrated significant performance enhancements over
existing quantum methods, alongside a significant reduction in quantum
resources overhead.
[COMMENTS]
This paper was accepted for publication on the 10th IEEE World Forum
on Internet of Things (IEEE WFIoT2024), in the session SS - QIoT-1: Special
Session - Quantum Internet of Things (QIoT)-1, November 10th, from 14:00 to
15:30 EST
[LINK]
http://arxiv.org/abs/2409.14622v4
[DATE]
2024-11-20 05:44:26+08:00
[CATEGORIES]
cs.LG
Puppet-CNN: Input-Adaptive Convolutional Neural Networks with Model Compression using Ordinary Differential Equation
[AUTHORS]
Yucheng Xing, Xin Wang
[ABSTRACT]
Convolutional Neural Network (CNN) has been applied to more and more
scenarios due to its excellent performance in many machine learning tasks,
especially with deep and complex structures. However, as the network goes
deeper, more parameters need to be stored and optimized. Besides, almost all
common CNN models adopt “train-and-use” strategy where the structure is
pre-defined and the kernel parameters are fixed after the training with the
same structure and set of parameters used for all data without considering the
content complexity. In this paper, we propose a new CNN framework, named as
$\textit{Puppet-CNN}$, which contains two modules: a $\textit{puppet module}$
and a $\textit{puppeteer module}$. The puppet module is a CNN model used to
actually process the input data just like other works, but its depth and
kernels are generated by the puppeteer module (realized with Ordinary
Differential Equation (ODE)) based on the input complexity each time. By
recurrently generating kernel parameters in the puppet module, we can take
advantage of the dependence among kernels of different convolutional layers to
significantly reduce the size of CNN model by only storing and training the
parameters of the much smaller puppeteer ODE module. Through experiments on
several datasets, our method has proven to be superior than the traditional
CNNs on both performance and efficiency. The model size can be reduced more
than 10 times.
[LINK]
http://arxiv.org/abs/2411.12876v1
[DATE]
2024-11-20 05:44:21+08:00
[CATEGORIES]
cs.LG
Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification
[AUTHORS]
Meryem Altin Karagoz, O. Ufuk Nalbantoglu, Geoffrey C. Fox
[ABSTRACT]
Deep learning has proven very promising for interpreting MRI in brain tumor
diagnosis. However, deep learning models suffer from a scarcity of brain MRI
datasets for effective training. Self-supervised learning (SSL) models provide
data-efficient and remarkable solutions to limited dataset problems. Therefore,
this paper introduces a generative SSL model for brain tumor classification in
two stages. The first stage is designed to pre-train a Residual Vision
Transformer (ResViT) model for MRI synthesis as a pretext task. The second
stage includes fine-tuning a ResViT-based classifier model as a downstream
task. Accordingly, we aim to leverage local features via CNN and global
features via ViT, employing a hybrid CNN-transformer architecture for ResViT in
pretext and downstream tasks. Moreover, synthetic MRI images are utilized to
balance the training set. The proposed model performs on public BraTs 2023,
Figshare, and Kaggle datasets. Furthermore, we compare the proposed model with
various deep learning models, including A-UNet, ResNet-9, pix2pix, pGAN for MRI
synthesis, and ConvNeXtTiny, ResNet101, DenseNet12, Residual CNN, ViT for
classification. According to the results, the proposed model pretraining on the
MRI dataset is superior compared to the pretraining on the ImageNet dataset.
Overall, the proposed model attains the highest accuracy, achieving 90.56% on
the BraTs dataset with T1 sequence, 98.53% on the Figshare, and 98.47% on the
Kaggle brain tumor datasets. As a result, the proposed model demonstrates a
robust, effective, and successful approach to handling insufficient dataset
challenges in MRI analysis by incorporating SSL, fine-tuning, data
augmentation, and combining CNN and ViT.
[LINK]
http://arxiv.org/abs/2411.12874v1
[DATE]
2024-11-20 05:42:57+08:00
[CATEGORIES]
cs.LG
From Text to Pose to Image: Improving Diffusion Model Control and Quality
[AUTHORS]
Clément Bonnett, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy Cizain, Pablo Ducru
[ABSTRACT]
In the last two years, text-to-image diffusion models have become extremely
popular. As their quality and usage increase, a major concern has been the need
for better output control. In addition to prompt engineering, one effective
method to improve the controllability of diffusion models has been to condition
them on additional modalities such as image style, depth map, or keypoints.
This forms the basis of ControlNets or Adapters. When attempting to apply these
methods to control human poses in outputs of text-to-image diffusion models,
two main challenges have arisen. The first challenge is generating poses
following a wide range of semantic text descriptions, for which previous
methods involved searching for a pose within a dataset of (caption, pose)
pairs. The second challenge is conditioning image generation on a specified
pose while keeping both high aesthetic and high pose fidelity. In this article,
we fix these two main issues by introducing a text-to-pose (T2P) generative
model alongside a new sampling algorithm, and a new pose adapter that
incorporates more pose keypoints for higher pose fidelity. Together, these two
new state-of-the-art models enable, for the first time, a generative
text-to-pose-to-image framework for higher pose control in diffusion models. We
release all models and the code used for the experiments at
https://github.com/clement-bonnet/text-to-pose.
[COMMENTS]
Published at the NeurIPS 2024 Workshop on Compositional Learning:
Perspectives, Methods, and Paths Forward
[LINK]
http://arxiv.org/abs/2411.12872v1
[DATE]
2024-11-20 05:34:50+08:00
[CATEGORIES]
cs.LG
Towards a framework on tabular synthetic data generation: a minimalist approach: theory, use cases, and limitations
[AUTHORS]
Yueyang Shen, Agus Sudjianto, Arun Prakash R, Anwesha Bhattacharyya, Maorong Rao, Yaqun Wang, Joel Vaughan, Nengfeng Zhou
[ABSTRACT]
We propose and study a minimalist approach towards synthetic tabular data
generation. The model consists of a minimalistic unsupervised SparsePCA encoder
(with contingent clustering step or log transformation to handle nonlinearity)
and XGboost decoder which is SOTA for structured data regression and
classification tasks. We study and contrast the methodologies with
(variational) autoencoders in several toy low dimensional scenarios to derive
necessary intuitions. The framework is applied to high dimensional simulated
credit scoring data which parallels real-life financial applications. We
applied the method to robustness testing to demonstrate practical use cases.
The case study result suggests that the method provides an alternative to raw
and quantile perturbation for model robustness testing. We show that the method
is simplistic, guarantees interpretability all the way through, does not
require extra tuning and provide unique benefits.
[LINK]
http://arxiv.org/abs/2411.10982v2
[DATE]
2024-11-20 05:20:47+08:00
[CATEGORIES]
cs.LG
Pretraining a Neural Operator in Lower Dimensions
[AUTHORS]
AmirPouya Hemmasian, Amir Barati Farimani
[ABSTRACT]
There has recently been increasing attention towards developing foundational
neural Partial Differential Equation (PDE) solvers and neural operators through
large-scale pretraining. However, unlike vision and language models that make
use of abundant and inexpensive (unlabeled) data for pretraining, these neural
solvers usually rely on simulated PDE data, which can be costly to obtain,
especially for high-dimensional PDEs. In this work, we aim to Pretrain neural
PDE solvers on Lower Dimensional PDEs (PreLowD) where data collection is the
least expensive. We evaluated the effectiveness of this pretraining strategy in
similar PDEs in higher dimensions. We use the Factorized Fourier Neural
Operator (FFNO) due to having the necessary flexibility to be applied to PDE
data of arbitrary spatial dimensions and reuse trained parameters in lower
dimensions. In addition, our work sheds light on the effect of the fine-tuning
configuration to make the most of this pretraining strategy. Code is available
at https://github.com/BaratiLab/PreLowD.
[LINK]
http://arxiv.org/abs/2407.17616v2
[DATE]
2024-11-20 05:08:20+08:00
[CATEGORIES]
cs.LG
CDI: Copyrighted Data Identification in Diffusion Models
[AUTHORS]
Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic
[ABSTRACT]
Diffusion Models (DMs) benefit from large and diverse datasets for their
training. Since this data is often scraped from the Internet without permission
from the data owners, this raises concerns about copyright and intellectual
property protections. While (illicit) use of data is easily detected for
training samples perfectly re-created by a DM at inference time, it is much
harder for data owners to verify if their data was used for training when the
outputs from the suspect DM are not close replicas. Conceptually, membership
inference attacks (MIAs), which detect if a given data point was used during
training, present themselves as a suitable tool to address this challenge.
However, we demonstrate that existing MIAs are not strong enough to reliably
determine the membership of individual images in large, state-of-the-art DMs.
To overcome this limitation, we propose CDI, a framework for data owners to
identify whether their dataset was used to train a given DM. CDI relies on
dataset inference techniques, i.e., instead of using the membership signal from
a single data point, CDI leverages the fact that most data owners, such as
providers of stock photography, visual media companies, or even individual
artists, own datasets with multiple publicly exposed data points which might
all be included in the training of a given DM. By selectively aggregating
signals from existing MIAs and using new handcrafted methods to extract
features for these datasets, feeding them to a scoring model, and applying
rigorous statistical testing, CDI allows data owners with as little as 70 data
points to identify with a confidence of more than 99% whether their data was
used to train a given DM. Thereby, CDI represents a valuable tool for data
owners to claim illegitimate use of their copyrighted data.
[COMMENTS]
Coda available at
https://github.com/sprintml/copyrighted_data_identification
[LINK]
http://arxiv.org/abs/2411.12858v1
[DATE]
2024-11-20 05:02:09+08:00
[CATEGORIES]
cs.LG
Integrating Secondary Structures Information into Triangular Spatial Relationships (TSR) for Advanced Protein Classification
[AUTHORS]
Poorya Khajouie, Titli Sarkar, Krishna Rauniyar, Li Chen, Wu Xu, Vijay Raghavan
[ABSTRACT]
Protein structures represent the key to deciphering biological functions. The
more detailed form of similarity among these proteins is sometimes overlooked
by the conventional structural comparison methods. In contrast, further
advanced methods, such as Triangular Spatial Relationship (TSR), have been
demonstrated to make finer differentiations. Still, the classical
implementation of TSR does not provide for the integration of secondary
structure information, which is important for a more detailed understanding of
the folding pattern of a protein. To overcome these limitations, we developed
the SSE-TSR approach. The proposed method integrates secondary structure
elements (SSEs) into TSR-based protein representations. This allows an enriched
representation of protein structures by considering 18 different combinations
of helix, strand, and coil arrangements. Our results show that using SSEs
improves the accuracy and reliability of protein classification to varying
degrees. We worked with two large protein datasets of 9.2K and 7.8K samples,
respectively. We applied the SSE-TSR approach and used a neural network model
for classification. Interestingly, introducing SSEs improved performance
statistics for Dataset 1, with accuracy moving from 96.0% to 98.3%. For Dataset
2, where the performance statistics were already good, further small
improvements were found with the introduction of SSE, giving an accuracy of
99.5% compared to 99.4%. These results show that SSE integration can
dramatically improve TSR key discrimination, with significant benefits in
datasets with low initial accuracies and only incremental gains in those with
high baseline performance. Thus, SSE-TSR is a powerful bioinformatics tool that
improves protein classification and understanding of protein function and
interaction.
[LINK]
http://arxiv.org/abs/2411.12853v1
[DATE]
2024-11-20 04:50:16+08:00
[CATEGORIES]
cs.LG
Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks
[AUTHORS]
Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pregowska, Tomasz P. Michalak
[ABSTRACT]
Graph Neural Networks (GNNs) are recognized as potent tools for processing
real-world data organized in graph structures. Especially inductive GNNs, which
allow for the processing of graph-structured data without relying on predefined
graph structures, are becoming increasingly important in a wide range of
applications. As such these networks become attractive targets for
model-stealing attacks where an adversary seeks to replicate the functionality
of the targeted network. Significant efforts have been devoted to developing
model-stealing attacks that extract models trained on images and texts.
However, little attention has been given to stealing GNNs trained on graph
data. This paper identifies a new method of performing unsupervised
model-stealing attacks against inductive GNNs, utilizing graph contrastive
learning and spectral graph augmentations to efficiently extract information
from the targeted model. The new type of attack is thoroughly evaluated on six
datasets and the results show that our approach outperforms the current
state-of-the-art by Shen et al. (2021). In particular, our attack surpasses the
baseline across all benchmarks, attaining superior fidelity and downstream
accuracy of the stolen model while necessitating fewer queries directed toward
the target model.
[COMMENTS]
Accepted at ECAI - 27th European Conference on Artificial
Intelligence
[LINK]
http://arxiv.org/abs/2405.12295v4
[DATE]
2024-11-20 04:37:54+08:00
[CATEGORIES]
cs.LG
mDAE : modified Denoising AutoEncoder for missing data imputation
[AUTHORS]
Mariette Dupuy, Marie Chavent, Remi Dubois
[ABSTRACT]
This paper introduces a methodology based on Denoising AutoEncoder (DAE) for
missing data imputation. The proposed methodology, called mDAE hereafter,
results from a modification of the loss function and a straightforward
procedure for choosing the hyper-parameters. An ablation study shows on several
UCI Machine Learning Repository datasets, the benefit of using this modified
loss function and an overcomplete structure, in terms of Root Mean Squared
Error (RMSE) of reconstruction. This numerical study is completed by comparing
the mDAE methodology with eight other methods (four standard and four more
recent). A criterion called Mean Distance to Best (MDB) is proposed to measure
how a method performs globally well on all datasets. This criterion is defined
as the mean (over the datasets) of the distances between the RMSE of the
considered method and the RMSE of the best method. According to this criterion,
the mDAE methodology was consistently ranked among the top methods (along with
SoftImput and missForest), while the four more recent methods were
systematically ranked last. The Python code of the numerical study will be
available on GitHub so that results can be reproduced or generalized with other
datasets and methods.
[LINK]
http://arxiv.org/abs/2411.12847v1
[DATE]
2024-11-20 04:31:53+08:00
[CATEGORIES]
cs.LG
GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts
[AUTHORS]
Deyu Zou, Shikun Liu, Siqi Miao, Victor Fung, Shiyu Chang, Pan Li
[ABSTRACT]
Geometric deep learning (GDL) has gained significant attention in scientific
fields, for its proficiency in modeling data with intricate geometric
structures. However, very few works have delved into its capability of tackling
the distribution shift problem, a prevalent challenge in many applications. To
bridge this gap, we propose GeSS, a comprehensive benchmark designed for
evaluating the performance of GDL models in scientific scenarios with
distribution shifts. Our evaluation datasets cover diverse scientific domains
from particle physics, materials science to biochemistry, and encapsulate a
broad spectrum of distribution shifts including conditional, covariate, and
concept shifts. Furthermore, we study three levels of information access from
the out-of-distribution (OOD) test data, including no OOD information, only
unlabeled OOD data, and OOD data with a few labels. Overall, our benchmark
results in 30 different experiment settings, and evaluates 3 GDL backbones and
11 learning algorithms in each setting. A thorough analysis of the evaluation
results is provided, poised to illuminate insights for GDL researchers and
domain practitioners who are to use GDL in their applications.
[COMMENTS]
Code and data are available at https://github.com/Graph-COM/GESS
[LINK]
http://arxiv.org/abs/2310.08677v2
[DATE]
2024-11-20 04:01:28+08:00
[CATEGORIES]
cs.LG
Copula-Linked Parallel ICA: A Method for Coupling Structural and Functional MRI brain Networks
[AUTHORS]
Oktay Agcaoglu, Rogers F. Silva, Deniz Alacam, Sergey Plis, Tulay Adali, Vince Calhoun
[ABSTRACT]
Different brain imaging modalities offer unique insights into brain function
and structure. Combining them enhances our understanding of neural mechanisms.
Prior multimodal studies fusing functional MRI (fMRI) and structural MRI (sMRI)
have shown the benefits of this approach. Since sMRI lacks temporal data,
existing fusion methods often compress fMRI temporal information into summary
measures, sacrificing rich temporal dynamics. Motivated by the observation that
covarying networks are identified in both sMRI and resting-state fMRI, we
developed a novel fusion method, by combining deep learning frameworks, copulas
and independent component analysis (ICA), named copula linked parallel ICA
(CLiP-ICA). This method estimates independent sources for each modality and
links the spatial sources of fMRI and sMRI using a copula-based model for more
flexible integration of temporal and spatial data. We tested CLiP-ICA using
data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Our results
showed that CLiP-ICA effectively captures both strongly and weakly linked sMRI
and fMRI networks, including the cerebellum, sensorimotor, visual, cognitive
control, and default mode networks. It revealed more meaningful components and
fewer artifacts, addressing the long-standing issue of optimal model order in
ICA. CLiP-ICA also detected complex functional connectivity patterns across
stages of cognitive decline, with cognitively normal subjects generally showing
higher connectivity in sensorimotor and visual networks compared to patients
with Alzheimer, along with patterns suggesting potential compensatory
mechanisms.
[COMMENTS]
25 pages, 10 figures, journal article
[LINK]
http://arxiv.org/abs/2410.19774v2
[DATE]
2024-11-20 03:56:20+08:00
[CATEGORIES]
cs.LG
Benchmarking Positional Encodings for GNNs and Graph Transformers
[AUTHORS]
Florian Grötschla, Jiaqing Xie, Roger Wattenhofer
[ABSTRACT]
Recent advances in Graph Neural Networks (GNNs) and Graph Transformers (GTs)
have been driven by innovations in architectures and Positional Encodings
(PEs), which are critical for augmenting node features and capturing graph
topology. PEs are essential for GTs, where topological information would
otherwise be lost without message-passing. However, PEs are often tested
alongside novel architectures, making it difficult to isolate their effect on
established models. To address this, we present a comprehensive benchmark of
PEs in a unified framework that includes both message-passing GNNs and GTs. We
also establish theoretical connections between MPNNs and GTs and introduce a
sparsified GRIT attention mechanism to examine the influence of global
connectivity. Our findings demonstrate that previously untested combinations of
GNN architectures and PEs can outperform existing methods and offer a more
comprehensive picture of the state-of-the-art. To support future research and
experimentation in our framework, we make the code publicly available.
[LINK]
http://arxiv.org/abs/2411.12732v1
[DATE]
2024-11-20 02:57:01+08:00
[CATEGORIES]
cs.LG
Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration
[AUTHORS]
Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan Rao Doppa, Yan Yan
[ABSTRACT]
Conformal prediction (CP) is an emerging uncertainty quantification framework
that allows us to construct a prediction set to cover the true label with a
pre-specified marginal or conditional probability. Although the valid coverage
guarantee has been extensively studied for classification problems, CP often
produces large prediction sets which may not be practically useful. This issue
is exacerbated for the setting of class-conditional coverage on imbalanced
classification tasks with many and/or imbalanced classes. This paper proposes
the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the
prediction set sizes to achieve class-conditional coverage, where the valid
coverage holds for each class. In contrast to the standard class-conditional CP
(CCP) method that uniformly thresholds the class-wise conformity score for each
class, the augmented label rank calibration step allows RC3P to selectively
iterate this class-wise thresholding subroutine only for a subset of classes
whose class-wise top-k error is small. We prove that agnostic to the classifier
and data distribution, RC3P achieves class-wise coverage. We also show that
RC3P reduces the size of prediction sets compared to the CCP method.
Comprehensive experiments on multiple real-world datasets demonstrate that RC3P
achieves class-wise coverage and 26.25% reduction in prediction set sizes on
average.
[LINK]
http://arxiv.org/abs/2406.06818v4
[DATE]
2024-11-20 02:56:07+08:00
[CATEGORIES]
cs.LG
Testing classical properties from quantum data
[AUTHORS]
Matthias C. Caro, Preksha Naik, Joseph Slote
[ABSTRACT]
Many properties of Boolean functions can be tested far more efficiently than
the function can be learned. However, this advantage often disappears when
testers are limited to random samples–a natural setting for data
science–rather than queries. In this work we investigate the quantum version
of this scenario: quantum algorithms that test properties of a function $f$
solely from quantum data in the form of copies of the function state for $f$.
For three well-established properties, we show that the speedup lost when
restricting classical testers to samples can be recovered by testers that use
quantum data. For monotonicity testing, we give a quantum algorithm that uses
$\tilde{\mathcal{O}}(n^2)$ function state copies as compared to the
$2^{\Omega(\sqrt{n})}$ samples required classically. We also present
$\mathcal{O}(1)$-copy testers for symmetry and triangle-freeness, comparing
favorably to classical lower bounds of $\Omega(n^{1/4})$ and $\Omega(n)$
samples respectively. These algorithms are time-efficient and necessarily
include techniques beyond the Fourier sampling approaches applied to earlier
testing problems.
These results make the case for a general study of the advantages afforded by
quantum data for testing. We contribute to this project by complementing our
upper bounds with a lower bound of $\Omega(1/\varepsilon)$ for monotonicity
testing from quantum data in the proximity regime
$\varepsilon\leq\mathcal{O}(n^{-3/2})$. This implies a strict separation
between testing monotonicity from quantum data and from quantum queries–where
$\tilde{\mathcal{O}}(n)$ queries suffice when $\varepsilon=\Theta(n^{-3/2})$.
We also exhibit a testing problem that can be solved from $\mathcal{O}(1)$
classical queries but requires $\Omega(2^{n/2})$ function state copies,
complementing a separation of the same magnitude in the opposite direction
derived from the Forrelation problem.
[COMMENTS]
38 + 14 pages, 2 tables, 2 figures
[LINK]
http://arxiv.org/abs/2411.12730v1
[DATE]
2024-11-20 02:52:55+08:00
[CATEGORIES]
cs.LG
LazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transport
[AUTHORS]
Lianghao Cao, Joshua Chen, Michael Brennan, Thomas O’Leary-Roseberry, Youssef Marzouk, Omar Ghattas
[ABSTRACT]
We present LazyDINO, a transport map variational inference method for fast,
scalable, and efficiently amortized solutions of high-dimensional nonlinear
Bayesian inverse problems with expensive parameter-to-observable (PtO) maps.
Our method consists of an offline phase in which we construct a
derivative-informed neural surrogate of the PtO map using joint samples of the
PtO map and its Jacobian. During the online phase, when given observational
data, we seek rapid posterior approximation using surrogate-driven training of
a lazy map [Brennan et al., NeurIPS, (2020)], i.e., a structure-exploiting
transport map with low-dimensional nonlinearity. The trained lazy map then
produces approximate posterior samples or density evaluations. Our surrogate
construction is optimized for amortized Bayesian inversion using lazy map
variational inference. We show that (i) the derivative-based reduced basis
architecture [O’Leary-Roseberry et al., Comput. Methods Appl. Mech. Eng., 388
(2022)] minimizes the upper bound on the expected error in surrogate posterior
approximation, and (ii) the derivative-informed training formulation
[O’Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] minimizes the expected
error due to surrogate-driven transport map optimization. Our numerical results
demonstrate that LazyDINO is highly efficient in cost amortization for Bayesian
inversion. We observe one to two orders of magnitude reduction of offline cost
for accurate posterior approximation, compared to simulation-based amortized
inference via conditional transport and conventional surrogate-driven
transport. In particular, LazyDINO outperforms Laplace approximation
consistently using fewer than 1000 offline samples, while other amortized
inference methods struggle and sometimes fail at 16,000 offline samples.
[LINK]
http://arxiv.org/abs/2411.12726v1
[DATE]
2024-11-20 02:48:00+08:00
[CATEGORIES]
cs.LG
Heuristic-Free Multi-Teacher Learning
[AUTHORS]
Huy Thong Nguyen, En-Hung Chu, Lenord Melvix, Jazon Jiao, Chunglin Wen, Benjamin Louie
[ABSTRACT]
We introduce Teacher2Task, a novel framework for multi-teacher learning that
eliminates the need for manual aggregation heuristics. Existing multi-teacher
methods typically rely on such heuristics to combine predictions from multiple
teachers, often resulting in sub-optimal aggregated labels and the propagation
of aggregation errors. Teacher2Task addresses these limitations by introducing
teacher-specific input tokens and reformulating the training process. Instead
of relying on aggregated labels, the framework transforms the training data,
consisting of ground truth labels and annotations from N teachers, into N+1
distinct tasks: N auxiliary tasks that predict the labeling styles of the N
individual teachers, and one primary task that focuses on the ground truth
labels. This approach, drawing upon principles from multiple learning
paradigms, demonstrates strong empirical results across a range of
architectures, modalities, and tasks.
[LINK]
http://arxiv.org/abs/2411.12724v1
[DATE]
2024-11-20 02:45:16+08:00
[CATEGORIES]
cs.LG
GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval
[AUTHORS]
Dong Liu, Roger Waleffe, Meng Jiang, Shivaram Venkataraman
[ABSTRACT]
In our recent research, we have developed a framework called GraphSnapShot,
which has been proven an useful tool for graph learning acceleration.
GraphSnapShot is a framework for fast cache, storage, retrieval and computation
for graph learning. It can quickly store and update the local topology of graph
structure and allows us to track patterns in the structure of graph networks,
just like take snapshots of the graphs. In experiments, GraphSnapShot shows
efficiency, it can achieve up to 30% training acceleration and 73% memory
reduction for lossless graph ML training compared to current baselines such as
dgl.This technique is particular useful for large dynamic graph learning tasks
such as social media analysis and recommendation systems to process complex
relationships between entities.
The code for GraphSnapShot is publicly available at
https://github.com/NoakLiu/GraphSnapShot.
[LINK]
http://arxiv.org/abs/2406.17918v3
[DATE]
2024-11-20 02:24:03+08:00
[CATEGORIES]
cs.LG
Regulating Chatbot Output via Inter-Informational Competition
[AUTHORS]
Jiawei Zhang
[ABSTRACT]
The advent of ChatGPT has sparked over a year of regulatory frenzy. However,
few existing studies have rigorously questioned the assumption that, if left
unregulated, AI chatbot’s output would inflict tangible, severe real harm on
human affairs. Most researchers have overlooked the critical possibility that
the information market itself can effectively mitigate these risks and, as a
result, they tend to use regulatory tools to address the issue directly. This
Article develops a yardstick for reevaluating both AI-related content risks and
corresponding regulatory proposals by focusing on inter-informational
competition among various outlets. The decades-long history of regulating
information and communications technologies indicates that regulators tend to
err too much on the side of caution and to put forward excessive regulatory
measures when encountering the uncertainties brought about by new technologies.
In fact, a trove of empirical evidence has demonstrated that market competition
among information outlets can effectively mitigate most risks and that
overreliance on regulation is not only unnecessary but detrimental, as well.
This Article argues that sufficient competition among chatbots and other
information outlets in the information marketplace can sufficiently mitigate
and even resolve most content risks posed by generative AI technologies. This
renders certain loudly advocated regulatory strategies, like mandatory
prohibitions, licensure, curation of datasets, and notice-and-response regimes,
truly unnecessary and even toxic to desirable competition and innovation
throughout the AI industry. Ultimately, the ideas that I advance in this
Article should pour some much-needed cold water on the regulatory frenzy over
generative AI and steer the issue back to a rational track.
[COMMENTS]
50-page legal Article, forthcoming in Northwestern Journal of
Technology and Intellectual Property
[LINK]
http://arxiv.org/abs/2403.11046v2
[DATE]
2024-11-20 02:18:04+08:00
[CATEGORIES]
cs.LG
KTO: Model Alignment as Prospect Theoretic Optimization
[AUTHORS]
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela
[COMMENTS]
ICML 2024
[LINK]
http://arxiv.org/abs/2402.01306v4
[DATE]
2024-11-20 02:12:45+08:00
[CATEGORIES]
cs.LG
Learning multivariate Gaussians with imperfect advice
[AUTHORS]
Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis
[ABSTRACT]
We revisit the problem of distribution learning within the framework of
learning-augmented algorithms. In this setting, we explore the scenario where a
probability distribution is provided as potentially inaccurate advice on the
true, unknown distribution. Our objective is to develop learning algorithms
whose sample complexity decreases as the quality of the advice improves,
thereby surpassing standard learning lower bounds when the advice is
sufficiently accurate.
Specifically, we demonstrate that this outcome is achievable for the problem
of learning a multivariate Gaussian distribution $N(\boldsymbol{\mu},
\boldsymbol{\Sigma})$ in the PAC learning setting. Classically, in the
advice-free setting, $\tilde{\Theta}(d^2/\varepsilon^2)$ samples are sufficient
and worst case necessary to learn $d$-dimensional Gaussians up to TV distance
$\varepsilon$ with constant probability. When we are additionally given a
parameter $\tilde{\boldsymbol{\Sigma}}$ as advice, we show that
$\tilde{O}(d^{2-\beta}/\varepsilon^2)$ samples suffices whenever $|
\tilde{\boldsymbol{\Sigma}}^{-1/2} \boldsymbol{\Sigma}
\tilde{\boldsymbol{\Sigma}}^{-1/2} - \boldsymbol{I_d} |_1 \leq \varepsilon
d^{1-\beta}$ (where $|\cdot|_1$ denotes the entrywise $\ell_1$ norm) for any
$\beta > 0$, yielding a polynomial improvement over the advice-free setting.
[LINK]
http://arxiv.org/abs/2411.12700v1
[DATE]
2024-11-20 02:08:01+08:00
[CATEGORIES]
cs.LG
RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
[AUTHORS]
Jonas Eschmann, Dario Albani, Giuseppe Loianno
[ABSTRACT]
Deep Reinforcement Learning (RL) can yield capable agents and control
policies in several domains but is commonly plagued by prohibitively long
training times. Additionally, in the case of continuous control problems, the
applicability of learned policies on real-world embedded devices is limited due
to the lack of real-time guarantees and portability of existing libraries. To
address these challenges, we present RLtools, a dependency-free, header-only,
pure C++ library for deep supervised and reinforcement learning. Its novel
architecture allows RLtools to be used on a wide variety of platforms, from HPC
clusters over workstations and laptops to smartphones, smartwatches, and
microcontrollers. Specifically, due to the tight integration of the RL
algorithms with simulation environments, RLtools can solve popular RL problems
up to 76 times faster than other popular RL frameworks. We also benchmark the
inference on a diverse set of microcontrollers and show that in most cases our
optimized implementation is by far the fastest. Finally, RLtools enables the
first-ever demonstration of training a deep RL algorithm directly on a
microcontroller, giving rise to the field of TinyRL. The source code as well as
documentation and live demos are available through our project page at
https://rl.tools.
[COMMENTS]
Project page: https://rl.tools
[LINK]
http://arxiv.org/abs/2306.03530v4
[DATE]
2024-11-20 01:41:00+08:00
[CATEGORIES]
cs.LG
IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose
[AUTHORS]
Fei Ren, Chao Ren, Tianyi Lyu
[ABSTRACT]
This study proposes the IoT-Enhanced Pose Optimization Network (IE-PONet) for
high-precision 3D pose estimation and motion optimization of track and field
athletes. IE-PONet integrates C3D for spatiotemporal feature extraction,
OpenPose for real-time keypoint detection, and Bayesian optimization for
hyperparameter tuning. Experimental results on NTURGB+D and FineGYM datasets
demonstrate superior performance, with AP(^p50) scores of 90.5 and 91.0, and
mAP scores of 74.3 and 74.0, respectively. Ablation studies confirm the
essential roles of each module in enhancing model accuracy. IE-PONet provides a
robust tool for athletic performance analysis and optimization, offering
precise technical insights for training and injury prevention. Future work will
focus on further model optimization, multimodal data integration, and
developing real-time feedback mechanisms to enhance practical applications.
[COMMENTS]
17 pages
[LINK]
http://arxiv.org/abs/2411.12676v1
[DATE]
2024-11-20 01:29:59+08:00
[CATEGORIES]
cs.LG
Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis
[AUTHORS]
Kazi Hasibul Kabir, Md. Zahiruddin Aqib, Sharmin Sultana, Shamim Akhter
[ABSTRACT]
Monitoring agricultural activities is important to ensure food security.
Remote sensing plays a significant role for large-scale continuous monitoring
of cultivation activities. Time series remote sensing data were used for the
generation of the cropping pattern. Classification algorithms are used to
classify crop patterns and mapped agriculture land used. Some conventional
classification methods including support vector machine (SVM) and decision
trees were applied for crop pattern recognition. However, in this paper, we are
proposing Deep Neural Network (DNN) based classification to improve the
performance of crop pattern recognition and make a comparative analysis with
two (2) other machine learning approaches including Naive Bayes and Random
Forest.
[COMMENTS]
Published in ICNTET2018: International Conference on New Trends in
Engineering & Technology Tirupathi Highway, Tiruvallur Dist Chennai, India,
September 7-8, 2018
[LINK]
http://arxiv.org/abs/2411.12667v1
[DATE]
2024-11-20 01:19:20+08:00
[CATEGORIES]
cs.LG
PoM: Efficient Image and Video Generation with the Polynomial Mixer
[AUTHORS]
David Picard, Nicolas Dufour
[ABSTRACT]
Diffusion models based on Multi-Head Attention (MHA) have become ubiquitous
to generate high quality images and videos. However, encoding an image or a
video as a sequence of patches results in costly attention patterns, as the
requirements both in terms of memory and compute grow quadratically. To
alleviate this problem, we propose a drop-in replacement for MHA called the
Polynomial Mixer (PoM) that has the benefit of encoding the entire sequence
into an explicit state. PoM has a linear complexity with respect to the number
of tokens. This explicit state also allows us to generate frames in a
sequential fashion, minimizing memory and compute requirement, while still
being able to train in parallel. We show the Polynomial Mixer is a universal
sequence-to-sequence approximator, just like regular MHA. We adapt several
Diffusion Transformers (DiT) for generating images and videos with PoM
replacing MHA, and we obtain high quality samples while using less
computational resources. The code is available at
https://github.com/davidpicard/HoMM.
[LINK]
http://arxiv.org/abs/2411.12663v1
[DATE]
2024-11-20 01:16:31+08:00
[CATEGORIES]
cs.LG
TransDreamer: Reinforcement Learning with Transformer World Models
[AUTHORS]
Chang Chen, Yi-Fu Wu, Jaesik Yoon, Sungjin Ahn
[ABSTRACT]
The Dreamer agent provides various benefits of Model-Based Reinforcement
Learning (MBRL) such as sample efficiency, reusable knowledge, and safe
planning. However, its world model and policy networks inherit the limitations
of recurrent neural networks and thus an important question is how an MBRL
framework can benefit from the recent advances of transformers and what the
challenges are in doing so. In this paper, we propose a transformer-based MBRL
agent, called TransDreamer. We first introduce the Transformer State-Space
Model, a world model that leverages a transformer for dynamics predictions. We
then share this world model with a transformer-based policy network and obtain
stability in training a transformer-based RL agent. In experiments, we apply
the proposed model to 2D visual RL and 3D first-person visual RL tasks both
requiring long-range memory access for memory-based reasoning. We show that the
proposed model outperforms Dreamer in these complex tasks.
[COMMENTS]
Deep RL Workshop NeurIPS 2021
[LINK]
http://arxiv.org/abs/2202.09481v2
[DATE]
2024-11-20 00:55:55+08:00
[CATEGORIES]
cs.LG
PyAWD: A Library for Generating Large Synthetic Datasets of Acoustic Wave Propagation with Devito
[AUTHORS]
Pascal Tribel, Gianluca Bontempi
[ABSTRACT]
Seismic data is often sparse and unevenly distributed due to the high costs
and logistical challenges associated with deploying physical seismometers,
limiting the application of Machine Learning (ML) in earthquake analysis. To
address this gap, we introduce PyAWD, a Python library designed to generate
high-resolution synthetic datasets simulating spatio-temporal acoustic wave
propagation in both two-dimensional and three-dimensional heterogeneous media.
By allowing fine control over parameters such as wave speed, external forces,
spatial and temporal discretization, and media composition, PyAWD enables the
creation of ML-scale datasets that capture the complexity of seismic wave
behavior. We illustrate the library’s potential with an epicenter retrieval
task, showcasing its suitability for designing complex, accurate seismic
problems that support advanced ML approaches in the absence or lack of dense
real-world data.
[LINK]
http://arxiv.org/abs/2411.12636v1
[DATE]
2024-11-20 00:49:58+08:00
[CATEGORIES]
cs.LG
log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling
[AUTHORS]
Xiao Hu, Ziqi Chen, Bo Peng, Daniel Adu-Ampratwum, Xia Ning
[ABSTRACT]
Accurate prediction of chemical reaction yields is crucial for optimizing
organic synthesis, potentially reducing time and resources spent on
experimentation. With the rise of artificial intelligence (AI), there is
growing interest in leveraging AI-based methods to accelerate yield predictions
without conducting in vitro experiments. We present log-RRIM, an innovative
graph transformer-based framework designed for predicting chemical reaction
yields. Our approach implements a unique local-to-global reaction
representation learning strategy. This approach initially captures detailed
molecule-level information and then models and aggregates intermolecular
interactions, ensuring that the impact of varying-sizes molecular fragments on
yield is accurately accounted for. Another key feature of log-RRIM is its
integration of a cross-attention mechanism that focuses on the interplay
between reagents and reaction centers. This design reflects a fundamental
principle in chemical reactions: the crucial role of reagents in influencing
bond-breaking and formation processes, which ultimately affect reaction yields.
log-RRIM outperforms existing methods in our experiments, especially for medium
to high-yielding reactions, proving its reliability as a predictor. Its
advanced modeling of reactant-reagent interactions and sensitivity to small
molecular fragments make it a valuable tool for reaction planning and
optimization in chemical synthesis. The data and codes of log-RRIM are
accessible through https://github.com/ninglab/Yield_log_RRIM.
[COMMENTS]
18 pages, 8 figures
[LINK]
http://arxiv.org/abs/2411.03320v3
[DATE]
2024-11-20 00:49:12+08:00
[CATEGORIES]
cs.LG
Instant Policy: In-Context Imitation Learning via Graph Diffusion
[AUTHORS]
Vitalis Vosylius, Edward Johns
[ABSTRACT]
Following the impressive capabilities of in-context learning with large
transformers, In-Context Imitation Learning (ICIL) is a promising opportunity
for robotics. We introduce Instant Policy, which learns new tasks instantly
(without further training) from just one or two demonstrations, achieving ICIL
through two key components. First, we introduce inductive biases through a
graph representation and model ICIL as a graph generation problem with a
learned diffusion process, enabling structured reasoning over demonstrations,
observations, and actions. Second, we show that such a model can be trained
using pseudo-demonstrations - arbitrary trajectories generated in simulation -
as a virtually infinite pool of training data. Simulated and real experiments
show that Instant Policy enables rapid learning of various everyday robot
tasks. We also show how it can serve as a foundation for cross-embodiment and
zero-shot transfer to language-defined tasks. Code and videos are available at
https://www.robot-learning.uk/instant-policy.
[COMMENTS]
Code and videos are available on our project webpage at
https://www.robot-learning.uk/instant-policy
[LINK]
http://arxiv.org/abs/2411.12633v1
[DATE]
2024-11-20 00:45:52+08:00
[CATEGORIES]
cs.LG
Exploring the Manifold of Neural Networks Using Diffusion Geometry
[AUTHORS]
Elliott Abel, Peyton Crevasse, Yvan Grinspan, Selma Mazioud, Folu Ogundipe, Kristof Reimann, Ellie Schueler, Andrew J. Steindl, Ellen Zhang, Dhananjay Bhaskar, Siddharth Viswanath, Yanlei Zhang, Tim G. J. Rudner, Ian Adelstein, Smita Krishnaswamy
[ABSTRACT]
Drawing motivation from the manifold hypothesis, which posits that most
high-dimensional data lies on or near low-dimensional manifolds, we apply
manifold learning to the space of neural networks. We learn manifolds where
datapoints are neural networks by introducing a distance between the hidden
layer representations of the neural networks. These distances are then fed to
the non-linear dimensionality reduction algorithm PHATE to create a manifold of
neural networks. We characterize this manifold using features of the
representation, including class separation, hierarchical cluster structure,
spectral entropy, and topological structure. Our analysis reveals that
high-performing networks cluster together in the manifold, displaying
consistent embedding patterns across all these features. Finally, we
demonstrate the utility of this approach for guiding hyperparameter
optimization and neural architecture search by sampling from the manifold.
[LINK]
http://arxiv.org/abs/2411.12626v1
[DATE]
2024-11-20 00:34:45+08:00
[CATEGORIES]
cs.LG
A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation
[AUTHORS]
Jiaqi Yang, Nitish Mehta, Xiaoling Hu, Chao Chen, Chia-Ling Tsai
[ABSTRACT]
Accurate segmentation of Optical Coherence Tomography (OCT) images is crucial
for diagnosing and monitoring retinal diseases. However, the labor-intensive
nature of pixel-level annotation limits the scalability of supervised learning
with large datasets. Weakly Supervised Semantic Segmentation (WSSS) provides a
promising alternative by leveraging image-level labels. In this study, we
propose a novel WSSS approach that integrates structural guidance with
text-driven strategies to generate high-quality pseudo labels, significantly
improving segmentation performance. In terms of visual information, our method
employs two processing modules that exchange raw image features and structural
features from OCT images, guiding the model to identify where lesions are
likely to occur. In terms of textual information, we utilize large-scale
pretrained models from cross-domain sources to implement label-informed textual
guidance and synthetic descriptive integration with two textual processing
modules that combine local semantic features with consistent synthetic
descriptions. By fusing these visual and textual components within a multimodal
framework, our approach enhances lesion localization accuracy. Experimental
results on three OCT datasets demonstrate that our method achieves
state-of-the-art performance, highlighting its potential to improve diagnostic
accuracy and efficiency in medical imaging.
[COMMENTS]
21 pages, 9 figures, 8 tables
[LINK]
http://arxiv.org/abs/2411.12615v1
[DATE]
2024-11-20 00:20:27+08:00
[CATEGORIES]
cs.LG
Reward driven workflows for unsupervised explainable analysis of phases and ferroic variants from atomically resolved imaging data
[AUTHORS]
Kamyar Barakati, Yu Liu, Chris Nelson, Maxim A. Ziatdinov, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin
[ABSTRACT]
Rapid progress in aberration corrected electron microscopy necessitates
development of robust methods for the identification of phases, ferroic
variants, and other pertinent aspects of materials structure from imaging data.
While unsupervised methods for clustering and classification are widely used
for these tasks, their performance can be sensitive to hyperparameter selection
in the analysis workflow. In this study, we explore the effects of descriptors
and hyperparameters on the capability of unsupervised ML methods to distill
local structural information, exemplified by discovery of polarization and
lattice distortion in Sm doped BiFeO3 (BFO) thin films. We demonstrate that a
reward-driven approach can be used to optimize these key hyperparameters across
the full workflow, where rewards were designed to reflect domain wall
continuity and straightness, ensuring that the analysis aligns with the
material’s physical behavior. This approach allows us to discover local
descriptors that are best aligned with the specific physical behavior,
providing insight into the fundamental physics of materials. We further extend
the reward driven workflows to disentangle structural factors of variation via
optimized variational autoencoder (VAE). Finally, the importance of
well-defined rewards was explored as a quantifiable measure of success of the
workflow.
[COMMENTS]
19 pages, 6 figures
[LINK]
http://arxiv.org/abs/2411.12612v1
[DATE]
2024-11-20 00:18:20+08:00
[CATEGORIES]
cs.LG
Improving Multi-task Learning via Seeking Task-based Flat Regions
[AUTHORS]
Hoang Phan, Lam Tran, Quyen Tran, Ngoc N. Tran, Tuan Truong, Nhat Ho, Dinh Phung, Trung Le
[ABSTRACT]
Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for
training deep neural networks that allows learning more than one objective by a
single backbone. Compared to training tasks separately, MTL significantly
reduces computational costs, improves data efficiency, and potentially enhances
model performance by leveraging knowledge across tasks. Hence, it has been
adopted in a variety of applications, ranging from computer vision to natural
language processing and speech recognition. Among them, there is an emerging
line of work in MTL that focuses on manipulating the task gradient to derive an
ultimate gradient descent direction to benefit all tasks. Despite achieving
impressive results on many benchmarks, directly applying these approaches
without using appropriate regularization techniques might lead to suboptimal
solutions on real-world problems. In particular, standard training that
minimizes the empirical loss on the training data can easily suffer from
overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which
can cause negative transfer between tasks and overall performance drop. To
alleviate such problems, we propose to leverage a recently introduced training
method, named Sharpness-aware Minimization, which can enhance model
generalization ability on single-task learning. Accordingly, we present a novel
MTL training methodology, encouraging the model to find task-based flat minima
for coherently improving its generalization capability on all tasks. Finally,
we conduct comprehensive experiments on a variety of applications to
demonstrate the merit of our proposed approach to existing gradient-based MTL
methods, as suggested by our developed theory.
[COMMENTS]
35 pages, 17 figures, 7 tables
[LINK]
http://arxiv.org/abs/2211.13723v3
[DATE]
2024-11-20 00:17:58+08:00
[CATEGORIES]
cs.LG
STREAM: A Universal State-Space Model for Sparse Geometric Data
[AUTHORS]
Mark Schöne, Yash Bhisikar, Karan Bania, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney, David Kappel
[ABSTRACT]
Handling sparse and unstructured geometric data, such as point clouds or
event-based vision, is a pressing challenge in the field of machine vision.
Recently, sequence models such as Transformers and state-space models entered
the domain of geometric data. These methods require specialized preprocessing
to create a sequential view of a set of points. Furthermore, prior works
involving sequence models iterate geometric data with either uniform or learned
step sizes, implicitly relying on the model to infer the underlying geometric
structure. In this work, we propose to encode geometric structure explicitly
into the parameterization of a state-space model. State-space models are based
on linear dynamics governed by a one-dimensional variable such as time or a
spatial coordinate. We exploit this dynamic variable to inject relative
differences of coordinates into the step size of the state-space model. The
resulting geometric operation computes interactions between all pairs of N
points in O(N) steps. Our model deploys the Mamba selective state-space model
with a modified CUDA kernel to efficiently map sparse geometric data to modern
hardware. The resulting sequence model, which we call STREAM, achieves
competitive results on a range of benchmarks from point-cloud classification to
event-based vision and audio classification. STREAM demonstrates a powerful
inductive bias for sparse geometric data by improving the PointMamba baseline
when trained from scratch on the ModelNet40 and ScanObjectNN point cloud
analysis datasets. It further achieves, for the first time, 100% test accuracy
on all 11 classes of the DVS128 Gestures dataset.
[LINK]
http://arxiv.org/abs/2411.12603v1
[DATE]
2024-11-20 00:06:32+08:00
[CATEGORIES]
cs.LG
Hypergraph $p$-Laplacian equations for data interpolation and semi-supervised learning
[AUTHORS]
Kehan Shi, Martin Burger
[ABSTRACT]
Hypergraph learning with $p$-Laplacian regularization has attracted a lot of
attention due to its flexibility in modeling higher-order relationships in
data. This paper focuses on its fast numerical implementation, which is
challenging due to the non-differentiability of the objective function and the
non-uniqueness of the minimizer. We derive a hypergraph $p$-Laplacian equation
from the subdifferential of the $p$-Laplacian regularization. A simplified
equation that is mathematically well-posed and computationally efficient is
proposed as an alternative. Numerical experiments verify that the simplified
$p$-Laplacian equation suppresses spiky solutions in data interpolation and
improves classification accuracy in semi-supervised learning. The remarkably
low computational cost enables further applications.
[COMMENTS]
16 pages
[LINK]
http://arxiv.org/abs/2411.12601v1
[DATE]
2024-11-20 00:05:35+08:00
[CATEGORIES]
cs.LG
Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent
[AUTHORS]
Leonidas Askianakis
[ABSTRACT]
Grasping by a robot in unstructured environments is deemed a critical
challenge because of the requirement for effective adaptation to a wide
variation in object geometries, material properties, and other environmental
factors. In this paper, we propose a novel framework for robotic grasping based
on the idea of compressing high-dimensional target and gripper features in a
common latent space using a set of autoencoders. Our approach simplifies
grasping by using three autoencoders dedicated to the target, the gripper, and
a third one that fuses their latent representations. This allows the RL agent
to achieve higher learning rates at the initial stages of exploration of a new
environment, as well as at non-zero shot grasp attempts. The agent explores the
latent space of the third autoencoder for better quality grasp without explicit
reconstruction of objects. By implementing the PoWER algorithm into the RL
training process, updates on the agent’s policy will be made through the
perturbation in the reward-weighted latent space. The successful exploration
efficiently constrains both position and pose integrity for feasible executions
of grasps. We evaluate our system on a diverse set of objects, demonstrating
the high success rate in grasping with minimum computational overhead. We found
that approach enhances the adaptation of the RL agent by more than 35 % in
simulation experiments.
[COMMENTS]
Submitted for review at IEEE ICRA 2025
[LINK]
http://arxiv.org/abs/2411.08566v2
[DATE]
2024-11-20 00:03:58+08:00
[CATEGORIES]
cs.LG
GNNAS-Dock: Budget Aware Algorithm Selection with Graph Neural Networks for Molecular Docking
[AUTHORS]
Yiliang Yuan, Mustafa Misir
[ABSTRACT]
Molecular docking is a major element in drug discovery and design. It enables
the prediction of ligand-protein interactions by simulating the binding of
small molecules to proteins. Despite the availability of numerous docking
algorithms, there is no single algorithm consistently outperforms the others
across a diverse set of docking scenarios. This paper introduces GNNAS-Dock, a
novel Graph Neural Network (GNN)-based automated algorithm selection system for
molecular docking in blind docking situations. GNNs are accommodated to process
the complex structural data of both ligands and proteins. They benefit from the
inherent graph-like properties to predict the performance of various docking
algorithms under different conditions. The present study pursues two main
objectives: 1) predict the performance of each candidate docking algorithm, in
terms of Root Mean Square Deviation (RMSD), thereby identifying the most
accurate method for specific scenarios; and 2) choose the best computationally
efficient docking algorithm for each docking case, aiming to reduce the time
required for docking while maintaining high accuracy. We validate our approach
on PDBBind 2020 refined set, which contains about 5,300 pairs of protein-ligand
complexes.
[LINK]
http://arxiv.org/abs/2411.12597v1
[DATE]
2024-11-20 00:01:54+08:00
[CATEGORIES]
cs.LG
Large Language Models for Combinatorial Optimization of Design Structure Matrix
[AUTHORS]
Shuo Jiang, Min Xie, Jianxi Luo
[ABSTRACT]
Combinatorial optimization (CO) is essential for improving efficiency and
performance in engineering applications. As complexity increases with larger
problem sizes and more intricate dependencies, identifying the optimal solution
become challenging. When it comes to real-world engineering problems,
algorithms based on pure mathematical reasoning are limited and incapable to
capture the contextual nuances necessary for optimization. This study explores
the potential of Large Language Models (LLMs) in solving engineering CO
problems by leveraging their reasoning power and contextual knowledge. We
propose a novel LLM-based framework that integrates network topology and domain
knowledge to optimize the sequencing of Design Structure Matrix (DSM)-a common
CO problem. Our experiments on various DSM cases demonstrate that the proposed
method achieves faster convergence and higher solution quality than benchmark
methods. Moreover, results show that incorporating contextual domain knowledge
significantly improves performance despite the choice of LLMs. These findings
highlight the potential of LLMs in tackling complex real-world CO problems by
combining semantic and mathematical reasoning. This approach paves the way for
a new paradigm in in real-world combinatorial optimization.
[LINK]
http://arxiv.org/abs/2411.12571v1
[DATE]
2024-11-19 23:39:51+08:00
[CATEGORIES]
cs.CL
Plurals: A System for Guiding LLMs Via Simulated Social Ensembles
[AUTHORS]
Joshua Ashkinaze, Emily Fry, Narendra Edara, Eric Gilbert, Ceren Budak
[ABSTRACT]
Recent debates raised concerns that language models may favor certain
viewpoints. But what if the solution is not to aim for a ‘view from nowhere’
but rather to leverage different viewpoints? We introduce Plurals, a system and
Python library for pluralistic AI deliberation. Plurals consists of Agents
(LLMs, optionally with personas) which deliberate within customizable
Structures, with Moderators overseeing deliberation. Plurals is a generator of
simulated social ensembles. Plurals integrates with government datasets to
create nationally representative personas, includes deliberation templates
inspired by deliberative democracy, and allows users to customize both
information-sharing structures and deliberation behavior within Structures. Six
case studies demonstrate fidelity to theoretical constructs and efficacy. Three
randomized experiments show simulated focus groups produced output resonant
with an online sample of the relevant audiences (chosen over zero-shot
generation in 75% of trials). Plurals is both a paradigm and a concrete system
for pluralistic AI. The Plurals library is available at
https://github.com/josh-ashkinaze/plurals and will be continually updated.
[LINK]
http://arxiv.org/abs/2409.17213v5
[DATE]
2024-11-19 23:37:57+08:00
[CATEGORIES]
cs.CL
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
[AUTHORS]
Riccardo Grazzi, Julien Siems, Jörg K. H. Franke, Arber Zela, Frank Hutter, Massimiliano Pontil
[ABSTRACT]
Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and
DeltaNet have emerged as efficient alternatives to Transformers in large
language modeling, offering linear scaling with sequence length and improved
training efficiency. However, LRNNs struggle to perform state-tracking which
may impair performance in tasks such as code evaluation or tracking a chess
game. Even parity, the simplest state-tracking task, which non-linear RNNs like
LSTM handle effectively, cannot be solved by current LRNNs. Recently, Sarrof et
al. (2024) demonstrated that the failure of LRNNs like Mamba to solve parity
stems from restricting the value range of their diagonal state-transition
matrices to $[0, 1]$ and that incorporating negative values can resolve this
issue. We extend this result to non-diagonal LRNNs, which have recently shown
promise in models such as DeltaNet. We prove that finite precision LRNNs with
state-transition matrices having only positive eigenvalues cannot solve parity,
while complex eigenvalues are needed to count modulo $3$. Notably, we also
prove that LRNNs can learn any regular language when their state-transition
matrices are products of identity minus vector outer product matrices, each
with eigenvalues in the range $[-1, 1]$. Our empirical results confirm that
extending the eigenvalue range of models like Mamba and DeltaNet to include
negative values not only enables them to solve parity but consistently improves
their performance on state-tracking tasks. Furthermore, pre-training LRNNs with
an extended eigenvalue range for language modeling achieves comparable
performance and stability while showing promise on code and math data. Our work
enhances the expressivity of modern LRNNs, broadening their applicability
without changing the cost of training or inference.
[LINK]
http://arxiv.org/abs/2411.12537v1
[DATE]
2024-11-19 22:35:38+08:00
[CATEGORIES]
cs.LG
cs.CL
Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN
[AUTHORS]
Zhilun Zhou, Jingyang Fan, Yu Liu, Fengli Xu, Depeng Jin, Yong Li
[ABSTRACT]
The fast development of location-based social networks (LBSNs) has led to
significant changes in society, resulting in popular studies of using LBSN data
for socioeconomic prediction, e.g., regional population and commercial activity
estimation. Existing studies design various graphs to model heterogeneous LBSN
data, and further apply graph representation learning methods for socioeconomic
prediction. However, these approaches heavily rely on heuristic ideas and
expertise to extract task-relevant knowledge from diverse data, which may not
be optimal for specific tasks. Additionally, they tend to overlook the inherent
relationships between different indicators, limiting the prediction accuracy.
Motivated by the remarkable abilities of large language models (LLMs) in
commonsense reasoning, embedding, and multi-agent collaboration, in this work,
we synergize LLM agents and knowledge graph for socioeconomic prediction. We
first construct a location-based knowledge graph (LBKG) to integrate
multi-sourced LBSN data. Then we leverage the reasoning power of LLM agent to
identify relevant meta-paths in the LBKG for each type of socioeconomic
prediction task, and design a semantic-guided attention module for knowledge
fusion with meta-paths. Moreover, we introduce a cross-task communication
mechanism to further enhance performance by enabling knowledge sharing across
tasks at both LLM agent and KG levels. On the one hand, the LLM agents for
different tasks collaborate to generate more diverse and comprehensive
meta-paths. On the other hand, the embeddings from different tasks are
adaptively merged for better socioeconomic prediction. Experiments on two
datasets demonstrate the effectiveness of the synergistic design between LLM
and KG, providing insights for information sharing across socioeconomic
prediction tasks.
[LINK]
http://arxiv.org/abs/2411.00028v2
[DATE]
2024-11-19 22:29:32+08:00
[CATEGORIES]
cs.CL
cs.LG
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
[AUTHORS]
Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao
[ABSTRACT]
Large language models are usually fine-tuned to align with human preferences.
However, fine-tuning a large language model can be challenging. In this work,
we introduce $\textit{weak-to-strong search}$, framing the alignment of a large
language model as a test-time greedy search to maximize the log-probability
difference between small tuned and untuned models while sampling from the
frozen large model. This method serves both as (1) a compute-efficient model
up-scaling strategy that avoids directly tuning the large model and as (2) an
instance of weak-to-strong generalization that enhances a strong model with
weak test-time guidance. Empirically, we demonstrate the flexibility of
weak-to-strong search across different tasks. In controlled-sentiment
generation and summarization, we use tuned and untuned $\texttt{gpt2}$s to
improve the alignment of large models without additional training. Crucially,
in a more difficult instruction-following benchmark, AlpacaEval 2.0, we show
that reusing off-the-shelf small models (e.g., $\texttt{zephyr-7b-beta}$ and
its untuned version) can improve the length-controlled win rates of both
white-box and black-box large models against $\texttt{gpt-4-turbo}$ (e.g.,
$34.4\% \rightarrow 37.9\%$ for $\texttt{Llama-3-70B-Instruct}$ and $16.0\%
\rightarrow 20.1\%$ for $\texttt{gpt-3.5-turbo-instruct}$), despite the small
models’ low win rates $\approx 10.0\%$.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2405.19262v3
[DATE]
2024-11-19 21:27:30+08:00
[CATEGORIES]
cs.CL
cs.LG
Bias Free Sentiment Analysis
[AUTHORS]
Hubert Plisiecki
[ABSTRACT]
This paper introduces the Semantic Propagation Graph Neural Network (SProp
GNN), a machine learning sentiment analysis (SA) architecture that relies
exclusively on syntactic structures and word-level emotional cues to predict
emotions in text. By semantically blinding the model to information about
specific words, it is robust to biases such as political or gender bias that
have been plaguing previous machine learning-based SA systems. The SProp GNN
shows performance superior to lexicon-based alternatives such as VADER and
EmoAtlas on two different prediction tasks, and across two languages.
Additionally, it approaches the accuracy of transformer-based models while
significantly reducing bias in emotion prediction tasks. By offering improved
explainability and reducing bias, the SProp GNN bridges the methodological gap
between interpretable lexicon approaches and powerful, yet often opaque, deep
learning models, offering a robust tool for fair and effective emotion analysis
in understanding human behavior through text.
[LINK]
http://arxiv.org/abs/2411.12493v1
[DATE]
2024-11-19 21:23:53+08:00
[CATEGORIES]
cs.CL
Regular-pattern-sensitive CRFs for Distant Label Interactions
[AUTHORS]
Sean Papay, Roman Klinger, Sebastian Pado
[ABSTRACT]
Linear-chain conditional random fields (CRFs) are a common model component
for sequence labeling tasks when modeling the interactions between different
labels is important. However, the Markov assumption limits linear-chain CRFs to
only directly modeling interactions between adjacent labels. Weighted
finite-state transducers (FSTs) are a related approach which can be made to
model distant label-label interactions, but exact label inference is
intractable for these models in the general case, and the task of selecting an
appropriate automaton structure for the desired interaction types poses a
practical challenge. In this work, we present regular-pattern-sensitive CRFs
(RPCRFs), a method of enriching standard linear-chain CRFs with the ability to
learn long-distance label interactions which occur in user-specified patterns.
This approach allows users to write regular-expression label patterns concisely
specifying which types of interactions the model should take into account,
allowing the model to learn from data whether and in which contexts these
patterns occur. The result can be interpreted alternatively as a CRF augmented
with additional, non-local potentials, or as a finite-state transducer whose
structure is defined by a set of easily-interpretable patterns. Critically,
unlike the general case for FSTs (and for non-chain CRFs), exact training and
inference are tractable for many pattern sets. In this work, we detail how a
RPCRF can be automatically constructed from a set of user-specified patterns,
and demonstrate the model’s effectiveness on synthetic data, showing how
different types of patterns can capture different nonlocal dependency
structures in label sequences.
[LINK]
http://arxiv.org/abs/2411.12484v1
[DATE]
2024-11-19 21:08:03+08:00
[CATEGORIES]
cs.LG
cs.CL
NMT-Obfuscator Attack: Ignore a sentence in translation with only one word
[AUTHORS]
Sahar Sadrizadeh, César Descalzo, Ljiljana Dolamic, Pascal Frossard
[ABSTRACT]
Neural Machine Translation systems are used in diverse applications due to
their impressive performance. However, recent studies have shown that these
systems are vulnerable to carefully crafted small perturbations to their
inputs, known as adversarial attacks. In this paper, we propose a new type of
adversarial attack against NMT models. In this attack, we find a word to be
added between two sentences such that the second sentence is ignored and not
translated by the NMT model. The word added between the two sentences is such
that the whole adversarial text is natural in the source language. This type of
attack can be harmful in practical scenarios since the attacker can hide
malicious information in the automatic translation made by the target NMT
model. Our experiments show that different NMT models and translation tasks are
vulnerable to this type of attack. Our attack can successfully force the NMT
models to ignore the second part of the input in the translation for more than
50% of all cases while being able to maintain low perplexity for the whole
input.
[LINK]
http://arxiv.org/abs/2411.12473v1
[DATE]
2024-11-19 20:55:22+08:00
[CATEGORIES]
cs.CL
Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation
[AUTHORS]
Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
[ABSTRACT]
With the development and proliferation of large, complex, black-box models
for solving many natural language processing (NLP) tasks, there is also an
increasing necessity of methods to stress-test these models and provide some
degree of interpretability or explainability. While counterfactual examples are
useful in this regard, automated generation of counterfactuals is a data and
resource intensive process. such methods depend on models such as pre-trained
language models that are then fine-tuned on auxiliary, often task-specific
datasets, that may be infeasible to build in practice, especially for new tasks
and data domains. Therefore, in this work we explore the possibility of
leveraging large language models (LLMs) for zero-shot counterfactual generation
in order to stress-test NLP models. We propose a structured pipeline to
facilitate this generation, and we hypothesize that the instruction-following
and textual understanding capabilities of recent LLMs can be effectively
leveraged for generating high quality counterfactuals in a zero-shot manner,
without requiring any training or fine-tuning. Through comprehensive
experiments on a variety of propreitary and open-source LLMs, along with
various downstream tasks in NLP, we explore the efficacy of LLMs as zero-shot
counterfactual generators in evaluating and explaining black-box NLP models.
[COMMENTS]
Longer version of short paper accepted at IEEE BigData 2024 (Main
Track)
[LINK]
http://arxiv.org/abs/2405.04793v2
[DATE]
2024-11-19 18:59:30+08:00
[CATEGORIES]
cs.CL
cs.LG
Child Speech Recognition in Human-Robot Interaction: Problem Solved?
[AUTHORS]
Ruben Janssens, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, Tony Belpaeme
[ABSTRACT]
Automated Speech Recognition shows superhuman performance for adult English
speech on a range of benchmarks, but disappoints when fed children’s speech.
This has long sat in the way of child-robot interaction. Recent evolutions in
data-driven speech recognition, including the availability of Transformer
architectures and unprecedented volumes of training data, might mean a
breakthrough for child speech recognition and social robot applications aimed
at children. We revisit a study on child speech recognition from 2017 and show
that indeed performance has increased, with newcomer OpenAI Whisper doing
markedly better than leading commercial cloud services. Performance improves
even more in highly structured interactions when priming models with specific
phrases. While transcription is not perfect yet, the best model recognises
60.3% of sentences correctly barring small grammatical differences, with
sub-second transcription time running on a local GPU, showing potential for
usable autonomous child-robot speech interactions.
[COMMENTS]
Submitted to 2024 International Conference on Social Robotics
[LINK]
http://arxiv.org/abs/2404.17394v2
[DATE]
2024-11-19 18:27:37+08:00
[CATEGORIES]
cs.CL
ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
[AUTHORS]
Ishneet Sukhvinder Singh, Ritvik Aggarwal, Ibrahim Allahverdiyev, Muhammad Taha, Aslihan Akalin, Kevin Zhu, Sean O’Brien
[ABSTRACT]
Retrieval-Augmented Generation (RAG) systems using large language models
(LLMs) often generate inaccurate responses due to the retrieval of irrelevant
or loosely related information. Existing methods, which operate at the document
level, fail to effectively filter out such content. We propose LLM-driven chunk
filtering, ChunkRAG, a framework that enhances RAG systems by evaluating and
filtering retrieved information at the chunk level. Our approach employs
semantic chunking to divide documents into coherent sections and utilizes
LLM-based relevance scoring to assess each chunk’s alignment with the user’s
query. By filtering out less pertinent chunks before the generation phase, we
significantly reduce hallucinations and improve factual accuracy. Experiments
show that our method outperforms existing RAG models, achieving higher accuracy
on tasks requiring precise information retrieval. This advancement enhances the
reliability of RAG systems, making them particularly beneficial for
applications like fact-checking and multi-hop reasoning.
[LINK]
http://arxiv.org/abs/2410.19572v4
[DATE]
2024-11-19 18:00:41+08:00
[CATEGORIES]
cs.CL
RedPajama: an Open Dataset for Training Large Language Models
[AUTHORS]
Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang
[ABSTRACT]
Large language models are increasingly becoming a cornerstone technology in
artificial intelligence, the sciences, and society as a whole, yet the optimal
strategies for dataset composition and filtering remain largely elusive. Many
of the top-performing models lack transparency in their dataset curation and
model development processes, posing an obstacle to the development of fully
open language models. In this paper, we identify three core data-related
challenges that must be addressed to advance open-source language models. These
include (1) transparency in model development, including the data curation
process, (2) access to large quantities of high-quality data, and (3)
availability of artifacts and metadata for dataset curation and analysis. To
address these challenges, we release RedPajama-V1, an open reproduction of the
LLaMA training dataset. In addition, we release RedPajama-V2, a massive
web-only dataset consisting of raw, unfiltered text data together with quality
signals and metadata. Together, the RedPajama datasets comprise over 100
trillion tokens spanning multiple domains and with their quality signals
facilitate the filtering of data, aiming to inspire the development of numerous
new datasets. To date, these datasets have already been used in the training of
strong language models used in production, such as Snowflake Arctic,
Salesforce’s XGen and AI2’s OLMo. To provide insight into the quality of
RedPajama, we present a series of analyses and ablation studies with
decoder-only language models with up to 1.6B parameters. Our findings
demonstrate how quality signals for web data can be effectively leveraged to
curate high-quality subsets of the dataset, underscoring the potential of
RedPajama to advance the development of transparent and high-performing
language models at scale.
[COMMENTS]
38th Conference on Neural Information Processing Systems (NeurIPS
2024) Track on Datasets and Benchmarks
[LINK]
http://arxiv.org/abs/2411.12372v1
[DATE]
2024-11-19 17:35:28+08:00
[CATEGORIES]
cs.CL
cs.LG
Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification
[AUTHORS]
Ming Li, Jike Zhong, Chenxin Li, Liuzhuozheng Li, Nie Lin, Masashi Sugiyama
[COMMENTS]
EMNLP 2024 Main Conference
[LINK]
http://arxiv.org/abs/2409.16718v2
[DATE]
2024-11-19 17:27:37+08:00
[CATEGORIES]
cs.CL
cs.LG
Re-Reading Improves Reasoning in Large Language Models
[AUTHORS]
Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou, Shuai Ma
[COMMENTS]
EMNLP 2024 Main
[LINK]
http://arxiv.org/abs/2309.06275v4
[DATE]
2024-11-19 17:06:33+08:00
[CATEGORIES]
cs.CL
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
[AUTHORS]
Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michał Podstawski, Torsten Hoefler
[ABSTRACT]
Retrieval Augmented Generation (RAG) enhances the abilities of Large Language
Models (LLMs) by enabling the retrieval of documents into the LLM context to
provide more accurate and relevant responses. Existing RAG solutions do not
focus on queries that may require fetching multiple documents with
substantially different contents. Such queries occur frequently, but are
challenging because the embeddings of these documents may be distant in the
embedding space, making it hard to retrieve them all. This paper introduces
Multi-Head RAG (MRAG), a novel scheme designed to address this gap with a
simple yet powerful idea: leveraging activations of Transformer’s multi-head
attention layer, instead of the decoder layer, as keys for fetching
multi-aspect documents. The driving motivation is that different attention
heads can learn to capture different data aspects. Harnessing the corresponding
activations results in embeddings that represent various facets of data items
and queries, improving the retrieval accuracy for complex queries. We provide
an evaluation methodology and metrics, multi-aspect datasets that we release
online, and real-world use cases to demonstrate MRAG’s effectiveness, showing
improvements of up to 20% in relevance over standard RAG baselines. MRAG can be
seamlessly integrated with existing RAG frameworks and benchmarking tools like
RAGAS as well as different classes of data stores.
[LINK]
http://arxiv.org/abs/2406.05085v2
[DATE]
2024-11-19 16:46:34+08:00
[CATEGORIES]
cs.CL
Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration
[AUTHORS]
Kangxi Wu, Liang Pang, Huawei Shen, Xueqi Cheng
[COMMENTS]
Accepted to the EMNLP 2024 main
[LINK]
http://arxiv.org/abs/2410.01285v2
[DATE]
2024-11-19 16:08:38+08:00
[CATEGORIES]
cs.CL
Divide-or-Conquer? Which Part Should You Distill Your LLM?
[AUTHORS]
Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang
[COMMENTS]
Findings of the Association for Computational Linguistics: EMNLP 2024
[LINK]
http://arxiv.org/abs/2402.15000v3
[DATE]
2024-11-19 15:46:16+08:00
[CATEGORIES]
cs.CL
cs.LG
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
[AUTHORS]
Dongyoung Go, Taesun Whang, Chanhee Lee, Hwayeon Kim, Sunghoon Park, Seunghwan Ji, Dongchan Kim, Young-Bum Kim
[ABSTRACT]
The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large
Language Models (MLLMs) has expanded the scope of multimodal query resolution.
However, current systems struggle with intent understanding, information
retrieval, and safety filtering, limiting their effectiveness. This paper
introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a
novel multimodal search pipeline that addresses these challenges through a
multi-stage framework comprising image context enrichment, intent refinement,
contextual query generation, external API integration, and relevance-based
filtering. CUE-M incorporates a robust safety framework combining image-based,
text-based, and multimodal classifiers, dynamically adapting to instance- and
category-specific risks. Evaluations on a multimodal Q&A dataset and a public
safety benchmark demonstrate that CUE-M outperforms baselines in accuracy,
knowledge integration, and safety, advancing the capabilities of multimodal
retrieval systems.
[COMMENTS]
Preprint. Under review
[LINK]
http://arxiv.org/abs/2411.12287v1
[DATE]
2024-11-19 15:16:48+08:00
[CATEGORIES]
cs.CL
A Review on Generative AI Models for Synthetic Medical Text, Time Series, and Longitudinal Data
[AUTHORS]
Mohammad Loni, Fatemeh Poursalim, Mehdi Asadi, Arash Gharehbaghi
[ABSTRACT]
This paper presents the results of a novel scoping review on the practical
models for generating three different types of synthetic health records (SHRs):
medical text, time series, and longitudinal data. The innovative aspects of the
review, which incorporate study objectives, data modality, and research
methodology of the reviewed studies, uncover the importance and the scope of
the topic for the digital medicine context. In total, 52 publications met the
eligibility criteria for generating medical time series (22), longitudinal data
(17), and medical text (13). Privacy preservation was found to be the main
research objective of the studied papers, along with class imbalance, data
scarcity, and data imputation as the other objectives. The adversarial
network-based, probabilistic, and large language models exhibited superiority
for generating synthetic longitudinal data, time series, and medical texts,
respectively. Finding a reliable performance measure to quantify SHR
re-identification risk is the major research gap of the topic.
[COMMENTS]
27 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.12274v1
[DATE]
2024-11-19 14:53:54+08:00
[CATEGORIES]
cs.LG
cs.CL
Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service
[AUTHORS]
Raphael Merx, Hanna Suominen, Adérito José Guterres Correia, Trevor Cohn, Ekaterina Vylomova
[ABSTRACT]
The impact of machine translation (MT) on low-resource languages remains
poorly understood. In particular, observational studies of actual usage
patterns are scarce. Such studies could provide valuable insights into user
needs and behaviours, complementing survey-based methods. Here we present an
observational analysis of real-world MT usage for Tetun, the lingua franca of
Timor-Leste, using server logs from a widely-used MT service with over $70,000$
monthly active users. Our analysis of $100,000$ translation requests reveals
patterns that challenge assumptions based on existing corpora. We find that
users, many of them students on mobile devices, typically translate short texts
into Tetun across diverse domains including science, healthcare, and daily
life. This contrasts sharply with available Tetun corpora, which are dominated
by news articles covering government and social issues. Our results suggest
that MT systems for languages like Tetun should prioritise translating into the
low-resource language, handling brief inputs effectively, and covering a wide
range of domains relevant to educational contexts. More broadly, this study
demonstrates how observational analysis can inform low-resource language
technology development, by grounding research in practical community needs.
[LINK]
http://arxiv.org/abs/2411.12262v1
[DATE]
2024-11-19 14:21:51+08:00
[CATEGORIES]
cs.CL
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages
[AUTHORS]
S. Tamang, D. J. Bora
[ABSTRACT]
Large Language Models (LLMs) based on transformer architectures have
revolutionized a variety of domains, with tokenization playing a pivotal role
in their pre-processing and fine-tuning stages. In multilingual models,
particularly those tailored for Indic languages, effective tokenization is
crucial for optimizing performance. This paper presents a comprehensive
evaluation of tokenizers used by 12 LLMs across all 22 official languages of
India, with a focus on comparing the efficiency of their tokenization
processes. We employed the Normalized Sequence Length (NSL) as a key metric in
our analysis. Our findings reveal that the SUTRA tokenizer outperforms all
other models, including several Indic-specific models, excelling in 14
languages. Notable insights include the SUTRA tokenizer’s superior handling of
Indic languages, GPT-4o’s advancement over its predecessor GPT-4 in processing
Indian languages, and the limited performance of Project Indus in certain
languages. This study underscores the critical importance of developing
targeted tokenization strategies for multilingual and Indic-centric models,
laying the groundwork for future improvements in tokenizer design to enhance
linguistic coverage and model efficiency.
[LINK]
http://arxiv.org/abs/2411.12240v1
[DATE]
2024-11-19 13:37:17+08:00
[CATEGORIES]
cs.CL
BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?
[AUTHORS]
Zongmeng Zhang, Jinhua Zhu, Wengang Zhou, Xiang Qi, Peng Zhang, Houqiang Li
[COMMENTS]
Findings of the Association for Computational Linguistics: EMNLP 2024
[LINK]
http://arxiv.org/abs/2411.12235v1
[DATE]
2024-11-19 13:19:53+08:00
[CATEGORIES]
cs.CL
Refusal in LLMs is an Affine Function
[AUTHORS]
Thomas Marshall, Adam Scherlis, Nora Belrose
[ABSTRACT]
We propose affine concept editing (ACE) as an approach for steering language
models’ behavior by intervening directly in activations. We begin with an
affine decomposition of model activation vectors and show that prior methods
for steering model behavior correspond to subsets of terms of this
decomposition. We then provide a derivation of ACE and use it to control
refusal behavior on ten different models, including Llama 3 70B. ACE combines
affine subspace projection and activation addition to reliably control the
model’s refusal responses across prompt types. We evaluate the results using
LLM-based scoring on a collection of harmful and harmless prompts. Our
experiments demonstrate that ACE consistently achieves more precise control
over model behavior than existing methods and generalizes to models where
directional ablation via affine subspace projection alone produces incoherent
outputs. Code for reproducing our results is available at
https://github.com/EleutherAI/steering-llama3 .
[COMMENTS]
added plots for results from additional models
[LINK]
http://arxiv.org/abs/2411.09003v2
[DATE]
2024-11-19 12:53:47+08:00
[CATEGORIES]
cs.LG
cs.CL
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
[AUTHORS]
Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua
[ABSTRACT]
The recent advancements in large language models (LLMs) and pre-trained
vision models have accelerated the development of vision-language large models
(VLLMs), enhancing the interaction between visual and linguistic modalities.
Despite their notable success across various domains, VLLMs face challenges in
modality alignment, which can lead to issues like hallucinations and unsafe
content generation. Current alignment techniques often rely on coarse feedback
and external datasets, limiting scalability and performance. In this paper, we
propose FiSAO (Fine-Grained Self-Alignment Optimization), a novel
self-alignment method that utilizes the model’s own visual encoder as a
fine-grained verifier to improve vision-language alignment without the need for
additional data. By leveraging token-level feedback from the vision encoder,
FiSAO significantly improves vision-language alignment, even surpassing
traditional preference tuning methods that require additional data. Through
both theoretical analysis and experimental validation, we demonstrate that
FiSAO effectively addresses the misalignment problem in VLLMs, marking the
first instance of token-level rewards being applied to such models.
[COMMENTS]
23 pages
[LINK]
http://arxiv.org/abs/2410.14148v3
[DATE]
2024-11-19 11:08:34+08:00
[CATEGORIES]
cs.CL
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
[AUTHORS]
Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua
[ABSTRACT]
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong
reasoning abilities across multiple modalities, achieving significant
breakthroughs in various real-world applications. Despite this great success,
the safety guardrail of LVLMs may not cover the unforeseen domains introduced
by the visual modality. Existing studies primarily focus on eliciting LVLMs to
generate harmful responses via carefully crafted image-based jailbreaks
designed to bypass alignment defenses. In this study, we reveal that a safe
image can be exploited to achieve the same jailbreak consequence when combined
with additional safe images and prompts. This stems from two fundamental
properties of LVLMs: universal reasoning capabilities and safety snowball
effect. Building on these insights, we propose Safety Snowball Agent (SSA), a
novel agent-based framework leveraging agents’ autonomous and tool-using
abilities to jailbreak LVLMs. SSA operates through two principal stages: (1)
initial response generation, where tools generate or retrieve jailbreak images
based on potential harmful intents, and (2) harmful snowballing, where refined
subsequent prompts induce progressively harmful outputs. Our experiments
demonstrate that \ours can use nearly any image to induce LVLMs to produce
unsafe content, achieving high success jailbreaking rates against the latest
LVLMs. Unlike prior works that exploit alignment flaws, \ours leverages the
inherent properties of LVLMs, presenting a profound challenge for enforcing
safety in generative multimodal systems. Our code is avaliable at
\url{https://github.com/gzcch/Safety_Snowball_Agent}.
[LINK]
http://arxiv.org/abs/2411.11496v2
[DATE]
2024-11-19 11:01:43+08:00
[CATEGORIES]
cs.CL
Multi-LoRA Composition for Image Generation
[AUTHORS]
Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen
[ABSTRACT]
Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models
for the accurate rendition of specific elements like distinct characters or
unique styles in generated images. Nonetheless, existing methods face
challenges in effectively composing multiple LoRAs, especially as the number of
LoRAs to be integrated grows, thus hindering the creation of complex imagery.
In this paper, we study multi-LoRA composition through a decoding-centric
perspective. We present two training-free methods: LoRA Switch, which
alternates between different LoRAs at each denoising step, and LoRA Composite,
which simultaneously incorporates all LoRAs to guide more cohesive image
synthesis. To evaluate the proposed approaches, we establish ComposLoRA, a new
comprehensive testbed as part of this research. It features a diverse range of
LoRA categories with 480 composition sets. Utilizing an evaluation framework
based on GPT-4V, our findings demonstrate a clear improvement in performance
with our methods over the prevalent baseline, particularly evident when
increasing the number of LoRAs in a composition. The code, benchmarks, LoRA
weights, and all evaluation details are available on our project website:
https://maszhongming.github.io/Multi-LoRA-Composition.
[COMMENTS]
Transactions on Machine Learning Research (TMLR), 2024
[LINK]
http://arxiv.org/abs/2402.16843v2
[DATE]
2024-11-19 10:52:45+08:00
[CATEGORIES]
cs.CL
cs.LG
A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation
[AUTHORS]
Jiajing Chen, Shuo Wang, Zhen Qi, Zhenhong Zhang, Chihang Wang, Hongye Zheng
[ABSTRACT]
This research introduces a novel text generation model that combines BERT’s
semantic interpretation strengths with GPT-4’s generative capabilities,
establishing a high standard in generating coherent, contextually accurate
language. Through the combined architecture, the model enhances semantic depth
and maintains smooth, human-like text flow, overcoming limitations seen in
prior models. Experimental benchmarks reveal that BERT-GPT-4 surpasses
traditional models, including GPT-3, T5, BART, Transformer-XL, and CTRL, in key
metrics like Perplexity and BLEU, showcasing its superior natural language
generation performance. By fully utilizing contextual information, this hybrid
model generates text that is not only logically coherent but also aligns
closely with human language patterns, providing an advanced solution for text
generation tasks. This research highlights the potential of integrating
semantic understanding with advanced generative models, contributing new
insights for NLP, and setting a foundation for broader applications of
large-scale generative architectures in areas such as automated writing,
question-answer systems, and adaptive conversational agents.
[LINK]
http://arxiv.org/abs/2411.12157v1
[DATE]
2024-11-19 09:41:56+08:00
[CATEGORIES]
cs.CL
CoMeDi Shared Task: Models as Annotators in Lexical Semantics Disagreements
[AUTHORS]
Zhu Liu, Zhen Hu, Ying Liu
[ABSTRACT]
We present the results of our system for the CoMeDi Shared Task, which
predicts majority votes (Subtask 1) and annotator disagreements (Subtask 2).
Our approach combines model ensemble strategies with MLP-based and
threshold-based methods trained on pretrained language models. Treating
individual models as virtual annotators, we simulate the annotation process by
designing aggregation measures that incorporate continuous similarity scores
and discrete classification labels to capture both majority and disagreement.
Additionally, we employ anisotropy removal techniques to enhance performance.
Experimental results demonstrate the effectiveness of our methods, particularly
for Subtask 2. Notably, we find that continuous similarity scores, even within
the same model, align better with human disagreement patterns compared to
aggregated discrete labels.
[COMMENTS]
8 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.12147v1
[DATE]
2024-11-19 08:50:06+08:00
[CATEGORIES]
cs.CL
Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
[AUTHORS]
Tiberiu Musat
[ABSTRACT]
In this paper, I introduce the retrieval problem, a simple reasoning task
that can be solved only by transformers with a minimum number of layers. The
task has an adjustable difficulty that can further increase the required number
of layers to any arbitrary value. I demonstrate that large language models can
solve the task under different prompting formulations without any fine-tuning.
To understand how transformers solve the retrieval problem, I train several
transformers on a minimal formulation. I find that successful learning occurs
only under the presence of an implicit curriculum. I uncover the learned
mechanisms by studying the attention maps in the trained transformers. I also
study the training process, uncovering that attention heads always emerge in a
specific sequence.
[LINK]
http://arxiv.org/abs/2411.12118v1
[DATE]
2024-11-19 07:12:13+08:00
[CATEGORIES]
cs.LG
cs.CL
Mitigating Gender Bias in Contextual Word Embeddings
[AUTHORS]
Navya Yarrabelly, Vinay Damodaran, Feng-Guang Su
[ABSTRACT]
Word embeddings have been shown to produce remarkable results in tackling a
vast majority of NLP related tasks. Unfortunately, word embeddings also capture
the stereotypical biases that are prevalent in society, affecting the
predictive performance of the embeddings when used in downstream tasks. While
various techniques have been proposed \cite{bolukbasi2016man, zhao2018learning}
and criticized\cite{gonen2019lipstick} for static embeddings, very little work
has focused on mitigating bias in contextual embeddings. In this paper, we
propose a novel objective function for MLM(Masked-Language Modeling) which
largely mitigates the gender bias in contextual embeddings and also preserves
the performance for downstream tasks. Since previous works on measuring bias in
contextual embeddings lack in normative reasoning, we also propose novel
evaluation metrics that are straight-forward and aligned with our motivations
in debiasing. We also propose new methods for debiasing static embeddings and
provide empirical proof via extensive analysis and experiments, as to why the
main source of bias in static embeddings stems from the presence of
stereotypical names rather than gendered words themselves. All experiments and
embeddings studied are in English, unless otherwise
specified.\citep{bender2011achieving}.
[LINK]
http://arxiv.org/abs/2411.12074v1
[DATE]
2024-11-19 05:36:44+08:00
[CATEGORIES]
cs.CL
cs.LG
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
[AUTHORS]
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin
[ABSTRACT]
Large language models (LLMs) have shown impressive performance on language
tasks but face challenges when deployed on resource-constrained devices due to
their extensive parameters and reliance on dense multiplications, resulting in
high memory demands and latency bottlenecks. Shift-and-add reparameterization
offers a promising solution by replacing costly multiplications with
hardware-friendly primitives in both the attention and multi-layer perceptron
(MLP) layers of an LLM. However, current reparameterization techniques require
training from scratch or full parameter fine-tuning to restore accuracy, which
is resource-intensive for LLMs. To address this, we propose accelerating
pretrained LLMs through post-training shift-and-add reparameterization,
creating efficient multiplication-free models, dubbed ShiftAddLLM.
Specifically, we quantize each weight matrix into binary matrices paired with
group-wise scaling factors. The associated multiplications are reparameterized
into (1) shifts between activations and scaling factors and (2) queries and
adds according to the binary matrices. To reduce accuracy loss, we present a
multi-objective optimization method to minimize both weight and output
activation reparameterization errors. Additionally, based on varying
sensitivity across layers to reparameterization, we develop an automated bit
allocation strategy to further reduce memory usage and latency. Experiments on
five LLM families and eight tasks consistently validate the effectiveness of
ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points
at comparable or lower latency compared to the most competitive quantized LLMs
at 3 and 2 bits, respectively, and more than 80% memory and energy reductions
over the original LLMs. Codes and models are available at
https://github.com/GATECH-EIC/ShiftAddLLM.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.05981v4
[DATE]
2024-11-19 04:18:32+08:00
[CATEGORIES]
cs.LG
cs.CL
ByteScience: Bridging Unstructured Scientific Literature and Structured Data with Auto Fine-tuned Large Language Model in Token Granularity
[AUTHORS]
Tong Xie, Hanzhi Zhang, Shaozhou Wang, Yuwei Wan, Imran Razzak, Chunyu Kit, Wenjie Zhangand Bram Hoex
[ABSTRACT]
Natural Language Processing (NLP) is widely used to supply summarization
ability from long context to structured information. However, extracting
structured knowledge from scientific text by NLP models remains a challenge
because of its domain-specific nature to complex data preprocessing and the
granularity of multi-layered device-level information. To address this, we
introduce ByteScience, a non-profit cloud-based auto fine-tuned Large Language
Model (LLM) platform, which is designed to extract structured scientific data
and synthesize new scientific knowledge from vast scientific corpora. The
platform capitalizes on DARWIN, an open-source, fine-tuned LLM dedicated to
natural science. The platform was built on Amazon Web Services (AWS) and
provides an automated, user-friendly workflow for custom model development and
data extraction. The platform achieves remarkable accuracy with only a small
amount of well-annotated articles. This innovative tool streamlines the
transition from the science literature to structured knowledge and data and
benefits the advancements in natural informatics.
[LINK]
http://arxiv.org/abs/2411.12000v1
[DATE]
2024-11-19 03:36:26+08:00
[CATEGORIES]
cs.CL
Bi-Mamba: Towards Accurate 1-Bit State Space Models
[AUTHORS]
Shengkun Tang, Liqun Ma, Haonan Li, Mingjie Sun, Zhiqiang Shen
[ABSTRACT]
The typical selective state-space model (SSM) of Mamba addresses several
limitations of Transformers, such as quadratic computational complexity with
sequence length and significant inference-time memory requirements due to the
key-value cache. However, the growing size of Mamba models continues to pose
training and deployment challenges and raises environmental concerns due to
considerable energy consumption. In this work, we introduce Bi-Mamba, a
scalable and powerful 1-bit Mamba architecture designed for more efficient
large language models with multiple sizes across 780M, 1.3B, and 2.7B. Bi-Mamba
models are trained from scratch on data volume as regular LLM pertaining using
an autoregressive distillation loss. Extensive experimental results on language
modeling demonstrate that Bi-Mamba achieves performance comparable to its
full-precision counterparts (e.g., FP16 or BF16) and much better accuracy than
post-training-binarization (PTB) Mamba baselines, while significantly reducing
memory footprint and energy consumption compared to the original Mamba model.
Our study pioneers a new linear computational complexity LLM framework under
low-bit representation and facilitates the future design of specialized
hardware tailored for efficient 1-bit Mamba-based LLMs.
[LINK]
http://arxiv.org/abs/2411.11843v1
[DATE]
2024-11-19 02:59:15+08:00
[CATEGORIES]
cs.CL
Watermark-based Detection and Attribution of AI-Generated Content
[AUTHORS]
Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong
[ABSTRACT]
Several companies have deployed watermark-based detection to identify
AI-generated content. However, attribution–the ability to trace back to the
user of a generative AI (GenAI) service who created a given piece of
AI-generated content–remains largely unexplored despite its growing
importance. In this work, we aim to bridge this gap by conducting the first
systematic study on watermark-based, user-level attribution of AI-generated
content. Our key idea is to assign a unique watermark to each user of the GenAI
service and embed this watermark into the AI-generated content created by that
user. Attribution is then performed by identifying the user whose watermark
best matches the one extracted from the given content. This approach, however,
faces a key challenge: How should watermarks be selected for users to maximize
attribution performance? To address the challenge, we first theoretically
derive lower bounds on detection and attribution performance through rigorous
probabilistic analysis for any given set of user watermarks. Then, we select
watermarks for users to maximize these lower bounds, thereby optimizing
detection and attribution performance. Our theoretical and empirical results
show that watermark-based attribution inherits both the accuracy and
(non-)robustness properties of the underlying watermark. Specifically,
attribution remains highly accurate when the watermarked AI-generated content
is either not post-processed or subjected to common post-processing such as
JPEG compression, as well as black-box adversarial post-processing with limited
query budgets.
[LINK]
http://arxiv.org/abs/2404.04254v2
[DATE]
2024-11-19 02:35:06+08:00
[CATEGORIES]
cs.CL
cs.LG
CNMBert: A Model For Hanyu Pinyin Abbreviation to Character Conversion Task
[AUTHORS]
Zishuo Feng, Feng Cao
[ABSTRACT]
The task of converting Hanyu Pinyin abbreviations to Chinese characters
represents a significant branch within the domain of Chinese Spelling
Correction (CSC). This task is typically one of text-length alignment, however,
due to the limited informational content in pinyin abbreviations, achieving
accurate conversion is challenging. In this paper, we propose CNMBert which
stands for zh-CN Pinyin Multi-mask Bert Model as a solution to this issue.
CNMBert surpasses few-shot GPT models, achieving a 59.63% MRR on a
10,424-sample Hanyu Pinyin abbreviation test dataset.
[COMMENTS]
9 pages, 2figures
[LINK]
http://arxiv.org/abs/2411.11770v1
[DATE]
2024-11-19 01:50:34+08:00
[CATEGORIES]
cs.CL
Drowning in Documents: Consequences of Scaling Reranker Inference
[AUTHORS]
Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin, Omar Khattab, Andrew Drozdov
[LINK]
http://arxiv.org/abs/2411.11767v1
[DATE]
2024-11-19 01:46:32+08:00
[CATEGORIES]
cs.CL
cs.LG
Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking
[AUTHORS]
German Gritsai, Anastasia Voznyuk, Ildar Khabutdinov, Andrey Grabovoy
[ABSTRACT]
The paper describes a system designed by Advacheck team to recognise
machine-generated and human-written texts in the monolingual subtask of GenAI
Detection Task 1 competition. Our developed system is a multi-task architecture
with shared Transformer Encoder between several classification heads. One head
is responsible for binary classification between human-written and
machine-generated texts, while the other heads are auxiliary multiclass
classifiers for texts of different domains from particular datasets. As
multiclass heads were trained to distinguish the domains presented in the data,
they provide a better understanding of the samples. This approach led us to
achieve the first place in the official ranking with 83.07% macro F1-score on
the test set and bypass the baseline by 10%. We further study obtained system
through ablation, error and representation analyses, finding that multi-task
learning outperforms single-task mode and simultaneous tasks form a cluster
structure in embeddings space.
[LINK]
http://arxiv.org/abs/2411.11736v1
[DATE]
2024-11-19 01:03:30+08:00
[CATEGORIES]
cs.CL
FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models
[AUTHORS]
Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Kai Chen, Qiang Yang
[ABSTRACT]
By adapting Large Language Models (LLMs) to domain-specific tasks or
enriching them with domain-specific knowledge, we can fully harness the
capabilities of LLMs. Nonetheless, a gap persists in achieving simultaneous
mutual enhancement between the server’s LLM and the downstream clients’ Small
Language Models (SLMs). To address this, we propose FedCoLLM, a novel and
parameter-efficient federated framework designed for co-tuning LLMs and SLMs.
This approach is aimed at adaptively transferring server-side LLMs knowledge to
clients’ SLMs while simultaneously enriching the LLMs with domain insights from
the clients. To accomplish this, FedCoLLM utilizes lightweight adapters in
conjunction with SLMs, facilitating knowledge exchange between server and
clients in a manner that respects data privacy while also minimizing
computational and communication overhead. Our evaluation of FedCoLLM, utilizing
various public LLMs and SLMs across a range of NLP text generation tasks,
reveals that the performance of clients’ SLMs experiences notable improvements
with the assistance of the LLMs. Simultaneously, the LLMs enhanced via FedCoLLM
achieves comparable performance to that obtained through direct fine-tuning on
clients’ data.
[LINK]
http://arxiv.org/abs/2411.11707v1
[DATE]
2024-11-19 00:34:58+08:00
[CATEGORIES]
cs.CL
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
[AUTHORS]
Jinhao Jiang, Zhipeng Chen, Yingqian Min, Jie Chen, Xiaoxue Cheng, Jiapeng Wang, Yiru Tang, Haoxiang Sun, Jia Deng, Wayne Xin Zhao, Zheng Liu, Dong Yan, Jian Xie, Zhongyuan Wang, Ji-Rong Wen
[ABSTRACT]
Recently, test-time scaling has garnered significant attention from the
research community, largely due to the substantial advancements of the o1 model
released by OpenAI. By allocating more computational resources during the
inference phase, large language models~(LLMs) can extensively explore the
solution space by generating more thought tokens or diverse solutions, thereby
producing more accurate responses. However, developing an o1-like reasoning
approach is challenging, and researchers have been making various attempts to
advance this open area of research. In this paper, we present a preliminary
exploration into enhancing the reasoning abilities of LLMs through
reward-guided tree search algorithms. This framework is implemented by
integrating the policy model, reward model, and search algorithm. It is
primarily constructed around a tree search algorithm, where the policy model
navigates a dynamically expanding tree guided by a specially trained reward
model. We thoroughly explore various design considerations necessary for
implementing this framework and provide a detailed report of the technical
aspects. To assess the effectiveness of our approach, we focus on mathematical
reasoning tasks and conduct extensive evaluations on four challenging datasets,
significantly enhancing the reasoning abilities of LLMs.
[COMMENTS]
LLM;Complex Reasoning;Math
[LINK]
http://arxiv.org/abs/2411.11694v1
[DATE]
2024-11-19 00:15:17+08:00
[CATEGORIES]
cs.CL
Learning the Simplicity of Scattering Amplitudes
[AUTHORS]
Clifford Cheung, Aurélien Dersy, Matthew D. Schwartz
[ABSTRACT]
The simplification and reorganization of complex expressions lies at the core
of scientific progress, particularly in theoretical high-energy physics. This
work explores the application of machine learning to a particular facet of this
challenge: the task of simplifying scattering amplitudes expressed in terms of
spinor-helicity variables. We demonstrate that an encoder-decoder transformer
architecture achieves impressive simplification capabilities for expressions
composed of handfuls of terms. Lengthier expressions are implemented in an
additional embedding network, trained using contrastive learning, which
isolates subexpressions that are more likely to simplify. The resulting
framework is capable of reducing expressions with hundreds of terms - a regular
occurrence in quantum field theory calculations - to vastly simpler equivalent
expressions. Starting from lengthy input expressions, our networks can generate
the Parke-Taylor formula for five-point gluon scattering, as well as new
compact expressions for five-point amplitudes involving scalars and gravitons.
An interactive demonstration can be found at
https://spinorhelicity.streamlit.app .
[COMMENTS]
25+15 pages, 9+6 figures, v2: typos correction and extended the
introduction, conclusion, sections 2.2, 2.4 and appendix F
[LINK]
http://arxiv.org/abs/2408.04720v2
[DATE]
2024-11-19 23:57:07+08:00
[CATEGORIES]
cs.LG
Identifying Differential Patient Care Through Inverse Intent Inference
[AUTHORS]
Hyewon Jeong, Siddharth Nayak, Taylor Killian, Sanjat Kanjilal
[ABSTRACT]
Sepsis is a life-threatening condition defined by end-organ dysfunction due
to a dysregulated host response to infection. Although the Surviving Sepsis
Campaign has launched and has been releasing sepsis treatment guidelines to
unify and normalize the care for sepsis patients, it has been reported in
numerous studies that disparities in care exist across the trajectory of
patient stay in the emergency department and intensive care unit. Here, we
apply a number of reinforcement learning techniques including behavioral
cloning, imitation learning, and inverse reinforcement learning, to learn the
optimal policy in the management of septic patient subgroups using expert
demonstrations. Then we estimate the counterfactual optimal policies by
applying the model to another subset of unseen medical populations and identify
the difference in cure by comparing it to the real policy. Our data comes from
the sepsis cohort of MIMIC-IV and the clinical data warehouses of the Mass
General Brigham healthcare system. The ultimate objective of this work is to
use the optimal learned policy function to estimate the counterfactual
treatment policy and identify deviations across sub-populations of interest. We
hope this approach would help us identify any disparities in care and also
changes in cure in response to the publication of national sepsis treatment
guidelines.
[LINK]
http://arxiv.org/abs/2411.07372v2
[DATE]
2024-11-19 23:53:51+08:00
[CATEGORIES]
cs.LG
Combinatorial Logistic Bandits
[AUTHORS]
Xutong Liu, Xiangxiang Dai, Xuchuang Wang, Mohammad Hajiesmaili, John C. S. Lui
[ABSTRACT]
We introduce a novel framework called combinatorial logistic bandits (CLogB),
where in each round, a subset of base arms (called the super arm) is selected,
with the outcome of each base arm being binary and its expectation following a
logistic parametric model. The feedback is governed by a general arm triggering
process. Our study covers CLogB with reward functions satisfying two smoothness
conditions, capturing application scenarios such as online content delivery,
online learning to rank, and dynamic channel allocation. We first propose a
simple yet efficient algorithm, CLogUCB, utilizing a variance-agnostic
exploration bonus. Under the 1-norm triggering probability modulated (TPM)
smoothness condition, CLogUCB achieves a regret bound of
$\tilde{O}(d\sqrt{\kappa KT})$, where $\tilde{O}$ ignores logarithmic factors,
$d$ is the dimension of the feature vector, $\kappa$ represents the
nonlinearity of the logistic model, and $K$ is the maximum number of base arms
a super arm can trigger. This result improves on prior work by a factor of
$\tilde{O}(\sqrt{\kappa})$. We then enhance CLogUCB with a variance-adaptive
version, VA-CLogUCB, which attains a regret bound of $\tilde{O}(d\sqrt{KT})$
under the same 1-norm TPM condition, improving another
$\tilde{O}(\sqrt{\kappa})$ factor. VA-CLogUCB shows even greater promise under
the stronger triggering probability and variance modulated (TPVM) condition,
achieving a leading $\tilde{O}(d\sqrt{T})$ regret, thus removing the additional
dependency on the action-size $K$. Furthermore, we enhance the computational
efficiency of VA-CLogUCB by eliminating the nonconvex optimization process when
the context feature map is time-invariant while maintaining the tight
$\tilde{O}(d\sqrt{T})$ regret. Finally, experiments on synthetic and real-world
datasets demonstrate the superior performance of our algorithms compared to
benchmark algorithms.
[COMMENTS]
Accepted in ACM SIGMETRICS 2025
[LINK]
http://arxiv.org/abs/2410.17075v2
[DATE]
2024-11-19 23:50:58+08:00
[CATEGORIES]
cs.LG
Can Agents Spontaneously Form a Society? Introducing a Novel Architecture for Generative Multi-Agents to Elicit Social Emergence
[AUTHORS]
H. Zhang, J. Yin, M. Jiang, C. Su
[ABSTRACT]
Generative agents have demonstrated impressive capabilities in specific
tasks, but most of these frameworks focus on independent tasks and lack
attention to social interactions. We introduce a generative agent architecture
called ITCMA-S, which includes a basic framework for individual agents and a
framework called LTRHA that supports social interactions among multi-agents.
This architecture enables agents to identify and filter out behaviors that are
detrimental to social interactions, guiding them to choose more favorable
actions. We designed a sandbox environment to simulate the natural evolution of
social relationships among multiple identity-less agents for experimental
evaluation. The results showed that ITCMA-S performed well on multiple
evaluation indicators, demonstrating its ability to actively explore the
environment, recognize new agents, and acquire new information through
continuous actions and dialogue. Observations show that as agents establish
connections with each other, they spontaneously form cliques with internal
hierarchies around a selected leader and organize collective activities.
[COMMENTS]
13 pages, 8 figures
[LINK]
http://arxiv.org/abs/2409.06750v2
[DATE]
2024-11-19 23:44:30+08:00
[CATEGORIES]
cs.LG
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information
[AUTHORS]
Simone Martino, Domiziano Doria, Chiara Lionello, Matteo Becchi, Giovanni M. Pavan
[ABSTRACT]
Reconstructing the physical complexity of many-body dynamical systems can be
challenging. Starting from the trajectories of their constitutive units (raw
data), typical approaches require selecting appropriate descriptors to convert
them into time-series, which are then analyzed to extract interpretable
information. However, identifying the most effective descriptor is often
non-trivial. Here, we report a data-driven approach to compare the efficiency
of various descriptors in extracting information from noisy trajectories and
translating it into physically relevant insights. As a prototypical system with
non-trivial internal complexity, we analyze molecular dynamics trajectories of
an atomistic system where ice and water coexist in equilibrium near the
solid/liquid transition temperature. We compare general and specific
descriptors often used in aqueous systems: number of neighbors, molecular
velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and
Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from
the fifth neighbor ($d_5$). Using Onion Clustering – an efficient unsupervised
method for single-point time-series analysis – we assess the maximum
extractable information for each descriptor and rank them via a
high-dimensional metric. Our results show that advanced descriptors like SOAP
and LENS outperform classical ones due to higher signal-to-noise ratios.
Nonetheless, even simple descriptors can rival or exceed advanced ones after
local signal denoising. For example, $d_5$, initially among the weakest,
becomes the most effective at resolving the system’s non-local dynamical
complexity after denoising. This work highlights the critical role of noise in
information extraction from molecular trajectories and offers a data-driven
approach to identify optimal descriptors for systems with characteristic
internal complexity.
[COMMENTS]
19 pages, 5 figures + 3 in supporting information (at the bottom of
the manuscript)
[LINK]
http://arxiv.org/abs/2411.12570v1
[DATE]
2024-11-19 23:39:25+08:00
[CATEGORIES]
cs.LG
Approximating Families of Sharp Solutions to Fisher’s Equation with Physics-Informed Neural Networks
[AUTHORS]
Franz M. Rohrhofer, Stefan Posch, Clemens Gößnitzer, Bernhard C. Geiger
[ABSTRACT]
This paper employs physics-informed neural networks (PINNs) to solve Fisher’s
equation, a fundamental reaction-diffusion system with both simplicity and
significance. The focus is on investigating Fisher’s equation under conditions
of large reaction rate coefficients, where solutions exhibit steep traveling
waves that often present challenges for traditional numerical methods. To
address these challenges, a residual weighting scheme is introduced in the
network training to mitigate the difficulties associated with standard PINN
approaches. Additionally, a specialized network architecture designed to
capture traveling wave solutions is explored. The paper also assesses the
ability of PINNs to approximate a family of solutions by generalizing across
multiple reaction rate coefficients. The proposed method demonstrates high
effectiveness in solving Fisher’s equation with large reaction rate
coefficients and shows promise for meshfree solutions of generalized
reaction-diffusion systems.
[COMMENTS]
15 pages, 6 figures
[LINK]
http://arxiv.org/abs/2402.08313v2
[DATE]
2024-11-19 23:29:44+08:00
[CATEGORIES]
cs.LG
Stream-Based Active Learning for Process Monitoring
[AUTHORS]
Christian Capezza, Antonio Lepore, Kamran Paynabar
[ABSTRACT]
Statistical process monitoring (SPM) methods are essential tools in quality
management to check the stability of industrial processes, i.e., to dynamically
classify the process state as in control (IC), under normal operating
conditions, or out of control (OC), otherwise. Traditional SPM methods are
based on unsupervised approaches, which are popular because in most industrial
applications the true OC states of the process are not explicitly known. This
hampered the development of supervised methods that could instead take
advantage of process data containing labels on the true process state, although
they still need improvement in dealing with class imbalance, as OC states are
rare in high-quality processes, and the dynamic recognition of unseen classes,
e.g., the number of possible OC states. This article presents a novel
stream-based active learning strategy for SPM that enhances partially hidden
Markov models to deal with data streams. The ultimate goal is to optimize
labeling resources constrained by a limited budget and dynamically update the
possible OC states. The proposed method performance in classifying the true
state of the process is assessed through a simulation and a case study on the
SPM of a resistance spot welding process in the automotive industry, which
motivated this research.
[LINK]
http://arxiv.org/abs/2411.12563v1
[DATE]
2024-11-19 23:27:54+08:00
[CATEGORIES]
cs.LG
Partially Unitary Learning
[AUTHORS]
Mikhail Gennadievich Belov, Vladislav Gennadievich Malyshkin
[ABSTRACT]
The problem of an optimal mapping between Hilbert spaces $IN$ of
$\left|\psi\right\rangle$ and $OUT$ of $\left|\phi\right\rangle$ based on a set
of wavefunction measurements (within a phase) $\psi_l \to \phi_l$, $l=1\dots
M$, is formulated as an optimization problem maximizing the total fidelity
$\sum_{l=1}^{M} \omega^{(l)}
\left|\langle\phi_l|\mathcal{U}|\psi_l\rangle\right|^2$ subject to probability
preservation constraints on $\mathcal{U}$ (partial unitarity). The constructed
operator $\mathcal{U}$ can be considered as an $IN$ to $OUT$ quantum channel;
it is a partially unitary rectangular matrix (an isometry) of dimension
$\dim(OUT) \times \dim(IN)$ transforming operators as $A^{OUT}=\mathcal{U}
A^{IN} \mathcal{U}^{\dagger}$. An iterative algorithm for finding the global
maximum of this optimization problem is developed, and its application to a
number of problems is demonstrated. A software product implementing the
algorithm is available from the authors.
[COMMENTS]
A working algorithm implementing Partially Unitary Learning
arXiv:2212.14810 has been developed and generalized. See arXiv:2407.04406 for
further generalization to density matrix mappings
[LINK]
http://arxiv.org/abs/2405.10263v2
[DATE]
2024-11-19 23:27:27+08:00
[CATEGORIES]
cs.LG
On Size and Hardness Generalization in Unsupervised Learning for the Travelling Salesman Problem
[AUTHORS]
Yimeng Min, Carla P. Gomes
[ABSTRACT]
We study the generalization capability of Unsupervised Learning in solving
the Travelling Salesman Problem (TSP). We use a Graph Neural Network (GNN)
trained with a surrogate loss function to generate an embedding for each node.
We use these embeddings to construct a heat map that indicates the likelihood
of each edge being part of the optimal route. We then apply local search to
generate our final predictions. Our investigation explores how different
training instance sizes, embedding dimensions, and distributions influence the
outcomes of Unsupervised Learning methods. Our results show that training with
larger instance sizes and increasing embedding dimensions can build a more
effective representation, enhancing the model’s ability to solve TSP.
Furthermore, in evaluating generalization across different distributions, we
first determine the hardness of various distributions and explore how different
hardnesses affect the final results. Our findings suggest that models trained
on harder instances exhibit better generalization capabilities, highlighting
the importance of selecting appropriate training instances in solving TSP using
Unsupervised Learning.
[LINK]
http://arxiv.org/abs/2403.20212v2
[DATE]
2024-11-19 23:23:29+08:00
[CATEGORIES]
cs.LG
UMGAD: Unsupervised Multiplex Graph Anomaly Detection
[AUTHORS]
Xiang Li, Jianpeng Qi, Zhongying Zhao, Guanjie Zheng, Lei Cao, Junyu Dong, Yanwei Yu
[ABSTRACT]
Graph anomaly detection (GAD) is a critical task in graph machine learning,
with the primary objective of identifying anomalous nodes that deviate
significantly from the majority. This task is widely applied in various
real-world scenarios, including fraud detection and social network analysis.
However, existing GAD methods still face two major challenges: (1) They are
often limited to detecting anomalies in single-type interaction graphs and
struggle with multiple interaction types in multiplex heterogeneous graphs; (2)
In unsupervised scenarios, selecting appropriate anomaly score thresholds
remains a significant challenge for accurate anomaly detection. To address the
above challenges, we propose a novel Unsupervised Multiplex Graph Anomaly
Detection method, named UMGAD. We first learn multi-relational correlations
among nodes in multiplex heterogeneous graphs and capture anomaly information
during node attribute and structure reconstruction through graph-masked
autoencoder (GMAE). Then, to further weaken the influence of noise and
redundant information on abnormal information extraction, we generate
attribute-level and subgraph-level augmented-view graphs respectively, and
perform attribute and structure reconstruction through GMAE. Finally, We learn
to optimize node attributes and structural features through contrastive
learning between original-view and augmented-view graphs to improve the model’s
ability to capture anomalies. Meanwhile, we also propose a new anomaly score
threshold selection strategy, which allows the model to be independent of the
ground truth in real unsupervised scenarios. Extensive experiments on four
datasets show that our \model significantly outperforms state-of-the-art
methods, achieving average improvements of 13.48% in AUC and 11.68% in Macro-F1
across all datasets.
[LINK]
http://arxiv.org/abs/2411.12556v1
[DATE]
2024-11-19 23:15:45+08:00
[CATEGORIES]
cs.LG
Machine Learning Algorithms to Assess Site Closure Time Frames for Soil and Groundwater Contamination
[AUTHORS]
Vu-Anh Le, Haruko Murakami Wainwright, Hansell Gonzalez-Raymat, Carol Eddy-Dilek
[ABSTRACT]
Monitored Natural Attenuation (MNA) is gaining prominence as an effective
method for managing soil and groundwater contamination due to its
cost-efficiency and minimal environmental disruption. Despite its benefits, MNA
necessitates extensive groundwater monitoring to ensure that contaminant levels
decrease to meet safety standards. This study expands the capabilities of
PyLEnM, a Python package designed for long-term environmental monitoring, by
incorporating new algorithms to enhance its predictive and analytical
functionalities. We introduce methods to estimate the timeframe required for
contaminants like Sr-90 and I-129 to reach regulatory safety standards using
linear regression and to forecast future contaminant levels with the
Bidirectional Long Short-Term Memory (Bi-LSTM) networks. Additionally, Random
Forest regression is employed to identify factors influencing the time to reach
safety standards. Our methods are illustrated using data from the Savannah
River Site (SRS) F-Area, where preliminary findings reveal a notable downward
trend in contaminant levels, with variability linked to initial concentrations
and groundwater flow dynamics. The Bi-LSTM model effectively predicts
contaminant concentrations for the next four years, demonstrating the potential
of advanced time series analysis to improve MNA strategies and reduce reliance
on manual groundwater sampling. The code, along with its usage instructions,
validation, and requirements, is available at:
https://github.com/csplevuanh/pylenm_extension.
[COMMENTS]
The paper will be withdrawn to fix some work issues with the sections
on Bi-LSTM models
[LINK]
http://arxiv.org/abs/2411.10214v2
[DATE]
2024-11-19 23:09:10+08:00
[CATEGORIES]
cs.LG
S3TU-Net: Structured Convolution and Superpixel Transformer for Lung Nodule Segmentation
[AUTHORS]
Yuke Wu, Xiang Liu, Yunyu Shi, Xinyi Chen, Zhenglei Wang, YuQing Xu, Shuo Hong Wang
[ABSTRACT]
The irregular and challenging characteristics of lung adenocarcinoma nodules
in computed tomography (CT) images complicate staging diagnosis, making
accurate segmentation critical for clinicians to extract detailed lesion
information. In this study, we propose a segmentation model, S3TU-Net, which
integrates multi-dimensional spatial connectors and a superpixel-based visual
transformer. S3TU-Net is built on a multi-view CNN-Transformer hybrid
architecture, incorporating superpixel algorithms, structured weighting, and
spatial shifting techniques to achieve superior segmentation performance. The
model leverages structured convolution blocks (DWF-Conv/D2BR-Conv) to extract
multi-scale local features while mitigating overfitting. To enhance multi-scale
feature fusion, we introduce the S2-MLP Link, integrating spatial shifting and
attention mechanisms at the skip connections. Additionally, the residual-based
superpixel visual transformer (RM-SViT) effectively merges global and local
features by employing sparse correlation learning and multi-branch attention to
capture long-range dependencies, with residual connections enhancing stability
and computational efficiency. Experimental results on the LIDC-IDRI dataset
demonstrate that S3TU-Net achieves a DSC, precision, and IoU of 89.04%, 90.73%,
and 90.70%, respectively. Compared to recent methods, S3TU-Net improves DSC by
4.52% and sensitivity by 3.16%, with other metrics showing an approximate 2%
increase. In addition to comparison and ablation studies, we validated the
generalization ability of our model on the EPDB private dataset, achieving a
DSC of 86.40%.
[LINK]
http://arxiv.org/abs/2411.12547v1
[DATE]
2024-11-19 23:00:18+08:00
[CATEGORIES]
cs.LG
MaIL: Improving Imitation Learning with Mamba
[AUTHORS]
Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann
[ABSTRACT]
This work presents Mamba Imitation Learning (MaIL), a novel imitation
learning (IL) architecture that provides an alternative to state-of-the-art
(SoTA) Transformer-based policies. MaIL leverages Mamba, a state-space model
designed to selectively focus on key features of the data. While Transformers
are highly effective in data-rich environments due to their dense attention
mechanisms, they can struggle with smaller datasets, often leading to
overfitting or suboptimal representation learning. In contrast, Mamba’s
architecture enhances representation learning efficiency by focusing on key
features and reducing model complexity. This approach mitigates overfitting and
enhances generalization, even when working with limited data. Extensive
evaluations on the LIBERO benchmark demonstrate that MaIL consistently
outperforms Transformers on all LIBERO tasks with limited data and matches
their performance when the full dataset is available. Additionally, MaIL’s
effectiveness is validated through its superior performance in three real robot
experiments. Our code is available at https://github.com/ALRhub/MaIL.
[LINK]
http://arxiv.org/abs/2406.08234v2
[DATE]
2024-11-19 22:44:36+08:00
[CATEGORIES]
cs.LG
Multistep Consistency Models
[AUTHORS]
Jonathan Heek, Emiel Hoogeboom, Tim Salimans
[ABSTRACT]
Diffusion models are relatively easy to train but require many steps to
generate samples. Consistency models are far more difficult to train, but
generate samples in a single step.
In this paper we propose Multistep Consistency Models: A unification between
Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that
can interpolate between a consistency model and a diffusion model: a trade-off
between sampling speed and sampling quality. Specifically, a 1-step consistency
model is a conventional consistency model whereas a $\infty$-step consistency
model is a diffusion model.
Multistep Consistency Models work really well in practice. By increasing the
sample budget from a single step to 2-8 steps, we can train models more easily
that generate higher quality samples, while retaining much of the sampling
speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1
FID on Imagenet128 in 8 steps with consistency distillation, using simple
losses without adversarial training. We also show that our method scales to a
text-to-image diffusion model, generating samples that are close to the quality
of the original model.
[LINK]
http://arxiv.org/abs/2403.06807v3
[DATE]
2024-11-19 22:31:02+08:00
[CATEGORIES]
cs.LG
Robust Pareto Set Identification with Contaminated Bandit Feedback
[AUTHORS]
İlter Onat Korkmaz, Efe Eren Ceyani, Kerem Bozgan, Cem Tekin
[ABSTRACT]
We consider the Pareto set identification (PSI) problem in multi-objective
multi-armed bandits (MO-MAB) with contaminated reward observations. At each arm
pull, with some fixed probability, the true reward samples are replaced with
the samples from an arbitrary contamination distribution chosen by an
adversary. We consider ({\alpha}, {\delta})-PAC PSI and propose a sample
median-based multi-objective adaptive elimination algorithm that returns an
({\alpha}, {\delta})- PAC Pareto set upon termination with a sample complexity
bound that depends on the contamination probability. As the contamination
probability decreases, we recover the wellknown sample complexity results in
MO-MAB. We compare the proposed algorithm with a mean-based method from MO-MAB
literature, as well as an extended version that uses median estimators, on
several PSI problems under adversarial corruptions, including review bombing
and diabetes management. Our numerical results support our theoretical findings
and demonstrate that robust algorithm design is crucial for accurate PSI under
contaminated reward observations.
[LINK]
http://arxiv.org/abs/2206.02666v2
[DATE]
2024-11-19 22:30:22+08:00
[CATEGORIES]
cs.LG
Data Pruning in Generative Diffusion Models
[AUTHORS]
Rania Briq, Jiangtao Wang, Steffan Kesselheim
[ABSTRACT]
Data pruning is the problem of identifying a core subset that is most
beneficial to training and discarding the remainder. While pruning strategies
are well studied for discriminative models like those used in classification,
little research has gone into their application to generative models.
Generative models aim to estimate the underlying distribution of the data, so
presumably they should benefit from larger datasets. In this work we aim to
shed light on the accuracy of this statement, specifically answer the question
of whether data pruning for generative diffusion models could have a positive
impact. Contrary to intuition, we show that eliminating redundant or noisy data
in large datasets is beneficial particularly when done strategically. We
experiment with several pruning methods including recent-state-of-art methods,
and evaluate over CelebA-HQ and ImageNet datasets. We demonstrate that a simple
clustering method outperforms other sophisticated and computationally demanding
methods. We further exhibit how we can leverage clustering to balance skewed
datasets in an unsupervised manner to allow fair sampling for underrepresented
populations in the data distribution, which is a crucial problem in generative
models.
[LINK]
http://arxiv.org/abs/2411.12523v1
[DATE]
2024-11-19 22:13:25+08:00
[CATEGORIES]
cs.LG
Asymptotic and Non-Asymptotic Convergence of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis
[AUTHORS]
Ruinan Jin, Xiaoyu Wang, Baoxiang Wang
[ABSTRACT]
Adaptive optimizers have emerged as powerful tools in deep learning,
dynamically adjusting the learning rate based on iterative gradients. These
adaptive methods have significantly succeeded in various deep learning tasks,
outperforming stochastic gradient descent (SGD). However, despite AdaGrad’s
status as a cornerstone of adaptive optimization, its theoretical analysis has
not adequately addressed key aspects such as asymptotic convergence and
non-asymptotic convergence rates in non-convex optimization scenarios. This
study aims to provide a comprehensive analysis of AdaGrad, filling the existing
gaps in the literature. We introduce an innovative stopping time technique from
probabilistic theory, which allows us to establish the stability of AdaGrad
under mild conditions for the first time. We further derive the asymptotically
almost sure and mean-square convergence for AdaGrad. In addition, we
demonstrate the near-optimal non-asymptotic convergence rate measured by the
average-squared gradients in expectation, which is stronger than the existing
high-probability results. The techniques developed in this work are potentially
independent of interest for future research on other adaptive stochastic
algorithms.
[COMMENTS]
50 pages
[LINK]
http://arxiv.org/abs/2409.05023v2
[DATE]
2024-11-19 21:57:39+08:00
[CATEGORIES]
cs.LG
Transformer Neural Processes – Kernel Regression
[AUTHORS]
Daniel Jenson, Jhonathan Navott, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman
[ABSTRACT]
Stochastic processes model various natural phenomena from disease
transmission to stock prices, but simulating and quantifying their uncertainty
can be computationally challenging. For example, modeling a Gaussian Process
with standard statistical methods incurs an $\mathcal{O}(n^3)$ penalty, and
even using state-of-the-art Neural Processes (NPs) incurs an $\mathcal{O}(n^2)$
penalty due to the attention mechanism. We introduce the Transformer Neural
Process - Kernel Regression (TNP-KR), a new architecture that incorporates a
novel transformer block we call a Kernel Regression Block (KRBlock), which
reduces the computational complexity of attention in transformer-based Neural
Processes (TNPs) from $\mathcal{O}((n_C+n_T)^2)$ to $O(n_C^2+n_Cn_T)$ by
eliminating masked computations, where $n_C$ is the number of context, and
$n_T$ is the number of test points, respectively, and a fast attention variant
that further reduces all attention calculations to $\mathcal{O}(n_C)$ in space
and time complexity. In benchmarks spanning such tasks as meta-regression,
Bayesian optimization, and image completion, we demonstrate that the full
variant matches the performance of state-of-the-art methods while training
faster and scaling two orders of magnitude higher in number of test points, and
the fast variant nearly matches that performance while scaling to millions of
both test and context points on consumer hardware.
[LINK]
http://arxiv.org/abs/2411.12502v1
[DATE]
2024-11-19 21:40:49+08:00
[CATEGORIES]
cs.LG
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus
[AUTHORS]
Terufumi Morishita, Gaku Morio, Atsuki Yamaguchi, Yasuhiro Sogawa
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.12498v1
[DATE]
2024-11-19 21:31:53+08:00
[CATEGORIES]
cs.LG
Variational Bayesian Bow tie Neural Networks with Shrinkage
[AUTHORS]
Alisa Sheinkman, Sara Wade
[ABSTRACT]
Despite the dominant role of deep models in machine learning, limitations
persist, including overconfident predictions, susceptibility to adversarial
attacks, and underestimation of variability in predictions. The Bayesian
paradigm provides a natural framework to overcome such issues and has become
the gold standard for uncertainty estimation with deep models, also providing
improved accuracy and a framework for tuning critical hyperparameters. However,
exact Bayesian inference is challenging, typically involving variational
algorithms that impose strong independence and distributional assumptions.
Moreover, existing methods are sensitive to the architectural choice of the
network. We address these issues by constructing a relaxed version of the
standard feed-forward rectified neural network, and employing Polya-Gamma data
augmentation tricks to render a conditionally linear and Gaussian model.
Additionally, we use sparsity-promoting priors on the weights of the neural
network for data-driven architectural design. To approximate the posterior, we
derive a variational inference algorithm that avoids distributional assumptions
and independence across layers and is a faster alternative to the usual Markov
Chain Monte Carlo schemes.
[LINK]
http://arxiv.org/abs/2411.11132v2
[DATE]
2024-11-19 21:13:58+08:00
[CATEGORIES]
cs.LG
S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction
[AUTHORS]
Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi
[ABSTRACT]
We address the video prediction task by putting forth a novel model that
combines (i) a novel hierarchical residual learning vector quantized
variational autoencoder (HR-VQVAE), and (ii) a novel autoregressive
spatiotemporal predictive model (AST-PM). We refer to this approach as a
sequential hierarchical residual learning vector quantized variational
autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE
at modeling still images with a parsimonious representation, combined with the
AST-PM’s ability to handle spatiotemporal information, S-HR-VQVAE can better
deal with major challenges in video prediction. These include learning
spatiotemporal information, handling high dimensional data, combating blurry
prediction, and implicit modeling of physical characteristics. Extensive
experimental results on four challenging tasks, namely KTH Human Action,
TrafficBJ, Human3.6M, and Kitti, demonstrate that our model compares favorably
against state-of-the-art video prediction techniques both in quantitative and
qualitative evaluations despite a much smaller model size. Finally, we boost
S-HR-VQVAE by proposing a novel training method to jointly estimate the
HR-VQVAE and AST-PM parameters.
[COMMENTS]
12 pages, 6 figures, 5 tables. Accepted for publication on IEEE
Transactions on Multimedia on 2024-11-19
[LINK]
http://arxiv.org/abs/2307.06701v3
[DATE]
2024-11-19 21:09:06+08:00
[CATEGORIES]
cs.LG
Comparing Prior and Learned Time Representations in Transformer Models of Timeseries
[AUTHORS]
Natalia Koliou, Tatiana Boura, Stasinos Konstantopoulos, George Meramveliotakis, George Kosmadakis
[ABSTRACT]
What sets timeseries analysis apart from other machine learning exercises is
that time representation becomes a primary aspect of the experiment setup, as
it must adequately represent the temporal relations that are relevant for the
application at hand. In the work described here we study wo different
variations of the Transformer architecture: one where we use the fixed time
representation proposed in the literature and one where the time representation
is learned from the data. Our experiments use data from predicting the energy
output of solar panels, a task that exhibits known periodicities (daily and
seasonal) that is straight-forward to encode in the fixed time representation.
Our results indicate that even in an experiment where the phenomenon is
well-understood, it is difficult to encode prior knowledge due to side-effects
that are difficult to mitigate. We conclude that research work is needed to
work the human into the learning loop in ways that improve the robustness and
trust-worthiness of the network.
[COMMENTS]
Presented at the AI in Natural Sciences and Technology (AINST) track
of the 13th Conference on Artificial Intelligence (SETN 2024), 11-13
September 2024, Piraeus, Greece
[LINK]
http://arxiv.org/abs/2411.12476v1
[DATE]
2024-11-19 20:56:43+08:00
[CATEGORIES]
cs.LG
AI Flow at the Network Edge
[AUTHORS]
Jiawei Shao, Xuelong Li
[ABSTRACT]
Recent advancements in large language models (LLMs) and their multimodal
variants have led to remarkable progress across various domains, demonstrating
impressive capabilities and unprecedented potential. In the era of ubiquitous
connectivity, leveraging communication networks to distribute intelligence is a
transformative concept, envisioning AI-powered services accessible at the
network edge. However, pushing large models from the cloud to
resource-constrained environments faces critical challenges. Model inference on
low-end devices leads to excessive latency and performance bottlenecks, while
raw data transmission over limited bandwidth networks causes high communication
overhead. This article presents AI Flow, a framework that streamlines the
inference process by jointly leveraging the heterogeneous resources available
across devices, edge nodes, and cloud servers, making intelligence flow across
networks. To facilitate cooperation among multiple computational nodes, the
proposed framework explores a paradigm shift in the design of communication
network systems from transmitting information flow to intelligence flow, where
the goal of communications is task-oriented and folded into the inference
process. Experimental results demonstrate the effectiveness of the proposed
framework through an image captioning use case, showcasing the ability to
reduce response latency while maintaining high-quality captions. This article
serves as a position paper for identifying the motivation, challenges, and
principles of AI Flow.
[LINK]
http://arxiv.org/abs/2411.12469v1
[DATE]
2024-11-19 20:51:17+08:00
[CATEGORIES]
cs.LG
Mixed-Output Gaussian Process Latent Variable Models
[AUTHORS]
James Odgers, Ruby Sedgwick, Chrysoula Kappatou, Ruth Misener, Sarah Filippi
[ABSTRACT]
This work develops a Bayesian non-parametric approach to signal separation
where the signals may vary according to latent variables. Our key contribution
is to augment Gaussian Process Latent Variable Models (GPLVMs) for the case
where each data point comprises the weighted sum of a known number of pure
component signals, observed across several input locations. Our framework
allows arbitrary non-linear variations in the signals while being able to
incorporate useful priors for the linear weights, such as summing-to-one. Our
contributions are particularly relevant to spectroscopy, where changing
conditions may cause the underlying pure component signals to vary from sample
to sample. To demonstrate the applicability to both spectroscopy and other
domains, we consider several applications: a near-infrared spectroscopy dataset
with varying temperatures, a simulated dataset for identifying flow
configuration through a pipe, and a dataset for determining the type of rock
from its reflectance.
[LINK]
http://arxiv.org/abs/2402.09122v2
[DATE]
2024-11-19 20:40:43+08:00
[CATEGORIES]
cs.LG
Wavelets Are All You Need for Autoregressive Image Generation
[AUTHORS]
Wael Mattar, Idan Levy, Nir Sharon, Shai Dekel
[ABSTRACT]
In this paper, we take a new approach to autoregressive image generation that
is based on two main ingredients. The first is wavelet image coding, which
allows to tokenize the visual details of an image from coarse to fine details
by ordering the information starting with the most significant bits of the most
significant wavelet coefficients. The second is a variant of a language
transformer whose architecture is re-designed and optimized for token sequences
in this ‘wavelet language’. The transformer learns the significant statistical
correlations within a token sequence, which are the manifestations of
well-known correlations between the wavelet subbands at various resolutions. We
show experimental results with conditioning on the generation process.
[COMMENTS]
17 pages, 11 figures
[LINK]
http://arxiv.org/abs/2406.19997v2
[DATE]
2024-11-19 20:28:19+08:00
[CATEGORIES]
cs.LG
Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models – A review and challenges for practice
[AUTHORS]
Flavio Hafner, Chang Sun
[ABSTRACT]
Synthetic data generators, when trained using privacy-preserving techniques
like differential privacy, promise to produce synthetic data with formal
privacy guarantees, facilitating the sharing of sensitive data. However, it is
crucial to empirically assess the privacy risks associated with the generated
synthetic data before deploying generative technologies. This paper outlines
the key concepts and assumptions underlying empirical privacy evaluation in
machine learning-based generative and predictive models. Then, this paper
explores the practical challenges for privacy evaluations of generative models
for use cases with millions of training records, such as data from statistical
agencies and healthcare providers. Our findings indicate that methods designed
to verify the correct operation of the training algorithm are effective for
large datasets, but they often assume an adversary that is unrealistic in many
scenarios. Based on the findings, we highlight a crucial trade-off between the
computational feasibility of the evaluation and the level of realism of the
assumed threat model. Finally, we conclude with ideas and suggestions for
future research.
[LINK]
http://arxiv.org/abs/2411.12451v1
[DATE]
2024-11-19 20:19:28+08:00
[CATEGORIES]
cs.LG
Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures
[AUTHORS]
Prashanti Anderson, Mitali Bafna, Rares-Darius Buhai, Pravesh K. Kothari, David Steurer
[ABSTRACT]
We develop a new approach for clustering non-spherical (i.e., arbitrary
component covariances) Gaussian mixture models via a subroutine, based on the
sum-of-squares method, that finds a low-dimensional separation-preserving
projection of the input data. Our method gives a non-spherical analog of the
classical dimension reduction, based on singular value decomposition, that
forms a key component of the celebrated spherical clustering algorithm of
Vempala and Wang [VW04] (in addition to several other applications).
As applications, we obtain an algorithm to (1) cluster an arbitrary
total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians
with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and
$\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation
separated mixture of $k$ Gaussians with identical but arbitrary unknown
covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and
$n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight
of the input mixture, and $f$ does not depend on the dimension $d$. Our
algorithms naturally extend to tolerating a dimension-independent fraction of
arbitrary outliers. Before this work, the techniques in the state-of-the-art
non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ time and
samples for clustering such mixtures.
Our results may come as a surprise in the context of the $d^{\Omega(k)}$
statistical query lower bound [DKS17] for clustering non-spherical Gaussian
mixtures. While this result is usually thought to rule out $d^{o(k)}$ cost
algorithms for the problem, our results show that the lower bounds can in fact
be circumvented for a remarkably general class of Gaussian mixtures.
[COMMENTS]
64 pages
[LINK]
http://arxiv.org/abs/2411.12438v1
[DATE]
2024-11-19 19:58:51+08:00
[CATEGORIES]
cs.LG
STRisk: A Socio-Technical Approach to Assess Hacking Breaches Risk
[AUTHORS]
Hicham Hammouchi, Narjisse Nejjari, Ghita Mezzour, Mounir Ghogho, Houda Benbrahim
[ABSTRACT]
Data breaches have begun to take on new dimensions and their prediction is
becoming of great importance to organizations. Prior work has addressed this
issue mainly from a technical perspective and neglected other interfering
aspects such as the social media dimension. To fill this gap, we propose STRisk
which is a predictive system where we expand the scope of the prediction task
by bringing into play the social media dimension. We study over 3800 US
organizations including both victim and non-victim organizations. For each
organization, we design a profile composed of a variety of externally measured
technical indicators and social factors. In addition, to account for unreported
incidents, we consider the non-victim sample to be noisy and propose a noise
correction approach to correct mislabeled organizations. We then build several
machine learning models to predict whether an organization is exposed to
experience a hacking breach. By exploiting both technical and social features,
we achieve a Area Under Curve (AUC) score exceeding 98%, which is 12% higher
than the AUC achieved using only technical features. Furthermore, our feature
importance analysis reveals that open ports and expired certificates are the
best technical predictors, while spreadability and agreeability are the best
social predictors.
[LINK]
http://arxiv.org/abs/2411.12435v1
[DATE]
2024-11-19 19:52:10+08:00
[CATEGORIES]
cs.LG
Rethinking cluster-conditioned diffusion models for label-free image synthesis
[AUTHORS]
Nikolas Adaloglou, Tim Kaiser, Felix Michels, Markus Kollmann
[ABSTRACT]
Diffusion-based image generation models can enhance image quality when
conditioned on ground truth labels. Here, we conduct a comprehensive
experimental study on image-level conditioning for diffusion models using
cluster assignments. We investigate how individual clustering determinants,
such as the number of clusters and the clustering method, impact image
synthesis across three different datasets. Given the optimal number of clusters
with respect to image synthesis, we show that cluster-conditioning can achieve
state-of-the-art performance, with an FID of 1.67 for CIFAR10 and 2.17 for
CIFAR100, along with a strong increase in training sample efficiency. We
further propose a novel empirical method to estimate an upper bound for the
optimal number of clusters. Unlike existing approaches, we find no significant
association between clustering performance and the corresponding
cluster-conditional FID scores. The code is available at
https://github.com/HHU-MMBS/cedm-official-wavc2025.
[COMMENTS]
Accepted in WAVC2025 (21 pages, 15 figures). Code is available at
https://github.com/HHU-MMBS/cedm-official-wavc2025
[LINK]
http://arxiv.org/abs/2403.00570v2
[DATE]
2024-11-19 19:00:38+08:00
[CATEGORIES]
cs.LG
Interpretable Fusion Analytics Framework for fMRI Connectivity: Self-Attention Mechanism and Latent Space Item-Response Model
[AUTHORS]
Jeong-Jae Kim, Yeseul Jeon, SuMin Yu, Junggu Choi, Sanghoon Han
[ABSTRACT]
There have been several attempts to use deep learning based on brain fMRI
signals to classify cognitive impairment diseases. However, deep learning is a
hidden black box model that makes it difficult to interpret the process of
classification. To address this issue, we propose a novel analytical framework
that interprets the classification result from deep learning processes. We
first derive the region of interest (ROI) functional connectivity network (FCN)
by embedding functions based on their similar signal patterns. Then, using the
self-attention equipped deep learning model, we classify diseases based on
their FCN. Finally, in order to interpret the classification results, we employ
a latent space item-response interaction network model to identify the
significant functions that exhibit distinct connectivity patterns when compared
to other diseases. The application of this proposed framework to the four types
of cognitive impairment shows that our approach is valid for determining the
significant ROI functions.
[COMMENTS]
This submission is a duplicate of another manuscript from our
research group [arXiv preprint arXiv:2401.09028] due to a misunderstanding in
communication among co-authors
[LINK]
http://arxiv.org/abs/2207.01581v2
[DATE]
2024-11-19 18:28:29+08:00
[CATEGORIES]
cs.LG
Off-policy estimation with adaptively collected data: the power of online learning
[AUTHORS]
Jeonghwan Lee, Cong Ma
[ABSTRACT]
We consider estimation of a linear functional of the treatment effect using
adaptively collected data. This task finds a variety of applications including
the off-policy evaluation (\textsf{OPE}) in contextual bandits, and estimation
of the average treatment effect (\textsf{ATE}) in causal inference. While a
certain class of augmented inverse propensity weighting (\textsf{AIPW})
estimators enjoys desirable asymptotic properties including the semi-parametric
efficiency, much less is known about their non-asymptotic theory with
adaptively collected data. To fill in the gap, we first establish generic upper
bounds on the mean-squared error of the class of AIPW estimators that crucially
depends on a sequentially weighted error between the treatment effect and its
estimates. Motivated by this, we also propose a general reduction scheme that
allows one to produce a sequence of estimates for the treatment effect via
online learning to minimize the sequentially weighted estimation error. To
illustrate this, we provide three concrete instantiations in (\romannumeral 1)
the tabular case; (\romannumeral 2) the case of linear function approximation;
and (\romannumeral 3) the case of general function approximation for the
outcome model. We then provide a local minimax lower bound to show the
instance-dependent optimality of the \textsf{AIPW} estimator using no-regret
online learning algorithms.
[COMMENTS]
37 pages. Accepted to the 38th Annual Conference on Neural
Information Processing Systems (NeurIPS 2024), Vancouver, British Columbia,
Canada
[LINK]
http://arxiv.org/abs/2411.12786v1
[DATE]
2024-11-19 18:18:27+08:00
[CATEGORIES]
cs.LG
Signaling and Social Learning in Swarms of Robots
[AUTHORS]
Leo Cazenille, Maxime Toquebiau, Nicolas Lobato-Dauzier, Alessia Loi, Loona Macabre, Nathanael Aubert-Kato, Anthony Genot, Nicolas Bredeche
[ABSTRACT]
This paper investigates the role of communication in improving coordination
within robot swarms, focusing on a paradigm where learning and execution occur
simultaneously in a decentralized manner. We highlight the role communication
can play in addressing the credit assignment problem (individual contribution
to the overall performance), and how it can be influenced by it. We propose a
taxonomy of existing and future works on communication, focusing on information
selection and physical abstraction as principal axes for classification: from
low-level lossless compression with raw signal extraction and processing to
high-level lossy compression with structured communication models. The paper
reviews current research from evolutionary robotics, multi-agent (deep)
reinforcement learning, language models, and biophysics models to outline the
challenges and opportunities of communication in a collective of robots that
continuously learn from one another through local message exchanges,
illustrating a form of social learning.
[COMMENTS]
17 pages, 3 Figures
[LINK]
http://arxiv.org/abs/2411.11616v2
[DATE]
2024-11-19 18:11:04+08:00
[CATEGORIES]
cs.LG
Non-IID data in Federated Learning: A Systematic Review with Taxonomy, Metrics, Methods, Frameworks and Future Directions
[AUTHORS]
Daniel M. Jimenez G., David Solans, Mikko Heikkila, Andrea Vitaletti, Nicolas Kourtellis, Aris Anagnostopoulos, Ioannis Chatzigiannakis
[ABSTRACT]
Recent advances in machine learning have highlighted Federated Learning (FL)
as a promising approach that enables multiple distributed users (so-called
clients) to collectively train ML models without sharing their private data.
While this privacy-preserving method shows potential, it struggles when data
across clients is not independent and identically distributed (non-IID) data.
The latter remains an unsolved challenge that can result in poorer model
performance and slower training times. Despite the significance of non-IID data
in FL, there is a lack of consensus among researchers about its classification
and quantification. This systematic review aims to fill that gap by providing a
detailed taxonomy for non-IID data, partition protocols, and metrics to
quantify data heterogeneity. Additionally, we describe popular solutions to
address non-IID data and standardized frameworks employed in FL with
heterogeneous data. Based on our state-of-the-art review, we present key
lessons learned and suggest promising future research directions.
[LINK]
http://arxiv.org/abs/2411.12377v1
[DATE]
2024-11-19 17:53:28+08:00
[CATEGORIES]
cs.LG
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
[AUTHORS]
Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Artem Agarkov, Viacheslav Sinii, Sergey Kolesnikov
[ABSTRACT]
Inspired by the diversity and depth of XLand and the simplicity and
minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and
grid-world environments for meta-reinforcement learning research. Written in
JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run
on GPU or TPU accelerators, democratizing large-scale experimentation with
limited resources. Along with the environments, XLand-MiniGrid provides
pre-sampled benchmarks with millions of unique tasks of varying difficulty and
easy-to-use baselines that allow users to quickly start training adaptive
agents. In addition, we have conducted a preliminary analysis of scaling and
generalization, showing that our baselines are capable of reaching millions of
steps per second during training and validating that the proposed benchmarks
are challenging. XLand-MiniGrid is open-source and available at
https://github.com/dunnolab/xland-minigrid.
[COMMENTS]
Neural Information Processing Systems (NeurIPS 2024) Track on
Datasets and Benchmarks. Source code at
https://github.com/dunnolab/xland-minigrid
[LINK]
http://arxiv.org/abs/2312.12044v4
[DATE]
2024-11-19 17:52:55+08:00
[CATEGORIES]
cs.LG
Smoke and Mirrors in Causal Downstream Tasks
[AUTHORS]
Riccardo Cadei, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, Francesco Locatello
[ABSTRACT]
Machine Learning and AI have the potential to transform data-driven
scientific discovery, enabling accurate predictions for several scientific
phenomena. As many scientific questions are inherently causal, this paper looks
at the causal inference task of treatment effect estimation, where the outcome
of interest is recorded in high-dimensional observations in a Randomized
Controlled Trial (RCT). Despite being the simplest possible causal setting and
a perfect fit for deep learning, we theoretically find that many common choices
in the literature may lead to biased estimates. To test the practical impact of
these considerations, we recorded ISTAnt, the first real-world benchmark for
causal inference downstream tasks on high-dimensional observations as an RCT
studying how garden ants (Lasius neglectus) respond to microparticles applied
onto their colony members by hygienic grooming. Comparing 6 480 models
fine-tuned from state-of-the-art visual backbones, we find that the sampling
and modeling choices significantly affect the accuracy of the causal estimate,
and that classification accuracy is not a proxy thereof. We further validated
the analysis, repeating it on a synthetically generated visual data set
controlling the causal model. Our results suggest that future benchmarks should
carefully consider real downstream scientific questions, especially causal
ones. Further, we highlight guidelines for representation learning methods to
help answer causal questions in the sciences.
[LINK]
http://arxiv.org/abs/2405.17151v3
[DATE]
2024-11-19 17:48:17+08:00
[CATEGORIES]
cs.LG
Diffusion-Based Semantic Segmentation of Lumbar Spine MRI Scans of Lower Back Pain Patients
[AUTHORS]
Maria Monzon, Thomas Iff, Ender Konukoglu, Catherine R. Jutzeler
[ABSTRACT]
This study introduces a diffusion-based framework for robust and accurate
segmenton of vertebrae, intervertebral discs (IVDs), and spinal canal from
Magnetic Resonance Imaging~(MRI) scans of patients with low back pain (LBP),
regardless of whether the scans are T1w or T2-weighted. The results showed that
SpineSegDiff achieved comparable outperformed non-diffusion state-of-the-art
models in the identification of degenerated IVDs. Our findings highlight the
potential of diffusion models to improve LBP diagnosis and management through
precise spine MRI analysis.
[COMMENTS]
Findings paper presented at Machine Learning for Health (ML4H)
symposium 2024, December 15-16, 2024, Vancouver, Canada, 5 pages
[LINK]
http://arxiv.org/abs/2411.10755v2
[DATE]
2024-11-19 17:30:44+08:00
[CATEGORIES]
cs.LG
Ultra-Sparse Memory Network
[AUTHORS]
Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Zeng, Ran Guo, Xun Zhou
[ABSTRACT]
It is widely acknowledged that the performance of Transformer models is
exponentially related to their number of parameters and computational
complexity. While approaches like Mixture of Experts (MoE) decouple parameter
count from computational complexity, they still face challenges in inference
due to high memory access costs. This work introduces UltraMem, incorporating
large-scale, ultra-sparse memory layer to address these limitations. Our
approach significantly reduces inference latency while maintaining model
performance. We also investigate the scaling laws of this new architecture,
demonstrating that it not only exhibits favorable scaling properties but
outperforms traditional models. In our experiments, we train networks with up
to 20 million memory slots. The results show that our method achieves
state-of-the-art inference speed and model performance within a given
computational budget.
[COMMENTS]
10 pages, 6 figures
[LINK]
http://arxiv.org/abs/2411.12364v1
[DATE]
2024-11-19 17:24:34+08:00
[CATEGORIES]
cs.LG
PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium
[AUTHORS]
Shihong Ding, Hanze Dong, Cong Fang, Zhouchen Lin, Tong Zhang
[ABSTRACT]
We consider the non-convex non-concave objective function in two-player
zero-sum continuous games. The existence of pure Nash equilibrium requires
stringent conditions, posing a major challenge for this problem. To circumvent
this difficulty, we examine the problem of identifying a mixed Nash
equilibrium, where strategies are randomized and characterized by probability
distributions over continuous domains. To this end, we propose PArticle-based
Primal-dual ALgorithm (PAPAL) tailored for a weakly entropy-regularized min-max
optimization over probability distributions. This algorithm employs the
stochastic movements of particles to represent the updates of random strategies
for the $\epsilon$-mixed Nash equilibrium. We offer a comprehensive convergence
analysis of the proposed algorithm, demonstrating its effectiveness. In
contrast to prior research that attempted to update particle importance without
movements, PAPAL is the first implementable particle-based algorithm
accompanied by non-asymptotic quantitative convergence results, running time,
and sample complexity guarantees. Our framework contributes novel insights into
the particle-based algorithms for continuous min-max optimization in the
general non-convex non-concave setting.
[COMMENTS]
Published in Journal of Machine Learning Research 25 (2024) 1-48
[LINK]
http://arxiv.org/abs/2303.00970v3
[DATE]
2024-11-19 16:55:53+08:00
[CATEGORIES]
cs.LG
Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation
[AUTHORS]
Rüveyda Yilmaz, Kaan Keven, Yuli Wu, Johannes Stegmaier
[ABSTRACT]
Automated cell segmentation in microscopy images is essential for biomedical
research, yet conventional methods are labor-intensive and prone to error.
While deep learning-based approaches have proven effective, they often require
large annotated datasets, which are scarce due to the challenges of manual
annotation. To overcome this, we propose a novel framework for synthesizing
densely annotated 2D and 3D cell microscopy images using cascaded diffusion
models. Our method synthesizes 2D and 3D cell masks from sparse 2D annotations
using multi-level diffusion models and NeuS, a 3D surface reconstruction
approach. Following that, a pretrained 2D Stable Diffusion model is finetuned
to generate realistic cell textures and the final outputs are combined to form
cell populations. We show that training a segmentation model with a combination
of our synthetic data and real data improves cell segmentation performance by
up to 9\% across multiple datasets. Additionally, the FID scores indicate that
the synthetic data closely resembles real data. The code for our proposed
approach will be available at
https://github.com/ruveydayilmaz0/cascaded_diffusion.
[LINK]
http://arxiv.org/abs/2411.11515v2
[DATE]
2024-11-19 16:50:38+08:00
[CATEGORIES]
cs.LG
Graph as a feature: improving node classification with non-neural graph-aware logistic regression
[AUTHORS]
Simon Delarue, Thomas Bonald, Tiphaine Viard
[ABSTRACT]
Graph Neural Networks (GNNs) and their message passing framework that
leverages both structural and feature information, have become a standard
method for solving graph-based machine learning problems. However, these
approaches still struggle to generalise well beyond datasets that exhibit
strong homophily, where nodes of the same class tend to connect. This
limitation has led to the development of complex neural architectures that pose
challenges in terms of efficiency and scalability. In response to these
limitations, we focus on simpler and more scalable approaches and introduce
Graph-aware Logistic Regression (GLR), a non-neural model designed for node
classification tasks. Unlike traditional graph algorithms that use only a
fraction of the information accessible to GNNs, our proposed model
simultaneously leverages both node features and the relationships between
entities. However instead of relying on message passing, our approach encodes
each node’s relationships as an additional feature vector, which is then
combined with the node’s self attributes. Extensive experimental results,
conducted within a rigorous evaluation framework, show that our proposed GLR
approach outperforms both foundational and sophisticated state-of-the-art GNN
models in node classification tasks. Going beyond the traditional limited
benchmarks, our experiments indicate that GLR increases generalisation ability
while reaching performance gains in computation time up to two orders of
magnitude compared to it best neural competitor.
[LINK]
http://arxiv.org/abs/2411.12330v1
[DATE]
2024-11-19 16:32:14+08:00
[CATEGORIES]
cs.LG
Attributed Graph Clustering in Collaborative Settings
[AUTHORS]
Rui Zhang, Xiaoyang Hou, Zhihua Tian, Jian Liu, Qingbiao Wu, Kui Ren
[ABSTRACT]
Graph clustering is an unsupervised machine learning method that partitions
the nodes in a graph into different groups. Despite achieving significant
progress in exploiting both attributed and structured data information, graph
clustering methods often face practical challenges related to data isolation.
Moreover, the absence of collaborative methods for graph clustering limits
their effectiveness.
In this paper, we propose a collaborative graph clustering framework for
attributed graphs, supporting attributed graph clustering over vertically
partitioned data with different participants holding distinct features of the
same data. Our method leverages a novel technique that reduces the sample
space, improving the efficiency of the attributed graph clustering method.
Furthermore, we compare our method to its centralized counterpart under a
proximity condition, demonstrating that the successful local results of each
participant contribute to the overall success of the collaboration.
We fully implement our approach and evaluate its utility and efficiency by
conducting experiments on four public datasets. The results demonstrate that
our method achieves comparable accuracy levels to centralized attributed graph
clustering methods. Our collaborative graph clustering framework provides an
efficient and effective solution for graph clustering challenges related to
data isolation.
[COMMENTS]
16 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.12329v1
[DATE]
2024-11-19 16:30:22+08:00
[CATEGORIES]
cs.LG
Fair Generalized Linear Mixed Models
[AUTHORS]
Jan Pablo Burgard, João Vitor Pamplona
[ABSTRACT]
When using machine learning for automated prediction, it is important to
account for fairness in the prediction. Fairness in machine learning aims to
ensure that biases in the data and model inaccuracies do not lead to
discriminatory decisions. E.g., predictions from fair machine learning models
should not discriminate against sensitive variables such as sexual orientation
and ethnicity. The training data often in obtained from social surveys. In
social surveys, oftentimes the data collection process is a strata sampling,
e.g. due to cost restrictions. In strata samples, the assumption of
independence between the observation is not fulfilled. Hence, if the machine
learning models do not account for the strata correlations, the results may be
biased. Especially high is the bias in cases where the strata assignment is
correlated to the variable of interest. We present in this paper an algorithm
that can handle both problems simultaneously, and we demonstrate the impact of
stratified sampling on the quality of fair machine learning predictions in a
reproducible simulation study.
[COMMENTS]
25 pages, 12 figures. arXiv admin note: text overlap with
arXiv:2405.06433
[LINK]
http://arxiv.org/abs/2405.09273v5
[DATE]
2024-11-19 16:17:57+08:00
[CATEGORIES]
cs.LG
TFG: Unified Training-Free Guidance for Diffusion Models
[AUTHORS]
Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, Stefano Ermon
[ABSTRACT]
Given an unconditional diffusion model and a predictor for a target property
of interest (e.g., a classifier), the goal of training-free guidance is to
generate samples with desirable target properties without additional training.
Existing methods, though effective in various individual applications, often
lack theoretical grounding and rigorous testing on extensive benchmarks. As a
result, they could even fail on simple tasks, and applying them to a new
problem becomes unavoidably difficult. This paper introduces a novel
algorithmic framework encompassing existing methods as special cases, unifying
the study of training-free guidance into the analysis of an algorithm-agnostic
design space. Via theoretical and empirical investigation, we propose an
efficient and effective hyper-parameter searching strategy that can be readily
applied to any downstream task. We systematically benchmark across 7 diffusion
models on 16 tasks with 40 targets, and improve performance by 8.5% on average.
Our framework and benchmark offer a solid foundation for conditional generation
in a training-free manner.
[LINK]
http://arxiv.org/abs/2409.15761v2
[DATE]
2024-11-19 16:12:46+08:00
[CATEGORIES]
cs.LG
C$^{2}$INet: Realizing Incremental Trajectory Prediction with Prior-Aware Continual Causal Intervention
[AUTHORS]
Xiaohe Li, Feilong Huang, Zide Fan, Fangli Mou, Leilei Lin, Yingyan Hou, Lijie Wen
[ABSTRACT]
Trajectory prediction for multi-agents in complex scenarios is crucial for
applications like autonomous driving. However, existing methods often overlook
environmental biases, which leads to poor generalization. Additionally,
hardware constraints limit the use of large-scale data across environments, and
continual learning settings exacerbate the challenge of catastrophic
forgetting. To address these issues, we propose the Continual Causal
Intervention (C$^{2}$INet) method for generalizable multi-agent trajectory
prediction within a continual learning framework. Using variational inference,
we align environment-related prior with posterior estimator of confounding
factors in the latent space, thereby intervening in causal correlations that
affect trajectory representation. Furthermore, we store optimal variational
priors across various scenarios using a memory queue, ensuring continuous
debiasing during incremental task training. The proposed C$^{2}$INet enhances
adaptability to diverse tasks while preserving previous task information to
prevent catastrophic forgetting. It also incorporates pruning strategies to
mitigate overfitting. Comparative evaluations on three real and synthetic
complex datasets against state-of-the-art methods demonstrate that our proposed
method consistently achieves reliable prediction performance, effectively
mitigating confounding factors unique to different scenarios. This highlights
the practical value of our method for real-world applications.
[LINK]
http://arxiv.org/abs/2411.12313v1
[DATE]
2024-11-19 16:01:20+08:00
[CATEGORIES]
cs.LG
A semi-supervised learning using over-parameterized regression
[AUTHORS]
Katsuyuki Hagiwara
[ABSTRACT]
Semi-supervised learning (SSL) is an important theme in machine learning, in
which we have a few labeled samples and many unlabeled samples. In this paper,
for SSL in a regression problem, we consider a method of incorporating
information on unlabeled samples into kernel functions. As a typical
implementation, we employ Gaussian kernels whose centers are labeled and
unlabeled input samples. Since the number of coefficients is larger than the
number of labeled samples in this setting, this is an over-parameterized
regression roblem. A ridge regression is a typical estimation method under this
setting. In this paper, alternatively, we consider to apply the minimum norm
least squares (MNLS), which is known as a helpful tool for understanding deep
learning behavior while it may not be application oriented. Then, in applying
the MNLS for SSL, we established several methods based on feature
extraction/dimension reduction in the SVD (singular value decomposition)
representation of a Gram type matrix appeared in the over-parameterized
regression problem. The methods are thresholding according to singular value
magnitude with cross validation, hard-thresholding with cross validation,
universal thresholding and bridge thresholding methods. The first one is
equivalent to a method using a well-known low rank approximation of a Gram type
matrix. We refer to these methods as SVD regression methods. In the experiments
for real data, depending on datasets, clear superiority of the proposed SVD
regression methods over ridge regression methods was observed. And, depending
on datasets, incorporation of information on unlabeled input samples into
kernels was found to be clearly effective.
[LINK]
http://arxiv.org/abs/2409.04001v2
[DATE]
2024-11-19 15:44:51+08:00
[CATEGORIES]
cs.LG
Emergence of Implicit World Models from Mortal Agents
[AUTHORS]
Kazuya Horibe, Naoto Yoshida
[COMMENTS]
Accepted as a 1-page tiny paper in the Intrinsically Motivated
Open-ended Learning workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.12304v1
[DATE]
2024-11-19 15:43:30+08:00
[CATEGORIES]
cs.LG
A Hybrid Data-Driven Multi-Stage Deep Learning Framework for Enhanced Nuclear Reactor Power Prediction
[AUTHORS]
James Daniell, Kazuma Kobayashi, Ayodeji Alajo, Syed Bahauddin Alam
[ABSTRACT]
The accurate and efficient modeling of nuclear reactor transients is crucial
for ensuring safe and optimal reactor operation. Traditional physics-based
models, while valuable, can be computationally intensive and may not fully
capture the complexities of real-world reactor behavior. This paper introduces
a novel multi-stage deep learning framework that addresses these limitations,
offering a faster and more robust solution for predicting the final
steady-state power of reactor transients. By leveraging a combination of
feed-forward neural networks with both classification and regression stages,
and training on a unique dataset that integrates real-world measurements of
reactor power and controls state from the Missouri University of Science and
Technology Reactor (MSTR) with noise-enhanced simulated data, our approach
achieves remarkable accuracy (96% classification, 2.3% MAPE). The incorporation
of simulated data with noise significantly improves the model’s generalization
capabilities, mitigating the risk of overfitting. This innovative solution not
only enables rapid and precise prediction of reactor behavior but also has the
potential to revolutionize nuclear reactor operations, facilitating enhanced
safety protocols, optimized performance, and streamlined decision-making
processes.
[LINK]
http://arxiv.org/abs/2211.13157v3
[DATE]
2024-11-19 15:10:34+08:00
[CATEGORIES]
cs.LG
Learning general Gaussian mixtures with efficient score matching
[AUTHORS]
Sitan Chen, Vasilis Kontonis, Kulin Shah
[ABSTRACT]
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions.
We make no separation assumptions on the underlying mixture components: we only
require that the covariance matrices have bounded condition number and that the
means and covariances lie in a ball of bounded radius. We give an algorithm
that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture,
runs in sample-polynomial time, and constructs a sampler whose output
distribution is $\varepsilon$-far from the unknown mixture in total variation.
Prior works for this problem either (i) required exponential runtime in the
dimension $d$, (ii) placed strong assumptions on the instance (e.g., spherical
covariances or clusterability), or (iii) had doubly exponential dependence on
the number of components $k$.
Our approach departs from commonly used techniques for this problem like the
method of moments. Instead, we leverage a recently developed reduction, based
on diffusion models, from distribution learning to a supervised learning task
called score matching. We give an algorithm for the latter by proving a
structural result showing that the score function of a Gaussian mixture can be
approximated by a piecewise-polynomial function, and there is an efficient
algorithm for finding it. To our knowledge, this is the first example of
diffusion models achieving a state-of-the-art theoretical guarantee for an
unsupervised learning task.
[COMMENTS]
57 pages
[LINK]
http://arxiv.org/abs/2404.18893v2
[DATE]
2024-11-19 15:08:17+08:00
[CATEGORIES]
cs.LG
Bullion: A Column Store for Machine Learning
[AUTHORS]
Gang Liao, Ye Liu, Jianjun Chen, Daniel J. Abadi
[ABSTRACT]
The past two decades have witnessed significant success in applying columnar
storage to data warehousing and analytics. However, the rapid growth of machine
learning poses new challenges. This paper presents Bullion, a columnar storage
system tailored for machine learning workloads. Bullion addresses the
complexities of data compliance, optimizes the encoding of long sequence sparse
features, efficiently manages wide-table projections, introduces feature
quantization in storage, enables quality-aware sequential reads for multimodal
training data, and provides a comprehensive cascading encoding framework that
unifies diverse encoding schemes through modular, composable interfaces. By
aligning with the evolving requirements of ML applications, Bullion facilitates
the application of columnar storage and processing to modern application
scenarios such as those within advertising, recommendation systems, and
Generative AI.
Preliminary experimental results and theoretical analysis demonstrate
Bullion’s improved ability to deliver strong performance in the face of the
unique demands of machine learning workloads compared to existing columnar
storage solutions. Bullion significantly reduces I/O costs for deletion
compliance, achieves substantial storage savings with its optimized encoding
scheme for sparse features, and improves metadata parsing speed for wide-table
projections. These advancements enable Bullion to become an important component
in the future of machine learning infrastructure, enabling organizations to
efficiently manage and process the massive volumes of data required for
training and inference in modern AI applications.
[LINK]
http://arxiv.org/abs/2404.08901v3
[DATE]
2024-11-19 15:04:06+08:00
[CATEGORIES]
cs.LG
libcll: an Extendable Python Toolkit for Complementary-Label Learning
[AUTHORS]
Nai-Xuan Ye, Tan-Ha Mai, Hsiu-Hsuan Wang, Wei-I Lin, Hsuan-Tien Lin
[ABSTRACT]
Complementary-label learning (CLL) is a weakly supervised learning paradigm
for multiclass classification, where only complementary labels – indicating
classes an instance does not belong to – are provided to the learning
algorithm. Despite CLL’s increasing popularity, previous studies highlight two
main challenges: (1) inconsistent results arising from varied assumptions on
complementary label generation, and (2) high barriers to entry due to the lack
of a standardized evaluation platform across datasets and algorithms. To
address these challenges, we introduce \texttt{libcll}, an extensible Python
toolkit for CLL research. \texttt{libcll} provides a universal interface that
supports a wide range of generation assumptions, both synthetic and real-world
datasets, and key CLL algorithms. The toolkit is designed to mitigate
inconsistencies and streamline the research process, with easy installation,
comprehensive usage guides, and quickstart tutorials that facilitate efficient
adoption and implementation of CLL techniques. Extensive ablation studies
conducted with \texttt{libcll} demonstrate its utility in generating valuable
insights to advance future CLL research.
[COMMENTS]
10 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.12276v1
[DATE]
2024-11-19 14:56:24+08:00
[CATEGORIES]
cs.LG
Taming Generative Diffusion Prior for Universal Blind Image Restoration
[AUTHORS]
Siwei Tu, Weidong Yang, Ben Fei
[ABSTRACT]
Diffusion models have been widely utilized for image restoration. However,
previous blind image restoration methods still need to assume the type of
degradation model while leaving the parameters to be optimized, limiting their
real-world applications. Therefore, we aim to tame generative diffusion prior
for universal blind image restoration dubbed BIR-D, which utilizes an
optimizable convolutional kernel to simulate the degradation model and
dynamically update the parameters of the kernel in the diffusion steps,
enabling it to achieve blind image restoration results even in various complex
situations. Besides, based on mathematical reasoning, we have provided an
empirical formula for the chosen of adaptive guidance scale, eliminating the
need for a grid search for the optimal parameter. Experimentally, Our BIR-D has
demonstrated superior practicality and versatility than off-the-shelf
unsupervised methods across various tasks both on real-world and synthetic
datasets, qualitatively and quantitatively. BIR-D is able to fulfill
multi-guidance blind image restoration. Moreover, BIR-D can also restore images
that undergo multiple and complicated degradations, demonstrating the practical
applications.
[COMMENTS]
15 pages, 12 figures, 8 tables
[LINK]
http://arxiv.org/abs/2408.11287v2
[DATE]
2024-11-19 14:36:59+08:00
[CATEGORIES]
cs.LG
Variational Graph Autoencoder for Heterogeneous Information Networks with Missing and Inaccurate Attributes
[AUTHORS]
Yige Zhao, Jianxiang Yu, Yao Cheng, Chengcheng Yu, Yiding Liu, Xiang Li, Shuaiqiang Wang
[ABSTRACT]
Heterogeneous Information Networks (HINs), which consist of various types of
nodes and edges, have recently demonstrated excellent performance in graph
mining. However, most existing heterogeneous graph neural networks (HGNNs)
ignore the problems of missing attributes, inaccurate attributes and scarce
labels for nodes, which limits their expressiveness. In this paper, we propose
a generative self-supervised model GraMI to address these issues
simultaneously. Specifically, GraMI first initializes all the nodes in the
graph with a low-dimensional representation matrix. After that, based on the
variational graph autoencoder framework, GraMI learns both node-level and
attribute-level embeddings in the encoder, which can provide fine-grained
semantic information to construct node attributes. In the decoder, GraMI
reconstructs both links and attributes. Instead of directly reconstructing raw
features for attributed nodes, GraMI generates the initial low-dimensional
representation matrix for all the nodes, based on which raw features of
attributed nodes are further reconstructed to leverage accurate attributes. In
this way, GraMI can not only complete informative features for non-attributed
nodes, but rectify inaccurate ones for attributed nodes. Finally, we conduct
extensive experiments to show the superiority of GraMI in tackling HINs with
missing and inaccurate attributes.
[COMMENTS]
Accepted by KDD 2025
[LINK]
http://arxiv.org/abs/2311.07929v3
[DATE]
2024-11-19 14:34:03+08:00
[CATEGORIES]
cs.LG
On the Accuracy and Precision of Moving Averages to Estimate Wi-Fi Link Quality
[AUTHORS]
Gianluca Cena, Gabriele Formis, Matteo Rosani, Stefano Scanzio
[ABSTRACT]
The radio spectrum is characterized by a noticeable variability, which
impairs performance and determinism of every wireless communication technology.
To counteract this aspect, mechanisms like Minstrel are customarily employed in
real Wi-Fi devices, and the adoption of machine learning for optimization is
envisaged in next-generation Wi-Fi 8. All these approaches require
communication quality to be monitored at runtime.
In this paper, the effectiveness of simple techniques based on moving
averages to estimate wireless link quality is analyzed, to assess their
advantages and weaknesses. Results can be used, e.g., as a baseline when
studying how artificial intelligence can be employed to mitigate
unpredictability of wireless networks by providing reliable estimates about
current spectrum conditions.
[COMMENTS]
preprint, 8 pages, 2024
[LINK]
http://arxiv.org/abs/2411.12265v1
[DATE]
2024-11-19 14:28:58+08:00
[CATEGORIES]
cs.LG
The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing
[AUTHORS]
Yang Xu, Yihong Gu, Cong Fang
[ABSTRACT]
Models are expected to engage in invariance learning, which involves
distinguishing the core relations that remain consistent across varying
environments to ensure the predictions are safe, robust and fair. While
existing works consider specific algorithms to realize invariance learning, we
show that model has the potential to learn invariance through standard training
procedures. In other words, this paper studies the implicit bias of Stochastic
Gradient Descent (SGD) over heterogeneous data and shows that the implicit bias
drives the model learning towards an invariant solution. We call the phenomenon
the implicit invariance learning. Specifically, we theoretically investigate
the multi-environment low-rank matrix sensing problem where in each
environment, the signal comprises (i) a lower-rank invariant part shared across
all environments; and (ii) a significantly varying environment-dependent
spurious component. The key insight is, through simply employing the large step
size large-batch SGD sequentially in each environment without any explicit
regularization, the oscillation caused by heterogeneity can provably prevent
model learning spurious signals. The model reaches the invariant solution after
certain iterations. In contrast, model learned using pooled SGD over all data
would simultaneously learn both the invariant and spurious signals. Overall, we
unveil another implicit bias that is a result of the symbiosis between the
heterogeneity of data and modern algorithms, which is, to the best of our
knowledge, first in the literature.
[LINK]
http://arxiv.org/abs/2403.01420v3
[DATE]
2024-11-19 14:10:32+08:00
[CATEGORIES]
cs.LG
Restructuring Tractable Probabilistic Circuits
[AUTHORS]
Honghua Zhang, Benjie Wang, Marcelo Arenas, Guy Van den Broeck
[ABSTRACT]
Probabilistic circuits (PCs) is a unifying representation for probabilistic
models that support tractable inference. Numerous applications of PCs like
controllable text generation depend on the ability to efficiently multiply two
circuits. Existing multiplication algorithms require that the circuits respect
the same structure, i.e. variable scopes decomposes according to the same
vtree. In this work, we propose and study the task of restructuring
structured(-decomposable) PCs, that is, transforming a structured PC such that
it conforms to a target vtree. We propose a generic approach for this problem
and show that it leads to novel polynomial-time algorithms for multiplying
circuits respecting different vtrees, as well as a practical depth-reduction
algorithm that preserves structured decomposibility. Our work opens up new
avenues for tractable PC inference, suggesting the possibility of training with
less restrictive PC structures while enabling efficient inference by changing
their structures at inference time.
[LINK]
http://arxiv.org/abs/2411.12256v1
[DATE]
2024-11-19 14:10:22+08:00
[CATEGORIES]
cs.LG
Error-Feedback Model for Output Correction in Bilateral Control-Based Imitation Learning
[AUTHORS]
Hiroshi Sato, Masashi Konosu, Sho Sakaino, Toshiaki Tsuji
[ABSTRACT]
In recent years, imitation learning using neural networks has enabled robots
to perform flexible tasks. However, since neural networks operate in a
feedforward structure, they do not possess a mechanism to compensate for output
errors. To address this limitation, we developed a feedback mechanism to
correct these errors. By employing a hierarchical structure for neural networks
comprising lower and upper layers, the lower layer was controlled to follow the
upper layer. Additionally, using a multi-layer perceptron in the lower layer,
which lacks an internal state, enhanced the error feedback. In the
character-writing task, this model demonstrated improved accuracy in writing
previously untrained characters. In the character-writing task, this model
demonstrated improved accuracy in writing previously untrained characters.
Through autonomous control with error feedback, we confirmed that the lower
layer could effectively track the output of the upper layer. This study
represents a promising step toward integrating neural networks with control
theories.
[LINK]
http://arxiv.org/abs/2411.12255v1
[DATE]
2024-11-19 14:09:09+08:00
[CATEGORIES]
cs.LG
Distributionally robust self-supervised learning for tabular data
[AUTHORS]
Shantanu Ghosh, Tiankang Xie, Mikhail Kuznetsov
[ABSTRACT]
Machine learning (ML) models trained using Empirical Risk Minimization (ERM)
often exhibit systematic errors on specific subpopulations of tabular data,
known as error slices. Learning robust representation in presence of error
slices is challenging, especially in self-supervised settings during the
feature reconstruction phase, due to high cardinality features and the
complexity of constructing error sets. Traditional robust representation
learning methods are largely focused on improving worst group performance in
supervised setting in computer vision, leaving a gap in approaches tailored for
tabular data. We address this gap by developing a framework to learn robust
representation in tabular data during self-supervised pre-training. Our
approach utilizes an encoder-decoder model trained with Masked Language
Modeling (MLM) loss to learn robust latent representations. This paper applies
the Just Train Twice (JTT) and Deep Feature Reweighting (DFR) methods during
the pre-training phase for tabular data. These methods fine-tune the ERM
pre-trained model by up-weighting error-prone samples or creating balanced
datasets for specific categorical features. This results in specialized models
for each feature, which are then used in an ensemble approach to enhance
downstream classification performance. This methodology improves robustness
across slices, thus enhancing overall generalization performance. Extensive
experiments across various datasets demonstrate the efficacy of our approach.
The code is available:
\url{https://github.com/amazon-science/distributionally-robust-self-supervised-learning-for-tabular-data}.
[COMMENTS]
TRL Workshop@NeurIPS2024
[LINK]
http://arxiv.org/abs/2410.08511v3
[DATE]
2024-11-19 13:45:14+08:00
[CATEGORIES]
cs.LG
Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise
[AUTHORS]
Tao Sun, Xinwang Liu, Kun Yuan
[ABSTRACT]
This paper investigates the roles of gradient normalization and clipping in
ensuring the convergence of Stochastic Gradient Descent (SGD) under
heavy-tailed noise. While existing approaches consider gradient clipping
indispensable for SGD convergence, we theoretically demonstrate that gradient
normalization alone without clipping is sufficient to ensure convergence.
Furthermore, we establish that combining gradient normalization with clipping
offers significantly improved convergence rates compared to using either
technique in isolation, notably as gradient noise diminishes. With these
results, our work provides the first theoretical evidence demonstrating the
benefits of gradient normalization in SGD under heavy-tailed noise. Finally, we
introduce an accelerated SGD variant incorporating gradient normalization and
clipping, further enhancing convergence rates under heavy-tailed noise.
[LINK]
http://arxiv.org/abs/2410.16561v3
[DATE]
2024-11-19 13:34:33+08:00
[CATEGORIES]
cs.LG
Adapting Amidst Degradation: Cross Domain Li-ion Battery Health Estimation via Physics-Guided Test-Time Training
[AUTHORS]
Yuyuan Feng, Guosheng Hu, Xiaodong Li, Zhihong Zhang
[ABSTRACT]
Health modeling of lithium-ion batteries (LIBs) is crucial for safe and
efficient energy management and carries significant socio-economic
implications. Although Machine Learning (ML)-based State of Health (SOH)
estimation methods have made significant progress in accuracy, the scarcity of
high-quality LIB data remains a major obstacle. Existing transfer learning
methods for cross-domain LIB SOH estimation have significantly alleviated the
labeling burden of target LIB data, however, they still require sufficient
unlabeled target data (UTD) for effective adaptation to the target domain.
Collecting this UTD is challenging due to the time-consuming nature of
degradation experiments. To address this issue, we introduce a practical
Test-Time Training framework, BatteryTTT, which adapts the model continually
using each UTD collected amidst degradation, thereby significantly reducing
data collection time. To fully utilize each UTD, BatteryTTT integrates the
inherent physical laws of modern LIBs into self-supervised learning, termed
Physcics-Guided Test-Time Training. Additionally, we explore the potential of
large language models (LLMs) in battery sequence modeling by evaluating their
performance in SOH estimation through model reprogramming and prefix prompt
adaptation. The combination of BatteryTTT and LLM modeling, termed GPT4Battery,
achieves state-of-the-art generalization results across current LIB benchmarks.
Furthermore, we demonstrate the practical value and scalability of our approach
by deploying it in our real-world battery management system (BMS) for 300Ah
large-scale energy storage LIBs.
[LINK]
http://arxiv.org/abs/2402.00068v3
[DATE]
2024-11-19 13:08:44+08:00
[CATEGORIES]
cs.LG
Contrast Similarity-Aware Dual-Pathway Mamba for Multivariate Time Series Node Classification
[AUTHORS]
Mingsen Du, Meng Chen, Yongjian Li, Xiuxin Zhang, Jiahui Gao, Cun Ji, Shoushui Wei
[ABSTRACT]
Multivariate time series (MTS) data is generated through multiple sensors
across various domains such as engineering application, health monitoring, and
the internet of things, characterized by its temporal changes and high
dimensional characteristics. Over the past few years, many studies have
explored the long-range dependencies and similarities in MTS. However,
long-range dependencies are difficult to model due to their temporal changes
and high dimensionality makes it difficult to obtain similarities effectively
and efficiently. Thus, to address these issues, we propose contrast
similarity-aware dual-pathway Mamba for MTS node classification (CS-DPMamba).
Firstly, to obtain the dynamic similarity of each sample, we initially use
temporal contrast learning module to acquire MTS representations. And then we
construct a similarity matrix between MTS representations using Fast Dynamic
Time Warping (FastDTW). Secondly, we apply the DPMamba to consider the
bidirectional nature of MTS, allowing us to better capture long-range and
short-range dependencies within the data. Finally, we utilize the
Kolmogorov-Arnold Network enhanced Graph Isomorphism Network to complete the
information interaction in the matrix and MTS node classification task. By
comprehensively considering the long-range dependencies and dynamic similarity
features, we achieved precise MTS node classification. We conducted experiments
on multiple University of East Anglia (UEA) MTS datasets, which encompass
diverse application scenarios. Our results demonstrate the superiority of our
method through both supervised and semi-supervised experiments on the MTS
classification task.
[COMMENTS]
Submitted to Knowledge-Based Systems on Nov 17, 2024
[LINK]
http://arxiv.org/abs/2411.12222v1
[DATE]
2024-11-19 12:32:41+08:00
[CATEGORIES]
cs.LG
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning
[AUTHORS]
Jiapeng Zhu, Zichen Ding, Jianxiang Yu, Jiaqi Tan, Xiang Li, Weining Qian
[ABSTRACT]
The advent of the “pre-train, prompt” paradigm has recently extended its
generalization ability and data efficiency to graph representation learning,
following its achievements in Natural Language Processing (NLP). Initial graph
prompt tuning approaches tailored specialized prompting functions for Graph
Neural Network (GNN) models pre-trained with specific strategies, such as edge
prediction, thus limiting their applicability. In contrast, another pioneering
line of research has explored universal prompting via adding prompts to the
input graph’s feature space, thereby removing the reliance on specific
pre-training strategies. However, the necessity to add feature prompts to all
nodes remains an open question. Motivated by findings from prompt tuning
research in the NLP domain, which suggest that highly capable pre-trained
models need less conditioning signal to achieve desired behaviors, we advocate
for strategically incorporating necessary and lightweight feature prompts to
certain graph nodes to enhance downstream task performance. This introduces a
combinatorial optimization problem, requiring a policy to decide 1) which nodes
to prompt and 2) what specific feature prompts to attach. We then address the
problem by framing the prompt incorporation process as a sequential
decision-making problem and propose our method, RELIEF, which employs
Reinforcement Learning (RL) to optimize it. At each step, the RL agent selects
a node (discrete action) and determines the prompt content (continuous action),
aiming to maximize cumulative performance gain. Extensive experiments on graph
and node-level tasks with various pre-training strategies in few-shot scenarios
demonstrate that our RELIEF outperforms fine-tuning and other prompt-based
approaches in classification performance and data efficiency.
[COMMENTS]
Accepted by SIGKDD 2025
[LINK]
http://arxiv.org/abs/2408.03195v2
[DATE]
2024-11-19 12:21:54+08:00
[CATEGORIES]
cs.LG
DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning
[AUTHORS]
Kichang Lee, Yujin Shin, Jonghyuk Yun, Jun Han, JeongGil Ko
[ABSTRACT]
Federated Learning (FL) enables collaborative model training across
distributed devices while preserving local data privacy, making it ideal for
mobile and embedded systems. However, the decentralized nature of FL also opens
vulnerabilities to model poisoning attacks, particularly backdoor attacks,
where adversaries implant trigger patterns to manipulate model predictions. In
this paper, we propose DeTrigger, a scalable and efficient backdoor-robust
federated learning framework that leverages insights from adversarial attack
methodologies. By employing gradient analysis with temperature scaling,
DeTrigger detects and isolates backdoor triggers, allowing for precise model
weight pruning of backdoor activations without sacrificing benign model
knowledge. Extensive evaluations across four widely used datasets demonstrate
that DeTrigger achieves up to 251x faster detection than traditional methods
and mitigates backdoor attacks by up to 98.9%, with minimal impact on global
model accuracy. Our findings establish DeTrigger as a robust and scalable
solution to protect federated learning environments against sophisticated
backdoor threats.
[COMMENTS]
14 pages
[LINK]
http://arxiv.org/abs/2411.12220v1
[DATE]
2024-11-19 12:12:14+08:00
[CATEGORIES]
cs.LG
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
[AUTHORS]
Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen
[ABSTRACT]
To enhance the controllability of text-to-image diffusion models, existing
efforts like ControlNet incorporated image-based conditional controls. In this
paper, we reveal that existing methods still face significant challenges in
generating images that align with the image conditional controls. To this end,
we propose ControlNet++, a novel approach that improves controllable generation
by explicitly optimizing pixel-level cycle consistency between generated images
and conditional controls. Specifically, for an input conditional control, we
use a pre-trained discriminative reward model to extract the corresponding
condition of the generated images, and then optimize the consistency loss
between the input conditional control and extracted condition. A
straightforward implementation would be generating images from random noises
and then calculating the consistency loss, but such an approach requires
storing gradients for multiple sampling timesteps, leading to considerable time
and memory costs. To address this, we introduce an efficient reward strategy
that deliberately disturbs the input images by adding noise, and then uses the
single-step denoised images for reward fine-tuning. This avoids the extensive
costs associated with image sampling, allowing for more efficient reward
fine-tuning. Extensive experiments show that ControlNet++ significantly
improves controllability under various conditional controls. For example, it
achieves improvements over ControlNet by 11.1% mIoU, 13.4% SSIM, and 7.6% RMSE,
respectively, for segmentation mask, line-art edge, and depth conditions. All
the code, models, demo and organized data have been open sourced on our Github
Repo.
[COMMENTS]
Camera Ready Version. Project Page:
https://liming-ai.github.io/ControlNet_Plus_Plus Code & Data:
https://github.com/liming-ai/ControlNet_Plus_Plus
[LINK]
http://arxiv.org/abs/2404.07987v4
[DATE]
2024-11-19 11:23:20+08:00
[CATEGORIES]
cs.LG
Hierarchical Spatio-Temporal Uncertainty Quantification for Distributed Energy Adoption
[AUTHORS]
Wenbin Zhou, Shixiang Zhu, Feng Qiu, Xuan Wu
[ABSTRACT]
The rapid deployment of distributed energy resources (DER) has introduced
significant spatio-temporal uncertainties in power grid management,
necessitating accurate multilevel forecasting methods. However, existing
approaches often produce overly conservative uncertainty intervals at
individual spatial units and fail to properly capture uncertainties when
aggregating predictions across different spatial scales. This paper presents a
novel hierarchical spatio-temporal model based on the conformal prediction
framework to address these challenges. Our approach generates circuit-level DER
growth predictions and efficiently aggregates them to the substation level
while maintaining statistical validity through a tailored non-conformity score.
Applied to a decade of DER installation data from a local utility network, our
method demonstrates superior performance over existing approaches, particularly
in reducing prediction interval widths while maintaining coverage.
[LINK]
http://arxiv.org/abs/2411.12193v1
[DATE]
2024-11-19 11:18:31+08:00
[CATEGORIES]
cs.LG
Constant Rate Schedule: Constant-Rate Distributional Change for Efficient Training and Sampling in Diffusion Models
[AUTHORS]
Shuntaro Okada, Kenji Doi, Ryota Yoshihashi, Hirokatsu Kataoka, Tomohiro Tanaka
[ABSTRACT]
We propose a noise schedule that ensures a constant rate of change in the
probability distribution of diffused data throughout the diffusion process. To
obtain this noise schedule, we measure the rate of change in the probability
distribution of the forward process and use it to determine the noise schedule
before training diffusion models. The functional form of the noise schedule is
automatically determined and tailored to each dataset and type of diffusion
model. We evaluate the effectiveness of our noise schedule on unconditional and
class-conditional image generation tasks using the LSUN
(bedroom/church/cat/horse), ImageNet, and FFHQ datasets. Through extensive
experiments, we confirmed that our noise schedule broadly improves the
performance of the diffusion models regardless of the dataset, sampler, number
of function evaluations, or type of diffusion model.
[COMMENTS]
33 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.12188v1
[DATE]
2024-11-19 11:02:39+08:00
[CATEGORIES]
cs.LG
Testability of Instrumental Variables in Additive Nonlinear, Non-Constant Effects Models
[AUTHORS]
Xichen Guo, Zheng Li, Biwei Huang, Yan Zeng, Zhi Geng, Feng Xie
[ABSTRACT]
We address the issue of the testability of instrumental variables derived
from observational data. Most existing testable implications are centered on
scenarios where the treatment is a discrete variable, e.g., instrumental
inequality (Pearl, 1995), or where the effect is assumed to be constant, e.g.,
instrumental variables condition based on the principle of independent
mechanisms (Burauel, 2023). However, treatments can often be continuous
variables, such as drug dosages or nutritional content levels, and non-constant
effects may occur in many real-world scenarios. In this paper, we consider an
additive nonlinear, non-constant effects model with unmeasured confounders, in
which treatments can be either discrete or continuous, and propose an
Auxiliary-based Independence Test (AIT) condition to test whether a variable is
a valid instrument. We first show that if the candidate instrument is valid,
then the AIT condition holds. Moreover, we illustrate the implications of the
AIT condition and demonstrate that, in certain conditions, AIT conditions are
necessary and sufficient to detect all invalid IVs. We also extend the AIT
condition to include covariates and introduce a practical testing algorithm.
Experimental results on both synthetic and three different real-world datasets
show the effectiveness of our proposed condition.
[LINK]
http://arxiv.org/abs/2411.12184v1
[DATE]
2024-11-19 10:56:45+08:00
[CATEGORIES]
cs.LG
Beyond Perceptual Distances: Rethinking Disparity Assessment for Out-of-Distribution Detection with Diffusion Models
[AUTHORS]
Kun Fang, Qinghua Tao, Zuopeng Yang, Xiaolin Huang, Jie Yang
[ABSTRACT]
Out-of-Distribution (OoD) detection aims to justify whether a given sample is
from the training distribution of the classifier-under-protection, i.e.,
In-Distribution (InD), or from OoD. Diffusion Models (DMs) are recently
utilized in OoD detection by using the perceptual distances between the given
image and its DM generation. DM-based methods bring fresh insights to the
field, yet remain under-explored.
In this work, we point out two main limitations in DM-based OoD detection
methods: (i) the perceptual metrics on the disparities between the given sample
and its generation are devised only at human-perceived levels, ignoring the
abstract or high-level patterns that help better reflect the intrinsic
disparities in distribution; (ii) only the raw image contents are taken to
measure the disparities, while other representations, i.e., the features and
probabilities from the classifier-under-protection, are easy to access at hand
but are ignored. To this end, our proposed detection framework goes beyond the
perceptual distances and looks into the deep representations from the
classifier-under-protection with our novel metrics devised correspondingly,
leading to more informative disparity assessments between InD and OoD. An
anomaly-removal strategy is integrated to remove the abnormal OoD information
in the generation, further enhancing the distinctiveness of disparities. Our
work has demonstrated state-of-the-art detection performances among DM-based
methods in extensive experiments.
[LINK]
http://arxiv.org/abs/2409.10094v2
[DATE]
2024-11-19 10:55:22+08:00
[CATEGORIES]
cs.LG
Action-Attentive Deep Reinforcement Learning for Autonomous Alignment of Beamlines
[AUTHORS]
Siyu Wang, Shengran Dai, Jianhui Jiang, Shuang Wu, Yufei Peng, Junbin Zhang
[ABSTRACT]
Synchrotron radiation sources play a crucial role in fields such as materials
science, biology, and chemistry. The beamline, a key subsystem of the
synchrotron, modulates and directs the radiation to the sample for analysis.
However, the alignment of beamlines is a complex and time-consuming process,
primarily carried out manually by experienced engineers. Even minor
misalignments in optical components can significantly affect the beam’s
properties, leading to suboptimal experimental outcomes. Current automated
methods, such as bayesian optimization (BO) and reinforcement learning (RL),
although these methods enhance performance, limitations remain. The
relationship between the current and target beam properties, crucial for
determining the adjustment, is not fully considered. Additionally, the physical
characteristics of optical elements are overlooked, such as the need to adjust
specific devices to control the output beam’s spot size or position. This paper
addresses the alignment of beamlines by modeling it as a Markov Decision
Process (MDP) and training an intelligent agent using RL. The agent calculates
adjustment values based on the current and target beam states, executes
actions, and iterates until optimal parameters are achieved. A policy network
with action attention is designed to improve decision-making by considering
both state differences and the impact of optical components. Experiments on two
simulated beamlines demonstrate that our algorithm outperforms existing
methods, with ablation studies highlighting the effectiveness of the action
attention-based policy network.
[COMMENTS]
17 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.12183v1
[DATE]
2024-11-19 10:50:11+08:00
[CATEGORIES]
cs.LG
Diffusion-Inspired Cold Start with Sufficient Prior in Computerized Adaptive Testing
[AUTHORS]
Haiping Ma, Aoqing Xia, Changqian Wang, Hai Wang, Xingyi Zhang
[ABSTRACT]
Computerized Adaptive Testing (CAT) aims to select the most appropriate
questions based on the examinee’s ability and is widely used in online
education. However, existing CAT systems often lack initial understanding of
the examinee’s ability, requiring random probing questions. This can lead to
poorly matched questions, extending the test duration and negatively impacting
the examinee’s mindset, a phenomenon referred to as the Cold Start with
Insufficient Prior (CSIP) task. This issue occurs because CAT systems do not
effectively utilize the abundant prior information about the examinee available
from other courses on online platforms. These response records, due to the
commonality of cognitive states across different knowledge domains, can provide
valuable prior information for the target domain. However, no prior work has
explored solutions for the CSIP task. In response to this gap, we propose
Diffusion Cognitive States TransfeR Framework (DCSR), a novel domain transfer
framework based on Diffusion Models (DMs) to address the CSIP task.
Specifically, we construct a cognitive state transition bridge between domains,
guided by the common cognitive states of examinees, encouraging the model to
reconstruct the initial ability state in the target domain. To enrich the
expressive power of the generated data, we analyze the causal relationships in
the generation process from a causal perspective. Redundant and extraneous
cognitive states can lead to limited transfer and negative transfer effects.
Our DCSR can seamlessly apply the generated initial ability states in the
target domain to existing question selection algorithms, thus improving the
cold start performance of the CAT system. Extensive experiments conducted on
five real-world datasets demonstrate that DCSR significantly outperforms
existing baseline methods in addressing the CSIP task.
[COMMENTS]
Accepted by KDD2025
[LINK]
http://arxiv.org/abs/2411.12182v1
[DATE]
2024-11-19 10:48:58+08:00
[CATEGORIES]
cs.LG
A universal approximation theorem for nonlinear resistive networks
[AUTHORS]
Benjamin Scellier, Siddhartha Mishra
[ABSTRACT]
Resistor networks have recently attracted interest as analog computing
platforms for machine learning, particularly due to their compatibility with
the Equilibrium Propagation training framework. In this work, we explore the
computational capabilities of these networks. We prove that electrical networks
consisting of voltage sources, linear resistors, diodes, and voltage-controlled
voltage sources (VCVS) can approximate any continuous function to arbitrary
precision. Central to our proof is a method for translating a ReLU neural
network into an approximately equivalent electrical network comprising these
four elements. Our proof relies on two assumptions: (a) circuit elements are
ideal, and (b) variable resistor conductances and VCVS amplification factors
can take any value (arbitrarily small or large). Our findings provide insights
that could guide the development of universal self-learning electrical
networks.
[LINK]
http://arxiv.org/abs/2312.15063v2
[DATE]
2024-11-19 10:40:19+08:00
[CATEGORIES]
cs.LG
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
[AUTHORS]
Prakhar Ganesh, Usman Gohar, Lu Cheng, Golnoosh Farnadi
[COMMENTS]
To appear at AFME@NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.11101v2
[DATE]
2024-11-19 10:39:53+08:00
[CATEGORIES]
cs.LG
SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks
[AUTHORS]
Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu
[ABSTRACT]
Deep reinforcement learning (DRL) has achieved remarkable success in various
research domains. However, its reliance on neural networks results in a lack of
transparency, which limits its practical applications. To achieve
explainability, decision trees have emerged as a popular and promising
alternative to neural networks. Nonetheless, due to their limited
expressiveness, traditional decision trees struggle with high-dimensional
long-horizon continuous control tasks. In this paper, we proposes SkillTree, a
novel framework that reduces complex continuous action spaces into discrete
skill spaces. Our hierarchical approach integrates a differentiable decision
tree within the high-level policy to generate skill embeddings, which
subsequently guide the low-level policy in executing skills. By making skill
decisions explainable, we achieve skill-level explainability, enhancing the
understanding of the decision-making process in complex tasks. Experimental
results demonstrate that our method achieves performance comparable to
skill-based neural networks in complex robotic arm control domains.
Furthermore, SkillTree offers explanations at the skill level, thereby
increasing the transparency of the decision-making process.
[LINK]
http://arxiv.org/abs/2411.12173v1
[DATE]
2024-11-19 10:35:14+08:00
[CATEGORIES]
cs.LG
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
[AUTHORS]
Jiawei Du, Xin Zhang, Juncheng Hu, Wenxin Huang, Joey Tianyi Zhou
[ABSTRACT]
The sharp increase in data-related expenses has motivated research into
condensing datasets while retaining the most informative features. Dataset
distillation has thus recently come to the fore. This paradigm generates
synthetic datasets that are representative enough to replace the original
dataset in training a neural network. To avoid redundancy in these synthetic
datasets, it is crucial that each element contains unique features and remains
diverse from others during the synthesis stage. In this paper, we provide a
thorough theoretical and empirical analysis of diversity within synthesized
datasets. We argue that enhancing diversity can improve the parallelizable yet
isolated synthesizing approach. Specifically, we introduce a novel method that
employs dynamic and directed weight adjustment techniques to modulate the
synthesis process, thereby maximizing the representativeness and diversity of
each synthetic instance. Our method ensures that each batch of synthetic data
mirrors the characteristics of a large, varying subset of the original dataset.
Extensive experiments across multiple datasets, including CIFAR, Tiny-ImageNet,
and ImageNet-1K, demonstrate the superior performance of our method,
highlighting its effectiveness in producing diverse and representative
synthetic datasets with minimal computational expense. Our code is available at
https://github.com/AngusDujw/Diversity-Driven-Synthesis.https://github.com/AngusDujw/Diversity-Driven-Synthesis.
[LINK]
http://arxiv.org/abs/2409.17612v3
[DATE]
2024-11-19 10:05:56+08:00
[CATEGORIES]
cs.LG
UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning
[AUTHORS]
Yuan Yuan, Chonghua Han, Jingtao Ding, Depeng Jin, Yong Li
[ABSTRACT]
The urban environment is characterized by complex spatio-temporal dynamics
arising from diverse human activities and interactions. Effectively modeling
these dynamics is essential for understanding and optimizing urban systems In
this work, we introduce UrbanDiT, a foundation model for open-world urban
spatio-temporal learning that successfully scale up diffusion transformers in
this field. UrbanDiT pioneers a unified model that integrates diverse
spatio-temporal data sources and types while learning universal spatio-temporal
patterns across different cities and scenarios. This allows the model to unify
both multi-data and multi-task learning, and effectively support a wide range
of spatio-temporal applications. Its key innovation lies in the elaborated
prompt learning framework, which adaptively generates both data-driven and
task-specific prompts, guiding the model to deliver superior performance across
various urban applications. UrbanDiT offers three primary advantages: 1) It
unifies diverse data types, such as grid-based and graph-based data, into a
sequential format, allowing to capture spatio-temporal dynamics across diverse
scenarios of different cities; 2) With masking strategies and task-specific
prompts, it supports a wide range of tasks, including bi-directional
spatio-temporal prediction, temporal interpolation, spatial extrapolation, and
spatio-temporal imputation; and 3) It generalizes effectively to open-world
scenarios, with its powerful zero-shot capabilities outperforming nearly all
baselines with training data. These features allow UrbanDiT to achieves
state-of-the-art performance in different domains such as transportation
traffic, crowd flows, taxi demand, bike usage, and cellular traffic, across
multiple cities and tasks. UrbanDiT sets up a new benchmark for foundation
models in the urban spatio-temporal domain.
[LINK]
http://arxiv.org/abs/2411.12164v1
[DATE]
2024-11-19 10:01:07+08:00
[CATEGORIES]
cs.LG
Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning
[AUTHORS]
Luis Mandl, Somdatta Goswami, Lena Lambers, Tim Ricken
[ABSTRACT]
The deep operator network (DeepONet) is a popular neural operator
architecture that has shown promise in solving partial differential equations
(PDEs) by using deep neural networks to map between infinite-dimensional
function spaces. In the absence of labeled datasets, we utilize the PDE
residual loss to learn the physical system, an approach known as
physics-informed DeepONet. This method faces significant computational
challenges, primarily due to the curse of dimensionality, as the computational
cost increases exponentially with finer discretization. In this paper, we
introduce the Separable DeepONet framework to address these challenges and
improve scalability for high-dimensional PDEs. Our approach involves a
factorization technique where sub-networks handle individual one-dimensional
coordinates, thereby reducing the number of forward passes and the size of the
Jacobian matrix. By using forward-mode automatic differentiation, we further
optimize the computational cost related to the Jacobian matrix. As a result,
our modifications lead to a linear scaling of computational cost with
discretization density, making Separable DeepONet suitable for high-dimensional
PDEs. We validate the effectiveness of the separable architecture through three
benchmark PDE models: the viscous Burgers equation, Biot’s consolidation
theory, and a parametrized heat equation. In all cases, our proposed framework
achieves comparable or improved accuracy while significantly reducing
computational time compared to conventional DeepONet. These results demonstrate
the potential of Separable DeepONet in efficiently solving complex,
high-dimensional PDEs, advancing the field of physics-informed machine
learning.
[COMMENTS]
23 Pages, 9 Figures and 1 Table
[LINK]
http://arxiv.org/abs/2407.15887v3
[DATE]
2024-11-19 09:30:14+08:00
[CATEGORIES]
cs.LG
Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning
[AUTHORS]
Younggyo Seo, Pieter Abbeel
[ABSTRACT]
Training reinforcement learning (RL) agents on robotic tasks typically
requires a large number of training samples. This is because training data
often consists of noisy trajectories, whether from exploration or
human-collected demonstrations, making it difficult to learn value functions
that understand the effect of taking each action. On the other hand, recent
behavior-cloning (BC) approaches have shown that predicting a sequence of
actions enables policies to effectively approximate noisy, multi-modal
distributions of expert demonstrations. Can we use a similar idea for improving
RL on robotic tasks? In this paper, we introduce a novel RL algorithm that
learns a critic network that outputs Q-values over a sequence of actions. By
explicitly training the value functions to learn the consequence of executing a
series of current and future actions, our algorithm allows for learning useful
value functions from noisy trajectories. We study our algorithm across various
setups with sparse and dense rewards, and with or without demonstrations,
spanning mobile bi-manual manipulation, whole-body control, and tabletop
manipulation tasks from BiGym, HumanoidBench, and RLBench. We find that, by
learning the critic network with action sequences, our algorithm outperforms
various RL and BC baselines, in particular on challenging humanoid control
tasks.
[COMMENTS]
17 Pages. Website: https://younggyo.me/cqn-as/
[LINK]
http://arxiv.org/abs/2411.12155v1
[DATE]
2024-11-19 09:23:52+08:00
[CATEGORIES]
cs.LG
Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds
[AUTHORS]
Arda Güçlü, Subhonmesh Bose
[ABSTRACT]
We propose and analyze TRAiL (Tangential Randomization in Linear Bandits), a
computationally efficient regret-optimal forced exploration algorithm for
linear bandits on action sets that are sublevel sets of strongly convex
functions. TRAiL estimates the governing parameter of the linear bandit problem
through a standard regularized least squares and perturbs the reward-maximizing
action corresponding to said point estimate along the tangent plane of the
convex compact action set before projecting back to it. Exploiting
concentration results for matrix martingales, we prove that TRAiL ensures a
$\Omega(\sqrt{T})$ growth in the inference quality, measured via the minimum
eigenvalue of the design (regressor) matrix with high-probability over a
$T$-length period. We build on this result to obtain an $\mathcal{O}(\sqrt{T}
\log(T))$ upper bound on cumulative regret with probability at least $ 1 - 1/T$
over $T$ periods, and compare TRAiL to other popular algorithms for linear
bandits. Then, we characterize an $\Omega(\sqrt{T})$ minimax lower bound for
any algorithm on the expected regret that covers a wide variety of
action/parameter sets and noise processes. Our analysis not only expands the
realm of lower-bounds in linear bandits significantly, but as a byproduct,
yields a trade-off between regret and inference quality. Specifically, we prove
that any algorithm with an $\mathcal{O}(T^\alpha)$ expected regret growth must
have an $\Omega(T^{1-\alpha})$ asymptotic growth in expected inference quality.
Our experiments on the $L^p$ unit ball as action sets reveal how this relation
can be violated, but only in the short-run, before returning to respect the
bound asymptotically. In effect, regret-minimizing algorithms must have just
the right rate of inference – too fast or too slow inference will incur
sub-optimal regret growth.
[COMMENTS]
42 pages, 6 Figures
[LINK]
http://arxiv.org/abs/2411.12154v1
[DATE]
2024-11-19 09:08:13+08:00
[CATEGORIES]
cs.LG
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments
[AUTHORS]
Shuijing Liu, Haochen Xia, Fatemeh Cheraghi Pouria, Kaiwen Hong, Neeloy Chakraborty, Katherine Driggs-Campbell
[ABSTRACT]
We study the problem of robot navigation in dense and interactive crowds with
environmental constraints such as corridors and furniture. Previous methods
fail to consider all types of interactions among agents and obstacles, leading
to unsafe and inefficient robot paths. In this article, we leverage a
graph-based representation of crowded and constrained scenarios and propose a
structured framework to learn robot navigation policies with deep reinforcement
learning. We first split the representations of different components in the
environment and propose a heterogeneous spatio-temporal (st) graph to model
distinct interactions among humans, robots, and obstacles. Based on the
heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network
architecture with different components to capture heterogeneous interactions
among entities through space and time. HEIGHT utilizes attention mechanisms to
prioritize important interactions and a recurrent network to track changes in
the dynamic scene over time, encouraging the robot to avoid collisions
adaptively. Through extensive simulation and real-world experiments, we
demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of
success and efficiency in challenging navigation scenarios. Furthermore, we
demonstrate that our pipeline achieves better zero-shot generalization
capability than previous works when the densities of humans and obstacles
change. More videos are available at
https://sites.google.com/view/crowdnav-height/home.
[LINK]
http://arxiv.org/abs/2411.12150v1
[DATE]
2024-11-19 08:56:35+08:00
[CATEGORIES]
cs.LG
Self-supervised denoising of visual field data improves detection of glaucoma progression
[AUTHORS]
Sean Wu, Jun Yu Chen, Vahid Mohammadzadeh, Sajad Besharati, Jaewon Lee, Kouros Nouri-Mahdavi, Joseph Caprioli, Zhe Fei, Fabien Scalzo
[ABSTRACT]
Perimetric measurements provide insight into a patient’s peripheral vision
and day-to-day functioning and are the main outcome measure for identifying
progression of visual damage from glaucoma. However, visual field data can be
noisy, exhibiting high variance, especially with increasing damage. In this
study, we demonstrate the utility of self-supervised deep learning in denoising
visual field data from over 4000 patients to enhance its signal-to-noise ratio
and its ability to detect true glaucoma progression. We deployed both a
variational autoencoder (VAE) and a masked autoencoder to determine which
self-supervised model best smooths the visual field data while reconstructing
salient features that are less noisy and more predictive of worsening disease.
Our results indicate that including a categorical p-value at every visual field
location improves the smoothing of visual field data. Masked autoencoders led
to cleaner denoised data than previous methods, such as variational
autoencoders. A 4.7% increase in detection of progressing eyes with pointwise
linear regression (PLR) was observed. The masked and variational autoencoders’
smoothed data predicted glaucoma progression 2.3 months earlier when p-values
were included compared to when they were not. The faster prediction of time to
progression (TTP) and the higher percentage progression detected support our
hypothesis that masking out visual field elements during training while
including p-values at each location would improve the task of detection of
visual field progression. Our study has clinically relevant implications
regarding masking when training neural networks to denoise visual field data,
resulting in earlier and more accurate detection of glaucoma progression. This
denoising model can be integrated into future models for visual field analysis
to enhance detection of glaucoma progression.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2411.12146v1
[DATE]
2024-11-19 08:50:01+08:00
[CATEGORIES]
cs.LG
Visualizing Loss Functions as Topological Landscape Profiles
[AUTHORS]
Caleb Geniesse, Jiaqing Chen, Tiankai Xie, Ge Shi, Yaoqing Yang, Dmitriy Morozov, Talita Perciano, Michael W. Mahoney, Ross Maciejewski, Gunther H. Weber
[ABSTRACT]
In machine learning, a loss function measures the difference between model
predictions and ground-truth (or target) values. For neural network models,
visualizing how this loss changes as model parameters are varied can provide
insights into the local structure of the so-called loss landscape (e.g.,
smoothness) as well as global properties of the underlying model (e.g.,
generalization performance). While various methods for visualizing the loss
landscape have been proposed, many approaches limit sampling to just one or two
directions, ignoring potentially relevant information in this extremely
high-dimensional space. This paper introduces a new representation based on
topological data analysis that enables the visualization of higher-dimensional
loss landscapes. After describing this new topological landscape profile
representation, we show how the shape of loss landscapes can reveal new details
about model performance and learning dynamics, highlighting several use cases,
including image segmentation (e.g., UNet) and scientific machine learning
(e.g., physics-informed neural networks). Through these examples, we provide
new insights into how loss landscapes vary across distinct hyperparameter
spaces: we find that the topology of the loss landscape is simpler for
better-performing models; and we observe greater variation in the shape of loss
landscapes near transitions from low to high model performance.
[LINK]
http://arxiv.org/abs/2411.12136v1
[DATE]
2024-11-19 08:28:14+08:00
[CATEGORIES]
cs.LG
SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems
[AUTHORS]
Yusuf Umut Ciftci, Darren Chiu, Zeyuan Feng, Gaurav S. Sukhatme, Somil Bansal
[ABSTRACT]
Behavior cloning (BC) is a widely-used approach in imitation learning, where
a robot learns a control policy by observing an expert supervisor. However, the
learned policy can make errors and might lead to safety violations, which
limits their utility in safety-critical robotics applications. While prior
works have tried improving a BC policy via additional real or synthetic action
labels, adversarial training, or runtime filtering, none of them explicitly
focus on reducing the BC policy’s safety violations during training time. We
propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning
policies. SAFE-GIL deliberately injects adversarial disturbance in the system
during data collection to guide the expert towards safety-critical states. This
disturbance injection simulates potential policy errors that the system might
encounter during the test time. By ensuring that training more closely
replicates expert behavior in safety-critical states, our approach results in
safer policies despite policy errors during the test time. We further develop a
reachability-based method to compute this adversarial disturbance. We compare
SAFE-GIL with various behavior cloning techniques and online safety-filtering
methods in three domains: autonomous ground navigation, aircraft taxiing, and
aerial navigation on a quadrotor testbed. Our method demonstrates a significant
reduction in safety failures, particularly in low data regimes where the
likelihood of learning errors, and therefore safety violations, is higher. See
our website here: https://y-u-c.github.io/safegil/
[LINK]
http://arxiv.org/abs/2404.05249v2
[DATE]
2024-11-19 08:05:02+08:00
[CATEGORIES]
cs.LG
Fine-Grained Uncertainty Quantification via Collisions
[AUTHORS]
Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon
[ABSTRACT]
We propose a new approach for fine-grained uncertainty quantification (UQ)
using a collision matrix. For a classification problem involving $K$ classes,
the $K\times K$ collision matrix $S$ measures the inherent (aleatoric)
difficulty in distinguishing between each pair of classes. In contrast to
existing UQ methods, the collision matrix gives a much more detailed picture of
the difficulty of classification. We discuss several possible downstream
applications of the collision matrix, establish its fundamental mathematical
properties, as well as show its relationship with existing UQ methods,
including the Bayes error rate. We also address the new problem of estimating
the collision matrix using one-hot labeled data. We propose a series of
innovative techniques to estimate $S$. First, we learn a contrastive binary
classifier which takes two inputs and determines if they belong to the same
class. We then show that this contrastive classifier (which is PAC learnable)
can be used to reliably estimate the Gramian matrix of $S$, defined as
$G=S^TS$. Finally, we show that under very mild assumptions, $G$ can be used to
uniquely recover $S$, a new result on stochastic matrices which could be of
independent interest. Experimental results are also presented to validate our
methods on several datasets.
[LINK]
http://arxiv.org/abs/2411.12127v1
[DATE]
2024-11-19 07:41:27+08:00
[CATEGORIES]
cs.LG
Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration
[AUTHORS]
Wei Ji, Li Li, Zheqi Lv, Wenqiao Zhang, Mengze Li, Zhen Wan, Wenqiang Lei, Roger Zimmermann
[ABSTRACT]
In our increasingly interconnected world, where intelligent devices
continually amass copious personalized multi-modal data, a pressing need arises
to deliver high-quality, personalized device-aware services. However, this
endeavor presents a multifaceted challenge to prevailing artificial
intelligence (AI) systems primarily rooted in the cloud. As these systems
grapple with shifting data distributions between the cloud and devices, the
traditional approach of fine-tuning-based adaptation (FTA) exists the following
issues: the costly and time-consuming data annotation required by FTA and the
looming risk of model overfitting. To surmount these challenges, we introduce a
Universal On-Device Multi-modal Model Adaptation Framework, revolutionizing
on-device model adaptation by striking a balance between efficiency and
effectiveness. The framework features the Fast Domain Adaptor (FDA) hosted in
the cloud, providing tailored parameters for the Lightweight Multi-modal Model
on devices. To enhance adaptability across multi-modal tasks, the AnchorFrame
Distribution Reasoner (ADR) minimizes communication costs. Our contributions,
encapsulated in the Cloud-Device Collaboration Multi-modal Parameter Generation
(CDC-MMPG) framework, represent a pioneering solution for on-Device Multi-modal
Model Adaptation (DMMA). Extensive experiments validate the efficiency and
effectiveness of our method, particularly in video question answering and
retrieval tasks, driving forward the integration of intelligent devices into
our daily lives.
[LINK]
http://arxiv.org/abs/2406.01601v3
[DATE]
2024-11-19 07:06:46+08:00
[CATEGORIES]
cs.LG
Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning
[AUTHORS]
Brian B. Moser, Federico Raue, Tobias C. Nauen, Stanislav Frolov, Andreas Dengel
[ABSTRACT]
Dataset distillation has gained significant interest in recent years, yet
existing approaches typically distill from the entire dataset, potentially
including non-beneficial samples. We introduce a novel “Prune First, Distill
After” framework that systematically prunes datasets via loss-based sampling
prior to distillation. By leveraging pruning before classical distillation
techniques and generative priors, we create a representative core-set that
leads to enhanced generalization for unseen architectures - a significant
challenge of current distillation methods. More specifically, our proposed
framework significantly boosts distilled quality, achieving up to a 5.2
percentage points accuracy increase even with substantial dataset pruning,
i.e., removing 80% of the original dataset prior to distillation. Overall, our
experimental results highlight the advantages of our easy-sample prioritization
and cross-architecture robustness, paving the way for more effective and
high-quality dataset distillation.
[LINK]
http://arxiv.org/abs/2411.12115v1
[DATE]
2024-11-19 06:51:44+08:00
[CATEGORIES]
cs.LG
GD doesn’t make the cut: Three ways that non-differentiability affects neural network training
[AUTHORS]
Siddharth Krishna Kumar
[ABSTRACT]
This paper critically examines the fundamental distinctions between gradient
methods applied to non-differentiable functions (NGDMs) and classical gradient
descents (GDs) for differentiable functions, revealing significant gaps in
current deep learning optimization theory. We demonstrate that NGDMs exhibit
markedly different convergence properties compared to GDs, strongly challenging
the applicability of extensive neural network convergence literature based on
$L-smoothness$ to non-smooth neural networks. Our analysis reveals paradoxical
behavior of NDGM solutions for $L_{1}$-regularized problems, where increasing
regularization counterintuitively leads to larger $L_{1}$ norms of optimal
solutions. This finding calls into question widely adopted $L_{1}$ penalization
techniques for network pruning. We further challenge the common assumption that
optimization algorithms like RMSProp behave similarly in differentiable and
non-differentiable contexts. Expanding on the Edge of Stability phenomenon, we
demonstrate its occurrence in a broader class of functions, including Lipschitz
continuous convex differentiable functions. This finding raises important
questions about its relevance and interpretation in non-convex,
non-differentiable neural networks, particularly those using ReLU activations.
Our work identifies critical misunderstandings of NDGMs in influential
literature, stemming from an overreliance on strong smoothness assumptions.
These findings necessitate a reevaluation of optimization dynamics in deep
learning, emphasizing the crucial need for more nuanced theoretical foundations
in analyzing these complex systems.
[LINK]
http://arxiv.org/abs/2401.08426v6
[DATE]
2024-11-19 06:26:15+08:00
[CATEGORIES]
cs.LG
QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
[AUTHORS]
Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljačić
[ABSTRACT]
We propose Quantum-informed Tensor Adaptation (QuanTA), a novel,
easy-to-implement, fine-tuning method with no inference overhead for
large-scale pre-trained language models. By leveraging quantum-inspired methods
derived from quantum circuit structures, QuanTA enables efficient high-rank
fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)–low-rank
approximation may fail for complicated downstream tasks. Our approach is
theoretically supported by the universality theorem and the rank representation
theorem to achieve efficient high-rank adaptations. Experiments demonstrate
that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning,
and scalability compared to traditional methods. Furthermore, QuanTA shows
superior performance with fewer trainable parameters compared to other
approaches and can be designed to integrate with existing fine-tuning
algorithms for further improvement, providing a scalable and efficient solution
for fine-tuning large language models and advancing state-of-the-art in natural
language processing.
[LINK]
http://arxiv.org/abs/2406.00132v3
[DATE]
2024-11-19 06:24:16+08:00
[CATEGORIES]
cs.LG
BALI: Learning Neural Networks via Bayesian Layerwise Inference
[AUTHORS]
Richard Kurle, Alexej Klushyn, Ralf Herbrich
[ABSTRACT]
We introduce a new method for learning Bayesian neural networks, treating
them as a stack of multivariate Bayesian linear regression models. The main
idea is to infer the layerwise posterior exactly if we know the target outputs
of each layer. We define these pseudo-targets as the layer outputs from the
forward pass, updated by the backpropagated gradients of the objective
function. The resulting layerwise posterior is a matrix-normal distribution
with a Kronecker-factorized covariance matrix, which can be efficiently
inverted. Our method extends to the stochastic mini-batch setting using an
exponential moving average over natural-parameter terms, thus gradually
forgetting older data. The method converges in few iterations and performs as
well as or better than leading Bayesian neural network methods on various
regression, classification, and out-of-distribution detection benchmarks.
[LINK]
http://arxiv.org/abs/2411.12102v1
[DATE]
2024-11-19 06:18:34+08:00
[CATEGORIES]
cs.LG
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
[AUTHORS]
Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C. S. Lui, Wei Chen
[ABSTRACT]
We introduce a novel framework of combinatorial multi-armed bandits (CMAB)
with multivariant and probabilistically triggering arms (CMAB-MT), where the
outcome of each arm is a $d$-dimensional multivariant random variable and the
feedback follows a general arm triggering process. Compared with existing CMAB
works, CMAB-MT not only enhances the modeling power but also allows improved
results by leveraging distinct statistical properties for multivariant random
variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering
probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm
built upon this condition. Our framework can include many important problems as
applications, such as episodic reinforcement learning (RL) and probabilistic
maximum coverage for goods distribution, all of which meet the above smoothness
condition and achieve matching or improved regret bounds compared to existing
works. Through our new framework, we build the first connection between the
episodic RL and CMAB literature, by offering a new angle to solve the episodic
RL through the lens of CMAB, which may encourage more interactions between
these two important directions.
[LINK]
http://arxiv.org/abs/2406.01386v2
[DATE]
2024-11-19 06:18:03+08:00
[CATEGORIES]
cs.LG
T-GAE: Transferable Graph Autoencoder for Network Alignment
[AUTHORS]
Jiashu He, Charilaos I. Kanatsoulis, Alejandro Ribeiro
[ABSTRACT]
Network alignment is the task of establishing one-to-one correspondences
between the nodes of different graphs. Although finding a plethora of
applications in high-impact domains, this task is known to be NP-hard in its
general form. Existing optimization algorithms do not scale up as the size of
the graphs increases. While being able to reduce the matching complexity,
current GNN approaches fit a deep neural network on each graph and requires
re-train on unseen samples, which is time and memory inefficient. To tackle
both challenges we propose T-GAE, a transferable graph autoencoder framework
that leverages transferability and stability of GNNs to achieve efficient
network alignment on out-of-distribution graphs without retraining. We prove
that GNN-generated embeddings can achieve more accurate alignment compared to
classical spectral methods. Our experiments on real-world benchmarks
demonstrate that T-GAE outperforms the state-of-the-art optimization method and
the best GNN approach by up to 38.7% and 50.8%, respectively, while being able
to reduce 90% of the training time when matching out-of-distribution large
scale networks. We conduct ablation studies to highlight the effectiveness of
the proposed encoder architecture and training objective in enhancing the
expressiveness of GNNs to match perturbed graphs. T-GAE is also proved to be
flexible to utilize matching algorithms of different complexities. Our code is
available at https://github.com/Jason-Tree/T-GAE.
[LINK]
http://arxiv.org/abs/2310.03272v4
[DATE]
2024-11-19 06:05:04+08:00
[CATEGORIES]
cs.LG
Molecule Generation with Fragment Retrieval Augmentation
[AUTHORS]
Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Arash Vahdat, Weili Nie
[ABSTRACT]
Fragment-based drug discovery, in which molecular fragments are assembled
into new molecules with desirable biochemical properties, has achieved great
success. However, many fragment-based molecule generation methods show limited
exploration beyond the existing fragments in the database as they only
reassemble or slightly modify the given ones. To tackle this problem, we
propose a new fragment-based molecule generation framework with retrieval
augmentation, namely Fragment Retrieval-Augmented Generation (f-RAG). f-RAG is
based on a pre-trained molecular generative model that proposes additional
fragments from input fragments to complete and generate a new molecule. Given a
fragment vocabulary, f-RAG retrieves two types of fragments: (1) hard
fragments, which serve as building blocks that will be explicitly included in
the newly generated molecule, and (2) soft fragments, which serve as reference
to guide the generation of new fragments through a trainable fragment injection
module. To extrapolate beyond the existing fragments, f-RAG updates the
fragment vocabulary with generated fragments via an iterative refinement
process which is further enhanced with post-hoc genetic fragment modification.
f-RAG can achieve an improved exploration-exploitation trade-off by maintaining
a pool of fragments and expanding it with novel and high-quality fragments
through a strong generative prior.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.12078v1
[DATE]
2024-11-19 05:43:52+08:00
[CATEGORIES]
cs.LG
Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning
[AUTHORS]
Arundhati S. Shanbhag, Brian B. Moser, Tobias C. Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel
[ABSTRACT]
Diffusion models, known for their generative capabilities, have recently
shown unexpected potential in image classification tasks by using Bayes’
theorem. However, most diffusion classifiers require evaluating all class
labels for a single classification, leading to significant computational costs
that can hinder their application in large-scale scenarios. To address this, we
present a Hierarchical Diffusion Classifier (HDC) that exploits the inherent
hierarchical label structure of a dataset. By progressively pruning irrelevant
high-level categories and refining predictions only within relevant
subcategories, i.e., leaf nodes, HDC reduces the total number of class
evaluations. As a result, HDC can accelerate inference by up to 60% while
maintaining and, in some cases, improving classification accuracy. Our work
enables a new control mechanism of the trade-off between speed and precision,
making diffusion-based classification more viable for real-world applications,
particularly in large-scale image classification tasks.
[LINK]
http://arxiv.org/abs/2411.12073v1
[DATE]
2024-11-19 05:34:05+08:00
[CATEGORIES]
cs.LG
Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution
[AUTHORS]
Brian B. Moser, Stanislav Frolov, Tobias C. Nauen, Federico Raue, Andreas Dengel
[ABSTRACT]
Large-scale, pre-trained Text-to-Image (T2I) diffusion models have gained
significant popularity in image generation tasks and have shown unexpected
potential in image Super-Resolution (SR). However, most existing T2I diffusion
models are trained with a resolution limit of 512x512, making scaling beyond
this resolution an unresolved but necessary challenge for image SR. In this
work, we introduce a novel approach that, for the first time, enables these
models to generate 2K, 4K, and even 8K images without any additional training.
Our method leverages MultiDiffusion, which distributes the generation across
multiple diffusion paths to ensure global coherence at larger scales, and local
degradation-aware prompt extraction, which guides the T2I model to reconstruct
fine local structures according to its low-resolution input. These innovations
unlock higher resolutions, allowing T2I diffusion models to be applied to image
SR tasks without limitation on resolution.
[LINK]
http://arxiv.org/abs/2411.12072v1
[DATE]
2024-11-19 05:32:49+08:00
[CATEGORIES]
cs.LG
Theoretical Corrections and the Leveraging of Reinforcement Learning to Enhance Triangle Attack
[AUTHORS]
Nicole Meng, Caleb Manicke, David Chen, Yingjie Lao, Caiwen Ding, Pengyu Hong, Kaleel Mahmood
[ABSTRACT]
Adversarial examples represent a serious issue for the application of machine
learning models in many sensitive domains. For generating adversarial examples,
decision based black-box attacks are one of the most practical techniques as
they only require query access to the model. One of the most recently proposed
state-of-the-art decision based black-box attacks is Triangle Attack (TA). In
this paper, we offer a high-level description of TA and explain potential
theoretical limitations. We then propose a new decision based black-box attack,
Triangle Attack with Reinforcement Learning (TARL). Our new attack addresses
the limits of TA by leveraging reinforcement learning. This creates an attack
that can achieve similar, if not better, attack accuracy than TA with half as
many queries on state-of-the-art classifiers and defenses across ImageNet and
CIFAR-10.
[LINK]
http://arxiv.org/abs/2411.12071v1
[DATE]
2024-11-19 05:31:24+08:00
[CATEGORIES]
cs.LG
Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging
[AUTHORS]
Zuzanna Buchnajzer, Kacper Dobek, Stanisław Hapke, Daniel Jankowski, Krzysztof Krawiec
[ABSTRACT]
Deep learning architectures based on convolutional neural networks tend to
rely on continuous, smooth features. While this characteristics provides
significant robustness and proves useful in many real-world tasks, it is
strikingly incompatible with the physical characteristic of the world, which,
at the scale in which humans operate, comprises crisp objects, typically
representing well-defined categories. This study proposes a class of
neurosymbolic systems that learn by reconstructing the observed images in terms
of visual primitives and are thus forced to form high-level, structural
explanations of them. When applied to the task of diagnosing abnormalities in
histological imaging, the method proved superior to a conventional deep
learning architecture in terms of classification accuracy, while being more
transparent.
[COMMENTS]
16 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.12070v1
[DATE]
2024-11-19 05:29:50+08:00
[CATEGORIES]
cs.LG
Contextual Combinatorial Bandits with Probabilistically Triggered Arms
[AUTHORS]
Xutong Liu, Jinhang Zuo, Siwei Wang, John C. S. Lui, Mohammad Hajiesmaili, Adam Wierman, Wei Chen
[ABSTRACT]
We study contextual combinatorial bandits with probabilistically triggered
arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide
range of applications, such as contextual cascading bandits and contextual
influence maximization bandits. Under the triggering probability modulated
(TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel
analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a
potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the
dimension of contexts, $p_{\min}$ is the minimum positive probability that any
arm can be triggered, and batch-size $K$ is the maximum number of arms that can
be triggered per round. Under the variance modulated (VM) or triggering
probability and variance modulated (TPVM) conditions, we propose a new
variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound
$\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a
valuable by-product, our analysis technique and variance-adaptive algorithm can
be applied to the CMAB-T and C$^2$MAB setting, improving existing results there
as well. We also include experiments that demonstrate the improved performance
of our algorithms compared with benchmark algorithms on synthetic and
real-world datasets.
[COMMENTS]
The 40th International Conference on Machine Learning (ICML), 2023
[LINK]
http://arxiv.org/abs/2303.17110v3
[DATE]
2024-11-19 05:26:10+08:00
[CATEGORIES]
cs.LG
The Statistical Accuracy of Neural Posterior and Likelihood Estimation
[AUTHORS]
David T. Frazier, Ryan Kelly, Christopher Drovandi, David J. Warne
[ABSTRACT]
Neural posterior estimation (NPE) and neural likelihood estimation (NLE) are
machine learning approaches that provide accurate posterior, and likelihood,
approximations in complex modeling scenarios, and in situations where
conducting amortized inference is a necessity. While such methods have shown
significant promise across a range of diverse scientific applications, the
statistical accuracy of these methods is so far unexplored. In this manuscript,
we give, for the first time, an in-depth exploration on the statistical
behavior of NPE and NLE. We prove that these methods have similar theoretical
guarantees to common statistical methods like approximate Bayesian computation
(ABC) and Bayesian synthetic likelihood (BSL). While NPE and NLE methods are
just as accurate as ABC and BSL, we prove that this accuracy can often be
achieved at a vastly reduced computational cost, and will therefore deliver
more attractive approximations than ABC and BSL in certain problems. We verify
our results theoretically and in several examples from the literature.
[LINK]
http://arxiv.org/abs/2411.12068v1
[DATE]
2024-11-19 05:25:32+08:00
[CATEGORIES]
cs.LG
Interpretation of High-Dimensional Regression Coefficients by Comparison with Linearized Compressing Features
[AUTHORS]
Joachim Schaeffer, Jinwook Rhyu, Robin Droop, Rolf Findeisen, Richard Braatz
[ABSTRACT]
Linear regression is often deemed inherently interpretable; however,
challenges arise for high-dimensional data. We focus on further understanding
how linear regression approximates nonlinear responses from high-dimensional
functional data, motivated by predicting cycle life for lithium-ion batteries.
We develop a linearization method to derive feature coefficients, which we
compare with the closest regression coefficients of the path of regression
solutions. We showcase the methods on battery data case studies where a single
nonlinear compressing feature, $g\colon \mathbb{R}^p \to \mathbb{R}$, is used
to construct a synthetic response, $\mathbf{y} \in \mathbb{R}$. This unifying
view of linear regression and compressing features for high-dimensional
functional data helps to understand (1) how regression coefficients are shaped
in the highly regularized domain and how they relate to linearized feature
coefficients and (2) how the shape of regression coefficients changes as a
function of regularization to approximate nonlinear responses by exploiting
local structures.
[COMMENTS]
This manuscript is a short communication. 9 pages, 4 figures
[LINK]
http://arxiv.org/abs/2411.12060v1
[DATE]
2024-11-19 04:59:38+08:00
[CATEGORIES]
cs.LG
Mirror and Preconditioned Gradient Descent in Wasserstein Space
[AUTHORS]
Clément Bonet, Théo Uscidda, Adam David, Pierre-Cyril Aubin-Frankowski, Anna Korba
[ABSTRACT]
As the problem of minimizing functionals on the Wasserstein space encompasses
many applications in machine learning, different optimization algorithms on
$\mathbb{R}^d$ have received their counterpart analog on the Wasserstein space.
We focus here on lifting two explicit algorithms: mirror descent and
preconditioned gradient descent. These algorithms have been introduced to
better capture the geometry of the function to minimize and are provably
convergent under appropriate (namely relative) smoothness and convexity
conditions. Adapting these notions to the Wasserstein space, we prove
guarantees of convergence of some Wasserstein-gradient-based discrete-time
schemes for new pairings of objective functionals and regularizers. The
difficulty here is to carefully select along which curves the functionals
should be smooth and convex. We illustrate the advantages of adapting the
geometry induced by the regularizer on ill-conditioned optimization tasks, and
showcase the improvement of choosing different discrepancies and geometries in
a computational biology task of aligning single-cells.
[COMMENTS]
Accepted as Spotlight at Conference on Neural Information Processing
Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2406.08938v2
[DATE]
2024-11-19 04:56:37+08:00
[CATEGORIES]
cs.LG
Learning Personalized Treatment Decisions in Precision Medicine: Disentangling Treatment Assignment Bias in Counterfactual Outcome Prediction and Biomarker Identification
[AUTHORS]
Michael Vollenweider, Manuel Schürch, Chiara Rohrer, Gabriele Gut, Michael Krauthammer, Andreas Wicki
[ABSTRACT]
Precision medicine has the potential to tailor treatment decisions to
individual patients using machine learning (ML) and artificial intelligence
(AI), but it faces significant challenges due to complex biases in clinical
observational data and the high-dimensional nature of biological data. This
study models various types of treatment assignment biases using mutual
information and investigates their impact on ML models for counterfactual
prediction and biomarker identification. Unlike traditional counterfactual
benchmarks that rely on fixed treatment policies, our work focuses on modeling
different characteristics of the underlying observational treatment policy in
distinct clinical settings. We validate our approach through experiments on toy
datasets, semi-synthetic tumor cancer genome atlas (TCGA) data, and real-world
biological outcomes from drug and CRISPR screens. By incorporating empirical
biological mechanisms, we create a more realistic benchmark that reflects the
complexities of real-world data. Our analysis reveals that different biases
lead to varying model performances, with some biases, especially those
unrelated to outcome mechanisms, having minimal effect on prediction accuracy.
This highlights the crucial need to account for specific biases in clinical
observational data in counterfactual ML model development, ultimately enhancing
the personalization of treatment decisions in precision medicine.
[COMMENTS]
9 pages, 5 figures, ML4H conference 2024
[LINK]
http://arxiv.org/abs/2410.00509v2
[DATE]
2024-11-19 04:55:24+08:00
[CATEGORIES]
cs.LG
Higher Order Graph Attention Probabilistic Walk Networks
[AUTHORS]
Thomas Bailie, Yun Sing Koh, Karthik Mukkavilli
[ABSTRACT]
Graphs inherently capture dependencies between nodes or variables through
their topological structure, with paths between any two nodes indicating a
sequential dependency on the nodes traversed. Message Passing Neural Networks
(MPNNs) leverage these latent relationships embedded in graph structures, and
have become widely adopted across diverse applications. However, many existing
methods predominantly rely on local information within the $1$-hop
neighborhood. This approach has notable limitations; for example, $1$-hop
aggregation schemes inherently lose long-distance information, and are limited
in expressive power as defined by the $k$-Weisfeiler-Leman ($k$-WL) isomorphism
test. To address these issues, we propose the Higher Order Graphical Attention
(HoGA) module, which assigns weights to variable-length paths sampled based on
feature-vector diversity, effectively reconstructing the $k$-hop neighborhood.
HoGA represents higher-order relationships as a robust form of self-attention,
applicable to any single-hop attention mechanism. In empirical studies,
applying HoGA to existing attention-based models consistently leads to
significant accuracy improvements on benchmark node classification datasets.
Furthermore, we observe that the performance degradation typically associated
with additional message-passing steps may be mitigated.
[LINK]
http://arxiv.org/abs/2411.12052v1
[DATE]
2024-11-19 04:46:02+08:00
[CATEGORIES]
cs.LG
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
[AUTHORS]
Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque
[ABSTRACT]
We present a framework to recognize Parkinson’s disease (PD) through an
English pangram utterance speech collected using a web application from diverse
recording settings and environments, including participants’ homes. Our dataset
includes a global cohort of 1306 participants, including 392 diagnosed with PD.
Leveraging the diversity of the dataset, spanning various demographic
properties (such as age, sex, and ethnicity), we used deep learning embeddings
derived from semi-supervised models such as Wav2Vec 2.0, WavLM, and ImageBind
representing the speech dynamics associated with PD. Our novel fusion model for
PD classification, which aligns different speech embeddings into a cohesive
feature space, demonstrated superior performance over standard
concatenation-based fusion models and other baselines (including models built
on traditional acoustic features). In a randomized data split configuration,
the model achieved an Area Under the Receiver Operating Characteristic Curve
(AUROC) of 88.94% and an accuracy of 85.65%. Rigorous statistical analysis
confirmed that our model performs equitably across various demographic
subgroups in terms of sex, ethnicity, and age, and remains robust regardless of
disease duration. Furthermore, our model, when tested on two entirely unseen
test datasets collected from clinical settings and from a PD care center,
maintained AUROC scores of 82.12% and 78.44%, respectively. This affirms the
model’s robustness and it’s potential to enhance accessibility and health
equity in real-world applications.
[COMMENTS]
31 pages, 6 figures, and 8 tables
[LINK]
http://arxiv.org/abs/2405.17206v2
[DATE]
2024-11-19 04:43:37+08:00
[CATEGORIES]
cs.LG
Fast Convergence of Softmax Policy Mirror Ascent
[AUTHORS]
Reza Asad, Reza Babanezhad, Issam Laradji, Nicolas Le Roux, Sharan Vaswani
[ABSTRACT]
Natural policy gradient (NPG) is a common policy optimization algorithm and
can be viewed as mirror ascent in the space of probabilities. Recently, Vaswani
et al. [2021] introduced a policy gradient method that corresponds to mirror
ascent in the dual space of logits. We refine this algorithm, removing its need
for a normalization across actions and analyze the resulting method (referred
to as SPMA). For tabular MDPs, we prove that SPMA with a constant step-size
matches the linear convergence of NPG and achieves a faster convergence than
constant step-size (accelerated) softmax policy gradient. To handle large
state-action spaces, we extend SPMA to use a log-linear policy
parameterization. Unlike that for NPG, generalizing SPMA to the linear function
approximation (FA) setting does not require compatible function approximation.
Unlike MDPO, a practical generalization of NPG, SPMA with linear FA only
requires solving convex softmax classification problems. We prove that SPMA
achieves linear convergence to the neighbourhood of the optimal value function.
We extend SPMA to handle non-linear FA and evaluate its empirical performance
on the MuJoCo and Atari benchmarks. Our results demonstrate that SPMA
consistently achieves similar or better performance compared to MDPO, PPO and
TRPO.
[LINK]
http://arxiv.org/abs/2411.12042v1
[DATE]
2024-11-19 04:27:13+08:00
[CATEGORIES]
cs.LG
Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperCluster
[AUTHORS]
J. Alex Hurt, Anes Ouadou, Mariam Alshehri, Grant J. Scott
[ABSTRACT]
Throughout the scientific computing space, deep learning algorithms have
shown excellent performance in a wide range of applications. As these deep
neural networks (DNNs) continue to mature, the necessary compute required to
train them has continued to grow. Today, modern DNNs require millions of FLOPs
and days to weeks of training to generate a well-trained model. The training
times required for DNNs are oftentimes a bottleneck in DNN research for a
variety of deep learning applications, and as such, accelerating and scaling
DNN training enables more robust and accelerated research. To that end, in this
work, we explore utilizing the NRP Nautilus HyperCluster to automate and scale
deep learning model training for three separate applications of DNNs, including
overhead object detection, burned area segmentation, and deforestation
detection. In total, 234 deep neural models are trained on Nautilus, for a
total time of 4,040 hours
[LINK]
http://arxiv.org/abs/2411.12038v1
[DATE]
2024-11-19 04:19:49+08:00
[CATEGORIES]
cs.LG
Prediction-Guided Active Experiments
[AUTHORS]
Ruicheng Ao, Hongyu Chen, David Simchi-Levi
[ABSTRACT]
Here is the revised abstract, ensuring all characters are ASCII-compatible:
In this work, we introduce a new framework for active experimentation, the
Prediction-Guided Active Experiment (PGAE), which leverages predictions from an
existing machine learning model to guide sampling and experimentation.
Specifically, at each time step, an experimental unit is sampled according to a
designated sampling distribution, and the actual outcome is observed based on
an experimental probability. Otherwise, only a prediction for the outcome is
available. We begin by analyzing the non-adaptive case, where full information
on the joint distribution of the predictor and the actual outcome is assumed.
For this scenario, we derive an optimal experimentation strategy by minimizing
the semi-parametric efficiency bound for the class of regular estimators. We
then introduce an estimator that meets this efficiency bound, achieving
asymptotic optimality. Next, we move to the adaptive case, where the predictor
is continuously updated with newly sampled data. We show that the adaptive
version of the estimator remains efficient and attains the same semi-parametric
bound under certain regularity assumptions. Finally, we validate PGAE’s
performance through simulations and a semi-synthetic experiment using data from
the US Census Bureau. The results underscore the PGAE framework’s effectiveness
and superiority compared to other existing methods.
[COMMENTS]
25 pages, 11 figures
[LINK]
http://arxiv.org/abs/2411.12036v1
[DATE]
2024-11-19 04:16:24+08:00
[CATEGORIES]
cs.LG
SportsNGEN: Sustained Generation of Realistic Multi-player Sports Gameplay
[AUTHORS]
Lachlan Thorpe, Lewis Bawden, Karanjot Vendal, John Bronskill, Richard E. Turner
[ABSTRACT]
We present a transformer decoder based sports simulation engine, SportsNGEN,
trained on sports player and ball tracking sequences, that is capable of
generating sustained gameplay and accurately mimicking the decision making of
real players. By training on a large database of professional tennis tracking
data, we demonstrate that simulations produced by SportsNGEN can be used to
predict the outcomes of rallies, determine the best shot choices at any point,
and evaluate counterfactual or what if scenarios to inform coaching decisions
and elevate broadcast coverage. By combining the generated simulations with a
shot classifier and logic to start and end rallies, the system is capable of
simulating an entire tennis match. We evaluate SportsNGEN by comparing
statistics of the simulations with those of real matches between the same
players. We show that the model output sampling parameters are crucial to
simulation realism and that SportsNGEN is probabilistically well-calibrated to
real data. In addition, a generic version of SportsNGEN can be customized to a
specific player by fine-tuning on the subset of match data that includes that
player. Finally, we show qualitative results indicating the same approach works
for football.
[LINK]
http://arxiv.org/abs/2403.12977v3
[DATE]
2024-11-19 04:09:57+08:00
[CATEGORIES]
cs.LG
Machine Learning Evaluation Metric Discrepancies across Programming Languages and Their Components: Need for Standardization
[AUTHORS]
Mohammad R. Salmanpour, Morteza Alizadeh, Ghazal Mousavi, Saba Sadeghi, Sajad Amiri, Mehrdad Oveisi, Arman Rahmim, Ilker Hacihaliloglu
[ABSTRACT]
This study evaluates metrics for tasks such as classification, regression,
clustering, correlation analysis, statistical tests, segmentation, and
image-to-image (I2I) translation. Metrics were compared across Python
libraries, R packages, and Matlab functions to assess their consistency and
highlight discrepancies. The findings underscore the need for a unified roadmap
to standardize metrics, ensuring reliable and reproducible ML evaluations
across platforms. This study examined a wide range of evaluation metrics across
various tasks and found only some to be consistent across platforms, such as
(i) Accuracy, Balanced Accuracy, Cohens Kappa, F-beta Score, MCC, Geometric
Mean, AUC, and Log Loss in binary classification; (ii) Accuracy, Cohens Kappa,
and F-beta Score in multi-class classification; (iii) MAE, MSE, RMSE, MAPE,
Explained Variance, Median AE, MSLE, and Huber in regression; (iv)
Davies-Bouldin Index and Calinski-Harabasz Index in clustering; (v) Pearson,
Spearman, Kendall’s Tau, Mutual Information, Distance Correlation, Percbend,
Shepherd, and Partial Correlation in correlation analysis; (vi) Paired t-test,
Chi-Square Test, ANOVA, Kruskal-Wallis Test, Shapiro-Wilk Test, Welchs t-test,
and Bartlett’s test in statistical tests; (vii) Accuracy, Precision, and Recall
in 2D segmentation; (viii) Accuracy in 3D segmentation; (ix) MAE, MSE, RMSE,
and R-Squared in 2D-I2I translation; and (x) MAE, MSE, and RMSE in 3D-I2I
translation. Given observation of discrepancies in a number of metrics (e.g.
precision, recall and F1 score in binary classification, WCSS in clustering,
multiple statistical tests, and IoU in segmentation, amongst multiple metrics),
this study concludes that ML evaluation metrics require standardization and
recommends that future research use consistent metrics for different tasks to
effectively compare ML techniques and solutions.
[COMMENTS]
This paper is 12 pages with 1 table and 10 figures
[LINK]
http://arxiv.org/abs/2411.12032v1
[DATE]
2024-11-19 04:07:31+08:00
[CATEGORIES]
cs.LG
The Generalization Error of Machine Learning Algorithms
[AUTHORS]
Samir M. Perlaza, Xinying Zou
[ABSTRACT]
In this paper, the method of gaps, a technique for deriving closed-form
expressions in terms of information measures for the generalization error of
machine learning algorithms is introduced. The method relies on two central
observations: $(a)$~The generalization error is an average of the variation of
the expected empirical risk with respect to changes on the probability measure
(used for expectation); and~$(b)$~these variations, also referred to as gaps,
exhibit closed-form expressions in terms of information measures. The
expectation of the empirical risk can be either with respect to a measure on
the models (with a fixed dataset) or with respect to a measure on the datasets
(with a fixed model), which results in two variants of the method of gaps. The
first variant, which focuses on the gaps of the expected empirical risk with
respect to a measure on the models, appears to be the most general, as no
assumptions are made on the distribution of the datasets. The second variant
develops under the assumption that datasets are made of independent and
identically distributed data points. All existing exact expressions for the
generalization error of machine learning algorithms can be obtained with the
proposed method. Also, this method allows obtaining numerous new exact
expressions, which improves the understanding of the generalization error;
establish connections with other areas in statistics, e.g., hypothesis testing;
and potentially, might guide algorithm designs.
[COMMENTS]
Submitted to the IEEE Transaction on Information Theory. November 18,
2024
[LINK]
http://arxiv.org/abs/2411.12030v1
[DATE]
2024-11-19 04:05:51+08:00
[CATEGORIES]
cs.LG
On the Efficiency of ERM in Feature Learning
[AUTHORS]
Ayoub El Hanchi, Chris J. Maddison, Murat A. Erdogdu
[ABSTRACT]
Given a collection of feature maps indexed by a set $\mathcal{T}$, we study
the performance of empirical risk minimization (ERM) on regression problems
with square loss over the union of the linear classes induced by these feature
maps. This setup aims at capturing the simplest instance of feature learning,
where the model is expected to jointly learn from the data an appropriate
feature map and a linear predictor. We start by studying the asymptotic
quantiles of the excess risk of sequences of empirical risk minimizers.
Remarkably, we show that when the set $\mathcal{T}$ is not too large and when
there is a unique optimal feature map, these quantiles coincide, up to a factor
of two, with those of the excess risk of the oracle procedure, which knows a
priori this optimal feature map and deterministically outputs an empirical risk
minimizer from the associated optimal linear class. We complement this
asymptotic result with a non-asymptotic analysis that quantifies the decaying
effect of the global complexity of the set $\mathcal{T}$ on the excess risk of
ERM, and relates it to the size of the sublevel sets of the suboptimality of
the feature maps. As an application of our results, we obtain new guarantees on
the performance of the best subset selection procedure in sparse linear
regression under general assumptions.
[COMMENTS]
23 pages, 0 figures
[LINK]
http://arxiv.org/abs/2411.12029v1
[DATE]
2024-11-19 04:05:05+08:00
[CATEGORIES]
cs.LG
Regret-Free Reinforcement Learning for LTL Specifications
[AUTHORS]
Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani
[ABSTRACT]
Reinforcement learning (RL) is a promising method to learn optimal control
policies for systems with unknown dynamics. In particular, synthesizing
controllers for safety-critical systems based on high-level specifications,
such as those expressed in temporal languages like linear temporal logic (LTL),
presents a significant challenge in control systems research. Current RL-based
methods designed for LTL tasks typically offer only asymptotic guarantees,
which provide no insight into the transient performance during the learning
phase. While running an RL algorithm, it is crucial to assess how close we are
to achieving optimal behavior if we stop learning.
In this paper, we present the first regret-free online algorithm for learning
a controller that addresses the general class of LTL specifications over Markov
decision processes (MDPs) with a finite set of states and actions. We begin by
proposing a regret-free learning algorithm to solve infinite-horizon
reach-avoid problems. For general LTL specifications, we show that the
synthesis problem can be reduced to a reach-avoid problem when the graph
structure is known. Additionally, we provide an algorithm for learning the
graph structure, assuming knowledge of a minimum transition probability, which
operates independently of the main regret-free algorithm.
[LINK]
http://arxiv.org/abs/2411.12019v1
[DATE]
2024-11-19 04:01:45+08:00
[CATEGORIES]
cs.LG
SynCoTrain: A Dual Classifier PU-learning Framework for Synthesizability Prediction
[AUTHORS]
Sasan Amariamir, Janine George, Philipp Benner
[ABSTRACT]
Material discovery is a cornerstone of modern science, driving advancements
in diverse disciplines from biomedical technology to climate solutions.
Predicting synthesizability, a critical factor in realizing novel materials,
remains a complex challenge due to the limitations of traditional heuristics
and thermodynamic proxies. While stability metrics such as formation energy
offer partial insights, they fail to account for kinetic factors and
technological constraints that influence synthesis outcomes. These challenges
are further compounded by the scarcity of negative data, as failed synthesis
attempts are often unpublished or context-specific.
We present SynCoTrain, a semi-supervised machine learning model designed to
predict the synthesizability of materials. SynCoTrain employs a co-training
framework leveraging two complementary graph convolutional neural networks:
SchNet and ALIGNN. By iteratively exchanging predictions between classifiers,
SynCoTrain mitigates model bias and enhances generalizability. Our approach
uses Positive and Unlabeled (PU) Learning to address the absence of explicit
negative data, iteratively refining predictions through collaborative learning.
The model demonstrates robust performance, achieving high recall on internal
and leave-out test sets. By focusing on oxide crystals, a well-characterized
material family with extensive experimental data, we establish SynCoTrain as a
reliable tool for predicting synthesizability while balancing dataset
variability and computational efficiency. This work highlights the potential of
co-training to advance high-throughput materials discovery and generative
research, offering a scalable solution to the challenge of synthesizability
prediction.
[LINK]
http://arxiv.org/abs/2411.12011v1
[DATE]
2024-11-19 03:53:19+08:00
[CATEGORIES]
cs.LG
Active learning for efficient discovery of optimal gene combinations in the combinatorial perturbation space
[AUTHORS]
Jason Qin, Hans-Hermann Wessels, Carlos Fernandez-Granda, Yuhan Hao
[ABSTRACT]
The advancement of novel combinatorial CRISPR screening technologies enables
the identification of synergistic gene combinations on a large scale. This is
crucial for developing novel and effective combination therapies, but the
combinatorial space makes exhaustive experimentation infeasible. We introduce
NAIAD, an active learning framework that efficiently discovers optimal gene
pairs capable of driving cells toward desired cellular phenotypes. NAIAD
leverages single-gene perturbation effects and adaptive gene embeddings that
scale with the training data size, mitigating overfitting in small-sample
learning while capturing complex gene interactions as more data is collected.
Evaluated on four CRISPR combinatorial perturbation datasets totaling over
350,000 genetic interactions, NAIAD, trained on small datasets, outperforms
existing models by up to 40\% relative to the second-best. NAIAD’s
recommendation system prioritizes gene pairs with the maximum predicted
effects, resulting in the highest marginal gain in each AI-experiment round and
accelerating discovery with fewer CRISPR experimental iterations. Our NAIAD
framework (https://github.com/NeptuneBio/NAIAD) improves the identification of
novel, effective gene combinations, enabling more efficient CRISPR library
design and offering promising applications in genomics research and therapeutic
development.
[LINK]
http://arxiv.org/abs/2411.12010v1
[DATE]
2024-11-19 03:49:51+08:00
[CATEGORIES]
cs.LG
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
[AUTHORS]
Eli M Carrami, Sahand Sharifzadeh
[ABSTRACT]
Understanding protein structure and function is crucial in biology. However,
current computational methods are often task-specific and resource-intensive.
To address this, we propose zero-shot Protein Question Answering (PQA), a task
designed to answer a wide range of protein-related queries without
task-specific training. The success of PQA hinges on high-quality datasets and
robust evaluation strategies, both of which are lacking in current research.
Existing datasets suffer from biases, noise, and lack of evolutionary context,
while current evaluation methods fail to accurately assess model performance.
We introduce the Pika framework to overcome these limitations. Pika comprises a
curated, debiased dataset tailored for PQA and a biochemically relevant
benchmarking strategy. We also propose multimodal large language models as a
strong baseline for PQA, leveraging their natural language processing and
knowledge. This approach promises a more flexible and efficient way to explore
protein properties, advancing protein research. Our comprehensive PQA
framework, Pika, including dataset, code, and model checkpoints, is openly
accessible on github.com/EMCarrami/Pika, promoting wider research in the field.
[LINK]
http://arxiv.org/abs/2402.13653v2
[DATE]
2024-11-19 03:32:06+08:00
[CATEGORIES]
cs.LG
Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms
[AUTHORS]
Johannes Hertrich
[ABSTRACT]
Kernel-based methods are heavily used in machine learning. However, they
suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In
this paper, we propose an approximation procedure, which reduces this
complexity to $O(N)$. Our approach is based on two ideas. First, we prove that
any radial kernel with analytic basis function can be represented as sliced
version of some one-dimensional kernel and derive an analytic formula for the
one-dimensional counterpart. It turns out that the relation between one- and
$d$-dimensional kernels is given by a generalized Riemann-Liouville fractional
integral. Hence, we can reduce the $d$-dimensional kernel summation to a
one-dimensional setting. Second, for solving these one-dimensional problems
efficiently, we apply fast Fourier summations on non-equispaced data, a sorting
algorithm or a combination of both. Due to its practical importance we pay
special attention to the Gaussian kernel, where we show a dimension-independent
error bound and represent its one-dimensional counterpart via a closed-form
Fourier transform. We provide a run time comparison and error estimate of our
fast kernel summations.
[LINK]
http://arxiv.org/abs/2401.08260v3
[DATE]
2024-11-19 03:24:17+08:00
[CATEGORIES]
cs.LG
Transmission Line Outage Probability Prediction Under Extreme Events Using Peter-Clark Bayesian Structural Learning
[AUTHORS]
Xiaolin Chen, Qiuhua Huang, Yuqi Zhou
[ABSTRACT]
Recent years have seen a notable increase in the frequency and intensity of
extreme weather events. With a rising number of power outages caused by these
events, accurate prediction of power line outages is essential for safe and
reliable operation of power grids. The Bayesian network is a probabilistic
model that is very effective for predicting line outages under weather-related
uncertainties. However, most existing studies in this area offer general risk
assessments, but fall short of providing specific outage probabilities. In this
work, we introduce a novel approach for predicting transmission line outage
probabilities using a Bayesian network combined with Peter-Clark (PC)
structural learning. Our approach not only enables precise outage probability
calculations, but also demonstrates better scalability and robust performance,
even with limited data. Case studies using data from BPA and NOAA show the
effectiveness of this approach, while comparisons with several existing methods
further highlight its advantages.
[LINK]
http://arxiv.org/abs/2411.11980v1
[DATE]
2024-11-19 03:10:49+08:00
[CATEGORIES]
cs.LG
Pairwise Markov Chains for Volatility Forecasting
[AUTHORS]
Elie Azeraf
[ABSTRACT]
The Pairwise Markov Chain (PMC) is a probabilistic graphical model extending
the well-known Hidden Markov Model. This model, although highly effective for
many tasks, has been scarcely utilized for continuous value prediction. This is
mainly due to the issue of modeling observations inherent in generative
probabilistic models. In this paper, we introduce a new algorithm for
prediction with the PMC. On the one hand, this algorithm allows circumventing
the feature problem, thus fully exploiting the capabilities of the PMC. On the
other hand, it enables the PMC to extend any predictive model by introducing
hidden states, updated at each time step, and allowing the introduction of
non-stationarity for any model. We apply the PMC with its new algorithm for
volatility forecasting, which we compare to the highly popular GARCH(1,1) and
feedforward neural models across numerous pairs. This is particularly relevant
given the regime changes that we can observe in volatility. For each scenario,
our algorithm enhances the performance of the extended model, demonstrating the
value of our approach.
[COMMENTS]
14 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.11838v1
[DATE]
2024-11-19 02:56:46+08:00
[CATEGORIES]
cs.LG
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
[AUTHORS]
Katie Kang, Amrith Setlur, Dibya Ghosh, Jacob Steinhardt, Claire Tomlin, Sergey Levine, Aviral Kumar
[ABSTRACT]
Despite the remarkable capabilities of modern large language models (LLMs),
the mechanisms behind their problem-solving abilities remain elusive. In this
work, we aim to better understand how the learning dynamics of LLM finetuning
shapes downstream generalization. Our analysis focuses on reasoning tasks,
whose problem structure allows us to distinguish between memorization (the
exact replication of reasoning steps from the training data) and performance
(the correctness of the final solution). We find that a model’s generalization
behavior can be effectively characterized by a training metric we call
pre-memorization train accuracy: the accuracy of model samples on training
queries before they begin to copy the exact reasoning steps from the training
set. On the dataset level, this metric is able to reliably predict test
accuracy, achieving $R^2$ of around or exceeding 0.9 across various models
(Llama3 8, Gemma2 9B), datasets (GSM8k, MATH), and training configurations. On
a per-example level, this metric is also indicative of whether individual model
predictions are robust to perturbations in the training query. By connecting a
model’s learning behavior to its generalization, pre-memorization train
accuracy can guide targeted improvements to training strategies. We focus on
data curation as an example, and show that prioritizing examples with low
pre-memorization accuracy leads to 1.5-2x improvements in data efficiency
compared to i.i.d. data scaling, and outperforms other standard data curation
techniques.
[LINK]
http://arxiv.org/abs/2411.07681v2
[DATE]
2024-11-19 02:49:59+08:00
[CATEGORIES]
cs.LG
KAN/MultKAN with Physics-Informed Spline fitting (KAN-PISF) for ordinary/partial differential equation discovery of nonlinear dynamic systems
[AUTHORS]
Ashish Pal, Satish Nagarajaiah
[ABSTRACT]
Machine learning for scientific discovery is increasingly becoming popular
because of its ability to extract and recognize the nonlinear characteristics
from the data. The black-box nature of deep learning methods poses difficulties
in interpreting the identified model. There is a dire need to interpret the
machine learning models to develop a physical understanding of dynamic systems.
An interpretable form of neural network called Kolmogorov-Arnold networks (KAN)
or Multiplicative KAN (MultKAN) offers critical features that help recognize
the nonlinearities in the governing ordinary/partial differential equations
(ODE/PDE) of various dynamic systems and find their equation structures. In
this study, an equation discovery framework is proposed that includes i)
sequentially regularized derivatives for denoising (SRDD) algorithm to denoise
the measure data to obtain accurate derivatives, ii) KAN to identify the
equation structure and suggest relevant nonlinear functions that are used to
create a small overcomplete library of functions, and iii) physics-informed
spline fitting (PISF) algorithm to filter the excess functions from the library
and converge to the correct equation. The framework was tested on the forced
Duffing oscillator, Van der Pol oscillator (stiff ODE), Burger’s equation, and
Bouc-Wen model (coupled ODE). The proposed method converged to the true
equation for the first three systems. It provided an approximate model for the
Bouc-Wen model that could acceptably capture the hysteresis response. Using KAN
maintains low complexity, which helps the user interpret the results throughout
the process and avoid the black-box-type nature of machine learning methods.
[LINK]
http://arxiv.org/abs/2411.11801v1
[DATE]
2024-11-19 02:14:51+08:00
[CATEGORIES]
cs.LG
Competing Bandits in Decentralized Large Contextual Matching Markets
[AUTHORS]
Satush Parikh, Soumya Basu, Avishek Ghosh, Abishek Sankararaman
[ABSTRACT]
Sequential learning in a multi-agent resource constrained matching market has
received significant interest in the past few years. We study decentralized
learning in two-sided matching markets where the demand side (aka players or
agents) competes for a `large’ supply side (aka arms) with potentially
time-varying preferences, to obtain a stable match. Despite a long line of work
in the recent past, existing learning algorithms such as Explore-Then-Commit or
Upper-Confidence-Bound remain inefficient for this problem. In particular, the
per-agent regret achieved by these algorithms scales linearly with the number
of arms, $K$. Motivated by the linear contextual bandit framework, we assume
that for each agent an arm-mean can be represented by a linear function of a
known feature vector and an unknown (agent-specific) parameter.
Moreover, our setup captures the essence of a dynamic (non-stationary)
matching market where the preferences over arms change over time. Our proposed
algorithms achieve instance-dependent logarithmic regret, scaling independently
of the number of arms, $K$.
[LINK]
http://arxiv.org/abs/2411.11794v1
[DATE]
2024-11-19 02:08:05+08:00
[CATEGORIES]
cs.LG
A Potential Game Perspective in Federated Learning
[AUTHORS]
Kang Liu, Ziqi Wang, Enrique Zuazua
[ABSTRACT]
Federated learning (FL) is an emerging paradigm for training machine learning
models across distributed clients. Traditionally, in FL settings, a central
server assigns training efforts (or strategies) to clients. However, from a
market-oriented perspective, clients may independently choose their training
efforts based on rational self-interest. To explore this, we propose a
potential game framework where each client’s payoff is determined by their
individual efforts and the rewards provided by the server. The rewards are
influenced by the collective efforts of all clients and can be modulated
through a reward factor. Our study begins by establishing the existence of Nash
equilibria (NEs), followed by an investigation of uniqueness in homogeneous
settings. We demonstrate a significant improvement in clients’ training efforts
at a critical reward factor, identifying it as the optimal choice for the
server. Furthermore, we prove the convergence of the best-response algorithm to
compute NEs for our FL game. Finally, we apply the training efforts derived
from specific NEs to a real-world FL scenario, validating the effectiveness of
the identified optimal reward factor.
[LINK]
http://arxiv.org/abs/2411.11793v1
[DATE]
2024-11-19 02:06:44+08:00
[CATEGORIES]
cs.LG
Parallelly Tempered Generative Adversarial Networks
[AUTHORS]
Jinwon Sohn, Qifan Song
[ABSTRACT]
A generative adversarial network (GAN) has been a representative backbone
model in generative artificial intelligence (AI) because of its powerful
performance in capturing intricate data-generating processes. However, the GAN
training is well-known for its notorious training instability, usually
characterized by the occurrence of mode collapse. Through the lens of
gradients’ variance, this work particularly analyzes the training instability
and inefficiency in the presence of mode collapse by linking it to
multimodality in the target distribution. To ease the raised training issues
from severe multimodality, we introduce a novel GAN training framework that
leverages a series of tempered distributions produced via convex interpolation.
With our newly developed GAN objective function, the generator can learn all
the tempered distributions simultaneously, conceptually resonating with the
parallel tempering in Statistics. Our simulation studies demonstrate the
superiority of our approach over existing popular training strategies in both
image and tabular data synthesis. We theoretically analyze that such
significant improvement can arise from reducing the variance of gradient
estimates by using the tempered distributions. Finally, we further develop a
variant of the proposed framework aimed at generating fair synthetic data which
is one of the growing interests in the field of trustworthy AI.
[LINK]
http://arxiv.org/abs/2411.11786v1
[DATE]
2024-11-19 02:01:13+08:00
[CATEGORIES]
cs.LG
Learning-Based Pricing and Matching for Two-Sided Queues
[AUTHORS]
Zixian Yang, Lei Ying
[ABSTRACT]
We consider a dynamic system with multiple types of customers and servers.
Each type of waiting customer or server joins a separate queue, forming a
bipartite graph with customer-side queues and server-side queues. The platform
can match the servers and customers if their types are compatible. The matched
pairs then leave the system. The platform will charge a customer a price
according to their type when they arrive and will pay a server a price
according to their type. The arrival rate of each queue is determined by the
price according to some unknown demand or supply functions. Our goal is to
design pricing and matching algorithms to maximize the profit of the platform
with unknown demand and supply functions, while keeping queue lengths of both
customers and servers below a predetermined threshold. This system can be used
to model two-sided markets such as ride-sharing markets with passengers and
drivers. The difficulties of the problem include simultaneous learning and
decision making, and the tradeoff between maximizing profit and minimizing
queue length. We use a longest-queue-first matching algorithm and propose a
learning-based pricing algorithm, which combines gradient-free stochastic
projected gradient ascent with bisection search. We prove that our proposed
algorithm yields a sublinear regret $\tilde{O}(T^{5/6})$ and anytime
queue-length bound $\tilde{O}(T^{1/6})$, where $T$ is the time horizon. We
further establish a tradeoff between the regret bound and the queue-length
bound: $\tilde{O}(T^{1-\gamma})$ versus $\tilde{O}(T^{\gamma})$ for $\gamma \in
(0, 1/6].$
[COMMENTS]
60 pages, 8 figures
[LINK]
http://arxiv.org/abs/2403.11093v2
[DATE]
2024-11-19 01:58:35+08:00
[CATEGORIES]
cs.LG
Robust Subgraph Learning by Monitoring Early Training Representations
[AUTHORS]
Sepideh Neshatfar, Salimeh Yasaei Sekeh
[ABSTRACT]
Graph neural networks (GNNs) have attracted significant attention for their
outstanding performance in graph learning and node classification tasks.
However, their vulnerability to adversarial attacks, particularly through
susceptible nodes, poses a challenge in decision-making. The need for robust
graph summarization is evident in adversarial challenges resulting from the
propagation of attacks throughout the entire graph. In this paper, we address
both performance and adversarial robustness in graph input by introducing the
novel technique SHERD (Subgraph Learning Hale through Early Training
Representation Distances). SHERD leverages information from layers of a
partially trained graph convolutional network (GCN) to detect susceptible nodes
during adversarial attacks using standard distance metrics. The method
identifies “vulnerable (bad)” nodes and removes such nodes to form a robust
subgraph while maintaining node classification performance. Through our
experiments, we demonstrate the increased performance of SHERD in enhancing
robustness by comparing the network’s performance on original and subgraph
inputs against various baselines alongside existing adversarial attacks. Our
experiments across multiple datasets, including citation datasets such as Cora,
Citeseer, and Pubmed, as well as microanatomical tissue structures of cell
graphs in the placenta, highlight that SHERD not only achieves substantial
improvement in robust performance but also outperforms several baselines in
terms of node classification accuracy and computational complexity.
[LINK]
http://arxiv.org/abs/2403.09901v2
[DATE]
2024-11-19 01:43:31+08:00
[CATEGORIES]
cs.LG
Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
[AUTHORS]
Xutong Liu, Jinhang Zuo, Siwei Wang, Carlee Joe-Wong, John C. S. Lui, Wei Chen
[ABSTRACT]
In this paper, we study the combinatorial semi-bandits (CMAB) and focus on
reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is
the total number of arms that can be pulled or triggered in each round. First,
for the setting of CMAB with probabilistically triggered arms (CMAB-T), we
discover a novel (directional) triggering probability and variance modulated
(TPVM) condition that can replace the previously-used smoothness condition for
various applications, such as cascading bandits, online network exploration and
online influence maximization. Under this new condition, we propose a BCUCB-T
algorithm with variance-aware confidence intervals and conduct regret analysis
which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret
bound, significantly improving the regret bounds for the above applications.
Second, for the setting of non-triggering CMAB with independent arms, we
propose a SESCB algorithm which leverages on the non-triggering version of the
TPVM condition and completely removes the dependency on $K$ in the leading
regret. As a valuable by-product, the regret analysis used in this paper can
improve several existing results by a factor of $O(\log K)$. Finally,
experimental evaluations show our superior performance compared with benchmark
algorithms in different applications.
[LINK]
http://arxiv.org/abs/2208.14837v3
[DATE]
2024-11-19 01:30:05+08:00
[CATEGORIES]
cs.LG
Revitalizing Electoral Trust: Enhancing Transparency and Efficiency through Automated Voter Counting with Machine Learning
[AUTHORS]
Mir Faris, Syeda Aynul Karim, Md. Juniadul Islam
[ABSTRACT]
In order to address issues with manual vote counting during election
procedures, this study intends to examine the viability of using advanced image
processing techniques for automated voter counting. The study aims to shed
light on how automated systems that utilize cutting-edge technologies like
OpenCV, CVZone, and the MOG2 algorithm could greatly increase the effectiveness
and openness of electoral operations. The empirical findings demonstrate how
automated voter counting can enhance voting processes and rebuild public
confidence in election outcomes, particularly in places where trust is low. The
study also emphasizes how rigorous metrics, such as the F1 score, should be
used to systematically compare the accuracy of automated systems against manual
counting methods. This methodology enables a detailed comprehension of the
differences in performance between automated and human counting techniques by
providing a nuanced assessment. The incorporation of said measures serves to
reinforce an extensive assessment structure, guaranteeing the legitimacy and
dependability of automated voting systems inside the electoral sphere.
[COMMENTS]
13 Pages, 4 Figures
[LINK]
http://arxiv.org/abs/2411.11740v1
[DATE]
2024-11-19 01:10:14+08:00
[CATEGORIES]
cs.LG
Introducing Milabench: Benchmarking Accelerators for AI
[AUTHORS]
Pierre Delaunay, Xavier Bouthillier, Olivier Breuleux, Satya Ortiz-Gagné, Olexa Bilaniuk, Fabrice Normandin, Arnaud Bergeron, Bruno Carrez, Guillaume Alain, Soline Blanc, Frédéric Osterrath, Joseph Viviano, Roger Creus-Castanyer Darshan Patil, Rabiul Awal, Le Zhang
[ABSTRACT]
AI workloads, particularly those driven by deep learning, are introducing
novel usage patterns to high-performance computing (HPC) systems that are not
comprehensively captured by standard HPC benchmarks. As one of the largest
academic research centers dedicated to deep learning, Mila identified the need
to develop a custom benchmarking suite to address the diverse requirements of
its community, which consists of over 1,000 researchers. This report introduces
Milabench, the resulting benchmarking suite. Its design was informed by an
extensive literature review encompassing 867 papers, as well as surveys
conducted with Mila researchers. This rigorous process led to the selection of
26 primary benchmarks tailored for procurement evaluations, alongside 16
optional benchmarks for in-depth analysis. We detail the design methodology,
the structure of the benchmarking suite, and provide performance evaluations
using GPUs from NVIDIA, AMD, and Intel. The Milabench suite is open source and
can be accessed at github.com/mila-iqia/milabench.
[LINK]
http://arxiv.org/abs/2411.11940v1
[DATE]
2024-11-19 01:07:08+08:00
[CATEGORIES]
cs.LG
Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure
[AUTHORS]
Xiang Li, Yixiang Dai, Qing Qu
[ABSTRACT]
In this work, we study the generalizability of diffusion models by looking
into the hidden properties of the learned score functions, which are
essentially a series of deep denoisers trained on various noise levels. We
observe that as diffusion models transition from memorization to
generalization, their corresponding nonlinear diffusion denoisers exhibit
increasing linearity. This discovery leads us to investigate the linear
counterparts of the nonlinear diffusion models, which are a series of linear
models trained to match the function mappings of the nonlinear diffusion
denoisers. Surprisingly, these linear denoisers are approximately the optimal
denoisers for a multivariate Gaussian distribution characterized by the
empirical mean and covariance of the training dataset. This finding implies
that diffusion models have the inductive bias towards capturing and utilizing
the Gaussian structure (covariance information) of the training dataset for
data generation. We empirically demonstrate that this inductive bias is a
unique property of diffusion models in the generalization regime, which becomes
increasingly evident when the model’s capacity is relatively small compared to
the training dataset size. In the case that the model is highly
overparameterized, this inductive bias emerges during the initial training
phases before the model fully memorizes its training data. Our study provides
crucial insights into understanding the notable strong generalization
phenomenon recently observed in real-world diffusion models.
[LINK]
http://arxiv.org/abs/2410.24060v3
[DATE]
2024-11-19 01:04:09+08:00
[CATEGORIES]
cs.LG
Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
[AUTHORS]
Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Bo Du, Dacheng Tao
[ABSTRACT]
Aligning diffusion models with downstream objectives is essential for their
practical applications. However, standard alignment methods often struggle with
step generalization when directly applied to few-step diffusion models, leading
to inconsistent performance across different denoising step scenarios. To
address this, we introduce Stepwise Diffusion Policy Optimization (SDPO), a
novel alignment method tailored for few-step diffusion models. Unlike prior
approaches that rely on a single sparse reward from only the final step of each
denoising trajectory for trajectory-level optimization, SDPO incorporates dense
reward feedback at every intermediate step. By learning the differences in
dense rewards between paired samples, SDPO facilitates stepwise optimization of
few-step diffusion models, ensuring consistent alignment across all denoising
steps. To promote stable and efficient training, SDPO introduces an online
reinforcement learning framework featuring several novel strategies designed to
effectively exploit the stepwise granularity of dense rewards. Experimental
results demonstrate that SDPO consistently outperforms prior methods in
reward-based alignment across diverse step configurations, underscoring its
robust step generalization capabilities. Code is avaliable at
https://github.com/ZiyiZhang27/sdpo.
[LINK]
http://arxiv.org/abs/2411.11727v1
[DATE]
2024-11-19 00:57:41+08:00
[CATEGORIES]
cs.LG
Joint Diffusion models in Continual Learning
[AUTHORS]
Paweł Skierś, Kamil Deja
[ABSTRACT]
In this work, we introduce JDCL - a new method for continual learning with
generative rehearsal based on joint diffusion models. Neural networks suffer
from catastrophic forgetting defined as abrupt loss in the model’s performance
when retrained with additional data coming from a different distribution.
Generative-replay-based continual learning methods try to mitigate this issue
by retraining a model with a combination of new and rehearsal data sampled from
a generative model. In this work, we propose to extend this idea by combining a
continually trained classifier with a diffusion-based generative model into a
single - jointly optimized neural network. We show that such shared
parametrization, combined with the knowledge distillation technique allows for
stable adaptation to new tasks without catastrophic forgetting. We evaluate our
approach on several benchmarks, where it outperforms recent state-of-the-art
generative replay techniques. Additionally, we extend our method to the
semi-supervised continual learning setup, where it outperforms competing
buffer-based replay techniques, and evaluate, in a self-supervised manner, the
quality of trained representations.
[LINK]
http://arxiv.org/abs/2411.08224v2
[DATE]
2024-11-19 00:48:06+08:00
[CATEGORIES]
cs.LG
Exploring Eye Tracking to Detect Cognitive Load in Complex Virtual Reality Training
[AUTHORS]
Mahsa Nasri, Mehmet Kosa, Leanne Chukoskie, Mohsen Moghaddam, Casper Harteveld
[ABSTRACT]
Virtual Reality (VR) has been a beneficial training tool in fields such as
advanced manufacturing. However, users may experience a high cognitive load due
to various factors, such as the use of VR hardware or tasks within the VR
environment. Studies have shown that eye-tracking has the potential to detect
cognitive load, but in the context of VR and complex spatiotemporal tasks
(e.g., assembly and disassembly), it remains relatively unexplored. Here, we
present an ongoing study to detect users’ cognitive load using an
eye-tracking-based machine learning approach. We developed a VR training system
for cold spray and tested it with 22 participants, obtaining 19 valid
eye-tracking datasets and NASA-TLX scores. We applied Multi-Layer Perceptron
(MLP) and Random Forest (RF) models to compare the accuracy of predicting
cognitive load (i.e., NASA-TLX) using pupil dilation and fixation duration. Our
preliminary analysis demonstrates the feasibility of using eye tracking to
detect cognitive load in complex spatiotemporal VR experiences and motivates
further exploration.
[LINK]
http://arxiv.org/abs/2411.12771v1
[DATE]
2024-11-19 00:44:19+08:00
[CATEGORIES]
cs.LG
DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
[AUTHORS]
Sheng Yan, Cunhang fan, Hongyu Zhang, Xiaoke Yang, Jianhua Tao, Zhao Lv
[ABSTRACT]
At a cocktail party, humans exhibit an impressive ability to direct their
attention. The auditory attention detection (AAD) approach seeks to identify
the attended speaker by analyzing brain signals, such as EEG signals. However,
current AAD algorithms overlook the spatial distribution information within EEG
signals and lack the ability to capture long-range latent dependencies,
limiting the model’s ability to decode brain activity. To address these issues,
this paper proposes a dual attention refinement network with spatiotemporal
construction for AAD, named DARNet, which consists of the spatiotemporal
construction module, dual attention refinement module, and feature fusion \&
classifier module. Specifically, the spatiotemporal construction module aims to
construct more expressive spatiotemporal feature representations, by capturing
the spatial distribution characteristics of EEG signals. The dual attention
refinement module aims to extract different levels of temporal patterns in EEG
signals and enhance the model’s ability to capture long-range latent
dependencies. The feature fusion \& classifier module aims to aggregate
temporal patterns and dependencies from different levels and obtain the final
classification results. The experimental results indicate that compared to the
state-of-the-art models, DARNet achieves an average classification accuracy
improvement of 5.9\% for 0.1s, 4.6\% for 1s, and 3.9\% for 2s on the DTU
dataset. While maintaining excellent classification performance, DARNet
significantly reduces the number of required parameters. Compared to the
state-of-the-art models, DARNet reduces the parameter count by 91\%. Code is
available at: https://github.com/fchest/DARNet.git.
[LINK]
http://arxiv.org/abs/2410.11181v2
[DATE]
2024-11-19 00:25:53+08:00
[CATEGORIES]
cs.LG
Partial Information Decomposition for Data Interpretability and Feature Selection
[AUTHORS]
Charles Westphal, Stephen Hailes, Mirco Musolesi
[ABSTRACT]
In this paper, we introduce Partial Information Decomposition of Features
(PIDF), a new paradigm for simultaneous data interpretability and feature
selection. Contrary to traditional methods that assign a single importance
value, our approach is based on three metrics per feature: the mutual
information shared with the target variable, the feature’s contribution to
synergistic information, and the amount of this information that is redundant.
In particular, we develop a novel procedure based on these three metrics, which
reveals not only how features are correlated with the target but also the
additional and overlapping information provided by considering them in
combination with other features. We extensively evaluate PIDF using both
synthetic and real-world data, demonstrating its potential applications and
effectiveness, by considering case studies from genetics and neuroscience.
[LINK]
http://arxiv.org/abs/2405.19212v3
[DATE]
2024-11-19 00:22:41+08:00
[CATEGORIES]
cs.LG
Robust Reinforcement Learning under Diffusion Models for Data with Jumps
[AUTHORS]
Chenyang Jiang, Donggyu Kim, Alejandra Quintos, Yazhen Wang
[ABSTRACT]
Reinforcement Learning (RL) has proven effective in solving complex
decision-making tasks across various domains, but challenges remain in
continuous-time settings, particularly when state dynamics are governed by
stochastic differential equations (SDEs) with jump components. In this paper,
we address this challenge by introducing the Mean-Square Bipower Variation
Error (MSBVE) algorithm, which enhances robustness and convergence in scenarios
involving significant stochastic noise and jumps. We first revisit the
Mean-Square TD Error (MSTDE) algorithm, commonly used in continuous-time RL,
and highlight its limitations in handling jumps in state dynamics. The proposed
MSBVE algorithm minimizes the mean-square quadratic variation error, offering
improved performance over MSTDE in environments characterized by SDEs with
jumps. Simulations and formal proofs demonstrate that the MSBVE algorithm
reliably estimates the value function in complex settings, surpassing MSTDE’s
performance when faced with jump processes. These findings underscore the
importance of alternative error metrics to improve the resilience and
effectiveness of RL algorithms in continuous-time frameworks.
[LINK]
http://arxiv.org/abs/2411.11697v1
[DATE]
2024-11-19 00:17:34+08:00
[CATEGORIES]
cs.LG
Value Imprint: A Technique for Auditing the Human Values Embedded in RLHF Datasets
[AUTHORS]
Ike Obi, Rohan Pant, Srishti Shekhar Agrawal, Maham Ghazanfar, Aaron Basiletti
[ABSTRACT]
LLMs are increasingly fine-tuned using RLHF datasets to align them with human
preferences and values. However, very limited research has investigated which
specific human values are operationalized through these datasets. In this
paper, we introduce Value Imprint, a framework for auditing and classifying the
human values embedded within RLHF datasets. To investigate the viability of
this framework, we conducted three case study experiments by auditing the
Anthropic/hh-rlhf, OpenAI WebGPT Comparisons, and Alpaca GPT-4-LLM datasets to
examine the human values embedded within them. Our analysis involved a
two-phase process. During the first phase, we developed a taxonomy of human
values through an integrated review of prior works from philosophy, axiology,
and ethics. Then, we applied this taxonomy to annotate 6,501 RLHF preferences.
During the second phase, we employed the labels generated from the annotation
as ground truth data for training a transformer-based machine learning model to
audit and classify the three RLHF datasets. Through this approach, we
discovered that information-utility values, including Wisdom/Knowledge and
Information Seeking, were the most dominant human values within all three RLHF
datasets. In contrast, prosocial and democratic values, including Well-being,
Justice, and Human/Animal Rights, were the least represented human values.
These findings have significant implications for developing language models
that align with societal values and norms. We contribute our datasets to
support further research in this area.
[LINK]
http://arxiv.org/abs/2411.11937v1
[DATE]
2024-11-19 00:12:24+08:00
[CATEGORIES]
cs.LG
Learning Differentiable Surrogate Losses for Structured Prediction
[AUTHORS]
Junjie Yang, Matthieu Labeau, Florence d’Alché-Buc
[ABSTRACT]
Structured prediction involves learning to predict complex structures rather
than simple scalar values. The main challenge arises from the non-Euclidean
nature of the output space, which generally requires relaxing the problem
formulation. Surrogate methods build on kernel-induced losses or more
generally, loss functions admitting an Implicit Loss Embedding, and convert the
original problem into a regression task followed by a decoding step. However,
designing effective losses for objects with complex structures presents
significant challenges and often requires domain-specific expertise. In this
work, we introduce a novel framework in which a structured loss function,
parameterized by neural networks, is learned directly from output training data
through Contrastive Learning, prior to addressing the supervised surrogate
regression problem. As a result, the differentiable loss not only enables the
learning of neural networks due to the finite dimension of the surrogate space
but also allows for the prediction of new structures of the output data via a
decoding strategy based on gradient descent. Numerical experiments on
supervised graph prediction problems show that our approach achieves similar or
even better performance than methods based on a pre-defined kernel.
[LINK]
http://arxiv.org/abs/2411.11682v1
[DATE]
2024-11-19 00:07:47+08:00
[CATEGORIES]
cs.LG
Retrieval-Augmented Personalization for Multimodal Large Language Models
[AUTHORS]
Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue
[ABSTRACT]
The development of large language models (LLMs) has significantly enhanced
the capabilities of multimodal LLMs (MLLMs) as general assistants. However,
lack of user-specific knowledge still restricts their application in human’s
daily life. In this paper, we introduce the Retrieval Augmented Personalization
(RAP) framework for MLLMs’ personalization. Starting from a general MLLM, we
turn it into a personalized assistant in three steps. (a) Remember: We design a
key-value database to store user-related information, e.g., user’s name, avatar
and other attributes. (b) Retrieve: When the user initiates a conversation, RAP
will retrieve relevant information from the database using a multimodal
retriever. (c) Generate: The input query and retrieved concepts’ information
are fed into MLLMs to generate personalized, knowledge-augmented responses.
Unlike previous methods, RAP allows real-time concept editing via updating the
external database. To further improve generation quality and alignment with
user-specific information, we design a pipeline for data collection and create
a specialized dataset for personalized training of MLLMs. Based on the dataset,
we train a series of MLLMs as personalized multimodal assistants. By
pretraining on large-scale dataset, RAP-MLLMs can generalize to infinite visual
concepts without additional finetuning. Our models demonstrate outstanding
flexibility and generation quality across a variety of tasks, such as
personalized image captioning, question answering and visual recognition. The
code, data and models are available at https://github.com/Hoar012/RAP-MLLM.
[LINK]
http://arxiv.org/abs/2410.13360v2
[DATE]
2024-11-18 23:35:14+08:00
[CATEGORIES]
cs.CL
cs.LG
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents
[AUTHORS]
Emanuela Boros, Maud Ehrmann
[ABSTRACT]
This paper investigates the presence of OCR-sensitive neurons within the
Transformer architecture and their influence on named entity recognition (NER)
performance on historical documents. By analysing neuron activation patterns in
response to clean and noisy text inputs, we identify and then neutralise
OCR-sensitive neurons to improve model performance. Based on two open access
large language models (Llama2 and Mistral), experiments demonstrate the
existence of OCR-sensitive regions and show improvements in NER performance on
historical newspapers and classical commentaries, highlighting the potential of
targeted neuron modulation to improve models’ performance on noisy text.
[LINK]
http://arxiv.org/abs/2409.16934v3
[DATE]
2024-11-18 23:22:32+08:00
[CATEGORIES]
cs.CL
Chapter 7 Review of Data-Driven Generative AI Models for Knowledge Extraction from Scientific Literature in Healthcare
[AUTHORS]
Leon Kopitar, Primoz Kocbek, Lucija Gosak, Gregor Stiglic
[ABSTRACT]
This review examines the development of abstractive NLP-based text
summarization approaches and compares them to existing techniques for
extractive summarization. A brief history of text summarization from the 1950s
to the introduction of pre-trained language models such as Bidirectional
Encoder Representations from Transformer (BERT) and Generative Pre-training
Transformers (GPT) are presented. In total, 60 studies were identified in
PubMed and Web of Science, of which 29 were excluded and 24 were read and
evaluated for eligibility, resulting in the use of seven studies for further
analysis. This chapter also includes a section with examples including an
example of a comparison between GPT-3 and state-of-the-art GPT-4 solutions in
scientific text summarisation. Natural language processing has not yet reached
its full potential in the generation of brief textual summaries. As there are
acknowledged concerns that must be addressed, we can expect gradual
introduction of such models in practise.
[COMMENTS]
16 pages, 5 figures, 1 table
[LINK]
http://arxiv.org/abs/2411.11635v1
[DATE]
2024-11-18 23:13:47+08:00
[CATEGORIES]
cs.CL
Federated Incremental Named Entity Recognition
[AUTHORS]
Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dong Yu
[ABSTRACT]
Federated Named Entity Recognition (FNER) boosts model training within each
local client by aggregating the model updates of decentralized local clients,
without sharing their private data. However, existing FNER methods assume fixed
entity types and local clients in advance, leading to their ineffectiveness in
practical applications. In a more realistic scenario, local clients receive new
entity types continuously, while new local clients collecting novel data may
irregularly join the global FNER training. This challenging setup, referred to
here as Federated Incremental NER, renders the global model suffering from
heterogeneous forgetting of old entity types from both intra-client and
inter-client perspectives. To overcome these challenges, we propose a
Local-Global Forgetting Defense (LGFD) model. Specifically, to address
intra-client forgetting, we develop a structural knowledge distillation loss to
retain the latent space’s feature structure and a pseudo-label-guided
inter-type contrastive loss to enhance discriminative capability over different
entity types, effectively preserving previously learned knowledge within local
clients. To tackle inter-client forgetting, we propose a task switching monitor
that can automatically identify new entity types under privacy protection and
store the latest old global model for knowledge distillation and
pseudo-labeling. Experiments demonstrate significant improvement of our LGFD
model over comparison methods.
[COMMENTS]
Under Review
[LINK]
http://arxiv.org/abs/2411.11623v1
[DATE]
2024-11-18 22:53:53+08:00
[CATEGORIES]
cs.CL
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
[AUTHORS]
Philipp Allgeuer, Kyra Ahrens, Stefan Wermter
[ABSTRACT]
We introduce NOVIC, an innovative real-time uNconstrained Open Vocabulary
Image Classifier that uses an autoregressive transformer to generatively output
classification labels as language. Leveraging the extensive knowledge of CLIP
models, NOVIC harnesses the embedding space to enable zero-shot transfer from
pure text to images. Traditional CLIP models, despite their ability for open
vocabulary classification, require an exhaustive prompt of potential class
labels, restricting their application to images of known content or context. To
address this, we propose an “object decoder” model that is trained on a
large-scale 92M-target dataset of templated object noun sets and LLM-generated
captions to always output the object noun in question. This effectively inverts
the CLIP text encoder and allows textual object labels from essentially the
entire English language to be generated directly from image-derived embedding
vectors, without requiring any a priori knowledge of the potential content of
an image, and without any label biases. The trained decoders are tested on a
mix of manually and web-curated datasets, as well as standard image
classification benchmarks, and achieve fine-grained prompt-free prediction
scores of up to 87.5%, a strong result considering the model must work for any
conceivable image and without any contextual clues.
[COMMENTS]
Published at WACV 2025
[LINK]
http://arxiv.org/abs/2407.11211v3
[DATE]
2024-11-18 22:43:38+08:00
[CATEGORIES]
cs.CL
Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
[AUTHORS]
Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West
[ABSTRACT]
A central question in multilingual language modeling is whether large
language models (LLMs) develop a universal concept representation, disentangled
from specific languages. In this paper, we address this question by analyzing
latent representations (latents) during a word translation task in
transformer-based LLMs. We strategically extract latents from a source
translation prompt and insert them into the forward pass on a target
translation prompt. By doing so, we find that the output language is encoded in
the latent at an earlier layer than the concept to be translated. Building on
this insight, we conduct two key experiments. First, we demonstrate that we can
change the concept without changing the language and vice versa through
activation patching alone. Second, we show that patching with the mean over
latents across different languages does not impair and instead improves the
models’ performance in translating the concept. Our results provide evidence
for the existence of language-agnostic concept representations within the
investigated models.
[COMMENTS]
12 pages, 10 figures, previous version published under the title “How
Do Llamas Process Multilingual Text? A Latent Exploration through Activation
Patching” at the ICML 2024 mechanistic interpretability workshop at
https://openreview.net/forum?id=0ku2hIm4BS
[LINK]
http://arxiv.org/abs/2411.08745v2
[DATE]
2024-11-18 22:41:38+08:00
[CATEGORIES]
cs.CL
BertaQA: How Much Do Language Models Know About Local Culture?
[AUTHORS]
Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe
[COMMENTS]
NEURIPS Datasets & Benchmarks 2024
[LINK]
http://arxiv.org/abs/2406.07302v2
[DATE]
2024-11-18 22:40:54+08:00
[CATEGORIES]
cs.CL
cs.LG
OASIS: Open Agents Social Interaction Simulations on One Million Agents
[AUTHORS]
Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao
[ABSTRACT]
There has been a growing interest in enhancing rule-based agent-based models
(ABMs) for social media platforms (\emph{i.e.}, X, Reddit) with more realistic
large language model (LLM) agents, thereby allowing for a more nuanced study of
complex systems. As a result, several LLM-based ABMs have been proposed in the
past year. While they hold promise, each simulator is specifically designed to
study a particular scenario, making it time-consuming and resource-intensive to
explore other phenomena using the same ABM. Additionally, these models simulate
only a limited number of agents, whereas real-world social media platforms
involve millions of users. To this end, we propose OASIS, a generalizable and
scalable social media simulator. OASIS is designed based on real-world social
media platforms, incorporating dynamically updated environments (\emph{i.e.},
dynamic social networks and post information), diverse action spaces
(\emph{i.e.}, following, commenting), and recommendation systems (\emph{i.e.},
interest-based and hot-score-based). Additionally, OASIS supports large-scale
user simulations, capable of modeling up to one million users. With these
features, OASIS can be easily extended to different social media platforms to
study large-scale group phenomena and behaviors. We replicate various social
phenomena, including information spreading, group polarization, and herd
effects across X and Reddit platforms. Moreover, we provide observations of
social phenomena at different agent group scales. We observe that the larger
agent group scale leads to more enhanced group dynamics and more diverse and
helpful agents’ opinions. These findings demonstrate OASIS’s potential as a
powerful tool for studying complex systems in digital environments.
[LINK]
http://arxiv.org/abs/2411.11581v1
[DATE]
2024-11-18 21:57:35+08:00
[CATEGORIES]
cs.CL
Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach
[AUTHORS]
Gideon Yoffe, Nachum Dershowitz, Ariel Vishne, Barak Sober
[ABSTRACT]
Stylometry aims to distinguish authors by analyzing literary traits assumed
to reflect semi-conscious choices distinct from elements like genre or theme.
However, these components often overlap, complicating text classification based
solely on feature distributions. While some literary properties, such as
thematic content, are likely to manifest as correlations between adjacent text
units, others, like authorial style, may be independent thereof. We introduce a
hypothesis-testing approach to evaluate the influence of sequentially
correlated literary properties on text classification, aiming to determine when
these correlations drive classification. Using a multivariate binary
distribution, our method models sequential correlations between text units as a
stochastic process, assessing the likelihood of clustering across varying
adjacency scales. This enables us to examine whether classification is
dominated by sequentially correlated properties or remains independent. In
experiments on a diverse English prose corpus, our analysis integrates
traditional and neural embeddings within supervised and unsupervised
frameworks. Results demonstrate that our approach effectively identifies when
textual classification is not primarily influenced by sequentially correlated
literary properties, particularly in cases where texts differ in authorial
style or genre rather than by a single author within a similar genre.
[LINK]
http://arxiv.org/abs/2411.04950v3
[DATE]
2024-11-18 21:15:59+08:00
[CATEGORIES]
cs.CL
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
[AUTHORS]
Runchuan Zhu, Zhipeng Ma, Jiang Wu, Junyuan Gao, Jiaqi Wang, Dahua Lin, Conghui He
[ABSTRACT]
Refusal-Aware Instruction Tuning (RAIT) enables Large Language Models (LLMs)
to refuse to answer unknown questions. By modifying responses of unknown
questions in the training data to refusal responses such as “I don’t know”,
RAIT enhances the reliability of LLMs and reduces their hallucination.
Generally, RAIT modifies training samples based on the correctness of the
initial LLM’s response. However, this crude approach can cause LLMs to
excessively refuse answering questions they could have correctly answered, the
problem we call over-refusal. In this paper, we explore two primary causes of
over-refusal: Static conflict occurs when similar samples within the LLM’s
feature space receive differing supervision signals (original vs. modified “I
don’t know”). Dynamic conflict, on the other hand, emerges as the LLM’s
knowledge evolves during SFT, allowing it to answer questions that were
previously unanswerable. Yet, these now-answerable training samples still
retain the original “I don’t know” supervision signals based on the initial LLM
state, resulting in inconsistencies. These conflicts cause the trained LLM to
misclassify known questions as unknown, resulting in over-refusal. To address
this issue, we introduce Certainty Represented Knowledge Flow for Refusal-Aware
Instructions Tuning (CRaFT). CRaFT centers on two main contributions: First, we
additionally incorporate response certainty to selectively filter and modify
data, reducing static conflicts. Second, we implement preliminary rehearsal
training to characterize changes in the LLM’s knowledge state, which helps
mitigate dynamic conflicts during the fine-tuning process. We conducted
extensive experiments on open-ended question answering and multiple-choice
question task. Experiment results show that CRaFT can improve LLM’s overall
performance during the RAIT process. Source code and training data will be
released at Github.
[COMMENTS]
Equal contribution: Runchuan Zhu, Zhipeng Ma, Jiang Wu; Corresponding
author: Conghui He
[LINK]
http://arxiv.org/abs/2410.06913v2
[DATE]
2024-11-18 21:15:41+08:00
[CATEGORIES]
cs.CL
A Complete Survey on LLM-based AI Chatbots
[AUTHORS]
Sumit Kumar Dam, Choong Seon Hong, Yu Qiao, Chaoning Zhang
[ABSTRACT]
The past few decades have witnessed an upsurge in data, forming the
foundation for data-hungry, learning-based AI technology. Conversational
agents, often referred to as AI chatbots, rely heavily on such data to train
large language models (LLMs) and generate new content (knowledge) in response
to user prompts. With the advent of OpenAI’s ChatGPT, LLM-based chatbots have
set new standards in the AI community. This paper presents a complete survey of
the evolution and deployment of LLM-based chatbots in various sectors. We first
summarize the development of foundational chatbots, followed by the evolution
of LLMs, and then provide an overview of LLM-based chatbots currently in use
and those in the development phase. Recognizing AI chatbots as tools for
generating new knowledge, we explore their diverse applications across various
industries. We then discuss the open challenges, considering how the data used
to train the LLMs and the misuse of the generated knowledge can cause several
issues. Finally, we explore the future outlook to augment their efficiency and
reliability in numerous applications. By addressing key milestones and the
present-day context of LLM-based chatbots, our survey invites readers to delve
deeper into this realm, reflecting on how their next generation will reshape
conversational AI.
[COMMENTS]
23 pages, 10 figures
[LINK]
http://arxiv.org/abs/2406.16937v2
[DATE]
2024-11-18 20:36:13+08:00
[CATEGORIES]
cs.CL
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
[AUTHORS]
Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin
[LINK]
http://arxiv.org/abs/2411.11504v1
[DATE]
2024-11-18 20:04:52+08:00
[CATEGORIES]
cs.CL
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding
[AUTHORS]
Ukyo Honda, Tatsushi Oka, Peinan Zhang, Masato Mita
[ABSTRACT]
Recent models for natural language understanding are inclined to exploit
simple patterns in datasets, commonly known as shortcuts. These shortcuts hinge
on spurious correlations between labels and latent features existing in the
training data. At inference time, shortcut-dependent models are likely to
generate erroneous predictions under distribution shifts, particularly when
some latent features are no longer correlated with the labels. To avoid this,
previous studies have trained models to eliminate the reliance on shortcuts. In
this study, we explore a different direction: pessimistically aggregating the
predictions of a mixture-of-experts, assuming each expert captures relatively
different latent features. The experimental results demonstrate that our
post-hoc control over the experts significantly enhances the model’s robustness
to the distribution shift in shortcuts. Besides, we show that our approach has
some practical advantages. We also analyze our model and provide results to
support the assumption.
[COMMENTS]
21 pages, 5 figures (the layout differs from the MIT Press
publication version)
[LINK]
http://arxiv.org/abs/2406.12060v3
[DATE]
2024-11-18 19:51:38+08:00
[CATEGORIES]
cs.CL
cs.LG
Exploring Context Window of Large Language Models via Decomposed Positional Vectors
[AUTHORS]
Zican Dong, Junyi Li, Xin Men, Wayne Xin Zhao, Bingbing Wang, Zhen Tian, Weipeng Chen, Ji-Rong Wen
[ABSTRACT]
Transformer-based large language models (LLMs) typically have a limited
context window, resulting in significant performance degradation when
processing text beyond the length of the context window. Extensive studies have
been proposed to extend the context window and achieve length extrapolation of
LLMs, but there is still a lack of in-depth interpretation of these approaches.
In this study, we explore the positional information within and beyond the
context window for deciphering the underlying mechanism of LLMs. By using a
mean-based decomposition method, we disentangle positional vectors from hidden
states of LLMs and analyze their formation and effect on attention.
Furthermore, when texts exceed the context window, we analyze the change of
positional vectors in two settings, i.e., direct extrapolation and context
window extension. Based on our findings, we design two training-free context
window extension methods, positional vector replacement and attention window
extension. Experimental results show that our methods can effectively extend
the context window length.
[COMMENTS]
Accepted by Neurips 2024 as a spotlight
[LINK]
http://arxiv.org/abs/2405.18009v2
[DATE]
2024-11-18 19:15:56+08:00
[CATEGORIES]
cs.CL
cs.LG
Re-examining learning linear functions in context
[AUTHORS]
Omar Naim, Guilhem Fouilhé, Nicholas Asher
[ABSTRACT]
In context learning (ICL) is an attractive method of solving a wide range of
problems. Inspired by Garg et al. (2022), we look closely at ICL in a variety
of train and test settings for several transformer models of different sizes
trained from scratch. Our study complements prior work by pointing out several
systematic failures of these models to generalize to data not in the training
distribution, thereby showing some limitations of ICL. We find that models
adopt a strategy for this task that is very different from standard solutions.
[LINK]
http://arxiv.org/abs/2411.11465v1
[DATE]
2024-11-18 18:58:46+08:00
[CATEGORIES]
cs.LG
cs.CL
Towards Evaluating Large Language Models for Graph Query Generation
[AUTHORS]
Siraj Munir, Alessandro Aldini
[ABSTRACT]
Large Language Models (LLMs) are revolutionizing the landscape of Generative
Artificial Intelligence (GenAI), with innovative LLM-backed solutions emerging
rapidly. However, when applied to database technologies, specifically query
generation for graph databases and Knowledge Graphs (KGs), LLMs still face
significant challenges. While research on LLM-driven query generation for
Structured Query Language (SQL) exists, similar systems for graph databases
remain underdeveloped. This paper presents a comparative study addressing the
challenge of generating Cypher queries a powerful language for interacting with
graph databases using open-access LLMs. We rigorously evaluate several LLM
agents (OpenAI ChatGPT 4o, Claude Sonnet 3.5, Google Gemini Pro 1.5, and a
locally deployed Llama 3.1 8B) using a designed few-shot learning prompt and
Retrieval Augmented Generation (RAG) backed by Chain-of-Thoughts (CoT)
reasoning. Our empirical analysis of query generation accuracy reveals that
Claude Sonnet 3.5 outperforms its counterparts in this specific domain.
Further, we highlight promising future research directions to address the
identified limitations and advance LLM-driven query generation for graph
databases.
[COMMENTS]
Paper accepted and will be presented at CSCI2024 in December 2024,
Later will be published at Springer LNCS
[LINK]
http://arxiv.org/abs/2411.08449v2
[DATE]
2024-11-18 17:57:04+08:00
[CATEGORIES]
cs.CL
Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts
[AUTHORS]
Xianzhen Luo, Qingfu Zhu, Zhiming Zhang, Libo Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che
[ABSTRACT]
Program of Thoughts (PoT) is an approach characterized by its executable
intermediate steps, which ensure the accuracy of the logical calculations in
the reasoning process. Currently, PoT primarily uses Python. However, relying
solely on a single language may result in suboptimal solutions and overlook the
potential benefits of other programming languages. In this paper, we conduct
comprehensive experiments on the programming languages used in PoT and find
that no single language consistently delivers optimal performance across all
tasks and models. The effectiveness of each language varies depending on the
specific scenarios. Inspired by this, we propose a task and model agnostic
approach called MultiPoT, which harnesses strength and diversity from various
languages. Experimental results reveal that it significantly outperforms Python
Self-Consistency. Furthermore, it achieves comparable or superior performance
compared to the best monolingual PoT in almost all tasks across all models. In
particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT
(gpt-3.5-turbo-0701).
[COMMENTS]
Accepted by EMNLP 2024. Code and data are released at
https://github.com/Luowaterbi/MultiPoT
[LINK]
http://arxiv.org/abs/2402.10691v4
[DATE]
2024-11-18 17:53:03+08:00
[CATEGORIES]
cs.CL
Membership Inference Attack against Long-Context Large Language Models
[AUTHORS]
Zixiong Wang, Gaoyang Liu, Yang Yang, Chen Wang
[ABSTRACT]
Recent advances in Large Language Models (LLMs) have enabled them to overcome
their context window limitations, and demonstrate exceptional retrieval and
reasoning capacities on longer context. Quesion-answering systems augmented
with Long-Context Language Models (LCLMs) can automatically search massive
external data and incorporate it into their contexts, enabling faithful
predictions and reducing issues such as hallucinations and knowledge staleness.
Existing studies targeting LCLMs mainly concentrate on addressing the so-called
lost-in-the-middle problem or improving the inference effiencicy, leaving their
privacy risks largely unexplored. In this paper, we aim to bridge this gap and
argue that integrating all information into the long context makes it a
repository of sensitive information, which often contains private data such as
medical records or personal identities. We further investigate the membership
privacy within LCLMs external context, with the aim of determining whether a
given document or sequence is included in the LCLMs context. Our basic idea is
that if a document lies in the context, it will exhibit a low generation loss
or a high degree of semantic similarity to the contents generated by LCLMs. We
for the first time propose six membership inference attack (MIA) strategies
tailored for LCLMs and conduct extensive experiments on various popular models.
Empirical results demonstrate that our attacks can accurately infer membership
status in most cases, e.g., 90.66% attack F1-score on Multi-document QA
datasets with LongChat-7b-v1.5-32k, highlighting significant risks of
membership leakage within LCLMs input contexts. Furthermore, we examine the
underlying reasons why LCLMs are susceptible to revealing such membership
information.
[LINK]
http://arxiv.org/abs/2411.11424v1
[DATE]
2024-11-18 17:50:54+08:00
[CATEGORIES]
cs.CL
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
[AUTHORS]
Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Boxing Chen, Hao Yang, Bei Li, Tong Xiao, Jingbo Zhu
[ABSTRACT]
With contributions from the open-source community, a vast amount of
instruction tuning (IT) data has emerged. Given the significant resource
allocation required for training and evaluating models, it is advantageous to
have an efficient method for selecting high-quality IT data. However, existing
methods for instruction data selection have limitations such as relying on
fragile external APIs, being affected by biases in GPT models, or reducing the
diversity of the selected instruction dataset. In this paper, we propose an
industrial-friendly, expert-aligned and diversity-preserved instruction data
selection method: Clustering and Ranking (CaR). CaR employs a two-step process:
first, it ranks instruction pairs using a high-accuracy (84.25%) scoring model
aligned with expert preferences; second, it preserves dataset diversity through
clustering. In our experiment, CaR efficiently selected a mere 1.96% of
Alpaca’s IT data, yet the resulting AlpaCaR model surpassed Alpaca’s
performance by an average of 32.1% in GPT-4 evaluations. Moreover, we find that
data selecting is a consistent paradigm whether the pre-trained model is more
capable or the model parameters scaling up. Our approach employs compact models
with 550M parameters and incurs just 11.2% of the financial outlay of current
methods, enhancing its industrial deployability.
[COMMENTS]
Accepted by EMNLP2024
[LINK]
http://arxiv.org/abs/2402.18191v3
[DATE]
2024-11-18 17:26:51+08:00
[CATEGORIES]
cs.CL
Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond
[AUTHORS]
Zhiyuan Wang, Jinhao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Yue Zhang, Ren Wang, Xiaoshuang Shi, Kaidi Xu
[ABSTRACT]
Uncertainty estimation is crucial for the reliability of safety-critical
human and artificial intelligence (AI) interaction systems, particularly in the
domain of healthcare engineering. However, a robust and general uncertainty
measure for free-form answers has not been well-established in open-ended
medical question-answering (QA) tasks, where generative inequality introduces a
large number of irrelevant words and sequences within the generated set for
uncertainty quantification (UQ), which can lead to biases. This paper
introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at
both the word and sequence levels, considering semantic relevance. WSE
quantifies uncertainty in a way that is more closely aligned with the
reliability of LLMs during uncertainty quantification (UQ). We compare WSE with
six baseline methods on five free-form medical QA datasets, utilizing seven
popular large language models (LLMs). Experimental results demonstrate that WSE
exhibits superior performance in UQ under two standard criteria for correctness
evaluation. Additionally, in terms of real-world medical QA applications, the
performance of LLMs is significantly enhanced (e.g., a 6.36% improvement in
model accuracy on the COVID-QA dataset) by employing responses with lower
uncertainty that are identified by WSE as final answers, without any additional
task-specific fine-tuning or architectural modifications.
[COMMENTS]
Accepted by Engineering Applications of Artificial Intelligence
[LINK]
http://arxiv.org/abs/2402.14259v2
[DATE]
2024-11-18 17:19:25+08:00
[CATEGORIES]
cs.CL
cs.LG
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
[AUTHORS]
Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Xiaoshuang Shi, Kaidi Xu, Hengtao Shen, Xiaofeng Zhu
[ABSTRACT]
Uncertainty quantification (UQ) in natural language generation (NLG) tasks
remains an open challenge, exacerbated by the closed-source nature of the
latest large language models (LLMs). This study investigates applying conformal
prediction (CP), which can transform any heuristic uncertainty notion into
rigorous prediction sets, to black-box LLMs in open-ended NLG tasks. We
introduce a novel uncertainty measure based on self-consistency theory, and
then develop a conformal uncertainty criterion by integrating the uncertainty
condition aligned with correctness into the CP algorithm. Empirical evaluations
indicate that our uncertainty measure outperforms prior state-of-the-art
methods. Furthermore, we achieve strict control over the correctness coverage
rate utilizing 7 popular LLMs on 4 free-form NLG datasets, spanning
general-purpose and medical scenarios. Additionally, the calibrated prediction
sets with small size further highlights the efficiency of our method in
providing trustworthy guarantees for practical open-ended NLG applications.
[COMMENTS]
Accepted by EMNLP 2024 Findings
[LINK]
http://arxiv.org/abs/2407.00499v3
[DATE]
2024-11-18 16:33:35+08:00
[CATEGORIES]
cs.CL
cs.LG
MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models
[AUTHORS]
Harshita Sharma, Valentina Salvatelli, Shaury Srivastav, Kenza Bouzid, Shruthi Bannur, Daniel C. Castro, Maximilian Ilse, Sam Bond-Taylor, Mercy Prasanna Ranjit, Fabian Falck, Fernando Pérez-García, Anton Schwaighofer, Hannah Richardson, Maria Teodora Wetscherek, Stephanie L. Hyland, Javier Alvarez-Valle
[ABSTRACT]
There is growing interest in applying AI to radiology report generation,
particularly for chest X-rays (CXRs). This paper investigates whether
incorporating pixel-level information through segmentation masks can improve
fine-grained image interpretation of multimodal large language models (MLLMs)
for radiology report generation. We introduce MAIRA-Seg, a segmentation-aware
MLLM framework designed to utilize semantic segmentation masks alongside CXRs
for generating radiology reports. We train expert segmentation models to obtain
mask pseudolabels for radiology-specific structures in CXRs. Subsequently,
building on the architectures of MAIRA, a CXR-specialised model for report
generation, we integrate a trainable segmentation tokens extractor that
leverages these mask pseudolabels, and employ mask-aware prompting to generate
draft radiology reports. Our experiments on the publicly available MIMIC-CXR
dataset show that MAIRA-Seg outperforms non-segmentation baselines. We also
investigate set-of-marks prompting with MAIRA and find that MAIRA-Seg
consistently demonstrates comparable or superior performance. The results
confirm that using segmentation masks enhances the nuanced reasoning of MLLMs,
potentially contributing to better clinical outcomes.
[COMMENTS]
Accepted as Proceedings Paper at ML4H 2024
[LINK]
http://arxiv.org/abs/2411.11362v1
[DATE]
2024-11-18 16:13:22+08:00
[CATEGORIES]
cs.CL
Semantic Operators: A Declarative Model for Rich, AI-based Analytics Over Text Data
[AUTHORS]
Liana Patel, Siddharth Jha, Parth Asawa, Melissa Pan, Carlos Guestrin, Matei Zaharia
[ABSTRACT]
The semantic capabilities of language models (LMs) have the potential to
enable rich analytics and reasoning over vast knowledge corpora. Unfortunately,
existing systems lack high-level abstractions to perform bulk semantic queries
across large corpora. We introduce semantic operators, a declarative
programming interface that extends the relational model with composable
AI-based operations for bulk semantic queries (e.g., filtering, sorting,
joining or aggregating records using natural language criteria). Each operator
can be implemented and optimized in multiple ways, opening a rich space for
execution plans similar to relational operators. We implement our operators in
LOTUS, an open source query engine with a DataFrame API. Furthermore, we
develop several novel optimizations that take advantage of the declarative
nature of semantic operators to accelerate semantic filtering, clustering and
join operators by up to $400\times$ while offering statistical accuracy
guarantees. We demonstrate LOTUS’ effectiveness on real AI applications
including fact-checking, extreme multi-label classification, and search. We
show that the semantic operator model is expressive, capturing state-of-the-art
AI pipelines in a few operator calls, and making it easy to express new
pipelines that achieve up to $180\%$ higher quality. Overall, LOTUS queries
match or exceed the accuracy of state-of-the-art AI pipelines for each task
while running up to 28$\times$ faster. LOTUS is publicly available at
https://github.com/stanford-futuredata/lotus.
[LINK]
http://arxiv.org/abs/2407.11418v2
[DATE]
2024-11-18 16:01:24+08:00
[CATEGORIES]
cs.CL
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
[AUTHORS]
Nay Myat Min, Long H. Pham, Yige Li, Jun Sun
[ABSTRACT]
Recent studies reveal that Large Language Models (LLMs) are susceptible to
backdoor attacks, where adversaries embed hidden triggers that manipulate model
responses. Existing backdoor defense methods are primarily designed for vision
or classification tasks, and are thus ineffective for text generation tasks,
leaving LLMs vulnerable. We introduce Internal Consistency Regularization
(CROW), a novel defense using consistency regularization finetuning to address
layer-wise inconsistencies caused by backdoor triggers. CROW leverages the
intuition that clean models exhibit smooth, consistent transitions in hidden
representations across layers, whereas backdoored models show noticeable
fluctuation when triggered. By enforcing internal consistency through
adversarial perturbations and regularization, CROW neutralizes backdoor effects
without requiring clean reference models or prior trigger knowledge, relying
only on a small set of clean data. This makes it practical for deployment
across various LLM architectures. Experimental results demonstrate that CROW
consistently achieves a significant reductions in attack success rates across
diverse backdoor strategies and tasks, including negative sentiment, targeted
refusal, and code injection, on models such as Llama-2 (7B, 13B), CodeLlama
(7B, 13B) and Mistral-7B, while preserving the model’s generative capabilities.
[LINK]
http://arxiv.org/abs/2411.12768v1
[DATE]
2024-11-18 15:52:12+08:00
[CATEGORIES]
cs.CL
cs.LG
The why, what, and how of AI-based coding in scientific research
[AUTHORS]
Tonghe Zhuang, Zhicheng Lin
[ABSTRACT]
Computer programming (coding) is indispensable for researchers across
disciplines, yet it remains challenging to learn and time-consuming to carry
out. Generative AI, particularly large language models (LLMs), has the
potential to transform coding into intuitive conversations, but best practices
and effective workflows are only emerging. We dissect AI-based coding through
three key lenses: the nature and role of LLMs in coding (why), six types of
coding assistance they provide (what), and a five-step workflow in action with
practical implementation strategies (how). Additionally, we address the
limitations and future outlook of AI in coding. By offering actionable
insights, this framework helps to guide researchers in effectively leveraging
AI to enhance coding practices and education, accelerating scientific progress.
[COMMENTS]
23 pages, 7 figure, 3 boxes
[LINK]
http://arxiv.org/abs/2410.02156v2
[DATE]
2024-11-18 15:36:36+08:00
[CATEGORIES]
cs.CL
Mitigating Knowledge Conflicts in Language Model-Driven Question Answering
[AUTHORS]
Han Cao, Zhaoyang Zhang, Xiangtian Li, Chufan Wu, Hansong Zhang, Wenqing Zhang
[ABSTRACT]
Knowledge-aware sequence to sequence generation tasks such as document
question answering and abstract summarization typically requires two types of
knowledge: encoded parametric knowledge and retrieved contextual information.
Previous work show improper correlation between parametric knowledge and
answers in the training set could cause the model ignore input information at
test time, resulting in un-desirable model behaviour such as over-stability and
hallucination. In this work, we argue that hallucination could be mitigated via
explicit correlation between input source and generated content. We focus on a
typical example of hallucination, entity-based knowledge conflicts in question
answering, where correlation of entities and their description at training time
hinders model behaviour during inference.
[LINK]
http://arxiv.org/abs/2411.11344v1
[DATE]
2024-11-18 15:33:10+08:00
[CATEGORIES]
cs.CL
Targeted Efficient Fine-tuning: Optimizing Parameter Updates with Data-Driven Sample Selection
[AUTHORS]
Ming Dong, Kang Xue, Bolong Zheng, Tingting He
[ABSTRACT]
Fine-tuning all parameters of Large Language Models (LLMs) is computationally
expensive. Parameter-Efficient Fine-Tuning (PEFT) methods address this by
selectively fine-tuning specific parameters. Most of the parameter efficient
fine-tuning (PEFT) methods center on selecting or introducing a set of
parameters to be fine-tuned. However, there are few methods that consider the
impact of data samples on parameter selecting. Representative data driven
methods include FISH Mask based method, which randomly selects a portion of
data samples as a basis when selecting parameters. However, this random data
sample selection method cannot select optimal parameters for unstable data
distribution. In this work, we introduce a data-centric approach and propose
the Iterative Range Decreasing (IRD) algorithm to optimize the sample-parameter
pair selection in FISH Mask. IRD iteratively refines the selection by
identifying subsets of samples and parameters exhibiting higher Fisher
information. We demonstrate the effectiveness and rationality of proposed
strategy by conducting experiments on GLUE benchmark. Experimental results show
our strategy optimizes the parameter selection and achieves preferable
performance over some typical baseline methods.
[LINK]
http://arxiv.org/abs/2403.08484v2
[DATE]
2024-11-18 15:32:16+08:00
[CATEGORIES]
cs.CL
Enhancing High-order Interaction Awareness in LLM-based Recommender Model
[AUTHORS]
Xinfeng Wang, Jin Cui, Fumiyo Fukumoto, Yoshimi Suzuki
[ABSTRACT]
Large language models (LLMs) have demonstrated prominent reasoning
capabilities in recommendation tasks by transforming them into text-generation
tasks. However, existing approaches either disregard or ineffectively model the
user-item high-order interactions. To this end, this paper presents an enhanced
LLM-based recommender (ELMRec). We enhance whole-word embeddings to
substantially enhance LLMs’ interpretation of graph-constructed interactions
for recommendations, without requiring graph pre-training. This finding may
inspire endeavors to incorporate rich knowledge graphs into LLM-based
recommenders via whole-word embedding. We also found that LLMs often recommend
items based on users’ earlier interactions rather than recent ones, and present
a reranking solution. Our ELMRec outperforms state-of-the-art (SOTA) methods in
both direct and sequential recommendations.
[COMMENTS]
Long paper accepted to EMNLP 2024 Main. 16 pages
[LINK]
http://arxiv.org/abs/2409.19979v3
[DATE]
2024-11-18 14:28:01+08:00
[CATEGORIES]
cs.CL
Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation
[AUTHORS]
Peng Shu, Junhao Chen, Zhengliang Liu, Hui Wang, Zihao Wu, Tianyang Zhong, Yiwei Li, Huaqin Zhao, Hanqi Jiang, Yi Pan, Yifan Zhou, Constance Owl, Xiaoming Zhai, Ninghao Liu, Claudio Saunt, Tianming Liu
[ABSTRACT]
Large Language Models (LLMs) have demonstrated remarkable success across a
wide range of tasks and domains. However, their performance in low-resource
language translation, particularly when translating into these languages,
remains underexplored. This gap poses significant challenges, as linguistic
barriers hinder the cultural preservation and development of minority
communities. To address this issue, this paper introduces a novel
retrieval-based method that enhances translation quality for low-resource
languages by focusing on key terms, which involves translating keywords and
retrieving corresponding examples from existing data. To evaluate the
effectiveness of this method, we conducted experiments translating from English
into three low-resource languages: Cherokee, a critically endangered indigenous
language of North America; Tibetan, a historically and culturally significant
language in Asia; and Manchu, a language with few remaining speakers. Our
comparison with the zero-shot performance of GPT-4o and LLaMA 3.1 405B,
highlights the significant challenges these models face when translating into
low-resource languages. In contrast, our retrieval-based method shows promise
in improving both word-level accuracy and overall semantic understanding by
leveraging existing resources more effectively.
[LINK]
http://arxiv.org/abs/2411.11295v1
[DATE]
2024-11-18 13:41:27+08:00
[CATEGORIES]
cs.CL
*ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search**
[AUTHORS]
Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang
[ABSTRACT]
Recent methodologies in LLM self-training mostly rely on LLM generating
responses and filtering those with correct output answers as training data.
This approach often yields a low-quality fine-tuning training set (e.g.,
incorrect plans or intermediate reasoning). In this paper, we develop a
reinforced self-training approach, called ReST-MCTS, based on integrating
process reward guidance with tree search MCTS for collecting higher-quality
reasoning traces as well as per-step value to train policy and reward models.
ReST-MCTS* circumvents the per-step manual annotation typically used to train
process rewards by tree-search-based reinforcement learning: Given oracle final
correct answers, ReST-MCTS* is able to infer the correct process rewards by
estimating the probability this step can help lead to the correct answer. These
inferred rewards serve dual purposes: they act as value targets for further
refining the process reward model and also facilitate the selection of
high-quality traces for policy model self-training. We first show that the
tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior
LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same
search budget. We then show that by using traces searched by this tree-search
policy as training data, we can continuously enhance the three language models
for multiple iterations, and outperform other self-training algorithms such as
ReST$^\text{EM}$ and Self-Rewarding LM. We release all code at
https://github.com/THUDM/ReST-MCTS.
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.03816v3
[DATE]
2024-11-18 13:36:16+08:00
[CATEGORIES]
cs.CL
SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models
[AUTHORS]
Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, Jie Tang
[COMMENTS]
Accepted to NeurIPS D&B Track 2024
[LINK]
http://arxiv.org/abs/2401.07950v3
[DATE]
2024-11-18 13:30:50+08:00
[CATEGORIES]
cs.CL
Matching Patients to Clinical Trials with Large Language Models
[AUTHORS]
Qiao Jin, Zifeng Wang, Charalampos S. Floudas, Fangyuan Chen, Changlin Gong, Dara Bracken-Clarke, Elisabetta Xue, Yifan Yang, Jimeng Sun, Zhiyong Lu
[ABSTRACT]
Patient recruitment is challenging for clinical trials. We introduce
TrialGPT, an end-to-end framework for zero-shot patient-to-trial matching with
large language models. TrialGPT comprises three modules: it first performs
large-scale filtering to retrieve candidate trials (TrialGPT-Retrieval); then
predicts criterion-level patient eligibility (TrialGPT-Matching); and finally
generates trial-level scores (TrialGPT-Ranking). We evaluate TrialGPT on three
cohorts of 183 synthetic patients with over 75,000 trial annotations.
TrialGPT-Retrieval can recall over 90% of relevant trials using less than 6% of
the initial collection. Manual evaluations on 1,015 patient-criterion pairs
show that TrialGPT-Matching achieves an accuracy of 87.3% with faithful
explanations, close to the expert performance. The TrialGPT-Ranking scores are
highly correlated with human judgments and outperform the best-competing models
by 43.8% in ranking and excluding trials. Furthermore, our user study reveals
that TrialGPT can reduce the screening time by 42.6% in patient recruitment.
Overall, these results have demonstrated promising opportunities for
patient-to-trial matching with TrialGPT.
[COMMENTS]
Nature Communications
[LINK]
http://arxiv.org/abs/2307.15051v5
[DATE]
2024-11-18 11:55:02+08:00
[CATEGORIES]
cs.CL
Suicide Risk Assessment on Social Media with Semi-Supervised Learning
[AUTHORS]
Max Lovitt, Haotian Ma, Song Wang, Yifan Peng
[ABSTRACT]
With social media communities increasingly becoming places where suicidal
individuals post and congregate, natural language processing presents an
exciting avenue for the development of automated suicide risk assessment
systems. However, past efforts suffer from a lack of labeled data and class
imbalances within the available labeled data. To accommodate this task’s
imperfect data landscape, we propose a semi-supervised framework that leverages
labeled (n=500) and unlabeled (n=1,500) data and expands upon the self-training
algorithm with a novel pseudo-label acquisition process designed to handle
imbalanced datasets. To further ensure pseudo-label quality, we manually verify
a subset of the pseudo-labeled data that was not predicted unanimously across
multiple trials of pseudo-label generation. We test various models to serve as
the backbone for this framework, ultimately deciding that RoBERTa performs the
best. Ultimately, by leveraging partially validated pseudo-labeled data in
addition to ground-truth labeled data, we substantially improve our model’s
ability to assess suicide risk from social media posts.
[COMMENTS]
Accepted for publication in the 2024 IEEE International Conference on
Big Data
[LINK]
http://arxiv.org/abs/2411.12767v1
[DATE]
2024-11-18 10:43:05+08:00
[CATEGORIES]
cs.CL
A Theoretical Understanding of Self-Correction through In-context Alignment
[AUTHORS]
Yifei Wang, Yuyang Wu, Zeming Wei, Stefanie Jegelka, Yisen Wang
[COMMENTS]
Accepted at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2405.18634v2
[DATE]
2024-11-18 10:42:23+08:00
[CATEGORIES]
cs.LG
cs.CL
Capturing Sparks of Abstraction for the ARC Challenge
[AUTHORS]
Martin Andrews
[ABSTRACT]
Excellent progress has been made recently in solving ARC Challenge problems.
However, it seems that new techniques may be required to push beyond 60%
accuracy. Even commercial Large Language Models (LLMs) struggle to ‘understand’
many of the problems (when given the input and output grids), which makes
discovering solutions by LLM-lead program search somewhat futile.
In this work, LLM ‘understanding’ is attempted from a stronger starting
position : An LLM is given complete solutions to tasks in code, and then asked
to explain how the task is being solved at various levels of abstraction.
Specifically, the LLM was given code solutions implemented in arc-dsl-llm (an
LLM-legible version of Hodel’s arc-dsl to obtain: (a) commented code; (b) code
refactored into reusable functional chunks; (c) problem solution steps; and (d)
high-level problem-solving tactics.
We demonstrate that ‘Sparks of Abstraction’ can be extracted from the LLM
output - in a form that could be used in downstream tasks with Local LLMs
eligible to enter the ARC Prize.
Both the arc-dsl-llm DSL framework (with the re-engineered solutions) and the
Gemini LLM-generated data (along with the generation code) are made Open
Source.
[COMMENTS]
Submitted as a paper entry for the 2024 ARC Prize
[LINK]
http://arxiv.org/abs/2411.11206v1
[DATE]
2024-11-18 07:40:00+08:00
[CATEGORIES]
cs.CL
cs.LG
Debiasing Watermarks for Large Language Models via Maximal Coupling
[AUTHORS]
Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie J. Su, Ruixun Zhang
[ABSTRACT]
Watermarking language models is essential for distinguishing between human
and machine-generated text and thus maintaining the integrity and
trustworthiness of digital communication. We present a novel green/red list
watermarking approach that partitions the token set into “green” and “red”
lists, subtly increasing the generation probability for green tokens. To
correct token distribution bias, our method employs maximal coupling, using a
uniform coin flip to decide whether to apply bias correction, with the result
embedded as a pseudorandom watermark signal. Theoretical analysis confirms this
approach’s unbiased nature and robust detection capabilities. Experimental
results show that it outperforms prior techniques by preserving text quality
while maintaining high detectability, and it demonstrates resilience to
targeted modifications aimed at improving text quality. This research provides
a promising watermarking solution for language models, balancing effective
detection with minimal impact on text quality.
[LINK]
http://arxiv.org/abs/2411.11203v1
[DATE]
2024-11-18 07:36:37+08:00
[CATEGORIES]
cs.CL
cs.LG
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning
[AUTHORS]
Ruosen Li, Ziming Luo, Xinya Du
[ABSTRACT]
Hallucinations in large language models (LLMs) pose significant challenges in
tasks requiring complex multi-step reasoning, such as mathematical
problem-solving. Existing approaches primarily detect the presence of
hallucinations but lack a nuanced understanding of their types and
manifestations. In this paper, we first introduce a comprehensive taxonomy that
categorizes the common hallucinations in mathematical reasoning task into six
types: fabrication, factual inconsistency, context inconsistency, instruction
inconsistency, logical inconsistency, and logical error. We then propose FG-PRM
(Fine-Grained Process Reward Model), an augmented model designed to detect and
mitigate hallucinations in a fine-grained, step-level manner. To address the
limitations of manually labeling training data, we propose an automated method
for generating fine-grained hallucination data using LLMs. By injecting
hallucinations into reasoning steps of correct solutions, we create a diverse
and balanced synthetic dataset for training FG-PRM, which consists of six
specialized Process Reward Models (PRMs), each tailored to detect a specific
hallucination type. Our FG-PRM demonstrates superior performance across two key
tasks: 1) Fine-grained hallucination detection: classifying hallucination types
for each reasoning step; and 2) Verification: ranking multiple LLM-generated
outputs to select the most accurate solution, mitigating reasoning
hallucinations. Our experiments show that FG-PRM outperforms ChatGPT-3.5 and
Claude-3 on fine-grained hallucination detection and substantially boosts the
performance of LLMs on GSM8K and MATH benchmarks.
[LINK]
http://arxiv.org/abs/2410.06304v2
[DATE]
2024-11-18 07:22:18+08:00
[CATEGORIES]
cs.CL
You can remove GPT2’s LayerNorm by fine-tuning
[AUTHORS]
Stefan Heimersheim
[ABSTRACT]
The LayerNorm (LN) layer in GPT-style transformer models has long been a
hindrance to mechanistic interpretability. LN is a crucial component required
to stabilize the training of large language models, and LN or the similar
RMSNorm have been used in practically all large language models based on the
transformer architecture. The non-linear nature of the LN layers is a hindrance
for mechanistic interpretability as it hinders interpretation of the residual
stream, and makes it difficult to decompose the model into circuits. Some
researchers have gone so far as to name “reasons interpretability researchers
hate layer norm.”
In this paper we show that it is possible to remove the LN layers from a
pre-trained GPT2-small model by fine-tuning on a fraction (500M tokens) of the
training data. We demonstrate that this LN-free model achieves similar
performance to the original model on the OpenWebText and ThePile datasets
(-0.05 cross-entropy loss), and the Hellaswag benchmark (-0.5% accuracy). We
provide our implementation at https://github.com/ApolloResearch/gpt2_noLN, and
fine-tuned GPT2-small models at
https://huggingface.co/apollo-research/gpt2_noLN.
Our work not only provides a simplified model for mechanistic
interpretability research, but also provides evidence that the LN layers, at
inference time, do not play a crucial role in transformer models.
[COMMENTS]
Presented at the Attributing Model Behavior at Scale (ATTRIB) and
Interpretable AI: Past, Present, and Future workshops at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2409.13710v2
[DATE]
2024-11-18 06:32:53+08:00
[CATEGORIES]
cs.CL
cs.LG
DocNet: Semantic Structure in Inductive Bias Detection Models
[AUTHORS]
Jessica Zhu, Iain Cruickshank, Michel Cukier
[ABSTRACT]
News will have biases so long as people have opinions. It is increasingly
important for informed citizens to be able to identify bias as social media
becomes the primary entry point for news and partisan differences increase. If
people know the biases of the news they are consuming, they will be able to
take action to avoid polarizing echo chambers. In this paper, we explore an
often overlooked aspect of bias detection in documents: the semantic structure
of news articles. We present DocNet, a novel, inductive, and low-resource
document embedding and bias detection model that outperforms large language
models. We also demonstrate that the semantic structure of news articles from
opposing partisan sides, as represented in document-level graph embeddings,
have significant similarities. These results can be used to advance bias
detection in low-resource environments. Our code, data, and the corresponding
datasheet are made available at: https://anonymous.4open.science/r/DocNet/.
[LINK]
http://arxiv.org/abs/2406.10965v2
[DATE]
2024-11-18 01:30:24+08:00
[CATEGORIES]
cs.CL
ReasoningRank: Teaching Student Models to Rank through Reasoning-Based Knowledge Distillation
[AUTHORS]
Yuelyu Ji, Zhuochun Li, Rui Meng, Daqing He
[ABSTRACT]
Reranking documents based on their relevance to a given query is a critical
task in information retrieval. Traditional reranking methods often lack
transparency and rely on proprietary models, hindering reproducibility and
interpretability. We propose Reason-to-Rank (R2R), a novel open-source
reranking approach that enhances transparency by generating two types of
reasoning: direct relevance reasoning, which explains how a document addresses
the query, and comparison reasoning, which justifies the relevance of one
document over another. We leverage large language models (LLMs) as teacher
models to generate these explanations and distill this knowledge into smaller,
openly available student models. Our student models are trained to generate
meaningful reasoning and rerank documents, achieving competitive performance
across multiple datasets, including MSMARCO and BRIGHT. Experiments demonstrate
that R2R not only improves reranking accuracy but also provides valuable
insights into the decision-making process. By offering a structured and
interpretable solution with openly accessible resources, R2R aims to bridge the
gap between effectiveness and transparency in information retrieval, fostering
reproducibility and further research in the field.
[LINK]
http://arxiv.org/abs/2410.05168v2
[DATE]
2024-11-18 01:26:23+08:00
[CATEGORIES]
cs.CL
Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives
[AUTHORS]
Xinliang Frederick Zhang, Nick Beauchamp, Lu Wang
[ABSTRACT]
Reasoning about time and temporal relations is an integral aspect of human
cognition, essential for perceiving the world and navigating our experiences.
Though large language models (LLMs) have demonstrated impressive performance in
many reasoning tasks, temporal reasoning remains challenging due to its
intrinsic complexity. In this work, we first study an essential task of
temporal reasoning – temporal graph generation, to unveil LLMs’ inherent,
global reasoning capabilities. We show that this task presents great challenges
even for the most powerful LLMs, such as GPT-3.5/4. We also notice a
significant performance gap by small models (<10B) that lag behind LLMs by 50%.
Next, we study how to close this gap with a budget constraint, e.g., not using
model finetuning. We propose a new prompting technique tailored for temporal
reasoning, Narrative-of-Thought (NoT), that first converts the events set to a
Python class, then prompts a small model to generate a temporally grounded
narrative, guiding the final generation of a temporal graph. Extensive
experiments showcase the efficacy of NoT in improving various metrics. Notably,
NoT attains the highest F1 on the Schema-11 evaluation set, while securing an
overall F1 on par with GPT-3.5. NoT also achieves the best structural
similarity across the board, even compared with GPT-3.5/4. Our code is
available at https://github.com/launchnlp/NoT.
[COMMENTS]
EMNLP‘24 Findings
[LINK]
http://arxiv.org/abs/2410.05558v2
[DATE]
2024-11-18 01:00:11+08:00
[CATEGORIES]
cs.CL
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
[AUTHORS]
Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen
[ABSTRACT]
Multimodal large language models (MLLMs) have shown impressive success across
modalities such as image, video, and audio in a variety of understanding and
generation tasks. However, current MLLMs are surprisingly poor at understanding
webpage screenshots and generating their corresponding HTML code. To address
this problem, we propose $\texttt{Web2Code}$, a benchmark consisting of a new
large-scale webpage-to-code dataset for instruction tuning and an evaluation
framework for the webpage understanding and HTML code translation abilities of
MLLMs. For dataset construction, we leverage pretrained LLMs to enhance
existing webpage-to-code datasets as well as generate a diverse pool of new
webpages rendered into images. Specifically, the inputs are webpage images and
instructions, while the responses are the webpage’s HTML code. We further
include diverse natural language QA pairs about the webpage content in the
responses to enable a more comprehensive understanding of the web content. To
evaluate model performance in these tasks, we develop an evaluation framework
for testing MLLMs’ abilities in webpage understanding and web-to-code
generation. Extensive experiments show that our proposed dataset is beneficial
not only to our proposed tasks but also in the general visual domain. We hope
our work will contribute to the development of general MLLMs suitable for
web-based content generation and task automation. Our data and code are
available at https://github.com/MBZUAI-LLM/web2code.
[COMMENTS]
NeurIPS 2024 Datasets and Benchmarks Camera-ready Version. Website at
https://mbzuai-llm.github.io/webpage2code/
[LINK]
http://arxiv.org/abs/2406.20098v2
[DATE]
2024-11-18 00:11:00+08:00
[CATEGORIES]
cs.CL
Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data Acquisition
[AUTHORS]
Haoyi Jia, Abhilasha Dave, Julia Gonski, Ryan Herbst
[ABSTRACT]
To fully exploit the physics potential of current and future high energy
particle colliders, machine learning (ML) can be implemented in detector
electronics for intelligent data processing and acquisition. The implementation
of ML in real-time at colliders requires very low latencies that are
unachievable with a software-based approach, requiring optimization and
synthesis of ML algorithms for deployment on hardware. An analysis of neural
network inference efficiency is presented, focusing on the application of
collider trigger algorithms in field programmable gate arrays (FPGAs).
Trade-offs are evaluated between two frameworks, the SLAC Neural Network
Library (SNL) and hls4ml, in terms of resources and latency for different model
sizes. Results highlight the strengths and limitations of each approach,
offering valuable insights for optimizing real-time neural network deployments
at colliders. This work aims to guide researchers and engineers in selecting
the most suitable hardware and software configurations for real-time,
resource-constrained environments.
[COMMENTS]
12 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.11678v1
[DATE]
2024-11-18 23:59:30+08:00
[CATEGORIES]
cs.LG
Few-shot Model Extraction Attacks against Sequential Recommender Systems
[AUTHORS]
Hui Zhang, Fu Liu
[ABSTRACT]
Among adversarial attacks against sequential recommender systems, model
extraction attacks represent a method to attack sequential recommendation
models without prior knowledge. Existing research has primarily concentrated on
the adversary’s execution of black-box attacks through data-free model
extraction. However, a significant gap remains in the literature concerning the
development of surrogate models by adversaries with access to few-shot raw data
(10\% even less). That is, the challenge of how to construct a surrogate model
with high functional similarity within the context of few-shot data scenarios
remains an issue that requires resolution.This study addresses this gap by
introducing a novel few-shot model extraction framework against sequential
recommenders, which is designed to construct a superior surrogate model with
the utilization of few-shot data. The proposed few-shot model extraction
framework is comprised of two components: an autoregressive augmentation
generation strategy and a bidirectional repair loss-facilitated model
distillation procedure. Specifically, to generate synthetic data that closely
approximate the distribution of raw data, autoregressive augmentation
generation strategy integrates a probabilistic interaction sampler to extract
inherent dependencies and a synthesis determinant signal module to characterize
user behavioral patterns. Subsequently, bidirectional repair loss, which target
the discrepancies between the recommendation lists, is designed as auxiliary
loss to rectify erroneous predictions from surrogate models, transferring
knowledge from the victim model to the surrogate model effectively. Experiments
on three datasets show that the proposed few-shot model extraction framework
yields superior surrogate models.
[LINK]
http://arxiv.org/abs/2411.11677v1
[DATE]
2024-11-18 23:57:14+08:00
[CATEGORIES]
cs.LG
Efficient and Robust Continual Graph Learning for Graph Classification in Biology
[AUTHORS]
Ding Zhang, Jane Downer, Can Chen, Ren Wang
[ABSTRACT]
Graph classification is essential for understanding complex biological
systems, where molecular structures and interactions are naturally represented
as graphs. Traditional graph neural networks (GNNs) perform well on static
tasks but struggle in dynamic settings due to catastrophic forgetting. We
present Perturbed and Sparsified Continual Graph Learning (PSCGL), a robust and
efficient continual graph learning framework for graph data classification,
specifically targeting biological datasets. We introduce a perturbed sampling
strategy to identify critical data points that contribute to model learning and
a motif-based graph sparsification technique to reduce storage needs while
maintaining performance. Additionally, our PSCGL framework inherently defends
against graph backdoor attacks, which is crucial for applications in sensitive
biological contexts. Extensive experiments on biological datasets demonstrate
that PSCGL not only retains knowledge across tasks but also enhances the
efficiency and robustness of graph classification models in biology.
[LINK]
http://arxiv.org/abs/2411.11668v1
[DATE]
2024-11-18 23:47:37+08:00
[CATEGORIES]
cs.LG
Modulating Language Model Experiences through Frictions
[AUTHORS]
Katherine M. Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, Umang Bhatt
[COMMENTS]
NeurIPS Workshop on Behavioral ML; non-archival
[LINK]
http://arxiv.org/abs/2407.12804v2
[DATE]
2024-11-18 23:41:24+08:00
[CATEGORIES]
cs.LG
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
[AUTHORS]
Vansh Bansal, Saptarshi Roy, Purnamrita Sarkar, Alessandro Rinaldo
[ABSTRACT]
Diffusion models have emerged as a powerful tool for image generation and
denoising. Typically, generative models learn a trajectory between the starting
noise distribution and the target data distribution. Recently Liu et al.
(2023b) designed a novel alternative generative model Rectified Flow (RF),
which aims to learn straight flow trajectories from noise to data using a
sequence of convex optimization problems with close ties to optimal transport.
If the trajectory is curved, one must use many Euler discretization steps or
novel strategies, such as exponential integrators, to achieve a satisfactory
generation quality. In contrast, RF has been shown to theoretically straighten
the trajectory through successive rectifications, reducing the number of
function evaluations (NFEs) while sampling. It has also been shown empirically
that RF may improve the straightness in two rectifications if one can solve the
underlying optimization problem within a sufficiently small error. In this
paper, we make two key theoretical contributions: 1) we provide the first
theoretical analysis of the Wasserstein distance between the sampling
distribution of RF and the target distribution. Our error rate is characterized
by the number of discretization steps and a new formulation of straightness
stronger than that in the original work. 2) under a mild regularity assumption,
we show that for a rectified flow from a Gaussian to any general target
distribution with finite first moment (e.g. mixture of Gaussians), two
rectifications are sufficient to achieve a straight flow, which is in line with
the previous empirical findings. Additionally, we also present empirical
results on both simulated and real datasets to validate our theoretical
findings.
[LINK]
http://arxiv.org/abs/2410.14949v2
[DATE]
2024-11-18 23:35:52+08:00
[CATEGORIES]
cs.LG
Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction
[AUTHORS]
Yonggang Jin, Ge Zhang, Hao Zhao, Tianyu Zheng, Jarvi Guo, Liuyu Xiang, Shawn Yue, Stephen W. Huang, Zhaofeng He, Jie Fu
[ABSTRACT]
Developing a generalist agent is a longstanding objective in artificial
intelligence. Previous efforts utilizing extensive offline datasets from
various tasks demonstrate remarkable performance in multitasking scenarios
within Reinforcement Learning. However, these works encounter challenges in
extending their capabilities to new tasks. Recent approaches integrate textual
guidance or visual trajectory into decision networks to provide task-specific
contextual cues, representing a promising direction. However, it is observed
that relying solely on textual guidance or visual trajectory is insufficient
for accurately conveying the contextual information of tasks. This paper
explores enhanced forms of task guidance for agents, enabling them to
comprehend gameplay instructions, thereby facilitating a “read-to-play”
capability. Drawing inspiration from the success of multimodal instruction
tuning in visual tasks, we treat the visual-based RL task as a long-horizon
vision task and construct a set of multimodal game instructions to incorporate
instruction tuning into a decision transformer. Experimental results
demonstrate that incorporating multimodal game instructions significantly
enhances the decision transformer’s multitasking and generalization
capabilities.
[LINK]
http://arxiv.org/abs/2402.04154v7
[DATE]
2024-11-18 23:31:52+08:00
[CATEGORIES]
cs.LG
Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Linear Regression
[AUTHORS]
Zelin He, Ying Sun, Jingyuan Liu, Runze Li
[ABSTRACT]
We consider the transfer learning problem in the high dimensional linear
regression setting, where the feature dimension is larger than the sample size.
To learn transferable information, which may vary across features or the source
samples, we propose an adaptive transfer learning method that can detect and
aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans)
transferable structures. We achieve this by employing a fused-penalty, coupled
with weights that can adapt according to the transferable structure. To choose
the weight, we propose a theoretically informed, data-driven procedure,
enabling F-AdaTrans to selectively fuse the transferable signals with the
target while filtering out non-transferable signals, and S-AdaTrans to obtain
the optimal combination of information transferred from each source sample. We
show that, with appropriately chosen weights, F-AdaTrans achieves a convergence
rate close to that of an oracle estimator with a known transferable structure,
and S-AdaTrans recovers existing near-minimax optimal rates as a special case.
The effectiveness of the proposed method is validated using both simulation and
real data, demonstrating favorable performance compared to the existing
methods.
[LINK]
http://arxiv.org/abs/2403.13565v2
[DATE]
2024-11-18 23:30:16+08:00
[CATEGORIES]
cs.LG
No-regret Exploration in Shuffle Private Reinforcement Learning
[AUTHORS]
Shaojie Bai, Mohammad Sadegh Talebi, Chengcheng Zhao, Peng Cheng, Jiming Chen
[ABSTRACT]
Differential privacy (DP) has recently been introduced into episodic
reinforcement learning (RL) to formally address user privacy concerns in
personalized services. Previous work mainly focuses on two trust models of DP:
the central model, where a central agent is responsible for protecting users’
sensitive data, and the (stronger) local model, where the protection occurs
directly on the user side. However, they either require a trusted central agent
or incur a significantly higher privacy cost, making it unsuitable for many
scenarios. This work introduces a trust model stronger than the central model
but with a lower privacy cost than the local model, leveraging the emerging
\emph{shuffle} model of privacy. We present the first generic algorithm for
episodic RL under the shuffle model, where a trusted shuffler randomly permutes
a batch of users’ data before sending it to the central agent. We then
instantiate the algorithm using our proposed shuffle Privatizer, relying on a
shuffle private binary summation mechanism. Our analysis shows that the
algorithm achieves a near-optimal regret bound comparable to that of the
centralized model and significantly outperforms the local model in terms of
privacy cost.
[LINK]
http://arxiv.org/abs/2411.11647v1
[DATE]
2024-11-18 23:24:11+08:00
[CATEGORIES]
cs.LG
Scalable spectral representations for multi-agent reinforcement learning in network MDPs
[AUTHORS]
Zhaolin Ren, Runyu Zhang, Bo Dai, Na Li
[ABSTRACT]
Network Markov Decision Processes (MDPs), a popular model for multi-agent
control, pose a significant challenge to efficient learning due to the
exponential growth of the global state-action space with the number of agents.
In this work, utilizing the exponential decay property of network dynamics, we
first derive scalable spectral local representations for network MDPs, which
induces a network linear subspace for the local $Q$-function of each agent.
Building on these local spectral representations, we design a scalable
algorithmic framework for continuous state-action network MDPs, and provide
end-to-end guarantees for the convergence of our algorithm. Empirically, we
validate the effectiveness of our scalable representation-based approach on two
benchmark problems, and demonstrate the advantages of our approach over generic
function approximation approaches to representing the local $Q$-functions.
[COMMENTS]
Updated title, corrected an issue with an author’s name
[LINK]
http://arxiv.org/abs/2410.17221v2
[DATE]
2024-11-18 23:21:40+08:00
[CATEGORIES]
cs.LG
Thermodynamic Transferability in Coarse-Grained Force Fields using Graph Neural Networks
[AUTHORS]
Emily Shinkle, Aleksandra Pachalieva, Riti Bahl, Sakib Matin, Brendan Gifford, Galen T. Craven, Nicholas Lubbers
[ABSTRACT]
Coarse-graining is a molecular modeling technique in which an atomistic
system is represented in a simplified fashion that retains the most significant
system features that contribute to a target output, while removing the degrees
of freedom that are less relevant. This reduction in model complexity allows
coarse-grained molecular simulations to reach increased spatial and temporal
scales compared to corresponding all-atom models. A core challenge in
coarse-graining is to construct a force field that represents the interactions
in the new representation in a way that preserves the atomistic-level
properties. Many approaches to building coarse-grained force fields have
limited transferability between different thermodynamic conditions as a result
of averaging over internal fluctuations at a specific thermodynamic state
point. Here, we use a graph-convolutional neural network architecture, the
Hierarchically Interacting Particle Neural Network with Tensor Sensitivity
(HIP-NN-TS), to develop a highly automated training pipeline for coarse grained
force fields which allows for studying the transferability of coarse-grained
models based on the force-matching approach. We show that this approach not
only yields highly accurate force fields, but also that these force fields are
more transferable through a variety of thermodynamic conditions. These results
illustrate the potential of machine learning techniques such as graph neural
networks to improve the construction of transferable coarse-grained force
fields.
[COMMENTS]
Post-referee revisions. Accepted by Journal of Chemical Theory and
Computation (JCTC). 46 pages, 10 figures + TOC figure + SI (19 pages, 6
figures)
[LINK]
http://arxiv.org/abs/2406.12112v2
[DATE]
2024-11-18 23:21:31+08:00
[CATEGORIES]
cs.LG
Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators
[AUTHORS]
Yan Shuo Tan, Jason M. Klusowski, Krishnakumar Balasubramanian
[ABSTRACT]
Models based on recursive adaptive partitioning such as decision trees and
their ensembles are popular for high-dimensional regression as they can
potentially avoid the curse of dimensionality. Because empirical risk
minimization (ERM) is computationally infeasible, these models are typically
trained using greedy algorithms. Although effective in many cases, these
algorithms have been empirically observed to get stuck at local optima. We
explore this phenomenon in the context of learning sparse regression functions
over $d$ binary features, showing that when the true regression function $f^$
does not satisfy Abbe et al. (2022)’s Merged Staircase Property (MSP), greedy
training requires $\exp(\Omega(d))$ to achieve low estimation error.
Conversely, when $f^$ does satisfy MSP, greedy training can attain small
estimation error with only $O(\log d)$ samples. This dichotomy mirrors that of
two-layer neural networks trained with stochastic gradient descent (SGD) in the
mean-field regime, thereby establishing a head-to-head comparison between
SGD-trained neural networks and greedy recursive partitioning estimators.
Furthermore, ERM-trained recursive partitioning estimators achieve low
estimation error with $O(\log d)$ samples irrespective of whether $f^*$
satisfies MSP, thereby demonstrating a statistical-computational trade-off for
greedy training. Our proofs are based on a novel interpretation of greedy
recursive partitioning using stochastic process theory and a coupling technique
that may be of independent interest.
[LINK]
http://arxiv.org/abs/2411.04394v2
[DATE]
2024-11-18 23:18:54+08:00
[CATEGORIES]
cs.LG
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation
[AUTHORS]
Hanieh Shojaei Miandashti, Qianqian Zou, Claus Brenner
[ABSTRACT]
Reliable deep learning models require not only accurate predictions but also
well-calibrated confidence estimates to ensure dependable uncertainty
estimation. This is crucial in safety-critical applications like autonomous
driving, which depend on rapid and precise semantic segmentation of LiDAR point
clouds for real-time 3D scene understanding. In this work, we introduce a
sampling-free approach for estimating well-calibrated confidence values for
classification tasks, achieving alignment with true classification accuracy and
significantly reducing inference time compared to sampling-based methods. Our
evaluation using the Adaptive Calibration Error (ACE) metric for LiDAR semantic
segmentation shows that our approach maintains well-calibrated confidence
values while achieving increased processing speed compared to a sampling
baseline. Additionally, reliability diagrams reveal that our method produces
underconfidence rather than overconfident predictions, an advantage for
safety-critical applications. Our sampling-free approach offers well-calibrated
and time-efficient predictions for LiDAR scene semantic segmentation.
[LINK]
http://arxiv.org/abs/2411.11935v1
[DATE]
2024-11-18 23:13:20+08:00
[CATEGORIES]
cs.LG
DEFT: Efficient Fine-Tuning of Diffusion Models by Learning the Generalised $h$-transform
[AUTHORS]
Alexander Denker, Francisco Vargas, Shreyas Padhy, Kieran Didi, Simon Mathis, Vincent Dutordoir, Riccardo Barbano, Emile Mathieu, Urszula Julia Komorowska, Pietro Lio
[ABSTRACT]
Generative modelling paradigms based on denoising diffusion processes have
emerged as a leading candidate for conditional sampling in inverse problems. In
many real-world applications, we often have access to large, expensively
trained unconditional diffusion models, which we aim to exploit for improving
conditional sampling. Most recent approaches are motivated heuristically and
lack a unifying framework, obscuring connections between them. Further, they
often suffer from issues such as being very sensitive to hyperparameters, being
expensive to train or needing access to weights hidden behind a closed API. In
this work, we unify conditional training and sampling using the mathematically
well-understood Doob’s h-transform. This new perspective allows us to unify
many existing methods under a common umbrella. Under this framework, we propose
DEFT (Doob’s h-transform Efficient FineTuning), a new approach for conditional
generation that simply fine-tunes a very small network to quickly learn the
conditional $h$-transform, while keeping the larger unconditional network
unchanged. DEFT is much faster than existing baselines while achieving
state-of-the-art performance across a variety of linear and non-linear
benchmarks. On image reconstruction tasks, we achieve speedups of up to
1.6$\times$, while having the best perceptual quality on natural images and
reconstruction performance on medical images. Further, we also provide initial
experiments on protein motif scaffolding and outperform reconstruction guidance
methods.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2312.09236
[LINK]
http://arxiv.org/abs/2406.01781v3
[DATE]
2024-11-18 23:11:11+08:00
[CATEGORIES]
cs.LG
ST-Tree with Interpretability for Multivariate Time Series Classification
[AUTHORS]
Mingsen Du, Yanxuan Wei, Yingxia Tang, Xiangwei Zheng, Shoushui Wei, Cun Ji
[ABSTRACT]
Multivariate time series classification is of great importance in practical
applications and is a challenging task. However, deep neural network models
such as Transformers exhibit high accuracy in multivariate time series
classification but lack interpretability and fail to provide insights into the
decision-making process. On the other hand, traditional approaches based on
decision tree classifiers offer clear decision processes but relatively lower
accuracy. Swin Transformer (ST) addresses these issues by leveraging
self-attention mechanisms to capture both fine-grained local patterns and
global patterns. It can also model multi-scale feature representation learning,
thereby providing a more comprehensive representation of time series features.
To tackle the aforementioned challenges, we propose ST-Tree with
interpretability for multivariate time series classification. Specifically, the
ST-Tree model combines ST as the backbone network with an additional neural
tree model. This integration allows us to fully leverage the advantages of ST
in learning time series context while providing interpretable decision
processes through the neural tree. This enables researchers to gain clear
insights into the model’s decision-making process and extract meaningful
interpretations. Through experimental evaluations on 10 UEA datasets, we
demonstrate that the ST-Tree model improves accuracy in multivariate time
series classification tasks and provides interpretability through visualizing
the decision-making process across different datasets.
[COMMENTS]
Submitted on May 15, 2024, major revisions on Aug 31, 2024
[LINK]
http://arxiv.org/abs/2411.11620v1
[DATE]
2024-11-18 22:49:12+08:00
[CATEGORIES]
cs.LG
On the physics of nested Markov models: a generalized probabilistic theory perspective
[AUTHORS]
Xingjian Zhang, Yuhao Wang
[ABSTRACT]
Determining potential probability distributions with a given causal graph is
vital for causality studies. To bypass the difficulty in characterizing latent
variables in a Bayesian network, the nested Markov model provides an elegant
algebraic approach by listing exactly all the equality constraints on the
observed variables. However, this algebraically motivated causal model
comprises distributions outside Bayesian networks, and its physical
interpretation remains vague. In this work, we inspect the nested Markov model
through the lens of generalized probabilistic theory, an axiomatic framework to
describe general physical theories. We prove that all the equality constraints
defining the nested Markov model hold valid theory-independently. Yet, we show
this model generally contains distributions not implementable even within such
relaxed physical theories subjected to merely the relativity principles and
mild probabilistic rules. To interpret the origin of such a gap, we establish a
new causal model that defines valid distributions as projected from a
high-dimensional Bell-type causal structure. The new model unveils inequality
constraints induced by relativity principles, or equivalently high-dimensional
conditional independences, which are absent in the nested Markov model.
Nevertheless, we also notice that the restrictions on states and measurements
introduced by the generalized probabilistic theory framework can pose
additional inequality constraints beyond the new causal model. As a by-product,
we discover a new causal structure exhibiting strict gaps between the
distribution sets of a Bayesian network, generalized probabilistic theories,
and the nested Markov model. We anticipate our results will enlighten further
explorations on the unification of algebraic and physical perspectives of
causality.
[COMMENTS]
21 pages, 5 figures, 5 tables; Comments are welcome!
[LINK]
http://arxiv.org/abs/2411.11614v1
[DATE]
2024-11-18 22:40:58+08:00
[CATEGORIES]
cs.LG
Feature Selection for Network Intrusion Detection
[AUTHORS]
Charles Westphal, Stephen Hailes, Mirco Musolesi
[ABSTRACT]
Network Intrusion Detection (NID) remains a key area of research within the
information security community, while also being relevant to Machine Learning
(ML) practitioners. The latter generally aim to detect attacks using network
features, which have been extracted from raw network data typically using
dimensionality reduction methods, such as principal component analysis (PCA).
However, PCA is not able to assess the relevance of features for the task at
hand. Consequently, the features available are of varying quality, with some
being entirely non-informative. From this, two major drawbacks arise. Firstly,
trained and deployed models have to process large amounts of unnecessary data,
therefore draining potentially costly resources. Secondly, the noise caused by
the presence of irrelevant features can, in some cases, impede a model’s
ability to detect an attack. In order to deal with these challenges, we present
Feature Selection for Network Intrusion Detection (FSNID) a novel
information-theoretic method that facilitates the exclusion of non-informative
features when detecting network intrusions. The proposed method is based on
function approximation using a neural network, which enables a version of our
approach that incorporates a recurrent layer. Consequently, this version
uniquely enables the integration of temporal dependencies. Through an extensive
set of experiments, we demonstrate that the proposed method selects a
significantly reduced feature set, while maintaining NID performance. Code will
be made available upon publication.
[LINK]
http://arxiv.org/abs/2411.11603v1
[DATE]
2024-11-18 22:25:55+08:00
[CATEGORIES]
cs.LG
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
[AUTHORS]
Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Schölkopf, Joni Pajarinen, Dieter Büchler
[ABSTRACT]
It has been a long-standing research goal to endow robot hands with
human-level dexterity. Bi-manual robot piano playing constitutes a task that
combines challenges from dynamic tasks, such as generating fast while precise
motions, with slower but contact-rich manipulation problems. Although
reinforcement learning based approaches have shown promising results in
single-task performance, these methods struggle in a multi-song setting. Our
work aims to close this gap and, thereby, enable imitation learning approaches
for robot piano playing at scale. To this end, we introduce the Robot Piano 1
Million (RP1M) dataset, containing bi-manual robot piano playing motion data of
more than one million trajectories. We formulate finger placements as an
optimal transport problem, thus, enabling automatic annotation of vast amounts
of unlabeled songs. Benchmarking existing imitation learning approaches shows
that such approaches reach state-of-the-art robot piano playing performance by
leveraging RP1M.
[COMMENTS]
Accepted by Conference on Robot Learning (CoRL) 2024. Project
Website: https://rp1m.github.io/
[LINK]
http://arxiv.org/abs/2408.11048v2
[DATE]
2024-11-18 22:14:22+08:00
[CATEGORIES]
cs.LG
Generative Spatio-temporal GraphNet for Transonic Wing Pressure Distribution Forecasting
[AUTHORS]
Gabriele Immordino, Andrea Vaiuso, Andrea Da Ronch, Marcello Righi
[ABSTRACT]
This study presents a framework for predicting unsteady transonic wing
pressure distributions, integrating an autoencoder architecture with graph
convolutional networks and graph-based temporal layers to model time
dependencies. The framework compresses high-dimensional pressure distribution
data into a lower-dimensional latent space using an autoencoder, ensuring
efficient data representation while preserving essential features. Within this
latent space, graph-based temporal layers are employed to predict future wing
pressures based on past data, effectively capturing temporal dependencies and
improving predictive accuracy. This combined approach leverages the strengths
of autoencoders for dimensionality reduction, graph convolutional networks for
handling unstructured grid data, and temporal layers for modeling time-based
sequences. The effectiveness of the proposed framework is validated through its
application to the Benchmark Super Critical Wing test case, achieving accuracy
comparable to computational fluid dynamics, while significantly reducing
prediction time. This framework offers a scalable, computationally efficient
solution for the aerodynamic analysis of unsteady phenomena.
[LINK]
http://arxiv.org/abs/2411.11592v1
[DATE]
2024-11-18 22:10:20+08:00
[CATEGORIES]
cs.LG
Robust Causal Analysis of Linear Cyclic Systems With Hidden Confounders
[AUTHORS]
Boris Lorbeer
[ABSTRACT]
We live in a world full of complex systems which we need to improve our
understanding of. To accomplish this, purely probabilistic investigations are
often not enough. They are only the first step and must be followed by learning
the system’s underlying mechanisms. This is what the discipline of causality is
concerned with. Many of those complex systems contain feedback loops which
means that our methods have to allow for cyclic causal relations. Furthermore,
systems are rarely sufficiently isolated, which means that there are usually
hidden confounders, i.e., unmeasured variables that each causally affects more
than one measured variable. Finally, data is often distorted by contaminating
processes, and we need to apply methods that are robust against such
distortions. That’s why we consider the robustness of LLC, see \cite{llc}, one
of the few causal analysis methods that can deal with cyclic models with hidden
confounders. Following a theoretical analysis of LLC’s robustness properties,
we also provide robust extensions of LLC. To facilitate reproducibility and
further research in this field, we make the source code publicly available.
[COMMENTS]
18 pages, 2 figures
[LINK]
http://arxiv.org/abs/2411.11590v1
[DATE]
2024-11-18 22:09:01+08:00
[CATEGORIES]
cs.LG
PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning
[AUTHORS]
Chengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu
[ABSTRACT]
Designing generalizable agents capable of adapting to diverse embodiments has
achieved significant attention in Reinforcement Learning (RL), which is
critical for deploying RL agents in various real-world applications. Previous
Cross-Embodiment RL approaches have focused on transferring knowledge across
embodiments within specific tasks. These methods often result in knowledge
tightly coupled with those tasks and fail to adequately capture the distinct
characteristics of different embodiments. To address this limitation, we
introduce the notion of Cross-Embodiment Unsupervised RL (CEURL), which
leverages unsupervised learning to enable agents to acquire embodiment-aware
and task-agnostic knowledge through online interactions within reward-free
environments. We formulate CEURL as a novel Controlled Embodiment Markov
Decision Process (CE-MDP) and systematically analyze CEURL’s pre-training
objectives under CE-MDP. Based on these analyses, we develop a novel algorithm
Pre-trained Embodiment-Aware Control (PEAC) for handling CEURL, incorporating
an intrinsic reward function specifically designed for cross-embodiment
pre-training. PEAC not only provides an intuitive optimization strategy for
cross-embodiment pre-training but also can integrate flexibly with existing
unsupervised RL methods, facilitating cross-embodiment exploration and skill
discovery. Extensive experiments in both simulated (e.g., DMC and Robosuite)
and real-world environments (e.g., legged locomotion) demonstrate that PEAC
significantly improves adaptation performance and cross-embodiment
generalization, demonstrating its effectiveness in overcoming the unique
challenges of CEURL. The project page and code are in
https://yingchengyang.github.io/ceurl.
[COMMENTS]
NeurIPS24
[LINK]
http://arxiv.org/abs/2405.14073v2
[DATE]
2024-11-18 22:06:10+08:00
[CATEGORIES]
cs.LG
PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures
[AUTHORS]
Christina Giannoula, Peiming Yang, Ivan Fernandez, Jiacheng Yang, Sankeerth Durvasula, Yu Xin Li, Mohammad Sadrosadati, Juan Gomez Luna, Onur Mutlu, Gennady Pekhimenko
[ABSTRACT]
Graph Neural Networks (GNNs) are emerging ML models to analyze
graph-structure data. Graph Neural Network (GNN) execution involves both
compute-intensive and memory-intensive kernels, the latter dominates the total
time, being significantly bottlenecked by data movement between memory and
processors. Processing-In-Memory (PIM) systems can alleviate this data movement
bottleneck by placing simple processors near or inside to memory arrays. In
this work, we introduce PyGim, an efficient ML library that accelerates GNNs on
real PIM systems. We propose intelligent parallelization techniques for
memory-intensive kernels of GNNs tailored for real PIM systems, and develop
handy Python API for them. We provide hybrid GNN execution, in which the
compute-intensive and memory-intensive kernels are executed in
processor-centric and memory-centric computing systems, respectively. We
extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using
emerging GNN models, and demonstrate that it outperforms its state-of-the-art
CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource
utilization than CPU and GPU systems. Our work provides useful recommendations
for software, system and hardware designers. PyGim is publicly available at
https://github.com/CMU-SAFARI/PyGim.
[LINK]
http://arxiv.org/abs/2402.16731v6
[DATE]
2024-11-18 22:05:29+08:00
[CATEGORIES]
cs.LG
Hybrid Data-Driven SSM for Interpretable and Label-Free mmWave Channel Prediction
[AUTHORS]
Yiyong Sun, Jiajun He, Zhidi Lin, Wenqiang Pu, Feng Yin, Hing Cheung So
[ABSTRACT]
Accurate prediction of mmWave time-varying channels is essential for
mitigating the issue of channel aging in complex scenarios owing to high user
mobility. Existing channel prediction methods have limitations: classical
model-based methods often struggle to track highly nonlinear channel dynamics
due to limited expert knowledge, while emerging data-driven methods typically
require substantial labeled data for effective training and often lack
interpretability. To address these issues, this paper proposes a novel hybrid
method that integrates a data-driven neural network into a conventional
model-based workflow based on a state-space model (SSM), implicitly tracking
complex channel dynamics from data without requiring precise expert knowledge.
Additionally, a novel unsupervised learning strategy is developed to train the
embedded neural network solely with unlabeled data. Theoretical analyses and
ablation studies are conducted to interpret the enhanced benefits gained from
the hybrid integration. Numerical simulations based on the 3GPP mmWave channel
model corroborate the superior prediction accuracy of the proposed method,
compared to state-of-the-art methods that are either purely model-based or
data-driven. Furthermore, extensive experiments validate its robustness against
various challenging factors, including among others severe channel variations
and high noise levels.
[LINK]
http://arxiv.org/abs/2411.11576v1
[DATE]
2024-11-18 21:54:44+08:00
[CATEGORIES]
cs.LG
Data-driven model reconstruction for nonlinear wave dynamics
[AUTHORS]
Ekaterina Smolina, Lev Smirnov, Daniel Leykam, Franco Nori, Daria Smirnova
[ABSTRACT]
The use of machine learning to predict wave dynamics is a topic of growing
interest, but commonly-used deep learning approaches suffer from a lack of
interpretability of the trained models. Here we present an interpretable
machine learning framework for analyzing the nonlinear evolution dynamics of
optical wavepackets in complex wave media. We use sparse regression to reduce
microscopic discrete lattice models to simpler effective continuum models which
can accurately describe the dynamics of the wavepacket envelope. We apply our
approach to valley-Hall domain walls in honeycomb photonic lattices of
laser-written waveguides with Kerr-type nonlinearity and different boundary
shapes. The reconstructed equations accurately reproduce the linear dispersion
and nonlinear effects including self-steepening and self-focusing. This scheme
is proven free of the a priori limitations imposed by the underlying hierarchy
of scales traditionally employed in asymptotic analytical methods. It
represents a powerful interpretable machine learning technique of interest for
advancing design capabilities in photonics and framing the complex
interaction-driven dynamics in various topological materials.
[COMMENTS]
6 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.11556v1
[DATE]
2024-11-18 21:17:10+08:00
[CATEGORIES]
cs.LG
Hierarchical-Graph-Structured Edge Partition Models for Learning Evolving Community Structure
[AUTHORS]
Xincan Yu, Sikun Yang
[ABSTRACT]
We propose a novel dynamic network model to capture evolving latent
communities within temporal networks. To achieve this, we decompose each
observed dynamic edge between vertices using a Poisson-gamma edge partition
model, assigning each vertex to one or more latent communities through
\emph{nonnegative} vertex-community memberships. Specifically, hierarchical
transition kernels are employed to model the interactions between these latent
communities in the observed temporal network. A hierarchical graph prior is
placed on the transition structure of the latent communities, allowing us to
model how they evolve and interact over time. Consequently, our dynamic network
enables the inferred community structure to merge, split, and interact with one
another, providing a comprehensive understanding of complex network dynamics.
Experiments on various real-world network datasets demonstrate that the
proposed model not only effectively uncovers interpretable latent structures
but also surpasses other state-of-the art dynamic network models in the tasks
of link prediction and community detection.
[LINK]
http://arxiv.org/abs/2411.11536v1
[DATE]
2024-11-18 20:48:15+08:00
[CATEGORIES]
cs.LG
SeqProFT: Applying LoRA Finetuning for Sequence-only Protein Property Predictions
[AUTHORS]
Shuo Zhang, Jian K. Liu
[ABSTRACT]
Protein language models (PLMs) are capable of learning the relationships
between protein sequences and functions by treating amino acid sequences as
textual data in a self-supervised manner. However, fine-tuning these models
typically demands substantial computational resources and time, with results
that may not always be optimized for specific tasks. To overcome these
challenges, this study employs the LoRA method to perform end-to-end
fine-tuning of the ESM-2 model specifically for protein property prediction
tasks, utilizing only sequence information. Additionally, a multi-head
attention mechanism is integrated into the downstream network to combine
sequence features with contact map information, thereby enhancing the model’s
comprehension of protein sequences. Experimental results of extensive
classification and regression tasks demonstrate that the fine-tuned model
achieves strong performance and faster convergence across multiple regression
and classification tasks.
[LINK]
http://arxiv.org/abs/2411.11530v1
[DATE]
2024-11-18 20:40:39+08:00
[CATEGORIES]
cs.LG
Unpicking Data at the Seams: VAEs, Disentanglement and Independent Components
[AUTHORS]
Carl Allen
[ABSTRACT]
Disentanglement, or identifying salient statistically independent factors of
the data, is of interest in many areas of machine learning and statistics, with
relevance to synthetic data generation with controlled properties, robust
classification of features, parsimonious encoding, and a greater understanding
of the generative process underlying the data. Disentanglement arises in
several generative paradigms, including Variational Autoencoders (VAEs),
Generative Adversarial Networks and diffusion models. Particular progress has
recently been made in understanding disentanglement in VAEs, where the choice
of diagonal posterior covariance matrices is suggested to promote mutual
orthogonality between columns of the decoder’s Jacobian. We continue this
thread to show how this linear independence translates to statistical
independence, completing the chain in understanding how the VAE’s objective
identifies independent components of, or disentangles, the data.
[LINK]
http://arxiv.org/abs/2410.22559v2
[DATE]
2024-11-18 20:36:04+08:00
[CATEGORIES]
cs.LG
Preempting Text Sanitization Utility in Resource-Constrained Privacy-Preserving LLM Interactions
[AUTHORS]
Robin Carpentier, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Dali Kaafar
[ABSTRACT]
Individuals have been increasingly interacting with online Large Language
Models (LLMs), both in their work and personal lives. These interactions raise
privacy issues as the LLMs are typically hosted by third-parties who can gather
a variety of sensitive information about users and their companies. Text
Sanitization techniques have been proposed in the literature and can be used to
sanitize user prompts before sending them to the LLM. However, sanitization has
an impact on the downstream task performed by the LLM, and often to such an
extent that it leads to unacceptable results for the user. This is not just a
minor annoyance, with clear monetary consequences as LLM services charge on a
per use basis as well as great amount of computing resources wasted. We propose
an architecture leveraging a Small Language Model (SLM) at the user-side to
help estimate the impact of sanitization on a prompt before it is sent to the
LLM, thus preventing resource losses.
Our evaluation of this architecture revealed a significant problem with text
sanitization based on Differential Privacy, on which we want to draw the
attention of the community for further investigation.
[LINK]
http://arxiv.org/abs/2411.11521v1
[DATE]
2024-11-18 20:31:22+08:00
[CATEGORIES]
cs.LG
A Pre-Trained Graph-Based Model for Adaptive Sequencing of Educational Documents
[AUTHORS]
Jean Vassoyan, Anan Schütt, Jill-Jênn Vie, Arun-Balajiee Lekshmi-Narayanan, Elisabeth André, Nicolas Vayatis
[ABSTRACT]
Massive Open Online Courses (MOOCs) have greatly contributed to making
education more accessible. However, many MOOCs maintain a rigid,
one-size-fits-all structure that fails to address the diverse needs and
backgrounds of individual learners. Learning path personalization aims to
address this limitation, by tailoring sequences of educational content to
optimize individual student learning outcomes. Existing approaches, however,
often require either massive student interaction data or extensive expert
annotation, limiting their broad application. In this study, we introduce a
novel data-efficient framework for learning path personalization that operates
without expert annotation. Our method employs a flexible recommender system
pre-trained with reinforcement learning on a dataset of raw course materials.
Through experiments on semi-synthetic data, we show that this pre-training
stage substantially improves data-efficiency in a range of adaptive learning
scenarios featuring new educational materials. This opens up new perspectives
for the design of foundation models for adaptive learning.
[COMMENTS]
NeurIPS 2024 Workshop on Large Foundation Models for Educational
Assessment (FM-Assess), Dec 2024, Vancouver, Canada
[LINK]
http://arxiv.org/abs/2411.11520v1
[DATE]
2024-11-18 20:29:06+08:00
[CATEGORIES]
cs.LG
Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information
[AUTHORS]
Sutanu Gayen, Sanket Kale, Sayantan Sen
[ABSTRACT]
Learning high-dimensional distributions is a significant challenge in machine
learning and statistics. Classical research has mostly concentrated on
asymptotic analysis of such data under suitable assumptions. While existing
works [Bhattacharyya et al.: SICOMP 2023, Daskalakis et al.: STOC 2021, Choo et
al.: ALT 2024] focus on discrete distributions, the current work addresses the
tree structure learning problem for Gaussian distributions, providing efficient
algorithms with solid theoretical guarantees. This is crucial as real-world
distributions are often continuous and differ from the discrete scenarios
studied in prior works.
In this work, we design a conditional mutual information tester for Gaussian
random variables that can test whether two Gaussian random variables are
independent, or their conditional mutual information is at least $\varepsilon$,
for some parameter $\varepsilon \in (0,1)$ using
$\mathcal{O}(\varepsilon^{-1})$ samples which we show to be near-optimal. In
contrast, an additive estimation would require $\Omega(\varepsilon^{-2})$
samples. Our upper bound technique uses linear regression on a pair of suitably
transformed random variables. Importantly, we show that the chain rule of
conditional mutual information continues to hold for the estimated
(conditional) mutual information. As an application of such a mutual
information tester, we give an efficient $\varepsilon$-approximate
structure-learning algorithm for an $n$-variate Gaussian tree model that takes
$\widetilde{\Theta}(n\varepsilon^{-1})$ samples which we again show to be
near-optimal. In contrast, when the underlying Gaussian model is not known to
be tree-structured, we show that $\widetilde{{{\Theta}}}(n^2\varepsilon^{-2})$
samples are necessary and sufficient to output an $\varepsilon$-approximate
tree structure. We perform extensive experiments that corroborate our
theoretical convergence bounds.
[COMMENTS]
47 pages, 16 figures, abstract shortened as per arXiv criteria
[LINK]
http://arxiv.org/abs/2411.11516v1
[DATE]
2024-11-18 20:25:34+08:00
[CATEGORIES]
cs.LG
A Modular Open Source Framework for Genomic Variant Calling
[AUTHORS]
Ankita Vaishnobi Bisoi, Bharath Ramsundar
[ABSTRACT]
Variant calling is a fundamental task in genomic research, essential for
detecting genetic variations such as single nucleotide polymorphisms (SNPs) and
insertions or deletions (indels). This paper presents an enhancement to
DeepChem, a widely used open-source drug discovery framework, through the
integration of DeepVariant. In particular, we introduce a variant calling
pipeline that leverages DeepVariant’s convolutional neural network (CNN)
architecture to improve the accuracy and reliability of variant detection. The
implemented pipeline includes stages for realignment of sequencing reads,
candidate variant detection, and pileup image generation, followed by variant
classification using a modified Inception v3 model. Our work adds a modular and
extensible variant calling framework to the DeepChem framework and enables
future work integrating DeepChem’s drug discovery infrastructure more tightly
with bioinformatics pipelines.
[LINK]
http://arxiv.org/abs/2411.11513v1
[DATE]
2024-11-18 20:21:48+08:00
[CATEGORIES]
cs.LG
Structure learning with Temporal Gaussian Mixture for model-based Reinforcement Learning
[AUTHORS]
Théophile Champion, Marek Grześ, Howard Bowman
[ABSTRACT]
Model-based reinforcement learning refers to a set of approaches capable of
sample-efficient decision making, which create an explicit model of the
environment. This model can subsequently be used for learning optimal policies.
In this paper, we propose a temporal Gaussian Mixture Model composed of a
perception model and a transition model. The perception model extracts discrete
(latent) states from continuous observations using a variational Gaussian
mixture likelihood. Importantly, our model constantly monitors the collected
data searching for new Gaussian components, i.e., the perception model performs
a form of structure learning (Smith et al., 2020; Friston et al., 2018; Neacsu
et al., 2022) as it learns the number of Gaussian components in the mixture.
Additionally, the transition model learns the temporal transition between
consecutive time steps by taking advantage of the Dirichlet-categorical
conjugacy. Both the perception and transition models are able to forget part of
the data points, while integrating the information they provide within the
prior, which ensure fast variational inference. Finally, decision making is
performed with a variant of Q-learning which is able to learn Q-values from
beliefs over states. Empirically, we have demonstrated the model’s ability to
learn the structure of several mazes: the model discovered the number of states
and the transition probabilities between these states. Moreover, using its
learned Q-values, the agent was able to successfully navigate from the starting
position to the maze’s exit.
[LINK]
http://arxiv.org/abs/2411.11511v1
[DATE]
2024-11-18 20:16:03+08:00
[CATEGORIES]
cs.LG
A survey and taxonomy of loss functions in machine learning
[AUTHORS]
Lorenzo Ciampiconi, Adam Elwood, Marco Leonardi, Ashraf Mohamed, Alessandro Rozza
[ABSTRACT]
Most state-of-the-art machine learning techniques revolve around the
optimisation of loss functions. Defining appropriate loss functions is
therefore critical to successfully solving problems in this field. In this
survey, we present a comprehensive overview of the most widely used loss
functions across key applications, including regression, classification,
generative modeling, ranking, and energy-based modeling. We introduce 43
distinct loss functions, structured within an intuitive taxonomy that clarifies
their theoretical foundations, properties, and optimal application contexts.
This survey is intended as a resource for undergraduate, graduate, and Ph.D.
students, as well as researchers seeking a deeper understanding of loss
functions.
[LINK]
http://arxiv.org/abs/2301.05579v2
[DATE]
2024-11-18 20:01:29+08:00
[CATEGORIES]
cs.LG
Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models
[AUTHORS]
Muhammad Saad Zia, Ashiq Anjum, Lu Liu, Anthony Conway, Anasol Pena Rios
[ABSTRACT]
Physics Informed Machine Learning has emerged as a popular approach in
modelling and simulation for digital twins to generate accurate models of
processes and behaviours of real-world systems. However, despite their success
in generating accurate and reliable models, the existing methods either use
simple regularizations in loss functions to offer limited physics integration
or are too specific in architectural definitions to be generalized to a wide
variety of physical systems. This paper presents a generic approach based on a
novel physics-encoded residual neural network architecture to combine
data-driven and physics-based analytical models to address these limitations.
Our method combines physics blocks as mathematical operators from physics-based
models with learning blocks comprising feed-forward layers. Intermediate
residual blocks are incorporated for stable gradient flow as they train on
physical system observation data. This way, the model learns to comply with the
geometric and kinematic aspects of the physical system. Compared to
conventional neural network-based methods, our method improves generalizability
with substantially low data requirements and model complexity in terms of
parameters, especially in scenarios where prior physics knowledge is either
elementary or incomplete. We investigate our approach in two application
domains. The first is a basic robotic motion model using Euler Lagrangian
equations of motion as physics prior. The second application is a complex
scenario of a steering model for a self-driving vehicle in a simulation. In
both applications, our method outperforms both conventional neural network
based approaches as-well as state-of-the-art Physics Informed Machine Learning
methods.
[LINK]
http://arxiv.org/abs/2411.11497v1
[DATE]
2024-11-18 19:58:20+08:00
[CATEGORIES]
cs.LG
Alien Recombination: Exploring Concept Blends Beyond Human Cognitive Availability in Visual Art
[AUTHORS]
Alejandro Hernandez, Levin Brinkmann, Ignacio Serna, Nasim Rahaman, Hassan Abu Alhaija, Hiromu Yakura, Mar Canet Sola, Bernhard Schölkopf, Iyad Rahwan
[ABSTRACT]
While AI models have demonstrated remarkable capabilities in constrained
domains like game strategy, their potential for genuine creativity in
open-ended domains like art remains debated. We explore this question by
examining how AI can transcend human cognitive limitations in visual art
creation. Our research hypothesizes that visual art contains a vast unexplored
space of conceptual combinations, constrained not by inherent incompatibility,
but by cognitive limitations imposed by artists’ cultural, temporal,
geographical and social contexts.
To test this hypothesis, we present the Alien Recombination method, a novel
approach utilizing fine-tuned large language models to identify and generate
concept combinations that lie beyond human cognitive availability. The system
models and deliberately counteracts human availability bias, the tendency to
rely on immediately accessible examples, to discover novel artistic
combinations.
This system not only produces combinations that have never been attempted
before within our dataset but also identifies and generates combinations that
are cognitively unavailable to all artists in the domain. Furthermore, we
translate these combinations into visual representations, enabling the
exploration of subjective perceptions of novelty. Our findings suggest that
cognitive unavailability is a promising metric for optimizing artistic novelty,
outperforming merely temperature scaling without additional evaluation
criteria. This approach uses generative models to connect previously
unconnected ideas, providing new insight into the potential of framing
AI-driven creativity as a combinatorial problem.
[COMMENTS]
NeurIPS 2024 Workshop on Creativity & Generative AI, 13 pages, 11
figures
[LINK]
http://arxiv.org/abs/2411.11494v1
[DATE]
2024-11-18 19:55:38+08:00
[CATEGORIES]
cs.LG
Pursuing Overall Welfare in Federated Learning through Sequential Decision Making
[AUTHORS]
Seok-Ju Hahn, Gi-Soo Kim, Junghye Lee
[COMMENTS]
Accepted at ICML 2024; added missing but important references, fixed
typos
[LINK]
http://arxiv.org/abs/2405.20821v2
[DATE]
2024-11-18 19:06:59+08:00
[CATEGORIES]
cs.LG
Physics meets Topology: Physics-informed topological neural networks for learning rigid body dynamics
[AUTHORS]
Amaury Wei, Olga Fink
[ABSTRACT]
Rigid body interactions are fundamental to numerous scientific disciplines,
but remain challenging to simulate due to their abrupt nonlinear nature and
sensitivity to complex, often unknown environmental factors. These challenges
call for adaptable learning-based methods capable of capturing complex
interactions beyond explicit physical models and simulations. While graph
neural networks can handle simple scenarios, they struggle with complex scenes
and long-term predictions. We introduce a novel framework for modeling rigid
body dynamics and learning collision interactions, addressing key limitations
of existing graph-based methods. Our approach extends the traditional
representation of meshes by incorporating higher-order topology complexes,
offering a physically consistent representation. Additionally, we propose a
physics-informed message-passing neural architecture, embedding physical laws
directly in the model. Our method demonstrates superior accuracy, even during
long rollouts, and exhibits strong generalization to unseen scenarios.
Importantly, this work addresses the challenge of multi-entity dynamic
interactions, with applications spanning diverse scientific and engineering
domains.
[COMMENTS]
17 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.11467v1
[DATE]
2024-11-18 19:03:15+08:00
[CATEGORIES]
cs.LG
PALMS: Parallel Adaptive Lasso with Multi-directional Signals for Latent Networks Reconstruction
[AUTHORS]
Zhaoyu Xing, Wei Zhong
[ABSTRACT]
Large-scale networks exist in many field and play an important role in
real-world dynamics. However, the networks are usually latent and expensive to
detect, which becomes the main challenging for many applications and empirical
analysis. Several statistical methods were proposed to infer the edges, but the
complexity of algorithms make them hard to be applied for large-scale networks.
In this paper, we proposed a general distributed and parallel computing
framework for network reconstruction methods via compressive sensing technical,
to make them feasible for inferring the super large networks in practice.
Combining with the CALMS, we proposed for those estimators enjoy additional
theoretical properties, such as the consistency and asymptotic normality, we
prove that the approximate estimation utilizing the distributed algorithm can
keep the theoretical results.
[COMMENTS]
48 pages
[LINK]
http://arxiv.org/abs/2411.11464v1
[DATE]
2024-11-18 18:58:16+08:00
[CATEGORIES]
cs.LG
ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures
[AUTHORS]
Salim Rukhsar, Anil Kumar Tiwari
[ABSTRACT]
Electroencephalography (EEG) is a widely used tool for diagnosing brain
disorders due to its high temporal resolution, non-invasive nature, and
affordability. Manual analysis of EEG is labor-intensive and requires
expertise, making automatic EEG interpretation crucial for reducing workload
and accurately assessing seizures. In epilepsy diagnosis, prolonged EEG
monitoring generates extensive data, often spanning hours, days, or even weeks.
While machine learning techniques for automatic EEG interpretation have
advanced significantly in recent decades, there remains a gap in its ability to
efficiently analyze large datasets with a balance of accuracy and computational
efficiency. To address the challenges mentioned above, an Attention Recurrent
Neural Network (ARNN) is proposed that can process a large amount of data
efficiently and accurately. This ARNN cell recurrently applies attention layers
along a sequence and has linear complexity with the sequence length and
leverages parallel computation by processing multi-channel EEG signals rather
than single-channel signals. In this architecture, the attention layer is a
computational unit that efficiently applies self-attention and cross-attention
mechanisms to compute a recurrent function over a wide number of state vectors
and input signals. This framework is inspired in part by the attention layer
and long short-term memory (LSTM) cells, but it scales this typical cell up by
several orders to parallelize for multi-channel EEG signals. It inherits the
advantages of attention layers and LSTM gate while avoiding their respective
drawbacks. The model’s effectiveness is evaluated through extensive experiments
with heterogeneous datasets, including the CHB-MIT and UPenn and Mayo’s Clinic
datasets.
[COMMENTS]
11 pages, 7 figures, Journal Paper
[LINK]
http://arxiv.org/abs/2403.03276v2
[DATE]
2024-11-18 18:46:04+08:00
[CATEGORIES]
cs.LG
Upside-Down Reinforcement Learning for More Interpretable Optimal Control
[AUTHORS]
Juan Cardenas-Cartagena, Massimiliano Falzari, Marco Zullich, Matthia Sabatelli
[ABSTRACT]
Model-Free Reinforcement Learning (RL) algorithms either learn how to map
states to expected rewards or search for policies that can maximize a certain
performance function. Model-Based algorithms instead, aim to learn an
approximation of the underlying model of the RL environment and then use it in
combination with planning algorithms. Upside-Down Reinforcement Learning (UDRL)
is a novel learning paradigm that aims to learn how to predict actions from
states and desired commands. This task is formulated as a Supervised Learning
problem and has successfully been tackled by Neural Networks (NNs). In this
paper, we investigate whether function approximation algorithms other than NNs
can also be used within a UDRL framework. Our experiments, performed over
several popular optimal control benchmarks, show that tree-based methods like
Random Forests and Extremely Randomized Trees can perform just as well as NNs
with the significant benefit of resulting in policies that are inherently more
interpretable than NNs, therefore paving the way for more transparent, safe,
and robust RL.
[LINK]
http://arxiv.org/abs/2411.11457v1
[DATE]
2024-11-18 18:44:20+08:00
[CATEGORIES]
cs.LG
Evaluating Synthetic Activations composed of SAE Latents in GPT-2
[AUTHORS]
Giorgi Giglemiani, Nora Petrova, Chatrik Singh Mangat, Jett Janiak, Stefan Heimersheim
[ABSTRACT]
Sparse Auto-Encoders (SAEs) are commonly employed in mechanistic
interpretability to decompose the residual stream into monosemantic SAE
latents. Recent work demonstrates that perturbing a model’s activations at an
early layer results in a step-function-like change in the model’s final layer
activations. Furthermore, the model’s sensitivity to this perturbation differs
between model-generated (real) activations and random activations. In our
study, we assess model sensitivity in order to compare real activations to
synthetic activations composed of SAE latents. Our findings indicate that
synthetic activations closely resemble real activations when we control for the
sparsity and cosine similarity of the constituent SAE latents. This suggests
that real activations cannot be explained by a simple “bag of SAE latents”
lacking internal structure, and instead suggests that SAE latents possess
significant geometric and statistical properties. Notably, we observe that our
synthetic activations exhibit less pronounced activation plateaus compared to
those typically surrounding real activations.
[COMMENTS]
Presented at the Attributing Model Behavior at Scale (ATTRIB)
workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2409.15019v2
[DATE]
2024-11-18 18:35:37+08:00
[CATEGORIES]
cs.LG
Characterizing stable regions in the residual stream of LLMs
[AUTHORS]
Jett Janiak, Jacek Karwowski, Chatrik Singh Mangat, Giorgi Giglemiani, Nora Petrova, Stefan Heimersheim
[COMMENTS]
Presented at the Scientific Methods for Understanding Deep Learning
(SciForDL) workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2409.17113v4
[DATE]
2024-11-18 18:32:32+08:00
[CATEGORIES]
cs.LG
Unveiling the Inflexibility of Adaptive Embedding in Traffic Forecasting
[AUTHORS]
Hongjun Wang, Jiyuan Chen, Lingyu Zhang, Renhe Jiang, Xuan Song
[ABSTRACT]
Spatiotemporal Graph Neural Networks (ST-GNNs) and Transformers have shown
significant promise in traffic forecasting by effectively modeling temporal and
spatial correlations. However, rapid urbanization in recent years has led to
dynamic shifts in traffic patterns and travel demand, posing major challenges
for accurate long-term traffic prediction. The generalization capability of
ST-GNNs in extended temporal scenarios and cross-city applications remains
largely unexplored. In this study, we evaluate state-of-the-art models on an
extended traffic benchmark and observe substantial performance degradation in
existing ST-GNNs over time, which we attribute to their limited inductive
capabilities. Our analysis reveals that this degradation stems from an
inability to adapt to evolving spatial relationships within urban environments.
To address this limitation, we reconsider the design of adaptive embeddings and
propose a Principal Component Analysis (PCA) embedding approach that enables
models to adapt to new scenarios without retraining. We incorporate PCA
embeddings into existing ST-GNN and Transformer architectures, achieving marked
improvements in performance. Notably, PCA embeddings allow for flexibility in
graph structures between training and testing, enabling models trained on one
city to perform zero-shot predictions on other cities. This adaptability
demonstrates the potential of PCA embeddings in enhancing the robustness and
generalization of spatiotemporal models.
[LINK]
http://arxiv.org/abs/2411.11448v1
[DATE]
2024-11-18 18:30:34+08:00
[CATEGORIES]
cs.LG
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
[AUTHORS]
Daniel J. Lee, Stefan Heimersheim
[ABSTRACT]
Sensitive directions experiments attempt to understand the computational
features of Language Models (LMs) by measuring how much the next token
prediction probabilities change by perturbing activations along specific
directions. We extend the sensitive directions work by introducing an improved
baseline for perturbation directions. We demonstrate that KL divergence for
Sparse Autoencoder (SAE) reconstruction errors are no longer pathologically
high compared to the improved baseline. We also show that feature directions
uncovered by SAEs have varying impacts on model outputs depending on the SAE’s
sparsity, with lower L0 SAE feature directions exerting a greater influence.
Additionally, we find that end-to-end SAE features do not exhibit stronger
effects on model outputs compared to traditional SAEs.
[COMMENTS]
Presented at the Attributing Model Behavior at Scale (ATTRIB) and
Scientific Methods for Understanding Deep Learning (SciForDL) workshops at
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.12555v2
[DATE]
2024-11-18 18:20:35+08:00
[CATEGORIES]
cs.LG
FuXi Weather: A data-to-forecast machine learning system for global weather
[AUTHORS]
Xiuyu Sun, Xiaohui Zhong, Xiaoze Xu, Yuanqing Huang, Hao Li, J. David Neelin, Deliang Chen, Jie Feng, Wei Han, Libo Wu, Yuan Qi
[ABSTRACT]
Weather forecasting traditionally relies on numerical weather prediction
(NWP) systems that integrates global observational systems, data assimilation
(DA), and forecasting models. Despite steady improvements in forecast accuracy
over recent decades, further advances are increasingly constrained by high
computational costs, the underutilization of vast observational datasets, and
the challenges of obtaining finer resolution. These limitations, alongside the
uneven distribution of observational networks, result in global disparities in
forecast accuracy, leaving some regions vulnerable to extreme weather. Recent
advances in machine learning present a promising alternative, providing more
efficient and accurate forecasts using the same initial conditions as NWP.
However, current machine learning models still depend on the initial conditions
generated by NWP systems, which require extensive computational resources and
expertise. Here we introduce FuXi Weather, a machine learning weather
forecasting system that assimilates data from multiple satellites. Operating on
a 6-hourly DA and forecast cycle, FuXi Weather generates reliable and accurate
10-day global weather forecasts at a spatial resolution of $0.25^\circ$. FuXi
Weather is the first system to achieve all-grid, all-surface, all-channel, and
all-sky DA and forecasting, extending skillful forecast lead times beyond those
of the European Centre for Medium-range Weather Forecasts (ECMWF)
high-resolution forecasts (HRES) while using significantly fewer observations.
FuXi Weather consistently outperforms ECMWF HRES in observation-sparse regions,
such as central Africa, demonstrating its potential to improve forecasts where
observational infrastructure is limited.
[COMMENTS]
73 pages
[LINK]
http://arxiv.org/abs/2408.05472v2
[DATE]
2024-11-18 18:19:17+08:00
[CATEGORIES]
cs.LG
Bayesian optimization of atomic structures with prior probabilities from universal interatomic potentials
[AUTHORS]
Peder Lyngby, Casper Larsen, Karsten Wedel Jacobsen
[ABSTRACT]
The optimization of atomic structures plays a pivotal role in understanding
and designing materials with desired properties. However, conventional
computational methods often struggle with the formidable task of navigating the
vast potential energy surface, especially in high-dimensional spaces with
numerous local minima. Recent advancements in machine learning-driven surrogate
models offer a promising avenue for alleviating this computational burden. In
this study, we propose a novel approach that combines the strengths of
universal machine learning potentials with a Bayesian approach using Gaussian
processes. By using the machine learning potentials as priors for the Gaussian
process, the Gaussian process has to learn only the difference between the
machine learning potential and the target energy surface calculated for example
by density functional theory. This turns out to improve the speed by which the
global optimal structure is identified across diverse systems for a
well-behaved machine learning potential. The approach is tested on periodic
bulk materials, surface structures, and a cluster.
[LINK]
http://arxiv.org/abs/2408.15590v2
[DATE]
2024-11-18 18:17:25+08:00
[CATEGORIES]
cs.LG
BONE: a unifying framework for Bayesian online learning in non-stationary environments
[AUTHORS]
Gerardo Duran-Martin, Leandro Sánchez-Betancourt, Alexander Y. Shestopaloff, Kevin Murphy
[ABSTRACT]
We propose a unifying framework for methods that perform Bayesian online
learning in non-stationary environments. We call the framework BONE, which
stands for (B)ayesian (O)nline learning in (N)on-stationary (E)nvironments.
BONE provides a common structure to tackle a variety of problems, including
online continual learning, prequential forecasting, and contextual bandits. The
framework requires specifying three modelling choices: (i) a model for
measurements (e.g., a neural network), (ii) an auxiliary process to model
non-stationarity (e.g., the time since the last changepoint), and (iii) a
conditional prior over model parameters (e.g., a multivariate Gaussian). The
framework also requires two algorithmic choices, which we use to carry out
approximate inference under this framework: (i) an algorithm to estimate
beliefs (posterior distribution) about the model parameters given the auxiliary
variable, and (ii) an algorithm to estimate beliefs about the auxiliary
variable. We show how this modularity allows us to write many different
existing methods as instances of BONE; we also use this framework to propose a
new method. We then experimentally compare existing methods with our proposed
new method on several datasets; we provide insights into the situations that
make one method more suitable than another for a given task.
[LINK]
http://arxiv.org/abs/2411.10153v2
[DATE]
2024-11-18 18:16:14+08:00
[CATEGORIES]
cs.LG
Implicit Regularization for Multi-label Feature Selection
[AUTHORS]
Dou El Kefel Mansouri, Khalid Benabdeslem, Seif-Eddine Benkabou
[ABSTRACT]
In this paper, we address the problem of feature selection in the context of
multi-label learning, by using a new estimator based on implicit regularization
and label embedding. Unlike the sparse feature selection methods that use a
penalized estimator with explicit regularization terms such as $l_{2,1}$-norm,
MCP or SCAD, we propose a simple alternative method via Hadamard product
parameterization. In order to guide the feature selection process, a latent
semantic of multi-label information method is adopted, as a label embedding.
Experimental results on some known benchmark datasets suggest that the proposed
estimator suffers much less from extra bias, and may lead to benign
overfitting.
[COMMENTS]
11 pages, 7 figures, My paper is currently under review at TPAMI
journal
[LINK]
http://arxiv.org/abs/2411.11436v1
[DATE]
2024-11-18 18:08:05+08:00
[CATEGORIES]
cs.LG
Interpretable Machine Learning for Survival Analysis
[AUTHORS]
Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright
[ABSTRACT]
With the spread and rapid advancement of black box machine learning models,
the field of interpretable machine learning (IML) or explainable artificial
intelligence (XAI) has become increasingly important over the last decade. This
is particularly relevant for survival analysis, where the adoption of IML
techniques promotes transparency, accountability and fairness in sensitive
areas, such as clinical decision making processes, the development of targeted
therapies, interventions or in other medical or healthcare related contexts.
More specifically, explainability can uncover a survival model’s potential
biases and limitations and provide more mathematically sound ways to understand
how and which features are influential for prediction or constitute risk
factors. However, the lack of readily available IML methods may have deterred
medical practitioners and policy makers in public health from leveraging the
full potential of machine learning for predicting time-to-event data. We
present a comprehensive review of the limited existing amount of work on IML
methods for survival analysis within the context of the general IML taxonomy.
In addition, we formally detail how commonly used IML methods, such as such as
individual conditional expectation (ICE), partial dependence plots (PDP),
accumulated local effects (ALE), different feature importance measures or
Friedman’s H-interaction statistics can be adapted to survival outcomes. An
application of several IML methods to real data on data on under-5 year
mortality of Ghanaian children from the Demographic and Health Surveys (DHS)
Program serves as a tutorial or guide for researchers, on how to utilize the
techniques in practice to facilitate understanding of model decisions or
predictions.
[LINK]
http://arxiv.org/abs/2403.10250v2
[DATE]
2024-11-18 18:06:01+08:00
[CATEGORIES]
cs.LG
Non-convex Stochastic Composite Optimization with Polyak Momentum
[AUTHORS]
Yuan Gao, Anton Rodomanov, Sebastian U. Stich
[ABSTRACT]
The stochastic proximal gradient method is a powerful generalization of the
widely used stochastic gradient descent (SGD) method and has found numerous
applications in Machine Learning. However, it is notoriously known that this
method fails to converge in non-convex settings where the stochastic noise is
significant (i.e. when only small or bounded batch sizes are used). In this
paper, we focus on the stochastic proximal gradient method with Polyak
momentum. We prove this method attains an optimal convergence rate for
non-convex composite optimization problems, regardless of batch size.
Additionally, we rigorously analyze the variance reduction effect of the Polyak
momentum in the composite optimization setting and we show the method also
converges when the proximal step can only be solved inexactly. Finally, we
provide numerical experiments to validate our theoretical results.
[LINK]
http://arxiv.org/abs/2403.02967v3
[DATE]
2024-11-18 17:44:08+08:00
[CATEGORIES]
cs.LG
Centaur: a foundation model of human cognition
[AUTHORS]
Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K. Eckstein, Noémi Éltető, Thomas L. Griffiths, Susanne Haridi, Akshay K. Jagadish, Li Ji-An, Alexander Kipnis, Sreejan Kumar, Tobias Ludwig, Marvin Mathony, Marcelo Mattar, Alireza Modirshanechi, Surabhi S. Nath, Joshua C. Peterson, Milena Rmus, Evan M. Russek, Tankred Saanum, Natalia Scharfenberg, Johannes A. Schubert, Luca M. Schulze Buschoff, Nishad Singhi, Xin Sui, Mirko Thalmann, Fabian Theis, Vuong Truong, Vishaal Udandarao, Konstantinos Voudouris, Robert Wilson, Kristin Witte, Shuchen Wu, Dirk Wulff, Huadong Xiong, Eric Schulz
[ABSTRACT]
Establishing a unified theory of cognition has been a major goal of
psychology. While there have been previous attempts to instantiate such
theories by building computational models, we currently do not have one model
that captures the human mind in its entirety. Here we introduce Centaur, a
computational model that can predict and simulate human behavior in any
experiment expressible in natural language. We derived Centaur by finetuning a
state-of-the-art language model on a novel, large-scale data set called
Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial
data from over 60,000 participants performing over 10,000,000 choices in 160
experiments. Centaur not only captures the behavior of held-out participants
better than existing cognitive models, but also generalizes to new cover
stories, structural task modifications, and entirely new domains. Furthermore,
we find that the model’s internal representations become more aligned with
human neural activity after finetuning. Taken together, Centaur is the first
real candidate for a unified model of human cognition. We anticipate that it
will have a disruptive impact on the cognitive sciences, challenging the
existing paradigm for developing computational models.
[LINK]
http://arxiv.org/abs/2410.20268v2
[DATE]
2024-11-18 17:39:18+08:00
[CATEGORIES]
cs.LG
Temporal and Spatial Reservoir Ensembling Techniques for Liquid State Machines
[AUTHORS]
Anmol Biswas, Sharvari Ashok Medhe, Raghav Singhal, Udayan Ganguly
[ABSTRACT]
Reservoir computing (RC), is a class of computational methods such as Echo
State Networks (ESN) and Liquid State Machines (LSM) describe a generic method
to perform pattern recognition and temporal analysis with any non-linear
system. This is enabled by Reservoir Computing being a shallow network model
with only Input, Reservoir, and Readout layers where input and reservoir
weights are not learned (only the readout layer is trained). LSM is a special
case of Reservoir computing inspired by the organization of neurons in the
brain and generally refers to spike-based Reservoir computing approaches. LSMs
have been successfully used to showcase decent performance on some neuromorphic
vision and speech datasets but a common problem associated with LSMs is that
since the model is more-or-less fixed, the main way to improve the performance
is by scaling up the Reservoir size, but that only gives diminishing rewards
despite a tremendous increase in model size and computation. In this paper, we
propose two approaches for effectively ensembling LSM models - Multi-Length
Scale Reservoir Ensemble (MuLRE) and Temporal Excitation Partitioned Reservoir
Ensemble (TEPRE) and benchmark them on Neuromorphic-MNIST (N-MNIST), Spiking
Heidelberg Digits (SHD), and DVSGesture datasets, which are standard
neuromorphic benchmarks. We achieve 98.1% test accuracy on N-MNIST with a
3600-neuron LSM model which is higher than any prior LSM-based approach and
77.8% test accuracy on the SHD dataset which is on par with a standard
Recurrent Spiking Neural Network trained by Backprop Through Time (BPTT). We
also propose receptive field-based input weights to the Reservoir to work
alongside the Multi-Length Scale Reservoir ensemble model for vision tasks.
Thus, we introduce effective means of scaling up the performance of LSM models
and evaluate them against relevant neuromorphic benchmarks
[LINK]
http://arxiv.org/abs/2411.11414v1
[DATE]
2024-11-18 17:35:22+08:00
[CATEGORIES]
cs.LG
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
[AUTHORS]
Yunong Liu, Cristobal Eyzaguirre, Manling Li, Shubh Khanna, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Weiyu Liu, Jiajun Wu
[ABSTRACT]
Shape assembly is a ubiquitous task in daily life, integral for constructing
complex 3D structures like IKEA furniture. While significant progress has been
made in developing autonomous agents for shape assembly, existing datasets have
not yet tackled the 4D grounding of assembly instructions in videos, essential
for a holistic understanding of assembly in 3D space over time. We introduce
IKEA Video Manuals, a dataset that features 3D models of furniture parts,
instructional manuals, assembly videos from the Internet, and most importantly,
annotations of dense spatio-temporal alignments between these data modalities.
To demonstrate the utility of IKEA Video Manuals, we present five applications
essential for shape assembly: assembly plan generation, part-conditioned
segmentation, part-conditioned pose estimation, video object segmentation, and
furniture assembly based on instructional video manuals. For each application,
we provide evaluation metrics and baseline methods. Through experiments on our
annotated data, we highlight many challenges in grounding assembly instructions
in videos to improve shape assembly, including handling occlusions, varying
viewpoints, and extended assembly sequences.
[COMMENTS]
NeurIPS 2024 Datasets and Benchmarks Track
[LINK]
http://arxiv.org/abs/2411.11409v1
[DATE]
2024-11-18 17:30:05+08:00
[CATEGORIES]
cs.LG
The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
[AUTHORS]
Xikang Yang, Xuehai Tang, Jizhong Han, Songlin Hu
[ABSTRACT]
The widespread deployment of large language models (LLMs) across various
domains has showcased their immense potential while exposing significant safety
vulnerabilities. A major concern is ensuring that LLM-generated content aligns
with human values. Existing jailbreak techniques reveal how this alignment can
be compromised through specific prompts or adversarial suffixes. In this study,
we introduce a new threat: LLMs’ bias toward authority. While this inherent
bias can improve the quality of outputs generated by LLMs, it also introduces a
potential vulnerability, increasing the risk of producing harmful content.
Notably, the biases in LLMs is the varying levels of trust given to different
types of authoritative information in harmful queries. For example, malware
development often favors trust GitHub. To better reveal the risks with LLM, we
propose DarkCite, an adaptive authority citation matcher and generator designed
for a black-box setting. DarkCite matches optimal citation types to specific
risk types and generates authoritative citations relevant to harmful
instructions, enabling more effective jailbreak attacks on aligned LLMs.Our
experiments show that DarkCite achieves a higher attack success rate (e.g.,
LLama-2 at 76% versus 68%) than previous methods. To counter this risk, we
propose an authenticity and harm verification defense strategy, raising the
average defense pass rate (DPR) from 11% to 74%. More importantly, the ability
to link citations to the content they encompass has become a foundational
function in LLMs, amplifying the influence of LLMs’ bias toward authority.
[LINK]
http://arxiv.org/abs/2411.11407v1
[DATE]
2024-11-18 17:28:58+08:00
[CATEGORIES]
cs.LG
Integrating GNN and Neural ODEs for Estimating Non-Reciprocal Two-Body Interactions in Mixed-Species Collective Motion
[AUTHORS]
Masahito Uwamichi, Simon K. Schnyder, Tetsuya J. Kobayashi, Satoshi Sawai
[ABSTRACT]
Analyzing the motion of multiple biological agents, be it cells or individual
animals, is pivotal for the understanding of complex collective behaviors. With
the advent of advanced microscopy, detailed images of complex tissue formations
involving multiple cell types have become more accessible in recent years.
However, deciphering the underlying rules that govern cell movements is far
from trivial. Here, we present a novel deep learning framework for estimating
the underlying equations of motion from observed trajectories, a pivotal step
in decoding such complex dynamics. Our framework integrates graph neural
networks with neural differential equations, enabling effective prediction of
two-body interactions based on the states of the interacting entities. We
demonstrate the efficacy of our approach through two numerical experiments.
First, we used simulated data from a toy model to tune the hyperparameters.
Based on the obtained hyperparameters, we then applied this approach to a more
complex model with non-reciprocal forces that mimic the collective dynamics of
the cells of slime molds. Our results show that the proposed method can
accurately estimate the functional forms of two-body interactions – even when
they are nonreciprocal – thereby precisely replicating both individual and
collective behaviors within these systems.
[COMMENTS]
Accepted at NeurIPS 2024. Some contents are omitted due to arXiv’s
storage limit. Please refer to the full paper at OpenReview (NeurIPS 2024) or
https://github.com/MasahitoUWAMICHI/collectiveMotionNN
[LINK]
http://arxiv.org/abs/2405.16503v2
[DATE]
2024-11-18 17:28:57+08:00
[CATEGORIES]
cs.LG
Bridging the Resource Gap: Deploying Advanced Imitation Learning Models onto Affordable Embedded Platforms
[AUTHORS]
Haizhou Ge, Ruixiang Wang, Zhu-ang Xu, Hongrui Zhu, Ruichen Deng, Yuhang Dong, Zeyu Pang, Guyue Zhou, Junyu Zhang, Lu Shi
[ABSTRACT]
Advanced imitation learning with structures like the transformer is
increasingly demonstrating its advantages in robotics. However, deploying these
large-scale models on embedded platforms remains a major challenge. In this
paper, we propose a pipeline that facilitates the migration of advanced
imitation learning algorithms to edge devices. The process is achieved via an
efficient model compression method and a practical asynchronous parallel method
Temporal Ensemble with Dropped Actions (TEDA) that enhances the smoothness of
operations. To show the efficiency of the proposed pipeline, large-scale
imitation learning models are trained on a server and deployed on an edge
device to complete various manipulation tasks.
[COMMENTS]
Accepted by the 2024 IEEE International Conference on Robotics and
Biomimetics (IEEE ROBIO 2024)
[LINK]
http://arxiv.org/abs/2411.11406v1
[DATE]
2024-11-18 17:28:11+08:00
[CATEGORIES]
cs.LG
The GECo algorithm for Graph Neural Networks Explanation
[AUTHORS]
Salvatore Calderaro, Domenico Amato, Giosuè Lo Bosco, Riccardo Rizzo, Filippo Vella
[ABSTRACT]
Graph Neural Networks (GNNs) are powerful models that can manage complex data
sources and their interconnection links. One of GNNs’ main drawbacks is their
lack of interpretability, which limits their application in sensitive fields.
In this paper, we introduce a new methodology involving graph communities to
address the interpretability of graph classification problems. The proposed
method, called GECo, exploits the idea that if a community is a subset of graph
nodes densely connected, this property should play a role in graph
classification. This is reasonable, especially if we consider the
message-passing mechanism, which is the basic mechanism of GNNs. GECo analyzes
the contribution to the classification result of the communities in the graph,
building a mask that highlights graph-relevant structures. GECo is tested for
Graph Convolutional Networks on six artificial and four real-world graph
datasets and is compared to the main explainability methods such as
PGMExplainer, PGExplainer, GNNExplainer, and SubgraphX using four different
metrics. The obtained results outperform the other methods for artificial graph
datasets and most real-world datasets.
[LINK]
http://arxiv.org/abs/2411.11391v1
[DATE]
2024-11-18 17:08:30+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks on Graph Databases
[AUTHORS]
Dmytro Lopushanskyy, Borun Shi
[ABSTRACT]
Training graph neural networks on large datasets has long been a challenge.
Traditional approaches include efficiently representing the whole graph
in-memory, designing parameter efficient and sampling-based models, and graph
partitioning in a distributed setup. Separately, graph databases with native
graph storage and query engines have been developed, which enable time and
resource efficient graph analytics workloads. We show how to directly train a
GNN on a graph DB, by retrieving minimal data into memory and sampling using
the query engine. Our experiments show resource advantages for single-machine
and distributed training. Our approach opens up a new way of scaling GNNs as
well as a new application area for graph DBs.
[COMMENTS]
14 pages, 8 figures
[LINK]
http://arxiv.org/abs/2411.11375v1
[DATE]
2024-11-18 16:39:24+08:00
[CATEGORIES]
cs.LG
Continual Task Learning through Adaptive Policy Self-Composition
[AUTHORS]
Shengchao Hu, Yuhang Zhou, Ziqing Fan, Jifeng Hu, Li Shen, Ya Zhang, Dacheng Tao
[ABSTRACT]
Training a generalizable agent to continually learn a sequence of tasks from
offline trajectories is a natural requirement for long-lived agents, yet
remains a significant challenge for current offline reinforcement learning (RL)
algorithms. Specifically, an agent must be able to rapidly adapt to new tasks
using newly collected trajectories (plasticity), while retaining knowledge from
previously learned tasks (stability). However, systematic analyses of this
setting are scarce, and it remains unclear whether conventional continual
learning (CL) methods are effective in continual offline RL (CORL) scenarios.
In this study, we develop the Offline Continual World benchmark and demonstrate
that traditional CL methods struggle with catastrophic forgetting, primarily
due to the unique distribution shifts inherent to CORL scenarios. To address
this challenge, we introduce CompoFormer, a structure-based continual
transformer model that adaptively composes previous policies via a meta-policy
network. Upon encountering a new task, CompoFormer leverages semantic
correlations to selectively integrate relevant prior policies alongside newly
trained parameters, thereby enhancing knowledge sharing and accelerating the
learning process. Our experiments reveal that CompoFormer outperforms
conventional CL methods, particularly in longer task sequences, showcasing a
promising balance between plasticity and stability.
[COMMENTS]
21 pages, 8 figures
[LINK]
http://arxiv.org/abs/2411.11364v1
[DATE]
2024-11-18 16:20:21+08:00
[CATEGORIES]
cs.LG
Spatio-Temporal Jump Model for Urban Thermal Comfort Monitoring
[AUTHORS]
Federico P. Cortese, Antonio Pievatolo
[ABSTRACT]
Thermal comfort is essential for well-being in urban spaces, especially as
cities face increasing heat from urbanization and climate change. Existing
thermal comfort models usually overlook temporal dynamics alongside spatial
dependencies. We address this problem by introducing a spatio-temporal jump
model that clusters data with persistence across both spatial and temporal
dimensions. This framework enhances interpretability, minimizes abrupt state
changes, and easily handles missing data. We validate our approach through
extensive simulations, demonstrating its accuracy in recovering the true
underlying partition. When applied to hourly environmental data gathered from a
set of weather stations located across the city of Singapore, our proposal
identifies meaningful thermal comfort regimes, demonstrating its effectiveness
in dynamic urban settings and suitability for real-world monitoring. The
comparison of these regimes with feedback on thermal preference indicates the
potential of an unsupervised approach to avoid extensive surveys.
[LINK]
http://arxiv.org/abs/2411.09726v2
[DATE]
2024-11-18 15:50:45+08:00
[CATEGORIES]
cs.LG
Zero-Shot Load Forecasting with Large Language Models
[AUTHORS]
Wenlong Liao, Zhe Yang, Mengshuo Jia, Christian Rehtanz, Jiannong Fang, Fernando Porté-Agel
[ABSTRACT]
Deep learning models have shown strong performance in load forecasting, but
they generally require large amounts of data for model training before being
applied to new scenarios, which limits their effectiveness in data-scarce
scenarios. Inspired by the great success of pre-trained language models (LLMs)
in natural language processing, this paper proposes a zero-shot load
forecasting approach using an advanced LLM framework denoted as the Chronos
model. By utilizing its extensive pre-trained knowledge, the Chronos model
enables accurate load forecasting in data-scarce scenarios without the need for
extensive data-specific training. Simulation results across five real-world
datasets demonstrate that the Chronos model significantly outperforms nine
popular baseline models for both deterministic and probabilistic load
forecasting with various forecast horizons (e.g., 1 to 48 hours), even though
the Chronos model is neither tailored nor fine-tuned to these specific load
datasets. Notably, Chronos reduces root mean squared error (RMSE), continuous
ranked probability score (CRPS), and quantile score (QS) by approximately
7.34%-84.30%, 19.63%-60.06%, and 22.83%-54.49%, respectively, compared to
baseline models. These results highlight the superiority and flexibility of the
Chronos model, positioning it as an effective solution in data-scarce
scenarios.
[COMMENTS]
21 pages,5 figures
[LINK]
http://arxiv.org/abs/2411.11350v1
[DATE]
2024-11-18 15:39:46+08:00
[CATEGORIES]
cs.LG
Modeling Multivariable High-resolution 3D Urban Microclimate Using Localized Fourier Neural Operator
[AUTHORS]
Shaoxiang Qin, Dongxue Zhan, Dingyang Geng, Wenhui Peng, Geng Tian, Yurong Shi, Naiping Gao, Xue Liu, Liangzhu Leon Wang
[ABSTRACT]
Accurate urban microclimate analysis with wind velocity and temperature is
vital for energy-efficient urban planning, supporting carbon reduction,
enhancing public health and comfort, and advancing the low-altitude economy.
However, traditional computational fluid dynamics (CFD) simulations that couple
velocity and temperature are computationally expensive. Recent machine learning
advancements offer promising alternatives for accelerating urban microclimate
simulations. The Fourier neural operator (FNO) has shown efficiency and
accuracy in predicting single-variable velocity magnitudes in urban wind
fields. Yet, for multivariable high-resolution 3D urban microclimate
prediction, FNO faces three key limitations: blurry output quality, high GPU
memory demand, and substantial data requirements. To address these issues, we
propose a novel localized Fourier neural operator (Local-FNO) model that
employs local training, geometry encoding, and patch overlapping. Local-FNO
provides accurate predictions for rapidly changing turbulence in urban
microclimate over 60 seconds, four times the average turbulence integral time
scale, with an average error of 0.35 m/s in velocity and 0.30 {\deg}C in
temperature. It also accurately captures turbulent heat flux represented by the
velocity-temperature correlation. In a 2 km by 2 km domain, Local-FNO resolves
turbulence patterns down to a 10 m resolution. It provides high-resolution
predictions with 150 million feature dimensions on a single 32 GB GPU at nearly
50 times the speed of a CFD solver. Compared to FNO, Local-FNO achieves a 23.9%
reduction in prediction error and a 47.3% improvement in turbulent fluctuation
correlation.
[LINK]
http://arxiv.org/abs/2411.11348v1
[DATE]
2024-11-18 15:38:25+08:00
[CATEGORIES]
cs.LG
SAD-TIME: a Spatiotemporal-fused network for depression detection with Automated multi-scale Depth-wise and TIME-interval-related common feature extractor
[AUTHORS]
Han-Guang Wang, Hui-Rang Hou, Li-Cheng Jin, Chen-Yang Xu, Zhong-Yi Zhang, Qing-Hao Meng
[ABSTRACT]
Background and Objective: Depression is a severe mental disorder, and
accurate diagnosis is pivotal to the cure and rehabilitation of people with
depression. However, the current questionnaire-based diagnostic methods could
bring subjective biases and may be denied by subjects. In search of a more
objective means of diagnosis, researchers have begun to experiment with deep
learning-based methods for identifying depressive disorders in recent years.
Methods: In this study, a novel Spatiotemporal-fused network with Automated
multi-scale Depth-wise and TIME-interval-related common feature extractor
(SAD-TIME) is proposed. SAD-TIME incorporates an automated nodes’ common
features extractor (CFE), a spatial sector (SpS), a modified temporal sector
(TeS), and a domain adversarial learner (DAL). The CFE includes a multi-scale
depth-wise 1D-convolutional neural network and a time-interval embedding
generator, where the unique information of each channel is preserved. The SpS
fuses the functional connectivity with the distance-based connectivity
containing spatial position of EEG electrodes. A multi-head-attention graph
convolutional network is also applied in the SpS to fuse the features from
different EEG channels. The TeS is based on long short-term memory and graph
transformer networks, where the temporal information of different time-windows
is fused. Moreover, the DAL is used after the SpS to obtain the
domain-invariant feature. Results: Experimental results under tenfold
cross-validation show that the proposed SAD-TIME method achieves 92.00% and
94.00% depression classification accuracies on two datasets, respectively, in
cross-subject mode. Conclusion: SAD-TIME is a robust depression detection
model, where the automatedly-generated features, the SpS and the TeS assist the
classification performance with the fusion of the innate spatiotemporal
information in the EEG signals.
[COMMENTS]
21pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.08521v2
[DATE]
2024-11-18 15:29:38+08:00
[CATEGORIES]
cs.LG
Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits
[AUTHORS]
Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh
[ABSTRACT]
Variational quantum circuits (VQCs) hold promise for quantum machine learning
on noisy intermediate-scale quantum (NISQ) devices. While tensor-train networks
(TTNs) can enhance VQC representation and generalization, the resulting hybrid
model, TTN-VQC, faces optimization challenges due to the Polyak-Lojasiewicz
(PL) condition. To mitigate this challenge, we introduce Pre+TTN-VQC, a
pre-trained TTN model combined with a VQC. Our theoretical analysis, grounded
in two-stage empirical risk minimization, provides an upper bound on the
transfer learning risk. It demonstrates the approach’s advantages in overcoming
the optimization challenge while maintaining TTN-VQC’s generalization
capability. We validate our findings through experiments on quantum dot and
handwritten digit classification using simulated and actual NISQ environments.
[COMMENTS]
In submission
[LINK]
http://arxiv.org/abs/2306.03741v4
[DATE]
2024-11-18 15:26:43+08:00
[CATEGORIES]
cs.LG
A Hybrid Loss Framework for Decomposition-based Time Series Forecasting Methods: Balancing Global and Component Errors
[AUTHORS]
Ronghui Han, Duanyu Feng, Hongyu Du, Hao Wang
[ABSTRACT]
Accurate time series forecasting, predicting future values based on past
data, is crucial for diverse industries. Many current time series methods
decompose time series into multiple sub-series, applying different model
architectures and training with an end-to-end overall loss for forecasting.
However, this raises a question: does this overall loss prioritize the
importance of critical sub-series within the decomposition for the better
performance? To investigate this, we conduct a study on the impact of overall
loss on existing time series methods with sequence decomposition. Our findings
reveal that overall loss may introduce bias in model learning, hindering the
learning of the prioritization of more significant sub-series and limiting the
forecasting performance. To address this, we propose a hybrid loss framework
combining the global and component losses. This framework introduces component
losses for each sub-series alongside the original overall loss. It employs a
dual min-max algorithm to dynamically adjust weights between the overall loss
and component losses, and within component losses. This enables the model to
achieve better performance of current time series methods by focusing on more
critical sub-series while still maintaining a low overall loss. We integrate
our loss framework into several time series methods and evaluate the
performance on multiple datasets. Results show an average improvement of 0.5-2%
over existing methods without any modifications to the model architectures.
[LINK]
http://arxiv.org/abs/2411.11340v1
[DATE]
2024-11-18 15:15:23+08:00
[CATEGORIES]
cs.LG
Adaptive AI-Driven Material Synthesis: Towards Autonomous 2D Materials Growth
[AUTHORS]
Leonardo Sabattini, Annalisa Coriolano, Corneel Casert, Stiven Forti, Edward S. Barnard, Fabio Beltram, Massimiliano Pontil, Stephen Whitelam, Camilla Coletti, Antonio Rossi
[ABSTRACT]
Two-dimensional (2D) materials are poised to revolutionize current
solid-state technology with their extraordinary properties. Yet, the primary
challenge remains their scalable production. While there have been significant
advancements, much of the scientific progress has depended on the exfoliation
of materials, a method that poses severe challenges for large-scale
applications. With the advent of artificial intelligence (AI) in materials
science, innovative synthesis methodologies are now on the horizon. This study
explores the forefront of autonomous materials synthesis using an artificial
neural network (ANN) trained by evolutionary methods, focusing on the efficient
production of graphene. Our approach demonstrates that a neural network can
iteratively and autonomously learn a time-dependent protocol for the efficient
growth of graphene, without requiring pretraining on what constitutes an
effective recipe. Evaluation criteria are based on the proximity of the Raman
signature to that of monolayer graphene: higher scores are granted to outcomes
whose spectrum more closely resembles that of an ideal continuous monolayer
structure. This feedback mechanism allows for iterative refinement of the ANN’s
time-dependent synthesis protocols, progressively improving sample quality.
Through the advancement and application of AI methodologies, this work makes a
substantial contribution to the field of materials engineering, fostering a new
era of innovation and efficiency in the synthesis process.
[LINK]
http://arxiv.org/abs/2410.10885v2
[DATE]
2024-11-18 14:57:23+08:00
[CATEGORIES]
cs.LG
Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation
[AUTHORS]
Zhihong Liu, Long Qian, Zeyang Liu, Lipeng Wan, Xingyu Chen, Xuguang Lan
[ABSTRACT]
Decision Transformer (DT) can learn effective policy from offline datasets by
converting the offline reinforcement learning (RL) into a supervised sequence
modeling task, where the trajectory elements are generated auto-regressively
conditioned on the return-to-go (RTG).However, the sequence modeling learning
approach tends to learn policies that converge on the sub-optimal trajectories
within the dataset, for lack of bridging data to move to better trajectories,
even if the condition is set to the highest RTG.To address this issue, we
introduce Diffusion-Based Trajectory Branch Generation (BG), which expands the
trajectories of the dataset with branches generated by a diffusion model.The
trajectory branch is generated based on the segment of the trajectory within
the dataset, and leads to trajectories with higher returns.We concatenate the
generated branch with the trajectory segment as an expansion of the
trajectory.After expanding, DT has more opportunities to learn policies to move
to better trajectories, preventing it from converging to the sub-optimal
trajectories.Empirically, after processing with BG, DT outperforms
state-of-the-art sequence modeling methods on D4RL benchmark, demonstrating the
effectiveness of adding branches to the dataset without further modifications.
[LINK]
http://arxiv.org/abs/2411.11327v1
[DATE]
2024-11-18 14:44:14+08:00
[CATEGORIES]
cs.LG
Dataset Distillers Are Good Label Denoisers In the Wild
[AUTHORS]
Lechao Cheng, Kaifeng Chen, Jiyang Li, Shengeng Tang, Shufei Zhang, Meng Wang
[ABSTRACT]
Learning from noisy data has become essential for adapting deep learning
models to real-world applications. Traditional methods often involve first
evaluating the noise and then applying strategies such as discarding noisy
samples, re-weighting, or re-labeling. However, these methods can fall into a
vicious cycle when the initial noise evaluation is inaccurate, leading to
suboptimal performance. To address this, we propose a novel approach that
leverages dataset distillation for noise removal. This method avoids the
feedback loop common in existing techniques and enhances training efficiency,
while also providing strong privacy protection through offline processing. We
rigorously evaluate three representative dataset distillation methods (DATM,
DANCE, and RCIG) under various noise conditions, including symmetric noise,
asymmetric noise, and real-world natural noise. Our empirical findings reveal
that dataset distillation effectively serves as a denoising tool in random
noise scenarios but may struggle with structured asymmetric noise patterns,
which can be absorbed into the distilled samples. Additionally, clean but
challenging samples, such as those from tail classes in imbalanced datasets,
may undergo lossy compression during distillation. Despite these challenges,
our results highlight that dataset distillation holds significant promise for
robust model training, especially in high-privacy environments where noise is
prevalent.
[LINK]
http://arxiv.org/abs/2411.11924v1
[DATE]
2024-11-18 14:26:41+08:00
[CATEGORIES]
cs.LG
Machine Vision-Based Assessment of Fall Color Changes and its Relationship with Leaf Nitrogen Concentration
[AUTHORS]
Achyut Paudel, Jostan Brown, Priyanka Upadhyaya, Atif Bilal Asad, Safal Kshetri, Joseph R. Davidson, Cindy Grimm, Ashley Thompson, Bernardita Sallato, Matthew D. Whiting, Manoj Karkee
[ABSTRACT]
Apple(\textit{Malus domestica} Borkh.) trees are deciduous, shedding leaves
each year. This process is preceded by a gradual change in leaf color from
green to yellow as chlorophyll is degraded prior to abscission. The initiation
and rate of this color change are affected by many factors including leaf
nitrogen (N) concentration. We predict that leaf color during this transition
may be indicative of the nitrogen status of apple trees. This study assesses a
machine vision-based system for quantifying the change in leaf color and its
correlation with leaf nitrogen content. An image dataset was collected in color
and 3D over five weeks in the fall of 2021 and 2023 at a commercial orchard
using a ground vehicle-based stereovision sensor. Trees in the foreground were
segmented from the point cloud using color and depth thresholding methods.
Then, to estimate the proportion of yellow leaves per canopy, the color
information of the segmented canopy area was quantified using a custom-defined
metric, \textit{yellowness index} (a normalized ratio of yellow to green
foliage in the tree) that varied from -1 to +1 (-1 being completely green and
+1 being completely yellow). Both K-means-based methods and gradient boosting
methods were used to estimate the \textit{yellowness index}. The gradient
boosting based method proposed in this study was better than the K-means-based
method (both in terms of computational time and accuracy), achieving an $R^2$
of 0.72 in estimating the \textit{yellowness index}. The metric was able to
capture the gradual color transition from green to yellow over the study
duration. Trees with lower leaf nitrogen showed the color transition to yellow
earlier than the trees with higher nitrogen.
Keywords: Fruit Tree Nitrogen Management, Machine Vision, Point Cloud
Segmentation, Precision Nitrogen Management
[LINK]
http://arxiv.org/abs/2404.14653v3
[DATE]
2024-11-18 14:03:47+08:00
[CATEGORIES]
cs.LG
Recurrent Stochastic Configuration Networks with Incremental Blocks
[AUTHORS]
Gang Dang, Dainhui Wang
[ABSTRACT]
Recurrent stochastic configuration networks (RSCNs) have shown promise in
modelling nonlinear dynamic systems with order uncertainty due to their
advantages of easy implementation, less human intervention, and strong
approximation capability. This paper develops the original RSCNs with block
increments, termed block RSCNs (BRSCNs), to further enhance the learning
capacity and efficiency of the network. BRSCNs can simultaneously add multiple
reservoir nodes (subreservoirs) during the construction. Each subreservoir is
configured with a unique structure in the light of a supervisory mechanism,
ensuring the universal approximation property. The reservoir feedback matrix is
appropriately scaled to guarantee the echo state property of the network.
Furthermore, the output weights are updated online using a projection
algorithm, and the persistent excitation conditions that facilitate parameter
convergence are also established. Numerical results over a time series
prediction, a nonlinear system identification task, and two industrial data
predictive analyses demonstrate that the proposed BRSCN performs favourably in
terms of modelling efficiency, learning, and generalization performance,
highlighting their significant potential for coping with complex dynamics.
[LINK]
http://arxiv.org/abs/2411.11303v1
[DATE]
2024-11-18 13:58:47+08:00
[CATEGORIES]
cs.LG
Accelerating spherical K-means clustering for large-scale sparse document data
[AUTHORS]
Kazuo Aoyama, Kazumi Saito
[ABSTRACT]
This paper presents an accelerated spherical K-means clustering algorithm for
large-scale and high-dimensional sparse document data sets. We design an
algorithm working in an architecture-friendly manner (AFM), which is a
procedure of suppressing performance-degradation factors such as the numbers of
instructions, branch mispredictions, and cache misses in CPUs of a modern
computer system. For the AFM operation, we leverage unique universal
characteristics (UCs) of a data-object and a cluster’s mean set, which are
skewed distributions on data relationships such as Zipf’s law and a
feature-value concentration phenomenon. The UCs indicate that the most part of
the number of multiplications for similarity calculations is executed regarding
terms with high document frequencies (df) and the most part of a similarity
between an object- and a mean-feature vector is obtained by the multiplications
regarding a few high mean-feature values. Our proposed algorithm applies an
inverted-index data structure to a mean set, extracts the specific region with
high-df terms and high mean-feature values in the mean-inverted index by newly
introduced two structural parameters, and exploits the index divided into three
parts for efficient pruning. The algorithm determines the two structural
parameters by minimizing the approximate number of multiplications related to
that of instructions, reduces the branch mispredictions by sharing the index
structure including the two parameters with all the objects, and suppressing
the cache misses by keeping in the caches the frequently used data in the
foregoing specific region, resulting in working in the AFM. We experimentally
demonstrate that our algorithm efficiently achieves superior speed performance
in large-scale documents compared with algorithms using the state-of-the-art
techniques.
[COMMENTS]
28 pages, 23 figures
[LINK]
http://arxiv.org/abs/2411.11300v1
[DATE]
2024-11-18 13:50:58+08:00
[CATEGORIES]
cs.LG
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning
[AUTHORS]
Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, Jinwoo Shin
[ABSTRACT]
In tabular prediction tasks, tree-based models combined with automated
feature engineering methods often outperform deep learning approaches that rely
on learned representations. While these feature engineering techniques are
effective, they typically depend on a pre-defined search space and primarily
use validation scores for feature selection, thereby missing valuable insights
from previous experiments. To address these limitations, we propose a novel
tabular learning framework that utilizes large language models (LLMs), termed
Optimizing Column feature generator with decision Tree reasoning (OCTree). Our
key idea is to leverage the reasoning capabilities of LLMs to identify
effective feature generation rules without manually specifying the search space
and provide language-based reasoning information highlighting past experiments
as feedback for iterative rule improvements. We use decision trees to convey
this reasoning information, as they can be easily represented in natural
language, effectively providing knowledge from prior experiments (i.e., the
impact of the generated features on performance) to the LLMs. Our empirical
results demonstrate that OCTree consistently enhances the performance of
various prediction models across diverse benchmarks, outperforming competing
automated feature engineering methods. Code is available at
https://github.com/jaehyun513/OCTree.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.08527v2
[DATE]
2024-11-18 13:47:10+08:00
[CATEGORIES]
cs.LG
Steering Language Model Refusal with Sparse Autoencoders
[AUTHORS]
Kyle O’Brien, David Majercak, Xavier Fernandes, Richard Edgar, Jingya Chen, Harsha Nori, Dean Carignan, Eric Horvitz, Forough Poursabzi-Sangde
[ABSTRACT]
Responsible practices for deploying language models include guiding models to
recognize and refuse answering prompts that are considered unsafe, while
complying with safe prompts. Achieving such behavior typically requires
updating model weights, which is costly and inflexible. We explore
opportunities to steering model activations at inference time, which does not
require updating weights. Using sparse autoencoders, we identify and steer
features in Phi-3 Mini that mediate refusal behavior. We find that feature
steering can improve Phi-3 Minis robustness to jailbreak attempts across
various harms, including challenging multi-turn attacks. However, we discover
that feature steering can adversely affect overall performance on benchmarks.
These results suggest that identifying steerable mechanisms for refusal via
sparse autoencoders is a promising approach for enhancing language model
safety, but that more research is needed to mitigate feature steerings adverse
effects on performance.
[LINK]
http://arxiv.org/abs/2411.11296v1
[DATE]
2024-11-18 13:47:02+08:00
[CATEGORIES]
cs.LG
A Scalable Training Strategy for Blind Multi-Distribution Noise Removal
[AUTHORS]
Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler
[ABSTRACT]
Despite recent advances, developing general-purpose universal denoising and
artifact-removal networks remains largely an open problem: Given fixed network
weights, one inherently trades-off specialization at one task (e.g.,~removing
Poisson noise) for performance at another (e.g.,~removing speckle noise). In
addition, training such a network is challenging due to the curse of
dimensionality: As one increases the dimensions of the specification-space
(i.e.,~the number of parameters needed to describe the noise distribution) the
number of unique specifications one needs to train for grows exponentially.
Uniformly sampling this space will result in a network that does well at very
challenging problem specifications but poorly at easy problem specifications,
where even large errors will have a small effect on the overall mean squared
error.
In this work we propose training denoising networks using an
adaptive-sampling/active-learning strategy. Our work improves upon a recently
proposed universal denoiser training strategy by extending these results to
higher dimensions and by incorporating a polynomial approximation of the true
specification-loss landscape. This approximation allows us to reduce training
times by almost two orders of magnitude. We test our method on simulated joint
Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training
strategy, a single blind, generalist denoiser network can achieve peak
signal-to-noise ratios within a uniform bound of specialized denoiser networks
across a large range of operating conditions. We also capture a small dataset
of images with varying amounts of joint Poisson-Gaussian-Speckle noise and
demonstrate that a universal denoiser trained using our adaptive-sampling
strategy outperforms uniformly trained baselines.
[COMMENTS]
IEEE TIP 2024
[LINK]
http://arxiv.org/abs/2310.20064v2
[DATE]
2024-11-18 13:09:42+08:00
[CATEGORIES]
cs.LG
Dual-Frequency Filtering Self-aware Graph Neural Networks for Homophilic and Heterophilic Graphs
[AUTHORS]
Yachao Yang, Yanfeng Sun, Jipeng Guo, Junbin Gao, Shaofan Wang, Fujiao Ju, Baocai Yin
[ABSTRACT]
Graph Neural Networks (GNNs) have excelled in handling graph-structured data,
attracting significant research interest. However, two primary challenges have
emerged: interference between topology and attributes distorting node
representations, and the low-pass filtering nature of most GNNs leading to the
oversight of valuable high-frequency information in graph signals. These issues
are particularly pronounced in heterophilic graphs. To address these
challenges, we propose Dual-Frequency Filtering Self-aware Graph Neural
Networks (DFGNN). DFGNN integrates low-pass and high-pass filters to extract
smooth and detailed topological features, using frequency-specific constraints
to minimize noise and redundancy in the respective frequency bands. The model
dynamically adjusts filtering ratios to accommodate both homophilic and
heterophilic graphs. Furthermore, DFGNN mitigates interference by aligning
topological and attribute representations through dynamic correspondences
between their respective frequency bands, enhancing overall model performance
and expressiveness. Extensive experiments conducted on benchmark datasets
demonstrate that DFGNN outperforms state-of-the-art methods in classification
performance, highlighting its effectiveness in handling both homophilic and
heterophilic graphs.
[COMMENTS]
11pages,17figures
[LINK]
http://arxiv.org/abs/2411.11284v1
[DATE]
2024-11-18 12:57:05+08:00
[CATEGORIES]
cs.LG
Multi-Hyperbolic Space-based Heterogeneous Graph Attention Network
[AUTHORS]
Jongmin Park, Seunghoon Han, Jong-Ryul Lee, Sungsu Lim
[ABSTRACT]
To leverage the complex structures within heterogeneous graphs, recent
studies on heterogeneous graph embedding use a hyperbolic space, characterized
by a constant negative curvature and exponentially increasing space, which
aligns with the structural properties of heterogeneous graphs. However, despite
heterogeneous graphs inherently possessing diverse power-law structures, most
hyperbolic heterogeneous graph embedding models use a single hyperbolic space
for the entire heterogeneous graph, which may not effectively capture the
diverse power-law structures within the heterogeneous graph. To address this
limitation, we propose Multi-hyperbolic Space-based heterogeneous Graph
Attention Network (MSGAT), which uses multiple hyperbolic spaces to effectively
capture diverse power-law structures within heterogeneous graphs. We conduct
comprehensive experiments to evaluate the effectiveness of MSGAT. The
experimental results demonstrate that MSGAT outperforms state-of-the-art
baselines in various graph machine learning tasks, effectively capturing the
complex structures of heterogeneous graphs.
[COMMENTS]
Accepted in IEEE ICDM 2024
[LINK]
http://arxiv.org/abs/2411.11283v1
[DATE]
2024-11-18 12:55:26+08:00
[CATEGORIES]
cs.LG
Coupled Integral PINN for conservation law
[AUTHORS]
Yeping Wang, Shihao Yang
[ABSTRACT]
The Physics-Informed Neural Network (PINN) is an innovative approach to solve
a diverse array of partial differential equations (PDEs) leveraging the power
of neural networks. This is achieved by minimizing the residual loss associated
with the explicit physical information, usually coupled with data derived from
initial and boundary conditions. However, a challenge arises in the context of
nonlinear conservation laws where derivatives are undefined at shocks, leading
to solutions that deviate from the true physical phenomena. To solve this
issue, the physical solution must be extracted from the weak formulation of the
PDE and is typically further bounded by entropy conditions. Within the
numerical framework, finite volume methods (FVM) are employed to address
conservation laws. These methods resolve the integral form of conservation laws
and delineate the shock characteristics. Inspired by the principles underlying
FVM, this paper introduces a novel Coupled Integrated PINN methodology that
involves fitting the integral solutions of equations using additional neural
networks. This technique not only augments the conventional PINN’s capability
in modeling shock waves, but also eliminates the need for spatial and temporal
discretization. As such, it bypasses the complexities of numerical integration
and reconstruction associated with non-convex fluxes. Finally, we show that the
proposed new Integrated PINN performs well in conservative law and outperforms
the vanilla PINN when tackle the challenging shock problems using examples of
Burger’s equation, Buckley-Leverett Equation and Euler System.
[LINK]
http://arxiv.org/abs/2411.11276v1
[DATE]
2024-11-18 12:32:42+08:00
[CATEGORIES]
cs.LG
Incorporating Arbitrary Matrix Group Equivariance into KANs
[AUTHORS]
Lexiang Hu, Yisen Wang, Zhouchen Lin
[ABSTRACT]
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific
domains thanks to spline activation functions, becoming an alternative to
Multi-Layer Perceptrons (MLPs). However, spline functions may not respect
symmetry in tasks, which is crucial prior knowledge in machine learning.
Previously, equivariant networks embed symmetry into their architectures,
achieving better performance in specific applications. Among these, Equivariant
Multi-Layer Perceptrons (EMLP) introduce arbitrary matrix group equivariance
into MLPs, providing a general framework for constructing equivariant networks
layer by layer. In this paper, we propose Equivariant Kolmogorov-Arnold
Networks (EKAN), a method for incorporating matrix group equivariance into
KANs, aiming to broaden their applicability to more fields. First, we construct
gated spline basis functions, which form the EKAN layer together with
equivariant linear weights. We then define a lift layer to align the input
space of EKAN with the feature space of the dataset, thereby building the
entire EKAN architecture. Compared with baseline models, EKAN achieves higher
accuracy with smaller datasets or fewer parameters on symmetry-related tasks,
such as particle scattering and the three-body problem, often reducing test MSE
by several orders of magnitude. Even in non-symbolic formula scenarios, such as
top quark tagging with three jet constituents, EKAN achieves comparable results
with EMLP using only $26\%$ of the parameters, while KANs do not outperform
MLPs as expected.
[LINK]
http://arxiv.org/abs/2410.00435v2
[DATE]
2024-11-18 12:28:09+08:00
[CATEGORIES]
cs.LG
Effective Predictive Modeling for Emergency Department Visits and Evaluating Exogenous Variables Impact: Using Explainable Meta-learning Gradient Boosting
[AUTHORS]
Mehdi Neshat, Michael Phipps, Nikhil Jha, Danial Khojasteh, Michael Tong, Amir Gandomi
[ABSTRACT]
Over an extensive duration, administrators and clinicians have endeavoured to
predict Emergency Department (ED) visits with precision, aiming to optimise
resource distribution. Despite the proliferation of diverse AI-driven models
tailored for precise prognostication, this task persists as a formidable
challenge, besieged by constraints such as restrained generalisability,
susceptibility to overfitting and underfitting, scalability issues, and complex
fine-tuning hyper-parameters. In this study, we introduce a novel Meta-learning
Gradient Booster (Meta-ED) approach for precisely forecasting daily ED visits
and leveraging a comprehensive dataset of exogenous variables, including
socio-demographic characteristics, healthcare service use, chronic diseases,
diagnosis, and climate parameters spanning 23 years from Canberra Hospital in
ACT, Australia. The proposed Meta-ED consists of four foundational
learners-Catboost, Random Forest, Extra Tree, and lightGBoost-alongside a
dependable top-level learner, Multi-Layer Perceptron (MLP), by combining the
unique capabilities of varied base models (sub-learners). Our study assesses
the efficacy of the Meta-ED model through an extensive comparative analysis
involving 23 models. The evaluation outcomes reveal a notable superiority of
Meta-ED over the other models in accuracy at 85.7% (95% CI ;85.4%, 86.0%) and
across a spectrum of 10 evaluation metrics. Notably, when compared with
prominent techniques, XGBoost, Random Forest (RF), AdaBoost, LightGBoost, and
Extra Tree (ExT), Meta-ED showcases substantial accuracy enhancements of 58.6%,
106.3%, 22.3%, 7.0%, and 15.7%, respectively. Furthermore, incorporating
weather-related features demonstrates a 3.25% improvement in the prediction
accuracy of visitors’ numbers. The encouraging outcomes of our study underscore
Meta-ED as a foundation model for the precise prediction of daily ED visitors.
[LINK]
http://arxiv.org/abs/2411.11275v1
[DATE]
2024-11-18 12:23:20+08:00
[CATEGORIES]
cs.LG
ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses
[AUTHORS]
Oliver Watt-Meyer, Brian Henn, Jeremy McGibbon, Spencer K. Clark, Anna Kwa, W. Andre Perkins, Elynn Wu, Lucas Harris, Christopher S. Bretherton
[ABSTRACT]
Existing machine learning models of weather variability are not formulated to
enable assessment of their response to varying external boundary conditions
such as sea surface temperature and greenhouse gases. Here we present ACE2 (Ai2
Climate Emulator version 2) and its application to reproducing atmospheric
variability over the past 80 years on timescales from days to decades. ACE2 is
a 450M-parameter autoregressive machine learning emulator, operating with
6-hour temporal resolution, 1{\deg} horizontal resolution and eight vertical
layers. It exactly conserves global dry air mass and moisture and can be
stepped forward stably for arbitrarily many steps with a throughput of about
1500 simulated years per wall clock day. ACE2 generates emergent phenomena such
as tropical cyclones, the Madden Julian Oscillation, and sudden stratospheric
warmings. Furthermore, it accurately reproduces the atmospheric response to El
Ni~no variability and global trends of temperature over the past 80 years.
However, its sensitivities to separately changing sea surface temperature and
carbon dioxide are not entirely realistic.
[COMMENTS]
31 pages, 23 figures
[LINK]
http://arxiv.org/abs/2411.11268v1
[DATE]
2024-11-18 11:57:07+08:00
[CATEGORIES]
cs.LG
Optimal and Fair Encouragement Policy Evaluation and Learning
[AUTHORS]
Angela Zhou
[ABSTRACT]
In consequential domains, it is often impossible to compel individuals to
take treatment, so that optimal policy rules are merely suggestions in the
presence of human non-adherence to treatment recommendations. Under
heterogeneity, covariates may predict take-up of treatment and final outcome,
but differently. While optimal treatment rules optimize causal outcomes across
the population, access parity constraints or other fairness considerations on
who receives treatment can be important. For example, in social services, a
persistent puzzle is the gap in take-up of beneficial services among those who
may benefit from them the most. We study causal identification and robust
estimation of optimal treatment rules, including under potential violations of
positivity. We consider fairness constraints such as demographic parity in
treatment take-up, and other constraints, via constrained optimization. Our
framework can be extended to handle algorithmic recommendations under an
often-reasonable covariate-conditional exclusion restriction, using our
robustness checks for lack of positivity in the recommendation. We develop a
two-stage algorithm for solving over parametrized policy classes under general
constraints to obtain variance-sensitive regret bounds. We illustrate the
methods in three case studies based on data from reminders of SNAP benefits
recertification, randomized encouragement to enroll in insurance, and from
pretrial supervised release with electronic monitoring. While the specific
remedy to inequities in algorithmic allocation is context-specific, it requires
studying both take-up of decisions and downstream outcomes of them.
[COMMENTS]
Updated with major new case study on SNAP recertification benefits
[LINK]
http://arxiv.org/abs/2309.07176v3
[DATE]
2024-11-18 11:40:52+08:00
[CATEGORIES]
cs.LG
GROOT: Effective Design of Biological Sequences with Limited Experimental Data
[AUTHORS]
Thanh V. T. Tran, Nhat Khang Ngo, Viet Anh Nguyen, Truong Son Hy
[ABSTRACT]
Latent space optimization (LSO) is a powerful method for designing discrete,
high-dimensional biological sequences that maximize expensive black-box
functions, such as wet lab experiments. This is accomplished by learning a
latent space from available data and using a surrogate model to guide
optimization algorithms toward optimal outputs. However, existing methods
struggle when labeled data is limited, as training the surrogate model with few
labeled data points can lead to subpar outputs, offering no advantage over the
training data itself. We address this challenge by introducing GROOT, a
Graph-based Latent Smoothing for Biological Sequence Optimization. In
particular, GROOT generates pseudo-labels for neighbors sampled around the
training latent embeddings. These pseudo-labels are then refined and smoothed
by Label Propagation. Additionally, we theoretically and empirically justify
our approach, demonstrate GROOT’s ability to extrapolate to regions beyond the
training set while maintaining reliability within an upper bound of their
expected distances from the training regions. We evaluate GROOT on various
biological sequence design tasks, including protein optimization (GFP and AAV)
and three tasks with exact oracles from Design-Bench. The results demonstrate
that GROOT equalizes and surpasses existing methods without requiring access to
black-box oracles or vast amounts of labeled data, highlighting its
practicality and effectiveness. We release our code at
https://anonymous.4open.science/r/GROOT-D554
[LINK]
http://arxiv.org/abs/2411.11265v1
[DATE]
2024-11-18 11:38:42+08:00
[CATEGORIES]
cs.LG
Graph Retention Networks for Dynamic Graphs
[AUTHORS]
Qian Chang, Xia Li, Xiufeng Cheng
[ABSTRACT]
In this work, we propose Graph Retention Network as a unified architecture
for deep learning on dynamic graphs. The GRN extends the core computational
manner of retention to dynamic graph data as graph retention, which empowers
the model with three key computational paradigms that enable training
parallelism, $O(1)$ low-cost inference, and long-term batch training. This
architecture achieves an optimal balance of effectiveness, efficiency, and
scalability. Extensive experiments conducted on benchmark datasets present the
superior performance of the GRN in both edge-level prediction and node-level
classification tasks. Our architecture achieves cutting-edge results while
maintaining lower training latency, reduced GPU memory consumption, and up to
an 86.7x improvement in inference throughput compared to baseline models. The
GRNs have demonstrated strong potential to become a widely adopted architecture
for dynamic graph learning tasks. Code will be available at
https://github.com/Chandler-Q/GraphRetentionNet.
[LINK]
http://arxiv.org/abs/2411.11259v1
[DATE]
2024-11-18 11:28:11+08:00
[CATEGORIES]
cs.LG
DecoR: Deconfounding Time Series with Robust Regression
[AUTHORS]
Felix Schur, Jonas Peters
[ABSTRACT]
Causal inference on time series data is a challenging problem, especially in
the presence of unobserved confounders. This work focuses on estimating the
causal effect between two time series that are confounded by a third,
unobserved time series. Assuming spectral sparsity of the confounder, we show
how in the frequency domain this problem can be framed as an adversarial
outlier problem. We introduce Deconfounding by Robust regression (DecoR), a
novel approach that estimates the causal effect using robust linear regression
in the frequency domain. Considering two different robust regression
techniques, we first improve existing bounds on the estimation error for such
techniques. Crucially, our results do not require distributional assumptions on
the covariates. We can therefore use them in time series settings. Applying
these results to DecoR, we prove, under suitable assumptions, upper bounds for
the estimation error of DecoR that imply consistency. We demonstrate DecoR’s
effectiveness through experiments on both synthetic and real-world data from
Earth system science. The simulation experiments furthermore suggest that DecoR
is robust with respect to model misspecification.
[COMMENTS]
27 pages, 7 figures
[LINK]
http://arxiv.org/abs/2406.07005v2
[DATE]
2024-11-18 11:02:13+08:00
[CATEGORIES]
cs.LG
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
[AUTHORS]
Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo
[ABSTRACT]
Grokking has been actively explored to reveal the mystery of delayed
generalization and identifying interpretable representations and algorithms
inside the grokked models is a suggestive hint to understanding its mechanism.
Grokking on modular addition has been known to implement Fourier representation
and its calculation circuits with trigonometric identities in Transformers.
Considering the periodicity in modular arithmetic, the natural question is to
what extent these explanations and interpretations hold for the grokking on
other modular operations beyond addition. For a closer look, we first
hypothesize that any modular operations can be characterized with distinctive
Fourier representation or internal circuits, grokked models obtain common
features transferable among similar operations, and mixing datasets with
similar operations promotes grokking. Then, we extensively examine them by
learning Transformers on complex modular arithmetic tasks, including
polynomials. Our Fourier analysis and novel progress measure for modular
arithmetic, Fourier Frequency Density and Fourier Coefficient Ratio,
characterize distinctive internal representations of grokked models per modular
operation; for instance, polynomials often result in the superposition of the
Fourier components seen in elementary arithmetic, but clear patterns do not
emerge in challenging non-factorizable polynomials. In contrast, our ablation
study on the pre-grokked models reveals that the transferability among the
models grokked with each operation can be only limited to specific
combinations, such as from elementary arithmetic to linear expressions.
Moreover, some multi-task mixtures may lead to co-grokking – where grokking
simultaneously happens for all the tasks – and accelerate generalization,
while others may not find optimal solutions. We provide empirical steps towards
the interpretability of internal circuits.
[COMMENTS]
Published at Transactions on Machine Learning Research (TMLR), Code:
https://github.com/frt03/grok_mod_poly
[LINK]
http://arxiv.org/abs/2402.16726v3
[DATE]
2024-11-18 10:56:27+08:00
[CATEGORIES]
cs.LG
A Fair Loss Function for Network Pruning
[AUTHORS]
Robbie Meyer, Alexander Wong
[COMMENTS]
[v1] Trustworthy and Socially Responsible Machine Learning (TSRML
2022) workshop co-located with NeurIPS 2022
[LINK]
http://arxiv.org/abs/2211.10285v2
[DATE]
2024-11-18 10:50:46+08:00
[CATEGORIES]
cs.LG
SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion
[AUTHORS]
Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan
[ABSTRACT]
Multivariate time series forecasting plays a crucial role in various fields
such as finance, traffic management, energy, and healthcare. Recent studies
have highlighted the advantages of channel independence to resist distribution
drift but neglect channel correlations, limiting further enhancements. Several
methods utilize mechanisms like attention or mixer to address this by capturing
channel correlations, but they either introduce excessive complexity or rely
too heavily on the correlation to achieve satisfactory results under
distribution drifts, particularly with a large number of channels. Addressing
this gap, this paper presents an efficient MLP-based model, the Series-cOre
Fused Time Series forecaster (SOFTS), which incorporates a novel STar
Aggregate-Redistribute (STAR) module. Unlike traditional approaches that manage
channel interactions through distributed structures, \textit{e.g.}, attention,
STAR employs a centralized strategy to improve efficiency and reduce reliance
on the quality of each channel. It aggregates all series to form a global core
representation, which is then dispatched and fused with individual series
representations to facilitate channel interactions effectively.SOFTS achieves
superior performance over existing state-of-the-art methods with only linear
complexity. The broad applicability of the STAR module across different
forecasting models is also demonstrated empirically. For further research and
development, we have made our code publicly available at
https://github.com/Secilia-Cxy/SOFTS.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2404.14197v3
[DATE]
2024-11-18 10:41:59+08:00
[CATEGORIES]
cs.LG
Physics-informed Machine Learning for Battery Pack Thermal Management
[AUTHORS]
Zheng Liu, Yuan Jiang, Yumeng Li, Pingfeng Wang
[ABSTRACT]
With the popularity of electric vehicles, the demand for lithium-ion
batteries is increasing. Temperature significantly influences the performance
and safety of batteries. Battery thermal management systems can effectively
control the temperature of batteries; therefore, the performance and safety can
be ensured. However, the development process of battery thermal management
systems is time-consuming and costly due to the extensive training dataset
needed by data-driven models requiring enormous computational costs for finite
element analysis. Therefore, a new approach to constructing surrogate models is
needed in the era of AI. Physics-informed machine learning enforces the
physical laws in surrogate models, making it the perfect candidate for
estimating battery pack temperature distribution. In this study, we first
developed a 21700 battery pack indirect liquid cooling system with cold plates
on the top and bottom with thermal paste surrounding the battery cells. Then,
the simplified finite element model was built based on experiment results. Due
to the high coolant flow rate, the cold plates can be considered as constant
temperature boundaries, while battery cells are the heat sources. The
physics-informed convolutional neural network served as a surrogate model to
estimate the temperature distribution of the battery pack. The loss function
was constructed considering the heat conduction equation based on the finite
difference method. The physics-informed loss function helped the convergence of
the training process with less data. As a result, the physics-informed
convolutional neural network showed more than 15 percents improvement in
accuracy compared to the data-driven method with the same training data.
[LINK]
http://arxiv.org/abs/2411.09915v2
[DATE]
2024-11-18 10:27:04+08:00
[CATEGORIES]
cs.LG
Mirror Descent on Reproducing Kernel Banach Spaces
[AUTHORS]
Akash Kumar, Mikhail Belkin, Parthe Pandit
[ABSTRACT]
Recent advances in machine learning have led to increased interest in
reproducing kernel Banach spaces (RKBS) as a more general framework that
extends beyond reproducing kernel Hilbert spaces (RKHS). These works have
resulted in the formulation of representer theorems under several regularized
learning schemes. However, little is known about an optimization method that
encompasses these results in this setting. This paper addresses a learning
problem on Banach spaces endowed with a reproducing kernel, focusing on
efficient optimization within RKBS. To tackle this challenge, we propose an
algorithm based on mirror descent (MDA). Our approach involves an iterative
method that employs gradient steps in the dual space of the Banach space using
the reproducing kernel.
We analyze the convergence properties of our algorithm under various
assumptions and establish two types of results: first, we identify conditions
under which a linear convergence rate is achievable, akin to optimization in
the Euclidean setting, and provide a proof of the linear rate; second, we
demonstrate a standard convergence rate in a constrained setting. Moreover, to
instantiate this algorithm in practice, we introduce a novel family of RKBSs
with $p$-norm ($p \neq 2$), characterized by both an explicit dual map and a
kernel.
[COMMENTS]
42 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.11242v1
[DATE]
2024-11-18 10:18:32+08:00
[CATEGORIES]
cs.LG
Reliable Learning of Halfspaces under Gaussian Marginals
[AUTHORS]
Ilias Diakonikolas, Lisheng Ren, Nikos Zarifis
[ABSTRACT]
We study the problem of PAC learning halfspaces in the reliable agnostic
model of Kalai et al. (2012). The reliable PAC model captures learning
scenarios where one type of error is costlier than the others. Our main
positive result is a new algorithm for reliable learning of Gaussian halfspaces
on $\mathbb{R}^d$ with sample and computational complexity \(d^\{O(\log
(\min\\{1/\alpha, 1/\epsilon\\}))\}\min (2^\{\log(1/\epsilon)^\{O(\log
(1/\alpha))\}\},2^\{\mathrm\{poly\}(1/\epsilon)\})\;,\) where $\epsilon$ is the
excess error and $\alpha$ is the bias of the optimal halfspace. We complement
our upper bound with a Statistical Query lower bound suggesting that the
$d^{\Omega(\log (1/\alpha))}$ dependence is best possible. Conceptually, our
results imply a strong computational separation between reliable agnostic
learning and standard agnostic learning of halfspaces in the Gaussian setting.
[LINK]
http://arxiv.org/abs/2411.11238v1
[DATE]
2024-11-18 10:13:11+08:00
[CATEGORIES]
cs.LG
Autoregressive Action Sequence Learning for Robotic Manipulation
[AUTHORS]
Xinyu Zhang, Yuhan Liu, Haonan Chang, Liam Schramm, Abdeslam Boularias
[ABSTRACT]
Designing a universal policy architecture that performs well across diverse
robots and task configurations remains a key challenge. In this work, we
address this by representing robot actions as sequential data and generating
actions through autoregressive sequence modeling. Existing autoregressive
architectures generate end-effector waypoints sequentially as word tokens in
language modeling, which are limited to low-frequency control tasks. Unlike
language, robot actions are heterogeneous and often include continuous values
– such as joint positions, 2D pixel coordinates, and end-effector poses –
which are not easily suited for language-based modeling. Based on this insight,
we introduce a straightforward enhancement: we extend causal transformers’
single-token prediction to support predicting a variable number of tokens in a
single step through our Chunking Causal Transformer (CCT). This enhancement
enables robust performance across diverse tasks of various control frequencies,
greater efficiency by having fewer autoregression steps, and lead to a hybrid
action sequence design by mixing different types of actions and using a
different chunk size for each action type. Based on CCT, we propose the
Autoregressive Policy (ARP) architecture, which solves manipulation tasks by
generating hybrid action sequences. We evaluate ARP across diverse robotic
manipulation environments, including Push-T, ALOHA, and RLBench, and show that
ARP, as a universal architecture, outperforms the environment-specific
state-of-the-art in all tested benchmarks, while being more efficient in
computation and parameter sizes. Videos of our real robot demonstrations, all
source code and the pretrained models of ARP can be found at
http://github.com/mlzxy/arp.
[LINK]
http://arxiv.org/abs/2410.03132v3
[DATE]
2024-11-18 10:06:46+08:00
[CATEGORIES]
cs.LG
Noise Filtering Benchmark for Neuromorphic Satellites Observations
[AUTHORS]
Sami Arja, Alexandre Marcireau, Nicholas Owen Ralph, Saeed Afshar, Gregory Cohen
[ABSTRACT]
Event cameras capture sparse, asynchronous brightness changes which offer
high temporal resolution, high dynamic range, low power consumption, and sparse
data output. These advantages make them ideal for Space Situational Awareness,
particularly in detecting resident space objects moving within a telescope’s
field of view. However, the output from event cameras often includes
substantial background activity noise, which is known to be more prevalent in
low-light conditions. This noise can overwhelm the sparse events generated by
satellite signals, making detection and tracking more challenging. Existing
noise-filtering algorithms struggle in these scenarios because they are
typically designed for denser scenes, where losing some signal is acceptable.
This limitation hinders the application of event cameras in complex, real-world
environments where signals are extremely sparse. In this paper, we propose new
event-driven noise-filtering algorithms specifically designed for very sparse
scenes. We categorise the algorithms into logical-based and learning-based
approaches and benchmark their performance against 11 state-of-the-art
noise-filtering algorithms, evaluating how effectively they remove noise and
hot pixels while preserving the signal. Their performance was quantified by
measuring signal retention and noise removal accuracy, with results reported
using ROC curves across the parameter space. Additionally, we introduce a new
high-resolution satellite dataset with ground truth from a real-world platform
under various noise conditions, which we have made publicly available. Code,
dataset, and trained weights are available at
\url{https://github.com/samiarja/dvs_sparse_filter}.
[COMMENTS]
17 pages, 8 figures, 1 table
[LINK]
http://arxiv.org/abs/2411.11233v1
[DATE]
2024-11-18 10:02:24+08:00
[CATEGORIES]
cs.LG
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
[AUTHORS]
Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré
[ABSTRACT]
Retrieval pipelines-an integral component of many machine learning
systems-perform poorly in domains where documents are long (e.g., 10K tokens or
more) and where identifying the relevant document requires synthesizing
information across the entire text. Developing long-context retrieval encoders
suitable for these domains raises three challenges: (1) how to evaluate
long-context retrieval performance, (2) how to pretrain a base language model
to represent both short contexts (corresponding to queries) and long contexts
(corresponding to documents), and (3) how to fine-tune this model for retrieval
under the batch size limitations imposed by GPU memory constraints. To address
these challenges, we first introduce LoCoV1, a novel 12 task benchmark
constructed to measure long-context retrieval where chunking is not possible or
not effective. We next present the M2-BERT retrieval encoder, an 80M parameter
state-space encoder model built from the Monarch Mixer architecture, capable of
scaling to documents up to 32K tokens long. We describe a pretraining data
mixture which allows this encoder to process both short and long context
sequences, and a finetuning approach that adapts this base model to retrieval
with only single-sample batches. Finally, we validate the M2-BERT retrieval
encoder on LoCoV1, finding that it outperforms competitive Transformer-based
models by at least 23.3 points, despite containing upwards of 90x fewer
parameters.
[COMMENTS]
International Conference on Machine Learning (ICML) 2024
[LINK]
http://arxiv.org/abs/2402.07440v3
[DATE]
2024-11-18 09:52:49+08:00
[CATEGORIES]
cs.LG
Phenome-wide causal proteomics enhance systemic lupus erythematosus flare prediction: A study in Asian populations
[AUTHORS]
Liying Chen, Ou Deng, Ting Fang, Mei Chen, Xvfeng Zhang, Ruichen Cong, Dingqi Lu, Runrun Zhang, Qun Jin, Xinchang Wang
[ABSTRACT]
Objective: Systemic lupus erythematosus (SLE) is a complex autoimmune disease
characterized by unpredictable flares. This study aimed to develop a novel
proteomics-based risk prediction model specifically for Asian SLE populations
to enhance personalized disease management and early intervention. Methods: A
longitudinal cohort study was conducted over 48 weeks, including 139 SLE
patients monitored every 12 weeks. Patients were classified into flare (n = 53)
and non-flare (n = 86) groups. Baseline plasma samples underwent
data-independent acquisition (DIA) proteomics analysis, and phenome-wide
Mendelian randomization (PheWAS) was performed to evaluate causal relationships
between proteins and clinical predictors. Logistic regression (LR) and random
forest (RF) models were used to integrate proteomic and clinical data for flare
risk prediction. Results: Five proteins (SAA1, B4GALT5, GIT2, NAA15, and RPIA)
were significantly associated with SLE Disease Activity Index-2K (SLEDAI-2K)
scores and 1-year flare risk, implicating key pathways such as B-cell receptor
signaling and platelet degranulation. SAA1 demonstrated causal effects on
flare-related clinical markers, including hemoglobin and red blood cell counts.
A combined model integrating clinical and proteomic data achieved the highest
predictive accuracy (AUC = 0.769), surpassing individual models. SAA1 was
highlighted as a priority biomarker for rapid flare discrimination. Conclusion:
The integration of proteomic and clinical data significantly improves flare
prediction in Asian SLE patients. The identification of key proteins and their
causal relationships with flare-related clinical markers provides valuable
insights for proactive SLE management and personalized therapeutic approaches.
[LINK]
http://arxiv.org/abs/2411.11915v1
[DATE]
2024-11-18 09:50:36+08:00
[CATEGORIES]
cs.LG
Heterogeneity-Aware Cooperative Federated Edge Learning with Adaptive Computation and Communication Compression
[AUTHORS]
Zhenxiao Zhang, Zhidong Gao, Yuanxiong Guo, Yanmin Gong
[ABSTRACT]
Motivated by the drawbacks of cloud-based federated learning (FL),
cooperative federated edge learning (CFEL) has been proposed to improve
efficiency for FL over mobile edge networks, where multiple edge servers
collaboratively coordinate the distributed model training across a large number
of edge devices. However, CFEL faces critical challenges arising from dynamic
and heterogeneous device properties, which slow down the convergence and
increase resource consumption. This paper proposes a heterogeneity-aware CFEL
scheme called \textit{Heterogeneity-Aware Cooperative Edge-based Federated
Averaging} (HCEF) that aims to maximize the model accuracy while minimizing the
training time and energy consumption via adaptive computation and communication
compression in CFEL. By theoretically analyzing how local update frequency and
gradient compression affect the convergence error bound in CFEL, we develop an
efficient online control algorithm for HCEF to dynamically determine local
update frequencies and compression ratios for heterogeneous devices.
Experimental results show that compared with prior schemes, the proposed HCEF
scheme can maintain higher model accuracy while reducing training latency and
improving energy efficiency simultaneously.
[COMMENTS]
20 pages, 8 figures, accepted by IEEE Transactions on Mobile
Computing
[LINK]
http://arxiv.org/abs/2409.04022v3
[DATE]
2024-11-18 09:46:40+08:00
[CATEGORIES]
cs.LG
Generalization ability and Vulnerabilities to adversarial perturbations: Two sides of the same coin
[AUTHORS]
Jung Hoon Lee, Sujith Vijayan
[ABSTRACT]
Deep neural networks (DNNs), the agents of deep learning (DL), require a
massive number of parallel/sequential operations, which makes it difficult to
comprehend them and impedes proper diagnosis. Without better knowledge of DNNs’
internal process, deploying DNNs in high-stakes domains may lead to
catastrophic failures. Therefore, to build more reliable DNNs/DL, it is
imperative that we gain insights into their underlying decision-making process.
Here, we use the self-organizing map (SOM) to analyze DL models’ internal codes
associated with DNNs’ decision-making. Our analyses suggest that shallow layers
close to the input layer map onto homogeneous codes and that deep layers close
to the output layer transform these homogeneous codes in shallow layers to
diverse codes. We also found evidence indicating that homogeneous codes may
underlie DNNs’ vulnerabilities to adversarial perturbations.
[COMMENTS]
19 pages, 12 main figures, 4 supplemental figures, 2 tables
[LINK]
http://arxiv.org/abs/2205.10952v4
[DATE]
2024-11-18 09:40:09+08:00
[CATEGORIES]
cs.LG
Don’t Be So Positive: Negative Step Sizes in Second-Order Methods
[AUTHORS]
Betty Shea, Mark Schmidt
[ABSTRACT]
The value of second-order methods lies in the use of curvature information.
Yet, this information is costly to extract and once obtained, valuable negative
curvature information is often discarded so that the method is globally
convergent. This limits the effectiveness of second-order methods in modern
machine learning. In this paper, we show that second-order and
second-order-like methods are promising optimizers for neural networks provided
that we add one ingredient: negative step sizes. We show that under very
general conditions, methods that produce ascent directions are globally
convergent when combined with a Wolfe line search that allows both positive and
negative step sizes. We experimentally demonstrate that using negative step
sizes is often more effective than common Hessian modification methods.
[LINK]
http://arxiv.org/abs/2411.11224v1
[DATE]
2024-11-18 09:27:44+08:00
[CATEGORIES]
cs.LG
Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting
[AUTHORS]
Bong Gyun Kang, Dongjun Lee, HyunGi Kim, DoHyun Chung, Sungroh Yoon
[ABSTRACT]
Sequence modeling faces challenges in capturing long-range dependencies
across diverse tasks. Recent linear and transformer-based forecasters have
shown superior performance in time series forecasting. However, they are
constrained by their inherent inability to effectively address long-range
dependencies in time series data, primarily due to using fixed-size inputs for
prediction. Furthermore, they typically sacrifice essential temporal
correlation among consecutive training samples by shuffling them into
mini-batches. To overcome these limitations, we introduce a fast and effective
Spectral Attention mechanism, which preserves temporal correlations among
samples and facilitates the handling of long-range information while
maintaining the base model structure. Spectral Attention preserves long-period
trends through a low-pass filter and facilitates gradient to flow between
samples. Spectral Attention can be seamlessly integrated into most sequence
models, allowing models with fixed-sized look-back windows to capture
long-range dependencies over thousands of steps. Through extensive experiments
on 11 real-world time series datasets using 7 recent forecasting models, we
consistently demonstrate the efficacy of our Spectral Attention mechanism,
achieving state-of-the-art results.
[COMMENTS]
Co-first Author: Bong Gyun Kang, Dongjun Lee. Accepted to NeurIPS
2024
[LINK]
http://arxiv.org/abs/2410.20772v2
[DATE]
2024-11-18 09:20:49+08:00
[CATEGORIES]
cs.LG
Data Driven Automatic Electrical Machine Preliminary Design with Artificial Intelligence Expert Guidance
[AUTHORS]
Yiwei Wang, Tao Yang, Hailin Huang, Tianjie Zou, Jincai Li, Nuo Chen, Zhuoran Zhang
[ABSTRACT]
This paper presents a data-driven electrical machine design (EMD) framework
using wound-rotor synchronous generator (WRSG) as a design example. Unlike
traditional preliminary EMD processes that heavily rely on expertise, this
framework leverages an artificial-intelligence based expert database, to
provide preliminary designs directly from user specifications. Initial data is
generated using 2D finite element (FE) machine models by sweeping fundamental
design variables including machine length and diameter, enabling scalable
machine geometry with machine performance for each design is recorded. This
data trains a Metamodel of Optimal Prognosis (MOP)-based surrogate model, which
maps design variables to key performance indicators (KPIs). Once trained,
guided by metaheuristic algorithms, the surrogate model can generate thousands
of geometric scalable designs, covering a wide power range, forming an AI
expert database to guide future preliminary design. The framework is validated
with a 30kVA WRSG design case. A prebuilt WRSG database, covering power from 10
to 60kVA, is validated by FE simulation. Design No.1138 is selected from
database and compared with conventional design. Results show No.1138 achieves a
higher power density of 2.21 kVA/kg in just 5 seconds, compared to 2.02 kVA/kg
obtained using traditional method, which take several days. The developed AI
expert database also serves as a high-quality data source for further
developing AI models for automatic electrical machine design.
[LINK]
http://arxiv.org/abs/2411.11221v1
[DATE]
2024-11-18 09:18:18+08:00
[CATEGORIES]
cs.LG
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
[AUTHORS]
Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica
[ABSTRACT]
Efficient deployment of large language models, particularly Mixture of
Experts (MoE), on resource-constrained platforms presents significant
challenges, especially in terms of computational efficiency and memory
utilization. The MoE architecture, renowned for its ability to increase model
capacity without a proportional increase in inference cost, greatly reduces the
token generation latency compared with dense models. However, the large model
size makes MoE models inaccessible to individuals without high-end GPUs. In
this paper, we propose a high-throughput MoE batch inference system, that
significantly outperforms past work. MoE-Lightning introduces a novel
CPU-GPU-I/O pipelining schedule, CGOPipe, with paged weights to achieve high
resource utilization, and a performance model, HRM, based on a Hierarchical
Roofline Model we introduce to help find policies with higher throughput than
existing systems. MoE-Lightning can achieve up to 10.3x higher throughput than
state-of-the-art offloading-enabled LLM inference systems for Mixtral 8x7B on a
single T4 GPU (16GB). When the theoretical system throughput is bounded by the
GPU memory, MoE-Lightning can reach the throughput upper bound with 2-3x less
CPU memory, significantly increasing resource utilization. MoE-Lightning also
supports efficient batch inference for much larger MoEs (e.g., Mixtral 8x22B
and DBRX) on multiple low-cost GPUs (e.g., 2-4 T4).
[LINK]
http://arxiv.org/abs/2411.11217v1
[DATE]
2024-11-18 09:06:12+08:00
[CATEGORIES]
cs.LG
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies
[AUTHORS]
Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak
[ABSTRACT]
The widespread adoption of deep learning across various industries has
introduced substantial challenges, particularly in terms of model
explainability and security. The inherent complexity of deep learning models,
while contributing to their effectiveness, also renders them susceptible to
adversarial attacks. Among these, backdoor attacks are especially concerning,
as they involve surreptitiously embedding specific triggers within training
data, causing the model to exhibit aberrant behavior when presented with input
containing the triggers. Such attacks often exploit vulnerabilities in
outsourced processes, compromising model integrity without affecting
performance on clean (trigger-free) input data. In this paper, we present a
comprehensive review of existing mitigation strategies designed to counter
backdoor attacks in image recognition. We provide an in-depth analysis of the
theoretical foundations, practical efficacy, and limitations of these
approaches. In addition, we conduct an extensive benchmarking of sixteen
state-of-the-art approaches against eight distinct backdoor attacks, utilizing
three datasets, four model architectures, and three poisoning ratios. Our
results, derived from 122,236 individual experiments, indicate that while many
approaches provide some level of protection, their performance can vary
considerably. Furthermore, when compared to two seminal approaches, most newer
approaches do not demonstrate substantial improvements in overall performance
or consistency across diverse settings. Drawing from these findings, we propose
potential directions for developing more effective and generalizable defensive
mechanisms in the future.
[LINK]
http://arxiv.org/abs/2411.11200v1
[DATE]
2024-11-18 07:30:01+08:00
[CATEGORIES]
cs.LG
Stealing Training Graphs from Graph Neural Networks
[AUTHORS]
Minhua Lin, Enyan Dai, Junjie Xu, Jinyuan Jia, Xiang Zhang, Suhang Wang
[ABSTRACT]
Graph Neural Networks (GNNs) have shown promising results in modeling graphs
in various tasks. The training of GNNs, especially on specialized tasks such as
bioinformatics, demands extensive expert annotations, which are expensive and
usually contain sensitive information of data providers. The trained GNN models
are often shared for deployment in the real world. As neural networks can
memorize the training samples, the model parameters of GNNs have a high risk of
leaking private training data. Our theoretical analysis shows the strong
connections between trained GNN parameters and the training graphs used,
confirming the training graph leakage issue. However, explorations into
training data leakage from trained GNNs are rather limited. Therefore, we
investigate a novel problem of stealing graphs from trained GNNs. To obtain
high-quality graphs that resemble the target training set, a graph diffusion
model with diffusion noise optimization is deployed as a graph generator.
Furthermore, we propose a selection method that effectively leverages GNN model
parameters to identify training graphs from samples generated by the graph
diffusion model. Extensive experiments on real-world datasets demonstrate the
effectiveness of the proposed framework in stealing training graphs from the
trained GNN.
[COMMENTS]
To be appeared in KDD 2025
[LINK]
http://arxiv.org/abs/2411.11197v1
[DATE]
2024-11-18 07:15:36+08:00
[CATEGORIES]
cs.LG
Accelerating Quantum Emitter Characterization with Latent Neural Ordinary Differential Equations
[AUTHORS]
Andrew H. Proppe, Kin Long Kelvin Lee, Weiwei Sun, Chantalle J. Krajewska, Oliver Tye, Moungi G. Bawendi
[ABSTRACT]
Deep neural network models can be used to learn complex dynamics from data
and reconstruct sparse or noisy signals, thereby accelerating and augmenting
experimental measurements. Evaluating the quantum optical properties of
solid-state single-photon emitters is a time-consuming task that typically
requires interferometric photon correlation experiments, such as Photon
correlation Fourier spectroscopy (PCFS) which measures time-resolved single
emitter lineshapes. Here, we demonstrate a latent neural ordinary differential
equation model that can forecast a complete and noise-free PCFS experiment from
a small subset of noisy correlation functions. By encoding measured photon
correlations into an initial value problem, the NODE can be propagated to an
arbitrary number of interferometer delay times. We demonstrate this with 10
noisy photon correlation functions that are used to extrapolate an entire
de-noised interferograms of up to 200 stage positions, enabling up to a 20-fold
speedup in experimental acquisition time from $\sim$3 hours to 10 minutes. Our
work presents a new approach to greatly accelerate the experimental
characterization of novel quantum emitter materials using deep learning.
[LINK]
http://arxiv.org/abs/2411.11191v1
[DATE]
2024-11-18 06:46:31+08:00
[CATEGORIES]
cs.LG
Evolution of SAE Features Across Layers in LLMs
[AUTHORS]
Daniel Balcells, Benjamin Lerner, Michael Oesterle, Ediz Ucar, Stefan Heimersheim
[ABSTRACT]
Sparse Autoencoders for transformer-based language models are typically
defined independently per layer. In this work we analyze statistical
relationships between features in adjacent layers to understand how features
evolve through a forward pass. We provide a graph visualization interface for
features and their most similar next-layer neighbors
(https://stefanhex.com/spar-2024/feature-browser/), and build communities of
related features across layers. We find that a considerable amount of features
are passed through from a previous layer, some features can be expressed as
quasi-boolean combinations of previous features, and some features become more
specialized in later layers.
[COMMENTS]
Presented at the Attributing Model Behavior at Scale (ATTRIB)
workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.08869v2
[DATE]
2024-11-18 06:45:45+08:00
[CATEGORIES]
cs.LG
Evaluating Representations with Readout Model Switching
[AUTHORS]
Yazhe Li, Jorg Bornschein, Marcus Hutter
[ABSTRACT]
Although much of the success of Deep Learning builds on learning good
representations, a rigorous method to evaluate their quality is lacking. In
this paper, we treat the evaluation of representations as a model selection
problem and propose to use the Minimum Description Length (MDL) principle to
devise an evaluation metric. Contrary to the established practice of limiting
the capacity of the readout model, we design a hybrid discrete and
continuous-valued model space for the readout models and employ a switching
strategy to combine their predictions. The MDL score takes model complexity, as
well as data efficiency into account. As a result, the most appropriate model
for the specific task and representation will be chosen, making it a unified
measure for comparison. The proposed metric can be efficiently computed with an
online method and we present results for pre-trained vision encoders of various
architectures (ResNet and ViT) and objective functions (supervised and
self-supervised) on a range of downstream tasks. We compare our methods with
accuracy-based approaches and show that the latter are inconsistent when
multiple readout models are used. Finally, we discuss important properties
revealed by our evaluations such as model scaling, preferred readout model, and
data efficiency.
[LINK]
http://arxiv.org/abs/2302.09579v2
[DATE]
2024-11-18 06:26:11+08:00
[CATEGORIES]
cs.LG
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
[AUTHORS]
Jake Grigsby, Justin Sasek, Samyak Parajuli, Daniel Adebi, Amy Zhang, Yuke Zhu
[ABSTRACT]
Language models trained on diverse datasets unlock generalization by
in-context learning. Reinforcement Learning (RL) policies can achieve a similar
effect by meta-learning within the memory of a sequence model. However, meta-RL
research primarily focuses on adapting to minor variations of a single task. It
is difficult to scale towards more general behavior without confronting
challenges in multi-task optimization, and few solutions are compatible with
meta-RL’s goal of learning from large training sets of unlabeled tasks. To
address this challenge, we revisit the idea that multi-task RL is bottlenecked
by imbalanced training losses created by uneven return scales across different
tasks. We build upon recent advancements in Transformer-based (in-context)
meta-RL and evaluate a simple yet scalable solution where both an agent’s actor
and critic objectives are converted to classification terms that decouple
optimization from the current scale of returns. Large-scale comparisons in
Meta-World ML45, Multi-Game Procgen, Multi-Task POPGym, Multi-Game Atari, and
BabyAI find that this design unlocks significant progress in online multi-task
adaptation and memory problems without explicit task labels.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.11188v1
[DATE]
2024-11-18 06:25:40+08:00
[CATEGORIES]
cs.LG
Blockchain for Large Language Model Security and Safety: A Holistic Survey
[AUTHORS]
Caleb Geren, Amanda Board, Gaby G. Dagher, Tim Andersen, Jun Zhuang
[ABSTRACT]
With the growing development and deployment of large language models (LLMs)
in both industrial and academic fields, their security and safety concerns have
become increasingly critical. However, recent studies indicate that LLMs face
numerous vulnerabilities, including data poisoning, prompt injections, and
unauthorized data exposure, which conventional methods have struggled to
address fully. In parallel, blockchain technology, known for its data
immutability and decentralized structure, offers a promising foundation for
safeguarding LLMs. In this survey, we aim to comprehensively assess how to
leverage blockchain technology to enhance LLMs’ security and safety. Besides,
we propose a new taxonomy of blockchain for large language models (BC4LLMs) to
systematically categorize related works in this emerging field. Our analysis
includes novel frameworks and definitions to delineate security and safety in
the context of BC4LLMs, highlighting potential research directions and
challenges at this intersection. Through this study, we aim to stimulate
targeted advancements in blockchain-integrated LLM security.
[COMMENTS]
Accepted to SIGKDD Explorations, to appear Dec 2024
[LINK]
http://arxiv.org/abs/2407.20181v2
[DATE]
2024-11-18 06:23:45+08:00
[CATEGORIES]
cs.LG
Feature learning as alignment: a structural property of gradient descent in non-linear neural networks
[AUTHORS]
Daniel Beaglehole, Ioannis Mitliagkas, Atish Agarwala
[ABSTRACT]
Understanding the mechanisms through which neural networks extract statistics
from input-label pairs through feature learning is one of the most important
unsolved problems in supervised learning. Prior works demonstrated that the
gram matrices of the weights (the neural feature matrices, NFM) and the average
gradient outer products (AGOP) become correlated during training, in a
statement known as the neural feature ansatz (NFA). Through the NFA, the
authors introduce mapping with the AGOP as a general mechanism for neural
feature learning. However, these works do not provide a theoretical explanation
for this correlation or its origins. In this work, we further clarify the
nature of this correlation, and explain its emergence. We show that this
correlation is equivalent to alignment between the left singular structure of
the weight matrices and the newly defined pre-activation tangent features at
each layer. We further establish that the alignment is driven by the
interaction of weight changes induced by SGD with the pre-activation features,
and analyze the resulting dynamics analytically at early times in terms of
simple statistics of the inputs and labels. We prove the derivative alignment
occurs almost surely in specific high dimensional settings. Finally, we
introduce a simple optimization rule motivated by our analysis of the centered
correlation which dramatically increases the NFA correlations at any given
layer and improves the quality of features learned.
[LINK]
http://arxiv.org/abs/2402.05271v4
[DATE]
2024-11-18 06:18:40+08:00
[CATEGORIES]
cs.LG
Mixing Neural Networks and Exponential Moving Averages for Predicting Wireless Links Behavior
[AUTHORS]
Gabriele Formis, Stefano Scanzio, Lukasz Wisniewski, Gianluca Cena
[ABSTRACT]
Predicting the behavior of a wireless link in terms of, e.g., the frame
delivery ratio, is a critical task for optimizing the performance of wireless
industrial communication systems. This is because industrial applications are
typically characterized by stringent dependability and end-to-end latency
requirements, which are adversely affected by channel quality degradation.
In this work, we studied two neural network models for Wi-Fi link quality
prediction in dense indoor environments. Experimental results show that their
accuracy outperforms conventional methods based on exponential moving averages,
due to their ability to capture complex patterns about communications,
including the effects of shadowing and multipath propagation, which are
particularly pronounced in industrial scenarios. This highlights the potential
of neural networks for predicting spectrum behavior in challenging operating
conditions, and suggests that they can be exploited to improve determinism and
dependability of wireless communications, fostering their adoption in the
industry.
[COMMENTS]
preprint, 6 pages, 2024
[LINK]
http://arxiv.org/abs/2411.11185v1
[DATE]
2024-11-18 06:13:07+08:00
[CATEGORIES]
cs.LG
Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits
[AUTHORS]
Hoang-Quan Nguyen, Xuan Bac Nguyen, Samuel Yen-Chi Chen, Hugh Churchill, Nicholas Borys, Samee U. Khan, Khoa Luu
[ABSTRACT]
Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading
strategy to utilize near-term quantum advantages in multiple problems,
including machine learning and combinatorial optimization. When applied to
specific tasks, the parameters in the quantum circuits are trained to minimize
the target function. Although there have been comprehensive studies to improve
the performance of the PQCs on practical tasks, the errors caused by the
quantum noise downgrade the performance when running on real quantum computers.
In particular, when the quantum state is transformed through multiple quantum
circuit layers, the effect of the quantum noise happens cumulatively and
becomes closer to the maximally mixed state or complete noise. This paper
studies the relationship between the quantum noise and the diffusion model.
Then, we propose a novel diffusion-inspired learning approach to mitigate the
quantum noise in the PQCs and reduce the error for specific tasks. Through our
experiments, we illustrate the efficiency of the learning strategy and achieve
state-of-the-art performance on classification tasks in the quantum noise
scenarios.
[LINK]
http://arxiv.org/abs/2406.00843v2
[DATE]
2024-11-18 06:10:15+08:00
[CATEGORIES]
cs.LG
F$^3$OCUS – Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics
[AUTHORS]
Pramit Saha, Felix Wagner, Divyanshu Mishra, Can Peng, Anshul Thakur, David Clifton, Konstantinos Kamnitsas, J. Alison Noble
[ABSTRACT]
Effective training of large Vision-Language Models (VLMs) on
resource-constrained client devices in Federated Learning (FL) requires the
usage of parameter-efficient fine-tuning (PEFT) strategies. To this end, we
demonstrate the impact of two factors \textit{viz.}, client-specific layer
importance score that selects the most important VLM layers for fine-tuning and
inter-client layer diversity score that encourages diverse layer selection
across clients for optimal VLM layer selection. We first theoretically motivate
and leverage the principal eigenvalue magnitude of layerwise Neural Tangent
Kernels and show its effectiveness as client-specific layer importance score.
Next, we propose a novel layer updating strategy dubbed F$^3$OCUS that jointly
optimizes the layer importance and diversity factors by employing a data-free,
multi-objective, meta-heuristic optimization on the server. We explore 5
different meta-heuristic algorithms and compare their effectiveness for
selecting model layers and adapter layers towards PEFT-FL. Furthermore, we
release a new MedVQA-FL dataset involving overall 707,962 VQA triplets and 9
modality-specific clients and utilize it to train and evaluate our method.
Overall, we conduct more than 10,000 client-level experiments on 6
Vision-Language FL task settings involving 58 medical image datasets and 4
different VLM architectures of varying sizes to demonstrate the effectiveness
of the proposed method.
[LINK]
http://arxiv.org/abs/2411.11912v1
[DATE]
2024-11-18 05:54:57+08:00
[CATEGORIES]
cs.LG
Robust Defense Against Extreme Grid Events Using Dual-Policy Reinforcement Learning Agents
[AUTHORS]
Benjamin M. Peter, Mert Korkali
[ABSTRACT]
Reinforcement learning (RL) agents are powerful tools for managing power
grids. They use large amounts of data to inform their actions and receive
rewards or penalties as feedback to learn favorable responses for the system.
Once trained, these agents can efficiently make decisions that would be too
computationally complex for a human operator. This ability is especially
valuable in decarbonizing power networks, where the demand for RL agents is
increasing. These agents are well suited to control grid actions since the
action space is constantly growing due to uncertainties in renewable
generation, microgrid integration, and cybersecurity threats. To assess the
efficacy of RL agents in response to an adverse grid event, we use the Grid2Op
platform for agent training. We employ a proximal policy optimization (PPO)
algorithm in conjunction with graph neural networks (GNNs). By simulating
agents’ responses to grid events, we assess their performance in avoiding grid
failure for as long as possible. The performance of an agent is expressed
concisely through its reward function, which helps the agent learn the most
optimal ways to reconfigure a grid’s topology amidst certain events. To model
multi-actor scenarios that threaten modern power networks, particularly those
resulting from cyberattacks, we integrate an opponent that acts iteratively
against a given agent. This interplay between the RL agent and opponent is
utilized in N-k contingency screening, providing a novel alternative to the
traditional security assessment.
[COMMENTS]
6 pages, 5 figures, submitted to the 2025 Texas Power and Energy
Conference (TPEC)
[LINK]
http://arxiv.org/abs/2411.11180v1
[DATE]
2024-11-18 05:30:48+08:00
[CATEGORIES]
cs.LG
Learning-Augmented Priority Queues
[AUTHORS]
Ziyad Benomar, Christian Coester
[COMMENTS]
Accepted as a conference paper at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.04793v2
[DATE]
2024-11-18 05:13:54+08:00
[CATEGORIES]
cs.LG
Modularity aided consistent attributed graph clustering via coarsening
[AUTHORS]
Samarth Bhatia, Yukti Makhija, Manoj Kumar, Sandeep Kumar
[ABSTRACT]
Graph clustering is an important unsupervised learning technique for
partitioning graphs with attributes and detecting communities. However, current
methods struggle to accurately capture true community structures and
intra-cluster relations, be computationally efficient, and identify smaller
communities. We address these challenges by integrating coarsening and
modularity maximization, effectively leveraging both adjacency and node
features to enhance clustering accuracy. We propose a loss function
incorporating log-determinant, smoothness, and modularity components using a
block majorization-minimization technique, resulting in superior clustering
outcomes. The method is theoretically consistent under the Degree-Corrected
Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and
complete label recovery. Our provably convergent and time-efficient algorithm
seamlessly integrates with graph neural networks (GNNs) and variational graph
autoencoders (VGAEs) to learn enhanced node features and deliver exceptional
clustering performance. Extensive experiments on benchmark datasets demonstrate
its superiority over existing state-of-the-art methods for both attributed and
non-attributed graphs.
[COMMENTS]
The first two authors contributed equally to this work
[LINK]
http://arxiv.org/abs/2407.07128v2
[DATE]
2024-11-18 05:05:17+08:00
[CATEGORIES]
cs.LG
Learning the Sherrington-Kirkpatrick Model Even at Low Temperature
[AUTHORS]
Gautam Chandrasekaran, Adam Klivans
[ABSTRACT]
We consider the fundamental problem of learning the parameters of an
undirected graphical model or Markov Random Field (MRF) in the setting where
the edge weights are chosen at random. For Ising models, we show that a
multiplicative-weight update algorithm due to Klivans and Meka learns the
parameters in polynomial time for any inverse temperature $\beta \leq
\sqrt{\log n}$.
This immediately yields an algorithm for learning the Sherrington-Kirkpatrick
(SK) model beyond the high-temperature regime of $\beta < 1$. Prior work breaks
down at $\beta = 1$ and requires heavy machinery from statistical physics or
functional inequalities. In contrast, our analysis is relatively simple and
uses only subgaussian concentration.
Our results extend to MRFs of higher order (such as pure $p$-spin models),
where even results in the high-temperature regime were not known.
[LINK]
http://arxiv.org/abs/2411.11174v1
[DATE]
2024-11-18 05:02:12+08:00
[CATEGORIES]
cs.LG
Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses
[AUTHORS]
Andrew Lowy, Meisam Razaviyayn
[ABSTRACT]
This paper studies federated learning (FL)–especially cross-silo FL–with
data from people who do not trust the server or other silos. In this setting,
each silo (e.g. hospital) has data from different people (e.g. patients) and
must maintain the privacy of each person’s data (e.g. medical record), even if
the server or other silos act as adversarial eavesdroppers. This requirement
motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP),
which requires silo i’s communications to satisfy record/item-level
differential privacy (DP). ISRL-DP ensures that the data of each person (e.g.
patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different
from well-studied privacy notions. Central and user-level DP assume that people
trust the server/other silos. On the other end of the spectrum, local DP
assumes that people do not trust anyone at all (even their own silo). Sitting
between central and local DP, ISRL-DP makes the realistic assumption (in
cross-silo FL) that people trust their own silo, but not the server or other
silos. In this work, we provide tight (up to logarithms) upper and lower bounds
for ISRL-DP FL with convex/strongly convex loss functions and homogeneous
(i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for
smooth losses with arbitrary heterogeneous silo data distributions, via an
accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for
ISRL-DP federated empirical risk minimization, and use acceleration to attain
the optimal bounds in fewer rounds of communication than the state-of-the-art.
Finally, with a secure “shuffler” to anonymize silo messages (but without a
trusted server), our algorithm attains the optimal central DP rates under more
practical trust assumptions. Numerical experiments show favorable
privacy-accuracy tradeoffs for our algorithm in classification and regression
tasks.
[COMMENTS]
ICLR 2023
[LINK]
http://arxiv.org/abs/2106.09779v9
[DATE]
2024-11-18 04:55:34+08:00
[CATEGORIES]
cs.LG
Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification
[AUTHORS]
Jing Li
[ABSTRACT]
The proper use of model evaluation metrics is important for model evaluation
and model selection in binary classification tasks. This study investigates how
consistent different metrics are at evaluating models across data of different
prevalence while the relationships between different variables and the sample
size are kept constant. Analyzing 156 data scenarios, 18 model evaluation
metrics and five commonly used machine learning models as well as a naive
random guess model, I find that evaluation metrics that are less influenced by
prevalence offer more consistent evaluation of individual models and more
consistent ranking of a set of models. In particular, Area Under the ROC Curve
(AUC) which takes all decision thresholds into account when evaluating models
has the smallest variance in evaluating individual models and smallest variance
in ranking of a set of models. A close threshold analysis using all possible
thresholds for all metrics further supports the hypothesis that considering all
decision thresholds helps reduce the variance in model evaluation with respect
to prevalence change in data. The results have significant implications for
model evaluation and model selection in binary classification tasks.
[LINK]
http://arxiv.org/abs/2408.10193v2
[DATE]
2024-11-18 04:33:56+08:00
[CATEGORIES]
cs.LG
From Optimization to Sampling via Lyapunov Potentials
[AUTHORS]
August Y. Chen, Karthik Sridharan
[ABSTRACT]
We study the problem of sampling from high-dimensional distributions using
Langevin Dynamics, a natural and popular variant of Gradient Descent where at
each step, appropriately scaled Gaussian noise is added. The similarities
between Langevin Dynamics and Gradient Flow and Gradient Descent leads to the
natural question: if the distribution’s log-density can be optimized from all
initializations via Gradient Flow and Gradient Descent, given oracle access to
the gradients, can we efficiently sample from the distribution using
discrete-time Langevin Dynamics? We answer this question in the affirmative for
distributions that are unimodal in a particular sense, at low but appropriate
temperature levels natural in the context of both optimization and real-world
applications, under mild regularity assumptions on the measure and the
convergence rate of Gradient Flow. We do so by using the results of De Sa,
Kale, Lee, Sekhari, and Sridharan (2022) that the success of optimization
implies particular geometric properties involving a \textit{Lyapunov
Potential}. These geometric properties from optimization in turn give us strong
quantitative control over isoperimetric constants of the measure. As a
corollary, we show we can efficiently sample from several new natural and
interesting classes of non-log-concave densities, an important setting where we
have relatively few examples. Another corollary is efficient discrete-time
sampling results for log-concave measures satisfying milder regularity
conditions than smoothness, results similar to the work of Lehec (2023).
[COMMENTS]
37 pages. Results and presentation significantly improved. More
examples added
[LINK]
http://arxiv.org/abs/2410.02979v2
[DATE]
2024-11-18 04:31:40+08:00
[CATEGORIES]
cs.LG
RPN 2: On Interdependence Function Learning Towards Unifying and Advancing CNN, RNN, GNN, and Transformer
[AUTHORS]
Jiawei Zhang
[ABSTRACT]
This paper builds upon our previous work on the Reconciled Polynomial Network
(RPN). The original RPN model was designed under the assumption of input data
independence, presuming the independence among both individual instances within
data batches and attributes in each data instance. However, this assumption
often proves invalid for function learning tasks involving complex,
interdependent data such as language, images, time series, and graphs. Ignoring
such data interdependence may inevitably lead to significant performance
degradation.
To overcome these limitations, we introduce the new Reconciled Polynomial
Network (version 2), namely RPN 2, in this paper. By incorporating data and
structural interdependence functions, RPN 2 explicitly models data
interdependence via new component functions in its architecture.
This enhancement not only significantly improves RPN 2’s learning performance
but also substantially expands its unifying potential, enabling it to encompass
a broader range of contemporary dominant backbone models within its canonical
representation. These backbones include, but are not limited to, convolutional
neural networks (CNNs), recurrent neural networks (RNNs), graph neural networks
(GNNs), and Transformers. Our analysis reveals that the fundamental
distinctions among these backbone models primarily stem from their diverse
approaches to defining the interdependence functions. Furthermore, this unified
representation opens up new opportunities for designing innovative
architectures with the potential to surpass the performance of these dominant
backbones.
[COMMENTS]
105 pages, 37 figures, 6 tables, preprint version
[LINK]
http://arxiv.org/abs/2411.11162v1
[DATE]
2024-11-18 03:45:26+08:00
[CATEGORIES]
cs.LG
MPLite: Multi-Aspect Pretraining for Mining Clinical Health Records
[AUTHORS]
Eric Yang, Pengfei Hu, Xiaoxue Han, Yue Ning
[ABSTRACT]
The adoption of digital systems in healthcare has resulted in the
accumulation of vast electronic health records (EHRs), offering valuable data
for machine learning methods to predict patient health outcomes. However,
single-visit records of patients are often neglected in the training process
due to the lack of annotations of next-visit information, thereby limiting the
predictive and expressive power of machine learning models. In this paper, we
present a novel framework MPLite that utilizes Multi-aspect Pretraining with
Lab results through a light-weight neural network to enhance medical concept
representation and predict future health outcomes of individuals. By
incorporating both structured medical data and additional information from lab
results, our approach fully leverages patient admission records. We design a
pretraining module that predicts medical codes based on lab results, ensuring
robust prediction by fusing multiple aspects of features. Our experimental
evaluation using both MIMIC-III and MIMIC-IV datasets demonstrates improvements
over existing models in diagnosis prediction and heart failure prediction
tasks, achieving a higher weighted-F1 and recall with MPLite. This work reveals
the potential of integrating diverse aspects of data to advance predictive
modeling in healthcare.
[LINK]
http://arxiv.org/abs/2411.11161v1
[DATE]
2024-11-18 03:43:10+08:00
[CATEGORIES]
cs.LG
Mixture of Experts Meets Prompt-Based Continual Learning
[AUTHORS]
Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van Ngo, Nhat Ho
[COMMENTS]
Accepted to NeurIPS 2024, 30 pages
[LINK]
http://arxiv.org/abs/2405.14124v3
[DATE]
2024-11-18 03:36:09+08:00
[CATEGORIES]
cs.LG
ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition
[AUTHORS]
Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan
[ABSTRACT]
Transformer models have demonstrated remarkable success in many domains such
as natural language processing (NLP) and computer vision. With the growing
interest in transformer-based architectures, they are now utilized for gesture
recognition. So, we also explore and devise a novel ConvMixFormer architecture
for dynamic hand gestures. The transformers use quadratic scaling of the
attention features with the sequential data, due to which these models are
computationally complex and heavy. We have considered this drawback of the
transformer and designed a resource-efficient model that replaces the
self-attention in the transformer with the simple convolutional layer-based
token mixer. The computational cost and the parameters used for the
convolution-based mixer are comparatively less than the quadratic
self-attention. Convolution-mixer helps the model capture the local spatial
features that self-attention struggles to capture due to their sequential
processing nature. Further, an efficient gate mechanism is employed instead of
a conventional feed-forward network in the transformer to help the model
control the flow of features within different stages of the proposed model.
This design uses fewer learnable parameters which is nearly half the vanilla
transformer that helps in fast and efficient training. The proposed method is
evaluated on NVidia Dynamic Hand Gesture and Briareo datasets and our model has
achieved state-of-the-art results on single and multimodal inputs. We have also
shown the parameter efficiency of the proposed ConvMixFormer model compared to
other methods. The source code is available at
https://github.com/mallikagarg/ConvMixFormer.
[LINK]
http://arxiv.org/abs/2411.07118v2
[DATE]
2024-11-18 02:58:41+08:00
[CATEGORIES]
cs.LG
TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data
[AUTHORS]
Suiyao Chen, Jing Wu, Yunxiao Wang, Cheng Ji, Tianpei Xie, Daniel Cociorva, Michael Sharps, Cecile Levasseur, Hakan Brunzell
[ABSTRACT]
Representation learning is a fundamental aspect of modern artificial
intelligence, driving substantial improvements across diverse applications.
While selfsupervised contrastive learning has led to significant advancements
in fields like computer vision and natural language processing, its adaptation
to tabular data presents unique challenges. Traditional approaches often
prioritize optimizing model architecture and loss functions but may overlook
the crucial task of constructing meaningful positive and negative sample pairs
from various perspectives like feature interactions, instance-level patterns
and batch-specific contexts. To address these challenges, we introduce TabDeco,
a novel method that leverages attention-based encoding strategies across both
rows and columns and employs contrastive learning framework to effectively
disentangle feature representations at multiple levels, including features,
instances and data batches. With the innovative feature decoupling hierarchies,
TabDeco consistently surpasses existing deep learning methods and leading
gradient boosting algorithms, including XG-Boost, CatBoost, and LightGBM,
across various benchmark tasks, underscoring its effectiveness in advancing
tabular data representation learning.
[LINK]
http://arxiv.org/abs/2411.11148v1
[DATE]
2024-11-18 02:42:46+08:00
[CATEGORIES]
cs.LG
CLMIA: Membership Inference Attacks via Unsupervised Contrastive Learning
[AUTHORS]
Depeng Chen, Xiao Liu, Jie Cui, Hong Zhong
[ABSTRACT]
Since machine learning model is often trained on a limited data set, the
model is trained multiple times on the same data sample, which causes the model
to memorize most of the training set data. Membership Inference Attacks (MIAs)
exploit this feature to determine whether a data sample is used for training a
machine learning model. However, in realistic scenarios, it is difficult for
the adversary to obtain enough qualified samples that mark accurate identity
information, especially since most samples are non-members in real world
applications. To address this limitation, in this paper, we propose a new
attack method called CLMIA, which uses unsupervised contrastive learning to
train an attack model without using extra membership status information.
Meanwhile, in CLMIA, we require only a small amount of data with known
membership status to fine-tune the attack model. Experimental results
demonstrate that CLMIA performs better than existing attack methods for
different datasets and model structures, especially with data with less marked
identity information. In addition, we experimentally find that the attack
performs differently for different proportions of labeled identity information
for member and non-member data. More analysis proves that our attack method
performs better with less labeled identity information, which applies to more
realistic scenarios.
[LINK]
http://arxiv.org/abs/2411.11144v1
[DATE]
2024-11-18 02:25:01+08:00
[CATEGORIES]
cs.LG
Smooth Non-Stationary Bandits
[AUTHORS]
Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier
[ABSTRACT]
In many applications of online decision making, the environment is
non-stationary and it is therefore crucial to use bandit algorithms that handle
changes. Most existing approaches are designed to protect against non-smooth
changes, constrained only by total variation or Lipschitzness over time.
However, in practice, environments often change {\em smoothly}, so such
algorithms may incur higher-than-necessary regret. We study a non-stationary
bandits problem where each arm’s mean reward sequence can be embedded into a
$\beta$-H"older function, i.e., a function that is $(\beta-1)$-times
Lipschitz-continuously differentiable. The non-stationarity becomes more smooth
as $\beta$ increases. When $\beta=1$, this corresponds to the non-smooth
regime, where \cite{besbes2014stochastic} established a minimax regret of
$\tilde \Theta(T^{2/3})$. We show the first separation between the smooth
(i.e., $\beta\ge 2$) and non-smooth (i.e., $\beta=1$) regimes by presenting a
policy with $\tilde O(k^{4/5} T^{3/5})$ regret on any $k$-armed, $2$-H"older
instance. We complement this result by showing that the minimax regret on the
$\beta$-H"older family of instances is $\Omega(T^{(\beta+1)/(2\beta+1)})$ for
any integer $\beta\ge 1$. This matches our upper bound for $\beta=2$ up to
logarithmic factors. Furthermore, we validated the effectiveness of our policy
through a comprehensive numerical study using real-world click-through rate
data.
[COMMENTS]
Accepted by ICML 2023
[LINK]
http://arxiv.org/abs/2301.12366v3
[DATE]
2024-11-18 02:03:40+08:00
[CATEGORIES]
cs.LG
Leveraging Bi-Focal Perspectives and Granular Feature Integration for Accurate Reliable Early Alzheimer’s Detection
[AUTHORS]
Pandiyaraju V, Shravan Venkatraman, Abeshek A, Pavan Kumar S, Aravintakshan S A, Kannan A
[ABSTRACT]
Alzheimer’s disease (AD) is the most common neurodegeneration, annually
diagnosed in millions of patients. The present medicine scenario still finds
challenges in the exact diagnosis and classification of AD through neuroimaging
data. Traditional CNNs can extract a good amount of low-level information in an
image but fail to extract high-level minuscule particles, which is a
significant challenge in detecting AD from MRI scans. To overcome this, we
propose a novel Granular Feature Integration method to combine information
extraction at different scales combined with an efficient information flow,
enabling the model to capture both broad and fine-grained features
simultaneously. We also propose a Bi-Focal Perspective mechanism to highlight
the subtle neurofibrillary tangles and amyloid plaques in the MRI scans,
ensuring that critical pathological markers are accurately identified. Our
model achieved an F1-Score of 99.31%, precision of 99.24%, and recall of
99.51%. These scores prove that our model is significantly better than the
state-of-the-art (SOTA) CNNs in existence.
[COMMENTS]
14 pages, 12 figures, 6 tables
[LINK]
http://arxiv.org/abs/2407.10921v3
[DATE]
2024-11-18 01:55:19+08:00
[CATEGORIES]
cs.LG
Sketch ‘n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra
[AUTHORS]
Alex Lavaee
[ABSTRACT]
We present Sketch ‘n Solve, an open-source Python package that implements
efficient randomized numerical linear algebra (RandNLA) techniques for solving
large-scale least squares problems. While sketch-and-solve algorithms have
demonstrated theoretical promise, their practical adoption has been limited by
the lack of robust, user-friendly implementations. Our package addresses this
gap by providing an optimized implementation built on NumPy and SciPy,
featuring both dense and sparse sketching operators with a clean API. Through
extensive benchmarking, we demonstrate that our implementation achieves up to
50x speedup over traditional LSQR while maintaining high accuracy, even for
ill-conditioned matrices. The package shows particular promise for applications
in machine learning optimization, signal processing, and scientific computing.
[LINK]
http://arxiv.org/abs/2409.14309v2
[DATE]
2024-11-18 01:51:30+08:00
[CATEGORIES]
cs.LG
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
[AUTHORS]
Zikang Zhou, Hengjian Zhou, Haibo Hu, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang
[LINK]
http://arxiv.org/abs/2411.11911v1
[DATE]
2024-11-18 00:36:09+08:00
[CATEGORIES]
cs.LG
Taming the Long Tail in Human Mobility Prediction
[AUTHORS]
Xiaohang Xu, Renhe Jiang, Chuang Yang, Zipei Fan, Kaoru Sezaki
[ABSTRACT]
With the popularity of location-based services, human mobility prediction
plays a key role in enhancing personalized navigation, optimizing
recommendation systems, and facilitating urban mobility and planning. This
involves predicting a user’s next POI (point-of-interest) visit using their
past visit history. However, the uneven distribution of visitations over time
and space, namely the long-tail problem in spatial distribution, makes it
difficult for AI models to predict those POIs that are less visited by humans.
In light of this issue, we propose the Long-Tail Adjusted Next POI Prediction
(LoTNext) framework for mobility prediction, combining a Long-Tailed Graph
Adjustment module to reduce the impact of the long-tailed nodes in the user-POI
interaction graph and a novel Long-Tailed Loss Adjustment module to adjust loss
by logit score and sample weight adjustment strategy. Also, we employ the
auxiliary prediction task to enhance generalization and accuracy. Our
experiments with two real-world trajectory datasets demonstrate that LoTNext
significantly surpasses existing state-of-the-art works. Our code is available
at https://github.com/Yukayo/LoTNext.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2410.14970v3
[DATE]
2024-11-18 00:09:12+08:00
[CATEGORIES]
cs.LG
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
[AUTHORS]
Christoph Leiter, Steffen Eger
[ABSTRACT]
Large language models (LLMs) have revolutionized NLP research. Notably,
in-context learning enables their use as evaluation metrics for natural
language generation, making them particularly advantageous in low-resource
scenarios and time-restricted applications. In this work, we introduce PrExMe,
a large-scale Prompt Exploration for Metrics, where we evaluate more than 720
prompt templates for open-source LLM-based metrics on machine translation (MT)
and summarization datasets, totalling over 6.6M evaluations. This extensive
comparison (1) benchmarks recent open-source LLMs as metrics and (2) explores
the stability and variability of different prompting strategies. We discover
that, on the one hand, there are scenarios for which prompts are stable. For
instance, some LLMs show idiosyncratic preferences and favor to grade generated
texts with textual labels while others prefer to return numeric scores. On the
other hand, the stability of prompts and model rankings can be susceptible to
seemingly innocuous changes. For example, changing the requested output format
from “0 to 100” to “-1 to +1” can strongly affect the rankings in our
evaluation. Our study contributes to understanding the impact of different
prompting approaches on LLM-based metrics for MT and summarization evaluation,
highlighting the most stable prompting patterns and potential limitations.
[COMMENTS]
EMNLP 2024 main; camera-ready
[LINK]
http://arxiv.org/abs/2406.18528v2
[DATE]
2024-11-17 23:09:54+08:00
[CATEGORIES]
cs.CL
Towards Explainable Evaluation Metrics for Machine Translation
[AUTHORS]
Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger
[ABSTRACT]
Unlike classical lexical overlap metrics such as BLEU, most current
evaluation metrics for machine translation (for example, COMET or BERTScore)
are based on black-box large language models. They often achieve strong
correlations with human judgments, but recent research indicates that the
lower-quality classical metrics remain dominant, one of the potential reasons
being that their decision processes are more transparent. To foster more
widespread acceptance of novel high-quality metrics, explainability thus
becomes crucial. In this concept paper, we identify key properties as well as
key goals of explainable machine translation metrics and provide a
comprehensive synthesis of recent techniques, relating them to our established
goals and properties. In this context, we also discuss the latest
state-of-the-art approaches to explainable metrics based on generative models
such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation
approaches, including natural language explanations. We hope that our work can
help catalyze and guide future research on explainable evaluation metrics and,
mediately, also contribute to better and more transparent machine translation
systems.
[COMMENTS]
Published at JMLR 3/24. We released an earlier preprint of this paper
under a different title (arXiv:2203.11131)
[LINK]
http://arxiv.org/abs/2306.13041v2
[DATE]
2024-11-17 22:17:08+08:00
[CATEGORIES]
cs.CL
cs.LG
Beyond Human-Like Processing: Large Language Models Perform Equivalently on Forward and Backward Scientific Text
[AUTHORS]
Xiaoliang Luo, Michael Ramscar, Bradley C. Love
[ABSTRACT]
The impressive performance of large language models (LLMs) has led to their
consideration as models of human language processing. Instead, we suggest that
the success of LLMs arises from the flexibility of the transformer learning
architecture. To evaluate this conjecture, we trained LLMs on scientific texts
that were either in a forward or backward format. Despite backward text being
inconsistent with the structure of human languages, we found that LLMs
performed equally well in either format on a neuroscience benchmark, eclipsing
human expert performance for both forward and backward orders. Our results are
consistent with the success of transformers across diverse domains, such as
weather prediction and protein design. This widespread success is attributable
to LLM’s ability to extract predictive patterns from any sufficiently
structured input. Given their generality, we suggest caution in interpreting
LLM’s success in linguistic tasks as evidence for human-like mechanisms.
[LINK]
http://arxiv.org/abs/2411.11061v1
[DATE]
2024-11-17 20:48:24+08:00
[CATEGORIES]
cs.CL
FastDraft: How to Train Your Draft
[AUTHORS]
Ofir Zafrir, Igor Margulis, Dorin Shteyman, Guy Boudoukh
[COMMENTS]
ENLSP NeurIPS Workshop 2024
[LINK]
http://arxiv.org/abs/2411.11055v1
[DATE]
2024-11-17 20:32:44+08:00
[CATEGORIES]
cs.CL
BianCang: A Traditional Chinese Medicine Large Language Model
[AUTHORS]
Sibo Wei, Xueping Peng, Yi-fei Wang, Jiasheng Si, Weiyu Zhang, Wenpeng Lu, Xiaoming Wu, Yinglong Wang
[ABSTRACT]
The rise of large language models (LLMs) has driven significant progress in
medical applications, including traditional Chinese medicine (TCM). However,
current medical LLMs struggle with TCM diagnosis and syndrome differentiation
due to substantial differences between TCM and modern medical theory, and the
scarcity of specialized, high-quality corpora. This paper addresses these
challenges by proposing BianCang, a TCM-specific LLM, using a two-stage
training process that first injects domain-specific knowledge and then aligns
it through targeted stimulation. To enhance diagnostic and differentiation
capabilities, we constructed pre-training corpora, instruction-aligned datasets
based on real hospital records, and the ChP-TCM dataset derived from the
Pharmacopoeia of the People’s Republic of China. We compiled extensive TCM and
medical corpora for continuous pre-training and supervised fine-tuning,
building a comprehensive dataset to refine the model’s understanding of TCM.
Evaluations across 11 test sets involving 29 models and 4 tasks demonstrate the
effectiveness of BianCang, offering valuable insights for future research.
Code, datasets, and models are available at
https://github.com/QLU-NLP/BianCang.
[LINK]
http://arxiv.org/abs/2411.11027v1
[DATE]
2024-11-17 18:17:01+08:00
[CATEGORIES]
cs.CL
Safely Learning with Private Data: A Federated Learning Framework for Large Language Model
[AUTHORS]
JiaYing Zheng, HaiNan Zhang, LingXiang Wang, WangJie Qiu, HongWei Zheng, ZhiMing Zheng
[ABSTRACT]
Private data, being larger and quality-higher than public data, can greatly
improve large language models (LLM). However, due to privacy concerns, this
data is often dispersed in multiple silos, making its secure utilization for
LLM training a challenge. Federated learning (FL) is an ideal solution for
training models with distributed private data, but traditional frameworks like
FedAvg are unsuitable for LLM due to their high computational demands on
clients. An alternative, split learning, offloads most training parameters to
the server while training embedding and output layers locally, making it more
suitable for LLM. Nonetheless, it faces significant challenges in security and
efficiency. Firstly, the gradients of embeddings are prone to attacks, leading
to potential reverse engineering of private data. Furthermore, the server’s
limitation of handle only one client’s training request at a time hinders
parallel training, severely impacting training efficiency. In this paper, we
propose a Federated Learning framework for LLM, named FL-GLM, which prevents
data leakage caused by both server-side and peer-client attacks while improving
training efficiency. Specifically, we first place the input block and output
block on local client to prevent embedding gradient attacks from server.
Secondly, we employ key-encryption during client-server communication to
prevent reverse engineering attacks from peer-clients. Lastly, we employ
optimization methods like client-batching or server-hierarchical, adopting
different acceleration methods based on the actual computational capabilities
of the server. Experimental results on NLU and generation tasks demonstrate
that FL-GLM achieves comparable metrics to centralized chatGLM model,
validating the effectiveness of our federated learning framework.
[LINK]
http://arxiv.org/abs/2406.14898v3
[DATE]
2024-11-17 16:45:44+08:00
[CATEGORIES]
cs.CL
FiSTECH: Financial Style Transfer to Enhance Creativity without Hallucinations in LLMs
[AUTHORS]
Sohini Roychowdhury, Marko Krema, Brian Moore, Xingjian Lai, Dike Effedua, Bharat Jethwani
[ABSTRACT]
Recent trends in Generative AI have emerged towards fine-tuning foundational
large language models (LLMs) to create domain-specific LLMs for automation and
chatbot-like applications. Specialized applications for analytics-heavy domains
such as Financial report generation require specific writing styles that
comprise compound and creative sentences with minimized hallucinations. In this
work, we explore the self-corrective auto-regressive qualities of LLMs to learn
creativity in writing styles with minimal prompting. We propose a novel
two-stage fine-tuning (FT) strategy wherein in the first stage public domain
financial reports are used to train for writing styles while allowing the LLM
to hallucinate. In the second stage the examples of hallucinations are manually
corrected and further used to fine-tune the LLM. The finally trained LLM learns
to generate specific financial report sections using minimal instructions and
tabular data inputs while ensuring low fine-tuning costs. Our proposed
two-stage fine-tuning boosts the accuracy of financial questions answering by
two-folds while reducing hallucinations by over 50%. Also, the fine-tuned model
has lower perplexity, improved ROUGE, TER and BLEU scores, higher creativity
and knowledge density with lower uncertainty and cross entropy than base LLMs.
Thus, the proposed framework can be generalized to train creativity in LLMs by
first allowing them to hallucinate.
[COMMENTS]
10 pages, 14 figures, 5 tables, conference
[LINK]
http://arxiv.org/abs/2408.05365v4
[DATE]
2024-11-17 15:22:31+08:00
[CATEGORIES]
cs.CL
A Comprehensive Study of Knowledge Editing for Large Language Models
[AUTHORS]
Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun Chen
[ABSTRACT]
Large Language Models (LLMs) have shown extraordinary capabilities in
understanding and generating text that closely mirrors human communication.
However, a primary limitation lies in the significant computational demands
during training, arising from their extensive parameterization. This challenge
is further intensified by the dynamic nature of the world, necessitating
frequent updates to LLMs to correct outdated information or integrate new
knowledge, thereby ensuring their continued relevance. Note that many
applications demand continual model adjustments post-training to address
deficiencies or undesirable behaviors. There is an increasing interest in
efficient, lightweight methods for on-the-fly model modifications. To this end,
recent years have seen a burgeoning in the techniques of knowledge editing for
LLMs, which aim to efficiently modify LLMs’ behaviors within specific domains
while preserving overall performance across various inputs. In this paper, we
first define the knowledge editing problem and then provide a comprehensive
review of cutting-edge approaches. Drawing inspiration from educational and
cognitive research theories, we propose a unified categorization criterion that
classifies knowledge editing methods into three groups: resorting to external
knowledge, merging knowledge into the model, and editing intrinsic knowledge.
Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive
empirical evaluation of representative knowledge editing approaches.
Additionally, we provide an in-depth analysis of knowledge location, which can
give a deeper understanding of the knowledge structures inherent within LLMs.
Finally, we discuss several potential applications of knowledge editing,
outlining its broad and impactful implications.
[COMMENTS]
Ongoing work (v5): we have updated the Table 4 results after
optimizing certain methods (related to AdaLoRA) and fixing computational bugs
(related to ROME and MEMIT) in the EasyEdit. These improvements have led to
better results than before. We will continue updating this paper and welcome
everyone to discuss and exchange ideas
[LINK]
http://arxiv.org/abs/2401.01286v5
[DATE]
2024-11-17 14:50:44+08:00
[CATEGORIES]
cs.CL
cs.LG
Fox-1 Technical Report
[AUTHORS]
Zijian Hu, Jipeng Zhang, Rui Pan, Zhaozhuo Xu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Dimitris Stripelis, Yuhang Yao, Salman Avestimehr, Chaoyang He, Tong Zhang
[ABSTRACT]
We present Fox-1, a series of small language models (SLMs) consisting of
Fox-1-1.6B and Fox-1-1.6B-Instruct-v0.1. These models are pre-trained on 3
trillion tokens of web-scraped document data and fine-tuned with 5 billion
tokens of instruction-following and multi-turn conversation data. Aiming to
improve the pre-training efficiency, Fox-1-1.6B model introduces a novel
3-stage data curriculum across all the training data with 2K-8K sequence
length. In architecture design, Fox-1 features a deeper layer structure, an
expanded vocabulary, and utilizes Grouped Query Attention (GQA), offering a
performant and efficient architecture compared to other SLMs. Fox-1 achieves
better or on-par performance in various benchmarks compared to StableLM-2-1.6B,
Gemma-2B, Qwen1.5-1.8B, and OpenELM1.1B, with competitive inference speed and
throughput. The model weights have been released under the Apache 2.0 license,
where we aim to promote the democratization of LLMs and make them fully
accessible to the whole open-source community.
[COMMENTS]
Base model is available at
https://huggingface.co/tensoropera/Fox-1-1.6B and the instruction-tuned
version is available at
https://huggingface.co/tensoropera/Fox-1-1.6B-Instruct-v0.1
[LINK]
http://arxiv.org/abs/2411.05281v2
[DATE]
2024-11-17 13:40:44+08:00
[CATEGORIES]
cs.CL
cs.LG
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents
[AUTHORS]
Qiang Sun, Yuanyi Luo, Sirui Li, Wenxiao Zhang, Wei Liu
[ABSTRACT]
Multimodal conversational agents are highly desirable because they offer
natural and human-like interaction. However, there is a lack of comprehensive
end-to-end solutions to support collaborative development and benchmarking.
While proprietary systems like GPT-4o and Gemini demonstrating impressive
integration of audio, video, and text with response times of 200-250ms,
challenges remain in balancing latency, accuracy, cost, and data privacy. To
better understand and quantify these issues, we developed OpenOmni, an
open-source, end-to-end pipeline benchmarking tool that integrates advanced
technologies such as Speech-to-Text, Emotion Detection, Retrieval Augmented
Generation, Large Language Models, along with the ability to integrate
customized models. OpenOmni supports local and cloud deployment, ensuring data
privacy and supporting latency and accuracy benchmarking. This flexible
framework allows researchers to customize the pipeline, focusing on real
bottlenecks and facilitating rapid proof-of-concept development. OpenOmni can
significantly enhance applications like indoor assistance for visually impaired
individuals, advancing human-computer interaction. Our demonstration video is
available https://www.youtube.com/watch?v=zaSiT3clWqY, demo is available via
https://openomni.ai4wa.com, code is available via
https://github.com/AI4WA/OpenOmniFramework.
[COMMENTS]
Published in Proceedings of the 2024 Conference on Empirical Methods
in Natural Language Processing: System Demonstrations (EMNLP 2024) Best Demo
Paper Award at EMNLP 2024
[LINK]
http://arxiv.org/abs/2408.03047v2
[DATE]
2024-11-17 10:53:34+08:00
[CATEGORIES]
cs.CL
Inter-linguistic Phonetic Composition (IPC): A Theoretical and Computational Approach to Enhance Second Language Pronunciation
[AUTHORS]
Jisang Park, Minu Kim, DaYoung Hong, Jongha Lee
[COMMENTS]
10 pages, 6 Figures, submitted to ACL ARR October 2024 for NAACL 2025
[LINK]
http://arxiv.org/abs/2411.10927v1
[DATE]
2024-11-17 09:15:58+08:00
[CATEGORIES]
cs.CL
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
[AUTHORS]
Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Valentin Hofmann, Tomasz Limisiewicz, Yulia Tsvetkov, Noah A. Smith
[ABSTRACT]
In multilingual settings, non-Latin scripts and low-resource languages are
usually disadvantaged in terms of language models’ utility, efficiency, and
cost. Specifically, previous studies have reported multiple modeling biases
that the current tokenization algorithms introduce to non-Latin script
languages, the main one being over-segmentation. In this work, we propose
MAGNET; multilingual adaptive gradient-based tokenization to reduce
over-segmentation via adaptive gradient-based subword tokenization. MAGNET
learns to predict segment boundaries between byte tokens in a sequence via
sub-modules within the model, which act as internal boundary predictors
(tokenizers). Previous gradient-based tokenization methods aimed for uniform
compression across sequences by integrating a single boundary predictor during
training and optimizing it end-to-end through stochastic reparameterization
alongside the next token prediction objective. However, this approach still
results in over-segmentation for non-Latin script languages in multilingual
settings. In contrast, MAGNET offers a customizable architecture where
byte-level sequences are routed through language-script-specific predictors,
each optimized for its respective language script. This modularity enforces
equitable segmentation granularity across different language scripts compared
to previous methods. Through extensive experiments, we demonstrate that in
addition to reducing segmentation disparities, MAGNET also enables faster
language modelling and improves downstream utility.
[LINK]
http://arxiv.org/abs/2407.08818v2
[DATE]
2024-11-17 08:41:01+08:00
[CATEGORIES]
cs.CL
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
[AUTHORS]
Sizhe Wang, Yongqi Tong, Hengyuan Zhang, Dawei Li, Xin Zhang, Tianlong Chen
[ABSTRACT]
Reinforcement Learning with Human Feedback (RLHF) is the key to the success
of large language models (LLMs) in recent years. In this work, we first
introduce the concepts of knowledge breadth and knowledge depth, which measure
the comprehensiveness and depth of an LLM or knowledge source respectively. We
reveal that the imbalance in the number of prompts and responses can lead to a
potential disparity in breadth and depth learning within alignment tuning
datasets by showing that even a simple uniform method for balancing the number
of instructions and responses can lead to significant improvements. Building on
this, we further propose Balanced Preference Optimization (BPO), designed to
dynamically augment the knowledge depth of each sample. BPO is motivated by the
observation that the usefulness of knowledge varies across samples,
necessitating tailored learning of knowledge depth. To achieve this, we
introduce gradient-based clustering, estimating the knowledge informativeness
and usefulness of each augmented sample based on the model’s optimization
direction. Our experimental results across various benchmarks demonstrate that
BPO outperforms other baseline methods in alignment tuning while maintaining
training efficiency. Furthermore, we conduct a detailed analysis of each
component of BPO, providing guidelines for future research in preference data
optimization.
[LINK]
http://arxiv.org/abs/2411.10914v1
[DATE]
2024-11-17 07:53:27+08:00
[CATEGORIES]
cs.CL
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
[AUTHORS]
Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari
[ABSTRACT]
Inference of Large Language Models (LLMs) across computer clusters has become
a focal point of research in recent times, with many acceleration techniques
taking inspiration from CPU speculative execution. These techniques reduce
bottlenecks associated with memory bandwidth, but also increase end-to-end
latency per inference run, requiring high speculation acceptance rates to
improve performance. Combined with a variable rate of acceptance across tasks,
speculative inference techniques can result in reduced performance.
Additionally, pipeline-parallel designs require many user requests to maintain
maximum utilization. As a remedy, we propose PipeInfer, a pipelined speculative
acceleration technique to reduce inter-token latency and improve system
utilization for single-request scenarios while also improving tolerance to low
speculation acceptance rates and low-bandwidth interconnects. PipeInfer
exhibits up to a 2.15$\times$ improvement in generation speed over standard
speculative inference. PipeInfer achieves its improvement through Continuous
Asynchronous Speculation and Early Inference Cancellation, the former improving
latency and generation speed by running single-token inference simultaneously
with several speculative runs, while the latter improves speed and latency by
skipping the computation of invalidated runs, even in the middle of inference.
[COMMENTS]
11 pages, submitted to SC24 conference
[LINK]
http://arxiv.org/abs/2407.11798v2
[DATE]
2024-11-17 07:19:51+08:00
[CATEGORIES]
cs.CL
cs.LG
Self-Attention Limits Working Memory Capacity of Transformer-Based Models
[AUTHORS]
Dongyu Gong, Hantao Zhang
[ABSTRACT]
Recent work on Transformer-based large language models (LLMs) has revealed
striking limits in their working memory capacity, similar to what has been
found in human behavioral studies. Specifically, these models’ performance
drops significantly on N-back tasks as N increases. However, there is still a
lack of mechanistic interpretability as to why this phenomenon would arise.
Inspired by the executive attention theory from behavioral sciences, we
hypothesize that the self-attention mechanism within Transformer-based models
might be responsible for their working memory capacity limits. To test this
hypothesis, we train vanilla decoder-only transformers to perform N-back tasks
and find that attention scores gradually aggregate to the N-back positions over
training, suggesting that the model masters the task by learning a strategy to
pay attention to the relationship between the current position and the N-back
position. Critically, we find that the total entropy of the attention score
matrix increases as N increases, suggesting that the dispersion of attention
scores might be the cause of the capacity limit observed in N-back tasks. Our
findings thus offer insights into the shared role of attention in both human
and artificial intelligence. Moreover, the limitations of the self-attention
mechanism revealed in the current study could inform future efforts to design
more powerful model architectures with enhanced working memory capacity and
cognitive capabilities.
[COMMENTS]
10 pages, 12 figures
[LINK]
http://arxiv.org/abs/2409.10715v2
[DATE]
2024-11-17 04:50:11+08:00
[CATEGORIES]
cs.CL
Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks
[AUTHORS]
Bo-Ru Lu, Nikita Haduong, Chien-Yu Lin, Hao Cheng, Noah A. Smith, Mari Ostendorf
[ABSTRACT]
Transformer-based NLP models are powerful but have high computational costs
that limit deployment. Finetuned encoder-decoder models are popular in
specialized domains and can outperform larger more generalized decoder-only
models, such as GPT-4. We introduce a new configuration for encoder-decoder
models that improves efficiency on structured output and decomposable tasks
where multiple outputs are required for a single shared input. Our method,
prompt-in-decoder (PiD), encodes the input once and decodes the output in
parallel, boosting both training and inference efficiency by avoiding duplicate
input encoding and increasing the operational intensity (ratio of numbers of
arithmetic operation to memory access) of decoding process by sharing the input
key-value cache. We achieve computation reduction that roughly scales with the
number of subtasks, gaining up to 4.6x speed-up over state-of-the-art models
for dialogue state tracking, summarization, and question-answering tasks, with
comparable or better performance.
[COMMENTS]
18 pages
[LINK]
http://arxiv.org/abs/2403.13112v3
[DATE]
2024-11-17 04:39:46+08:00
[CATEGORIES]
cs.CL
BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
[AUTHORS]
Md. Nazmus Sadat Samin, Jawad Ibn Ahad, Tanjila Ahmed Medha, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman
[ABSTRACT]
This study focuses on recognizing Bangladeshi dialects and converting diverse
Bengali accents into standardized formal Bengali speech. Dialects, often
referred to as regional languages, are distinctive variations of a language
spoken in a particular location and are identified by their phonetics,
pronunciations, and lexicon. Subtle changes in pronunciation and intonation are
also influenced by geographic location, educational attainment, and
socioeconomic status. Dialect standardization is needed to ensure effective
communication, educational consistency, access to technology, economic
opportunities, and the preservation of linguistic resources while respecting
cultural diversity. Being the fifth most spoken language with around 55
distinct dialects spoken by 160 million people, addressing Bangla dialects is
crucial for developing inclusive communication tools. However, limited research
exists due to a lack of comprehensive datasets and the challenges of handling
diverse dialects. With the advancement in multilingual Large Language Models
(mLLMs), emerging possibilities have been created to address the challenges of
dialectal Automated Speech Recognition (ASR) and Machine Translation (MT). This
study presents an end-to-end pipeline for converting dialectal Noakhali speech
to standard Bangla speech. This investigation includes constructing a
large-scale diverse dataset with dialectal speech signals that tailored the
fine-tuning process in ASR and LLM for transcribing the dialect speech to
dialect text and translating the dialect text to standard Bangla text. Our
experiments demonstrated that fine-tuning the Whisper ASR model achieved a CER
of 0.8% and WER of 1.5%, while the BanglaT5 model attained a BLEU score of
41.6% for dialect-to-standard text translation.
[COMMENTS]
Accepted in 2024 IEEE International Conference on Big Data (IEEE
BigData)
[LINK]
http://arxiv.org/abs/2411.10879v1
[DATE]
2024-11-17 04:20:15+08:00
[CATEGORIES]
cs.CL
cs.LG
Empowering Meta-Analysis: Leveraging Large Language Models for Scientific Synthesis
[AUTHORS]
Jawad Ibn Ahad, Rafeed Mohammad Sultan, Abraham Kaikobad, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman
[ABSTRACT]
This study investigates the automation of meta-analysis in scientific
documents using large language models (LLMs). Meta-analysis is a robust
statistical method that synthesizes the findings of multiple studies support
articles to provide a comprehensive understanding. We know that a meta-article
provides a structured analysis of several articles. However, conducting
meta-analysis by hand is labor-intensive, time-consuming, and susceptible to
human error, highlighting the need for automated pipelines to streamline the
process. Our research introduces a novel approach that fine-tunes the LLM on
extensive scientific datasets to address challenges in big data handling and
structured data extraction. We automate and optimize the meta-analysis process
by integrating Retrieval Augmented Generation (RAG). Tailored through prompt
engineering and a new loss metric, Inverse Cosine Distance (ICD), designed for
fine-tuning on large contextual datasets, LLMs efficiently generate structured
meta-analysis content. Human evaluation then assesses relevance and provides
information on model performance in key metrics. This research demonstrates
that fine-tuned models outperform non-fine-tuned models, with fine-tuned LLMs
generating 87.6% relevant meta-analysis abstracts. The relevance of the
context, based on human evaluation, shows a reduction in irrelevancy from 4.56%
to 1.9%. These experiments were conducted in a low-resource environment,
highlighting the study’s contribution to enhancing the efficiency and
reliability of meta-analysis automation.
[COMMENTS]
Accepted in 2024 IEEE International Conference on Big Data (IEEE
BigData)
[LINK]
http://arxiv.org/abs/2411.10878v1
[DATE]
2024-11-17 04:18:57+08:00
[CATEGORIES]
cs.CL
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
[AUTHORS]
Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee
[ABSTRACT]
In this study, we tackle a growing concern around the safety and ethical use
of large language models (LLMs). Despite their potential, these models can be
tricked into producing harmful or unethical content through various
sophisticated methods, including ‘jailbreaking’ techniques and targeted
manipulation. Our work zeroes in on a specific issue: to what extent LLMs can
be led astray by asking them to generate responses that are instruction-centric
such as a pseudocode, a program or a software snippet as opposed to vanilla
text. To investigate this question, we introduce TechHazardQA, a dataset
containing complex queries which should be answered in both text and
instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers
for unethical responses. We query a series of LLMs – Llama-2-13b, Llama-2-7b,
Mistral-V2 and Mistral 8X7B – and ask them to generate both text and
instruction-centric responses. For evaluation we report the harmfulness score
metric as well as judgements from GPT-4 and humans. Overall, we observe that
asking LLMs to produce instruction-centric responses enhances the unethical
response generation by ~2-38% across the models. As an additional objective, we
investigate the impact of model editing using the ROME technique, which further
increases the propensity for generating undesirable content. In particular,
asking edited LLMs to generate instruction-centric responses further increases
the unethical response generation by ~3-16% across the different models.
[COMMENTS]
Accepted at AAAI Conference on Web and Social Media (ICWSM) 2025.
Dataset
[LINK]
http://arxiv.org/abs/2402.15302v5
[DATE]
2024-11-17 03:21:32+08:00
[CATEGORIES]
cs.CL
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding
[AUTHORS]
Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Yonas Chanie, Bontu Fufa Balcha, Negasi Haile Abadi, Henok Biadglign Ademtew, Mulubrhan Abebe Nerea, Debela Desalegn Yadeta, Derartu Dagne Geremew, Assefa Atsbiha tesfau, Philipp Slusallek, Thamar Solorio, Dietrich Klakow
[ABSTRACT]
With the rapid development of evaluation datasets to assess LLMs
understanding across a wide range of subjects and domains, identifying a
suitable language understanding benchmark has become increasingly challenging.
In this work, we explore LLM evaluation challenges for low-resource language
understanding and introduce ProverbEval, LLM evaluation benchmark for
low-resource languages based on proverbs to focus on low-resource language
understanding in culture-specific scenarios. We benchmark various LLMs and
explore factors that create variability in the benchmarking process. We
observed performance variances of up to 50%, depending on the order in which
answer choices were presented in multiple-choice tasks. Native language proverb
descriptions significantly improve tasks such as proverb generation,
contributing to improved outcomes. Additionally, monolingual evaluations
consistently outperformed their cross-lingual counterparts. We argue special
attention must be given to the order of choices, choice of prompt language,
task variability, and generation tasks when creating LLM evaluation benchmarks.
[LINK]
http://arxiv.org/abs/2411.05049v2
[DATE]
2024-11-17 02:58:35+08:00
[CATEGORIES]
cs.CL
Investigating Annotator Bias in Large Language Models for Hate Speech Detection
[AUTHORS]
Amit Das, Zheng Zhang, Najib Hasan, Souvika Sarkar, Fatemeh Jamshidi, Tathagata Bhattacharya, Mostafa Rahgouy, Nilanjana Raychawdhary, Dongji Feng, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals
[COMMENTS]
Accepted at NeurIPS Safe Generative AI Workshop, 2024
[LINK]
http://arxiv.org/abs/2406.11109v5
[DATE]
2024-11-17 02:56:32+08:00
[CATEGORIES]
cs.CL
cs.LG
Vocabulary Transfer for Biomedical Texts: Add Tokens if You Can Not Add Data
[AUTHORS]
Priyanka Singh, Vladislav D. Mosin, Ivan P. Yamshchikov
[ABSTRACT]
Working within specific NLP subdomains presents significant challenges,
primarily due to a persistent deficit of data. Stringent privacy concerns and
limited data accessibility often drive this shortage. Additionally, the medical
domain demands high accuracy, where even marginal improvements in model
performance can have profound impacts. In this study, we investigate the
potential of vocabulary transfer to enhance model performance in biomedical NLP
tasks. Specifically, we focus on vocabulary extension, a technique that
involves expanding the target vocabulary to incorporate domain-specific
biomedical terms. Our findings demonstrate that vocabulary extension, leads to
measurable improvements in both downstream model performance and inference
time.
[LINK]
http://arxiv.org/abs/2208.02554v3
[DATE]
2024-11-17 01:49:57+08:00
[CATEGORIES]
cs.CL
Pseudo Replay-based Class Continual Learning for Online New Category Anomaly Detection in Advanced Manufacturing
[AUTHORS]
Yuxuan Li, Tianxin Xie, Chenang Liu, Zhangyue Shi
[ABSTRACT]
The incorporation of advanced sensors and machine learning techniques has
enabled modern manufacturing enterprises to perform data-driven
classification-based anomaly detection based on the sensor data collected in
manufacturing processes. However, one critical challenge is that newly
presented defect category may manifest as the manufacturing process continues,
resulting in monitoring performance deterioration of previously trained machine
learning models. Hence, there is an increasing need for empowering machine
learning models to learn continually. Among all continual learning methods,
memory-based continual learning has the best performance but faces the
constraints of data storage capacity. To address this issue, this paper
develops a novel pseudo replay-based continual learning framework by
integrating class incremental learning and oversampling-based data generation.
Without storing all the data, the developed framework could generate
high-quality data representing previous classes to train machine learning model
incrementally when new category anomaly occurs. In addition, it could even
enhance the monitoring performance since it also effectively improves the data
quality. The effectiveness of the proposed framework is validated in three
cases studies, which leverages supervised classification problem for anomaly
detection. The experimental results show that the developed method is very
promising in detecting novel anomaly while maintaining a good performance on
the previous task and brings up more flexibility in model architecture.
[LINK]
http://arxiv.org/abs/2312.02491v3
[DATE]
2024-11-17 23:22:02+08:00
[CATEGORIES]
cs.LG
Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning
[AUTHORS]
Ting Zhu, Yue Jin, Jeremie Houssineau, Giovanni Montana
[ABSTRACT]
In decentralized multi-agent reinforcement learning, agents learning in
isolation can lead to relative over-generalization (RO), where optimal joint
actions are undervalued in favor of suboptimal ones. This hinders effective
coordination in cooperative tasks, as agents tend to choose actions that are
individually rational but collectively suboptimal. To address this issue, we
introduce MaxMax Q-Learning (MMQ), which employs an iterative process of
sampling and evaluating potential next states, selecting those with maximal
Q-values for learning. This approach refines approximations of ideal state
transitions, aligning more closely with the optimal joint policy of
collaborating agents. We provide theoretical analysis supporting MMQ’s
potential and present empirical evaluations across various environments
susceptible to RO. Our results demonstrate that MMQ frequently outperforms
existing baselines, exhibiting enhanced convergence and sample efficiency.
[COMMENTS]
Published in Transactions on Machine Learning Research (11/2024)
[LINK]
http://arxiv.org/abs/2411.11099v1
[DATE]
2024-11-17 23:00:39+08:00
[CATEGORIES]
cs.LG
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume
[AUTHORS]
Ping Guo, Cheng Gong, Xi Lin, Zhiyuan Yang, Qingfu Zhang
[ABSTRACT]
The escalating threat of adversarial attacks on deep learning models,
particularly in security-critical fields, has underscored the need for robust
deep learning systems. Conventional robustness evaluations have relied on
adversarial accuracy, which measures a model’s performance under a specific
perturbation intensity. However, this singular metric does not fully
encapsulate the overall resilience of a model against varying degrees of
perturbation. To address this gap, we propose a new metric termed adversarial
hypervolume, assessing the robustness of deep learning models comprehensively
over a range of perturbation intensities from a multi-objective optimization
standpoint. This metric allows for an in-depth comparison of defense mechanisms
and recognizes the trivial improvements in robustness afforded by less potent
defensive strategies. Additionally, we adopt a novel training algorithm that
enhances adversarial robustness uniformly across various perturbation
intensities, in contrast to methods narrowly focused on optimizing adversarial
accuracy. Our extensive empirical studies validate the effectiveness of the
adversarial hypervolume metric, demonstrating its ability to reveal subtle
differences in robustness that adversarial accuracy overlooks. This research
contributes a new measure of robustness and establishes a standard for
assessing and benchmarking the resilience of current and future defensive
models against adversarial threats.
[LINK]
http://arxiv.org/abs/2403.05100v2
[DATE]
2024-11-17 22:42:51+08:00
[CATEGORIES]
cs.LG
An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces
[AUTHORS]
Alex Beeson, David Ireland, Giovanni Montana
[ABSTRACT]
Expanding reinforcement learning (RL) to offline domains generates promising
prospects, particularly in sectors where data collection poses substantial
challenges or risks. Pivotal to the success of transferring RL offline is
mitigating overestimation bias in value estimates for state-action pairs absent
from data. Whilst numerous approaches have been proposed in recent years, these
tend to focus primarily on continuous or small-scale discrete action spaces.
Factorised discrete action spaces, on the other hand, have received relatively
little attention, despite many real-world problems naturally having
factorisable actions. In this work, we undertake a formative investigation into
offline reinforcement learning in factorisable action spaces. Using
value-decomposition as formulated in DecQN as a foundation, we present the case
for a factorised approach and conduct an extensive empirical evaluation of
several offline techniques adapted to the factorised setting. In the absence of
established benchmarks, we introduce a suite of our own comprising datasets of
varying quality and task complexity. Advocating for reproducible research and
innovation, we make all datasets available for public use alongside our code
base.
[COMMENTS]
Published in Transactions on Machine Learning Research (11/2024)
[LINK]
http://arxiv.org/abs/2411.11088v1
[DATE]
2024-11-17 22:31:14+08:00
[CATEGORIES]
cs.LG
MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions
[AUTHORS]
Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima, Masayuki Ikebe, Kenji Suzuki, Xun Gong
[ABSTRACT]
Multi-modal learning has shown exceptional performance in various tasks,
especially in medical applications, where it integrates diverse medical
information for comprehensive diagnostic evidence. However, there still are
several challenges in multi-modal learning, 1. Heterogeneity between
modalities, 2. uncertainty in missing modalities, 3. influence of intrinsic
noise, and 4. interpretability for fusion result. This paper introduces the
Modal-Domain Attention (MDA) model to address the above challenges. MDA
constructs linear relationships between modalities through continuous
attention, due to its ability to adaptively allocate dynamic attention to
different modalities, MDA can reduce attention to low-correlation data, missing
modalities, or modalities with inherent noise, thereby maintaining SOTA
performance across various tasks on multiple public datasets. Furthermore, our
observations on the contribution of different modalities indicate that MDA
aligns with established clinical diagnostic imaging gold standards and holds
promise as a reference for pathologies where these standards are not yet
clearly defined. The code and dataset will be available.
[LINK]
http://arxiv.org/abs/2406.10569v3
[DATE]
2024-11-17 22:08:23+08:00
[CATEGORIES]
cs.LG
Reinforcement Learning for Financial Index Tracking
[AUTHORS]
Xianhua Peng, Chenyin Gong, Xue Dong He
[ABSTRACT]
We propose the first discrete-time infinite-horizon dynamic formulation of
the financial index tracking problem under both return-based tracking error and
value-based tracking error. The formulation overcomes the limitations of
existing models by incorporating the intertemporal dynamics of market
information variables not limited to prices, allowing exact calculation of
transaction costs, accounting for the tradeoff between overall tracking error
and transaction costs, allowing effective use of data in a long time period,
etc. The formulation also allows novel decision variables of cash injection or
withdraw. We propose to solve the portfolio rebalancing equation using a Banach
fixed point iteration, which allows to accurately calculate the transaction
costs specified as nonlinear functions of trading volumes in practice. We
propose an extension of deep reinforcement learning (RL) method to solve the
dynamic formulation. Our RL method resolves the issue of data limitation
resulting from the availability of a single sample path of financial data by a
novel training scheme. A comprehensive empirical study based on a 17-year-long
testing set demonstrates that the proposed method outperforms a benchmark
method in terms of tracking accuracy and has the potential for earning extra
profit through cash withdraw strategy.
[COMMENTS]
75 pages,15 figures, and 13 tables
[LINK]
http://arxiv.org/abs/2308.02820v2
[DATE]
2024-11-17 20:53:03+08:00
[CATEGORIES]
cs.LG
Asymmetrical estimator for training encapsulated deep photonic neural networks
[AUTHORS]
Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng
[ABSTRACT]
Photonic neural networks (PNNs) are fast in-propagation and high bandwidth
paradigms that aim to popularize reproducible NN acceleration with higher
efficiency and lower cost. However, the training of PNN is known to be a
challenge, where the device-to-device and system-to-system variations create
imperfect knowledge of the PNN. Despite backpropagation (BP)-based training
algorithms often being the industry standard for their robustness, generality,
and fast gradient convergence for digital training, existing PNN-BP methods
rely heavily on the accurate intermediate state extraction for a deep PNN
(DPNN). These information accesses truncate the photonic signal propagation,
bottlenecking DPNN’s operation speed and increasing the system construction
cost. Here, we introduce the asymmetrical training (AT) method, tailored for
encapsulated DPNNs, where the signal is preserved in the analogue photonic
domain for the entire structure. AT’s minimum information readout for training
bypasses analogue-digital interfaces wherever possible for fast operation and
minimum system footprint. AT’s error tolerance and generality aim to promote
PNN acceleration in a widened operational scenario despite the fabrication
variations and imperfect controls. We demonstrated AT for encapsulated DPNN
with integrated photonic chips, repeatably enhancing the performance from
in-silico BP for different network structures and datasets.
[COMMENTS]
22 pages, 6 figures
[LINK]
http://arxiv.org/abs/2405.18458v3
[DATE]
2024-11-17 20:33:25+08:00
[CATEGORIES]
cs.LG
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
[AUTHORS]
Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons
[COMMENTS]
Advances in Neural Information Processing Systems 37 (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2402.17747v5
[DATE]
2024-11-17 20:18:45+08:00
[CATEGORIES]
cs.LG
Privacy and Copyright Protection in Generative AI: A Lifecycle Perspective
[AUTHORS]
Dawen Zhang, Boming Xia, Yue Liu, Xiwei Xu, Thong Hoang, Zhenchang Xing, Mark Staples, Qinghua Lu, Liming Zhu
[ABSTRACT]
The advent of Generative AI has marked a significant milestone in artificial
intelligence, demonstrating remarkable capabilities in generating realistic
images, texts, and data patterns. However, these advancements come with
heightened concerns over data privacy and copyright infringement, primarily due
to the reliance on vast datasets for model training. Traditional approaches
like differential privacy, machine unlearning, and data poisoning only offer
fragmented solutions to these complex issues. Our paper delves into the
multifaceted challenges of privacy and copyright protection within the data
lifecycle. We advocate for integrated approaches that combines technical
innovation with ethical foresight, holistically addressing these concerns by
investigating and devising solutions that are informed by the lifecycle
perspective. This work aims to catalyze a broader discussion and inspire
concerted efforts towards data privacy and copyright integrity in Generative
AI.
[COMMENTS]
Accepted by 2024 IEEE/ACM 3rd International Conference on AI
Engineering - Software Engineering for AI (CAIN)
[LINK]
http://arxiv.org/abs/2311.18252v3
[DATE]
2024-11-17 20:09:49+08:00
[CATEGORIES]
cs.LG
Generating medical screening questionnaires through analysis of social media data
[AUTHORS]
Ortal Ashkenazi, Elad Yom-Tov, Liron Vardi David
[ABSTRACT]
Screening questionnaires are used in medicine as a diagnostic aid. Creating
them is a long and expensive process, which could potentially be improved
through analysis of social media posts related to symptoms and behaviors prior
to diagnosis. Here we show a preliminary investigation into the feasibility of
generating screening questionnaires for a given medical condition from social
media postings. The method first identifies a cohort of relevant users through
their posts in dedicated patient groups and a control group of users who
reported similar symptoms but did not report being diagnosed with the condition
of interest. Posts made prior to diagnosis are used to generate decision rules
to differentiate between the different groups, by clustering symptoms mentioned
by these users and training a decision tree to differentiate between the two
groups. We validate the generated rules by correlating them with scores given
by medical doctors to matching hypothetical cases. We demonstrate the proposed
method by creating questionnaires for three conditions (endometriosis, lupus,
and gout) using the data of several hundreds of users from Reddit. These
questionnaires were then validated by medical doctors. The average Pearson’s
correlation between the latter’s scores and the decision rules were 0.58
(endometriosis), 0.40 (lupus) and 0.27 (gout). Our results suggest that the
process of questionnaire generation can be, at least partly, automated. These
questionnaires are advantageous in that they are based on real-world experience
but are currently lacking in their ability to capture the context, duration,
and timing of symptoms.
[LINK]
http://arxiv.org/abs/2411.11048v1
[DATE]
2024-11-17 19:57:18+08:00
[CATEGORIES]
cs.LG
Knowledge-enhanced Transformer for Multivariate Long Sequence Time-series Forecasting
[AUTHORS]
Shubham Tanaji Kakde, Rony Mitra, Jasashwi Mandal, Manoj Kumar Tiwari
[ABSTRACT]
Multivariate Long Sequence Time-series Forecasting (LSTF) has been a critical
task across various real-world applications. Recent advancements focus on the
application of transformer architectures attributable to their ability to
capture temporal patterns effectively over extended periods. However, these
approaches often overlook the inherent relationships and interactions between
the input variables that could be drawn from their characteristic properties.
In this paper, we aim to bridge this gap by integrating information-rich
Knowledge Graph Embeddings (KGE) with state-of-the-art transformer-based
architectures. We introduce a novel approach that encapsulates conceptual
relationships among variables within a well-defined knowledge graph, forming
dynamic and learnable KGEs for seamless integration into the transformer
architecture. We investigate the influence of this integration into seminal
architectures such as PatchTST, Autoformer, Informer, and Vanilla Transformer.
Furthermore, we thoroughly investigate the performance of these
knowledge-enhanced architectures along with their original implementations for
long forecasting horizons and demonstrate significant improvement in the
benchmark results. This enhancement empowers transformer-based architectures to
address the inherent structural relation between variables. Our
knowledge-enhanced approach improves the accuracy of multivariate LSTF by
capturing complex temporal and relational dynamics across multiple domains. To
substantiate the validity of our model, we conduct comprehensive experiments
using Weather and Electric Transformer Temperature (ETT) datasets.
[COMMENTS]
9 pages, 4 figures, 4 tables
[LINK]
http://arxiv.org/abs/2411.11046v1
[DATE]
2024-11-17 19:53:54+08:00
[CATEGORIES]
cs.LG
Efficient Federated Unlearning with Adaptive Differential Privacy Preservation
[AUTHORS]
Yu Jiang, Xindi Tong, Ziyao Liu, Huanyi Ye, Chee Wei Tan, Kwok-Yan Lam
[ABSTRACT]
Federated unlearning (FU) offers a promising solution to effectively address
the need to erase the impact of specific clients’ data on the global model in
federated learning (FL), thereby granting individuals the ``Right to be
Forgotten”. The most straightforward approach to achieve unlearning is to train
the model from scratch, excluding clients who request data removal, but it is
resource-intensive. Current state-of-the-art FU methods extend traditional FL
frameworks by leveraging stored historical updates, enabling more efficient
unlearning than training from scratch. However, the use of stored updates
introduces significant privacy risks. Adversaries with access to these updates
can potentially reconstruct clients’ local data, a well-known vulnerability in
the privacy domain. While privacy-enhanced techniques exist, their applications
to FU scenarios that balance unlearning efficiency with privacy protection
remain underexplored. To address this gap, we propose FedADP, a method designed
to achieve both efficiency and privacy preservation in FU. Our approach
incorporates an adaptive differential privacy (DP) mechanism, carefully
balancing privacy and unlearning performance through a novel budget allocation
strategy tailored for FU. FedADP also employs a dual-layered selection process,
focusing on global models with significant changes and client updates closely
aligned with the global model, reducing storage and communication costs.
Additionally, a novel calibration method is introduced to facilitate effective
unlearning. Extensive experimental results demonstrate that FedADP effectively
manages the trade-off between unlearning efficiency and privacy protection.
[LINK]
http://arxiv.org/abs/2411.11044v1
[DATE]
2024-11-17 19:45:15+08:00
[CATEGORIES]
cs.LG
FedUHB: Accelerating Federated Unlearning via Polyak Heavy Ball Method
[AUTHORS]
Yu Jiang, Chee Wei Tan, Kwok-Yan Lam
[ABSTRACT]
Federated learning facilitates collaborative machine learning, enabling
multiple participants to collectively develop a shared model while preserving
the privacy of individual data. The growing importance of the “right to be
forgotten” calls for effective mechanisms to facilitate data removal upon
request. In response, federated unlearning (FU) has been developed to
efficiently eliminate the influence of specific data from the model. Current FU
methods primarily rely on approximate unlearning strategies, which seek to
balance data removal efficacy with computational and communication costs, but
often fail to completely erase data influence. To address these limitations, we
propose FedUHB, a novel exact unlearning approach that leverages the Polyak
heavy ball optimization technique, a first-order method, to achieve rapid
retraining. In addition, we introduce a dynamic stopping mechanism to optimize
the termination of the unlearning process. Our extensive experiments show that
FedUHB not only enhances unlearning efficiency but also preserves robust model
performance after unlearning. Furthermore, the dynamic stopping mechanism
effectively reduces the number of unlearning iterations, conserving both
computational and communication resources. FedUHB can be proved as an effective
and efficient solution for exact data removal in federated learning settings.
[LINK]
http://arxiv.org/abs/2411.11039v1
[DATE]
2024-11-17 19:08:49+08:00
[CATEGORIES]
cs.LG
EfQAT: An Efficient Framework for Quantization-Aware Training
[AUTHORS]
Saleh Ashkboos, Bram Verhoef, Torsten Hoefler, Evangelos Eleftheriou, Martino Dazzi
[ABSTRACT]
Quantization-aware training (QAT) schemes have been shown to achieve
near-full precision accuracy. They accomplish this by training a quantized
model for multiple epochs. This is computationally expensive, mainly because of
the full precision backward pass. On the other hand, post-training quantization
(PTQ) schemes do not involve training and are therefore computationally cheap,
but they usually result in a significant accuracy drop. We address these
challenges by proposing EfQAT, which generalizes both schemes by optimizing
only a subset of the parameters of a quantized model. EfQAT starts by applying
a PTQ scheme to a pre-trained model and only updates the most critical network
parameters while freezing the rest, accelerating the backward pass. We
demonstrate the effectiveness of EfQAT on various CNNs and Transformer-based
models using different GPUs. Specifically, we show that EfQAT is significantly
more accurate than PTQ with little extra compute. Furthermore, EfQAT can
accelerate the QAT backward pass between 1.44-1.64x while retaining most
accuracy.
[COMMENTS]
12 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.11038v1
[DATE]
2024-11-17 19:06:36+08:00
[CATEGORIES]
cs.LG
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
[AUTHORS]
Niels Mündler, Mark Niklas Müller, Jingxuan He, Martin Vechev
[ABSTRACT]
Rigorous software testing is crucial for developing and maintaining
high-quality code, making automated test generation a promising avenue for both
improving software quality and boosting the effectiveness of code generation
methods. However, while code generation with Large Language Models (LLMs) is an
extraordinarily active research area, test generation remains relatively
unexplored. We address this gap and investigate the capability of LLM-based
Code Agents to formalize user issues into test cases. To this end, we propose a
novel benchmark based on popular GitHub repositories, containing real-world
issues, ground-truth bug-fixes, and golden tests. We find that LLMs generally
perform surprisingly well at generating relevant test cases, with Code Agents
designed for code repair exceeding the performance of systems designed
specifically for test generation. Further, as test generation is a similar but
more structured task than code generation, it allows for a more fine-grained
analysis using issue reproduction rate and coverage changes, providing a dual
metric for analyzing systems designed for code repair. Finally, we find that
generated tests are an effective filter for proposed code fixes, doubling the
precision of SWE-Agent. We release all data and code at
https://github.com/logic-star-ai/SWT-Bench
[COMMENTS]
20 pages, 14 figures, 7 tables
[LINK]
http://arxiv.org/abs/2406.12952v2
[DATE]
2024-11-17 17:40:36+08:00
[CATEGORIES]
cs.LG
Localized Schrödinger Bridge Sampler
[AUTHORS]
Georg A. Gottwald, Sebastian Reich
[ABSTRACT]
We consider the problem of sampling from an unknown distribution for which
only a sufficiently large number of training samples are available. In this
paper, we build on previous work combining Schr"odinger bridges and plug &
play Langevin samplers. A key bottleneck of these approaches is the exponential
dependence of the required training samples on the dimension, $d$, of the
ambient state space. We propose a localization strategy which exploits
conditional independence of conditional expectation values. Localization thus
replaces a single high-dimensional Schr"odinger bridge problem by $d$
low-dimensional Schr"odinger bridge problems over the available training
samples. In this context, a connection to multi-head self attention transformer
architectures is established. As for the original Schr"odinger bridge sampling
approach, the localized sampler is stable and geometric ergodic. The sampler
also naturally extends to conditional sampling and to Bayesian inference. We
demonstrate the performance of our proposed scheme through experiments on a
high-dimensional Gaussian problem, on a temporal stochastic process, and on a
stochastic subgrid-scale parametrization conditional sampling problem. We also
extend the idea of localization to plug & play Langevin samplers using
kernel-based denoising in combination with Tweedie’s formula.
[LINK]
http://arxiv.org/abs/2409.07968v3
[DATE]
2024-11-17 17:17:17+08:00
[CATEGORIES]
cs.LG
Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
[AUTHORS]
Taehyun Hwang, Min-hwan Oh
[ABSTRACT]
We study model-based reinforcement learning (RL) for episodic Markov decision
processes (MDP) whose transition probability is parametrized by an unknown
transition core with features of state and action. Despite much recent progress
in analyzing algorithms in the linear MDP setting, the understanding of more
general transition models is very restrictive. In this paper, we establish a
provably efficient RL algorithm for the MDP whose state transition is given by
a multinomial logistic model. To balance the exploration-exploitation
trade-off, we propose an upper confidence bound-based algorithm. We show that
our proposed algorithm achieves $\tilde{O}(d \sqrt{H^3 T})$ regret bound where
$d$ is the dimension of the transition core, $H$ is the horizon, and $T$ is the
total number of steps. To the best of our knowledge, this is the first
model-based RL algorithm with multinomial logistic function approximation with
provable guarantees. We also comprehensively evaluate our proposed algorithm
numerically and show that it consistently outperforms the existing methods,
hence achieving both provable efficiency and practical superior performance.
[COMMENTS]
Accepted in AAAI 2023 (Main Technical Track)
[LINK]
http://arxiv.org/abs/2212.13540v2
[DATE]
2024-11-17 17:17:02+08:00
[CATEGORIES]
cs.LG
Beyond Normal: Learning Spatial Density Models of Node Mobility
[AUTHORS]
Wanxin Gao, Ioanis Nikolaidis, Janelle Harms
[ABSTRACT]
Learning models of complex spatial density functions, representing the
steady-state density of mobile nodes moving on a two-dimensional terrain, can
assist in network design and optimization problems, e.g., by accelerating the
computation of the density function during a parameter sweep. We address the
question of applicability for off-the-shelf mixture density network models for
the description of mobile node density over a disk. We propose the use of
M"obius distributions to retain symmetric spatial relations, yet be flexible
enough to capture changes as one radially traverses the disk. The mixture
models for M"obius versus Gaussian distributions are compared and the benefits
of choosing M"obius distributions become evident, yet we also observe that
learning mixtures of M"obius distributions is a fragile process, when using
current tools, compared to learning mixtures of Gaussians.
[LINK]
http://arxiv.org/abs/2411.10997v1
[DATE]
2024-11-17 16:10:39+08:00
[CATEGORIES]
cs.LG
ERATTA: Extreme RAG for Table To Answers with Large Language Models
[AUTHORS]
Sohini Roychowdhury, Marko Krema, Anvar Mahammad, Brian Moore, Arijit Mukherjee, Punit Prakashchandra
[ABSTRACT]
Large language models (LLMs) with retrieval augmented-generation (RAG) have
been the optimal choice for scalable generative AI solutions in the recent
past. Although RAG implemented with AI agents (agentic-RAG) has been recently
popularized, its suffers from unstable cost and unreliable performances for
Enterprise-level data-practices. Most existing use-cases that incorporate RAG
with LLMs have been either generic or extremely domain specific, thereby
questioning the scalability and generalizability of RAG-LLM approaches. In this
work, we propose a unique LLM-based system where multiple LLMs can be invoked
to enable data authentication, user-query routing, data-retrieval and custom
prompting for question-answering capabilities from Enterprise-data tables. The
source tables here are highly fluctuating and large in size and the proposed
framework enables structured responses in under 10 seconds per query.
Additionally, we propose a five metric scoring module that detects and reports
hallucinations in the LLM responses. Our proposed system and scoring metrics
achieve >90% confidence scores across hundreds of user queries in the
sustainability, financial health and social media domains. Extensions to the
proposed extreme RAG architectures can enable heterogeneous source querying
using LLMs.
[COMMENTS]
5 pages, 4 tables, IEEE Big Data, 2024
[LINK]
http://arxiv.org/abs/2405.03963v4
[DATE]
2024-11-17 15:23:40+08:00
[CATEGORIES]
cs.LG
Program Evaluation with Remotely Sensed Outcomes
[AUTHORS]
Ashesh Rambachan, Rahul Singh, Davide Viviano
[ABSTRACT]
While traditional program evaluations typically rely on surveys to measure
outcomes, certain economic outcomes such as living standards or environmental
quality may be infeasible or costly to collect. As a result, recent empirical
work estimates treatment effects using remotely sensed variables (RSVs), such
mobile phone activity or satellite images, instead of ground-truth outcome
measurements. Common practice predicts the economic outcome from the RSV, using
an auxiliary sample of labeled RSVs, and then uses such predictions as the
outcome in the experiment. We prove that this approach leads to biased
estimates of treatment effects when the RSV is a post-outcome variable. We
nonparametrically identify the treatment effect, using an assumption that
reflects the logic of recent empirical research: the conditional distribution
of the RSV remains stable across both samples, given the outcome and treatment.
Our results do not require researchers to know or consistently estimate the
relationship between the RSV, outcome, and treatment, which is typically
mis-specified with unstructured data. We form a representation of the RSV for
downstream causal inference by predicting the outcome and predicting the
treatment, with better predictions leading to more precise causal estimates. We
re-evaluate the efficacy of a large-scale public program in India, showing that
the program’s measured effects on local consumption and poverty can be
replicated using satellite
[LINK]
http://arxiv.org/abs/2411.10959v1
[DATE]
2024-11-17 12:43:04+08:00
[CATEGORIES]
cs.LG
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
[AUTHORS]
Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen
[ABSTRACT]
Although quantization for linear layers has been widely used, its application
to accelerate the attention process remains limited. SageAttention utilizes
8-bit matrix multiplication, 16-bit matrix multiplication with 16-bit
accumulator, and precision-enhancing methods, implementing an accurate and 2x
speedup kernel compared to FlashAttention2. To further enhance the efficiency
of attention computation while maintaining precision, we propose
SageAttention2, which utilizes significantly faster 4-bit matrix multiplication
(Matmul) alongside additional precision-enhancing techniques. First, we propose
to quantize matrixes $(Q, K)$ to INT4 in a warp-level granularity and quantize
matrixes $(\widetilde P, V)$ to FP8. Second, we propose a method to smooth $Q$
and $V$, enhancing the accuracy of attention with INT4 $QK$ and FP8 $PV$.
Third, we analyze the quantization accuracy across timesteps and layers, then
propose an adaptive quantization method to ensure the end-to-end metrics over
various models. The operations per second (OPS) of SageAttention2 surpass
FlashAttention2 and xformers by about 3x and 5x on RTX4090, respectively.
Comprehensive experiments confirm that our approach incurs negligible
end-to-end metrics loss across diverse models, including those for large
language processing, image generation, and video generation. The codes are
available at https://github.com/thu-ml/SageAttention.
[LINK]
http://arxiv.org/abs/2411.10958v1
[DATE]
2024-11-17 12:35:49+08:00
[CATEGORIES]
cs.LG
IMPaCT GNN: Imposing invariance with Message Passing in Chronological split Temporal Graphs
[AUTHORS]
Sejun Park, Joo Young Park, Hyunwoo Park
[ABSTRACT]
This paper addresses domain adaptation challenges in graph data resulting
from chronological splits. In a transductive graph learning setting, where each
node is associated with a timestamp, we focus on the task of Semi-Supervised
Node Classification (SSNC), aiming to classify recent nodes using labels of
past nodes. Temporal dependencies in node connections create domain shifts,
causing significant performance degradation when applying models trained on
historical data into recent data. Given the practical relevance of this
scenario, addressing domain adaptation in chronological split data is crucial,
yet underexplored. We propose Imposing invariance with Message Passing in
Chronological split Temporal Graphs (IMPaCT), a method that imposes invariant
properties based on realistic assumptions derived from temporal graph
structures. Unlike traditional domain adaptation approaches which rely on
unverifiable assumptions, IMPaCT explicitly accounts for the characteristics of
chronological splits. The IMPaCT is further supported by rigorous mathematical
analysis, including a derivation of an upper bound of the generalization error.
Experimentally, IMPaCT achieves a 3.8% performance improvement over current
SOTA method on the ogbn-mag graph dataset. Additionally, we introduce the
Temporal Stochastic Block Model (TSBM), which replicates temporal graphs under
varying conditions, demonstrating the applicability of our methods to general
spatial GNNs.
[COMMENTS]
11 pages (without appendix), 35 pages (with appendix), 14 figures
[LINK]
http://arxiv.org/abs/2411.10957v1
[DATE]
2024-11-17 12:23:25+08:00
[CATEGORIES]
cs.LG
Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process
[AUTHORS]
Sandipp Krishnan Ravi, Yigitcan Comlek, Arjun Pathak, Vipul Gupta, Rajnikant Umretiya, Andrew Hoffman, Ghanshyam Pilania, Piyush Pandita, Sayan Ghosh, Nathaniel Mckeever, Wei Chen, Liping Wang
[ABSTRACT]
With the advent of artificial intelligence and machine learning, various
domains of science and engineering communities have leveraged data-driven
surrogates to model complex systems through fusing numerous sources of
information (data) from published papers, patents, open repositories, or other
resources. However, not much attention has been paid to the differences in
quality and comprehensiveness of the known and unknown underlying physical
parameters of the information sources, which could have downstream implications
during system optimization. Additionally, existing methods cannot fuse
multi-source data into a single predictive model. Towards resolving this issue,
a multi-source data fusion framework based on Latent Variable Gaussian Process
(LVGP) is proposed. The individual data sources are tagged as a characteristic
categorical variable that are mapped into a physically interpretable latent
space, allowing the development of source-aware data fusion modeling.
Additionally, a dissimilarity metric based on the latent variables of LVGP is
introduced to study and understand the differences in the sources of data. The
proposed approach is demonstrated on and analyzed through two mathematical and
two materials science case studies. From the case studies, it is observed that
compared to using single-source and source unaware machine learning models, the
proposed multi-source data fusion framework can provide better predictions for
sparse-data problems.
[COMMENTS]
27 Pages, 10 Figures, 5 Supplementary Figures, 2 Supplementary Tables
[LINK]
http://arxiv.org/abs/2402.04146v4
[DATE]
2024-11-17 11:59:54+08:00
[CATEGORIES]
cs.LG
Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
[AUTHORS]
Yijun Dong, Hoang Phan, Xiang Pan, Qi Lei
[ABSTRACT]
We revisit data selection in a modern context of finetuning from a
fundamental perspective. Extending the classical wisdom of variance
minimization in low dimensions to high-dimensional finetuning, our
generalization analysis unveils the importance of additionally reducing bias
induced by low-rank approximation. Inspired by the variance-bias tradeoff in
high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a
scalable data selection scheme with two stages. (i) First, the bias is
controlled using gradient sketching that explores the finetuning parameter
space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the
variance is reduced over $\mathcal{S}$ via moment matching between the original
and selected datasets. Theoretically, we show that gradient sketching is fast
and provably accurate: selecting $n$ samples by reducing variance over
$\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$,
independent of the parameter dimension. Empirically, we concretize the
variance-bias balance via synthetic experiments and demonstrate the
effectiveness of SkMM for finetuning in real vision tasks.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2407.06120v3
[DATE]
2024-11-17 11:02:19+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks for Financial Fraud Detection: A Review
[AUTHORS]
Dawei Cheng, Yao Zou, Sheng Xiang, Changjun Jiang
[ABSTRACT]
The landscape of financial transactions has grown increasingly complex due to
the expansion of global economic integration and advancements in information
technology. This complexity poses greater challenges in detecting and managing
financial fraud. This review explores the role of Graph Neural Networks (GNNs)
in addressing these challenges by proposing a unified framework that
categorizes existing GNN methodologies applied to financial fraud detection.
Specifically, by examining a series of detailed research questions, this review
delves into the suitability of GNNs for financial fraud detection, their
deployment in real-world scenarios, and the design considerations that enhance
their effectiveness. This review reveals that GNNs are exceptionally adept at
capturing complex relational patterns and dynamics within financial networks,
significantly outperforming traditional fraud detection methods. Unlike
previous surveys that often overlook the specific potentials of GNNs or address
them only superficially, our review provides a comprehensive, structured
analysis, distinctly focusing on the multifaceted applications and deployments
of GNNs in financial fraud detection. This review not only highlights the
potential of GNNs to improve fraud detection mechanisms but also identifies
current gaps and outlines future research directions to enhance their
deployment in financial systems. Through a structured review of over 100
studies, this review paper contributes to the understanding of GNN applications
in financial fraud detection, offering insights into their adaptability and
potential integration strategies.
[COMMENTS]
17 Pages, 2 Figures
[LINK]
http://arxiv.org/abs/2411.05815v2
[DATE]
2024-11-17 11:01:05+08:00
[CATEGORIES]
cs.LG
UQE: A Query Engine for Unstructured Databases
[AUTHORS]
Hanjun Dai, Bethany Yixin Wang, Xingchen Wan, Bo Dai, Sherry Yang, Azade Nova, Pengcheng Yin, Phitchaya Mangpo Phothilimthana, Charles Sutton, Dale Schuurmans
[ABSTRACT]
Analytics on structured data is a mature field with many successful methods.
However, most real world data exists in unstructured form, such as images and
conversations. We investigate the potential of Large Language Models (LLMs) to
enable unstructured data analytics. In particular, we propose a new Universal
Query Engine (UQE) that directly interrogates and draws insights from
unstructured data collections. This engine accepts queries in a Universal Query
Language (UQL), a dialect of SQL that provides full natural language
flexibility in specifying conditions and operators. The new engine leverages
the ability of LLMs to conduct analysis of unstructured data, while also
allowing us to exploit advances in sampling and optimization techniques to
achieve efficient and accurate query execution. In addition, we borrow
techniques from classical compiler theory to better orchestrate the workflow
between sampling methods and foundation model calls. We demonstrate the
efficiency of UQE on data analytics across different modalities, including
images, dialogs and reviews, across a range of useful query types, including
conditional aggregation, semantic retrieval and abstraction aggregation.
[LINK]
http://arxiv.org/abs/2407.09522v2
[DATE]
2024-11-17 10:22:36+08:00
[CATEGORIES]
cs.LG
4+3 Phases of Compute-Optimal Neural Scaling Laws
[AUTHORS]
Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington
[ABSTRACT]
We consider the solvable neural scaling model with three parameters: data
complexity, target complexity, and model-parameter-count. We use this neural
scaling model to derive new predictions about the compute-limited,
infinite-data scaling law regime. To train the neural scaling model, we run
one-pass stochastic gradient descent on a mean-squared loss. We derive a
representation of the loss curves which holds over all iteration counts and
improves in accuracy as the model parameter count grows. We then analyze the
compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in
the data-complexity/target-complexity phase-plane. The phase boundaries are
determined by the relative importance of model capacity, optimizer noise, and
embedding of the features. We furthermore derive, with mathematical proof and
extensive numerical evidence, the scaling-law exponents in all of these phases,
in particular computing the optimal model-parameter-count as a function of
floating point operation budget.
[LINK]
http://arxiv.org/abs/2405.15074v2
[DATE]
2024-11-17 09:57:32+08:00
[CATEGORIES]
cs.LG
Improved AutoEncoder with LSTM module and KL divergence
[AUTHORS]
Wei Huang, Bingyang Zhang, Kaituo Zhang, Hua Gao, Rongchun Wan
[ABSTRACT]
The task of anomaly detection is to separate anomalous data from normal data
in the dataset. Models such as deep convolutional autoencoder (CAE) network and
deep supporting vector data description (SVDD) model have been universally
employed and have demonstrated significant success in detecting anomalies.
However, the over-reconstruction ability of CAE network for anomalous data can
easily lead to high false negative rate in detecting anomalous data. On the
other hand, the deep SVDD model has the drawback of feature collapse, which
leads to a decrease of detection accuracy for anomalies. To address these
problems, we propose the Improved AutoEncoder with LSTM module and
Kullback-Leibler divergence (IAE-LSTM-KL) model in this paper. An LSTM network
is added after the encoder to memorize feature representations of normal data.
In the meanwhile, the phenomenon of feature collapse can also be mitigated by
penalizing the featured input to SVDD module via KL divergence. The efficacy of
the IAE-LSTM-KL model is validated through experiments on both synthetic and
real-world datasets. Experimental results show that IAE-LSTM-KL model yields
higher detection accuracy for anomalies. In addition, it is also found that the
IAE-LSTM-KL model demonstrates enhanced robustness to contaminated outliers in
the dataset. All code may be found at
https://github.com/crazyn2/IAE-LSTM-KL_codes
[LINK]
http://arxiv.org/abs/2404.19247v2
[DATE]
2024-11-17 09:41:12+08:00
[CATEGORIES]
cs.LG
Constrained Diffusion with Trust Sampling
[AUTHORS]
William Huang, Yifeng Jiang, Tom Van Wouwe, C. Karen Liu
[ABSTRACT]
Diffusion models have demonstrated significant promise in various generative
tasks; however, they often struggle to satisfy challenging constraints. Our
approach addresses this limitation by rethinking training-free loss-guided
diffusion from an optimization perspective. We formulate a series of
constrained optimizations throughout the inference process of a diffusion
model. In each optimization, we allow the sample to take multiple steps along
the gradient of the proxy constraint function until we can no longer trust the
proxy, according to the variance at each diffusion level. Additionally, we
estimate the state manifold of diffusion model to allow for early termination
when the sample starts to wander away from the state manifold at each diffusion
step. Trust sampling effectively balances between following the unconditional
diffusion model and adhering to the loss guidance, enabling more flexible and
accurate constrained generation. We demonstrate the efficacy of our method
through extensive experiments on complex tasks, and in drastically different
domains of images and 3D motion generation, showing significant improvements
over existing methods in terms of generation quality. Our implementation is
available at https://github.com/will-s-h/trust-sampling.
[COMMENTS]
18 pages, 6 figures, NeurIPS
[LINK]
http://arxiv.org/abs/2411.10932v1
[DATE]
2024-11-17 09:34:57+08:00
[CATEGORIES]
cs.LG
Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder
[AUTHORS]
Cheng Zeng, Zulqarnain Khan, Nathan L. Post
[ABSTRACT]
Inverse materials design has proven successful in accelerating novel material
discovery. Many inverse materials design methods use unsupervised learning
where a latent space is learned to offer a compact description of materials
representations. A latent space learned this way is likely to be entangled, in
terms of the target property and other properties of the materials. This makes
the inverse design process ambiguous. Here, we present a semi-supervised
learning approach based on a disentangled variational autoencoder to learn a
probabilistic relationship between features, latent variables and target
properties. This approach is data efficient because it combines all labelled
and unlabelled data in a coherent manner, and it uses expert-informed prior
distributions to improve model robustness even with limited labelled data. It
is in essence interpretable, as the learnable target property is disentangled
out of the other properties of the materials, and an extra layer of
interpretability can be provided by a post-hoc analysis of the classification
head of the model. We demonstrate this new approach on an experimental
high-entropy alloy dataset with chemical compositions as input and single-phase
formation as the single target property. High-entropy alloys were chosen as
example materials because of the vast chemical space of their possible
combinations of compositions and atomic configurations. While single property
is used in this work, the disentangled model can be extended to customize for
inverse design of materials with multiple target properties.
[LINK]
http://arxiv.org/abs/2409.06740v2
[DATE]
2024-11-17 09:11:44+08:00
[CATEGORIES]
cs.LG
Video Diffusion Models: A Survey
[AUTHORS]
Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter
[ABSTRACT]
Diffusion generative models have recently become a powerful technique for
creating and modifying high-quality, coherent video content. This survey
provides a comprehensive overview of the critical components of diffusion
models for video generation, including their applications, architectural
design, and temporal dynamics modeling. The paper begins by discussing the core
principles and mathematical formulations, then explores various architectural
choices and methods for maintaining temporal consistency. A taxonomy of
applications is presented, categorizing models based on input modalities such
as text prompts, images, videos, and audio signals. Advancements in
text-to-video generation are discussed to illustrate the state-of-the-art
capabilities and limitations of current approaches. Additionally, the survey
summarizes recent developments in training and evaluation practices, including
the use of diverse video and image datasets and the adoption of various
evaluation metrics to assess model performance. The survey concludes with an
examination of ongoing challenges, such as generating longer videos and
managing computational costs, and offers insights into potential future
directions for the field. By consolidating the latest research and
developments, this survey aims to serve as a valuable resource for researchers
and practitioners working with video diffusion models. Website:
https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models
[COMMENTS]
https://github.com/ndrwmlnk/Awesome-Video-Diffusion-Models
[LINK]
http://arxiv.org/abs/2405.03150v2
[DATE]
2024-11-17 08:40:09+08:00
[CATEGORIES]
cs.LG
Distributed solar generation forecasting using attention-based deep neural networks for cloud movement prediction
[AUTHORS]
Maneesha Perera, Julian De Hoog, Kasun Bandara, Saman Halgamuge
[ABSTRACT]
Accurate forecasts of distributed solar generation are necessary to reduce
negative impacts resulting from the increased uptake of distributed solar
photovoltaic (PV) systems. However, the high variability of solar generation
over short time intervals (seconds to minutes) caused by cloud movement makes
this forecasting task difficult. To address this, using cloud images, which
capture the second-to-second changes in cloud cover affecting solar generation,
has shown promise. Recently, deep neural networks with “attention” that focus
on important regions of an image have been applied with success in many
computer vision applications. However, their use for forecasting cloud movement
has not yet been extensively explored. In this work, we propose an
attention-based convolutional long short-term memory network to forecast cloud
movement and apply an existing self-attention-based method previously proposed
for video prediction to forecast cloud movement. We investigate and discuss the
impact of cloud forecasts from attention-based methods towards forecasting
distributed solar generation, compared to cloud forecasts from
non-attention-based methods. We further provide insights into the different
solar forecast performances that can be achieved for high and low altitude
clouds. We find that for clouds at high altitudes, the cloud predictions
obtained using attention-based methods result in solar forecast skill score
improvements of 5.86% or more compared to non-attention-based methods.
[LINK]
http://arxiv.org/abs/2411.10921v1
[DATE]
2024-11-17 08:37:35+08:00
[CATEGORIES]
cs.LG
Generating Compositional Scenes via Text-to-image RGBA Instance Generation
[AUTHORS]
Alessandro Fontanella, Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang, Sarah Parisot
[ABSTRACT]
Text-to-image diffusion generative models can generate high quality images at
the cost of tedious prompt engineering. Controllability can be improved by
introducing layout conditioning, however existing methods lack layout editing
ability and fine-grained control over object attributes. The concept of
multi-layer generation holds great potential to address these limitations,
however generating image instances concurrently to scene composition limits
control over fine-grained object attributes, relative positioning in 3D space
and scene manipulation abilities. In this work, we propose a novel multi-stage
generation paradigm that is designed for fine-grained control, flexibility and
interactivity. To ensure control over instance attributes, we devise a novel
training paradigm to adapt a diffusion model to generate isolated scene
components as RGBA images with transparency information. To build complex
images, we employ these pre-generated instances and introduce a multi-layer
composite generation process that smoothly assembles components in realistic
scenes. Our experiments show that our RGBA diffusion model is capable of
generating diverse and high quality instances with precise control over object
attributes. Through multi-layer composition, we demonstrate that our approach
allows to build and manipulate images from highly complex prompts with
fine-grained control over object appearance and location, granting a higher
degree of control than competing methods.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10913v1
[DATE]
2024-11-17 07:44:14+08:00
[CATEGORIES]
cs.LG
Constructing accurate machine-learned potentials and performing highly efficient atomistic simulations to predict structural and thermal properties
[AUTHORS]
Junlan Liu, Qian Yin, Mengshu He, Jun Zhou
[ABSTRACT]
The $\text{Cu}_7\text{P}\text{S}_6$ compound has garnered significant
attention due to its potential in thermoelectric applications. In this study,
we introduce a neuroevolution potential (NEP), trained on a dataset generated
from ab initio molecular dynamics (AIMD) simulations, using the moment tensor
potential (MTP) as a reference. The low root mean square errors (RMSEs) for
total energy and atomic forces demonstrate the high accuracy and
transferability of both the MTP and NEP. We further calculate the phonon
density of states (DOS) and radial distribution function (RDF) using both
machine learning potentials, comparing the results to density functional theory
(DFT) calculations. While the MTP potential offers slightly higher accuracy,
the NEP achieves a remarkable 41-fold increase in computational speed. These
findings provide detailed microscopic insights into the dynamics and rapid
Cu-ion diffusion, paving the way for future studies on Cu-based solid
electrolytes and their applications in energy devices.
[LINK]
http://arxiv.org/abs/2411.10911v1
[DATE]
2024-11-17 07:16:59+08:00
[CATEGORIES]
cs.LG
Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs
[AUTHORS]
Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios Myrisiotis, Zhenan Wu
[ABSTRACT]
Reinforcement learning algorithms are usually stated without theoretical
guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan
(COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely,
LSVI-UCB) for the setting of linear Markov decision processes, and provided
theoretical guarantees regarding its running time and regret. In real-world
scenarios, however, the space usage of this algorithm can be prohibitive due to
a utilized linear regression step. We propose and analyze two modifications of
LSVI-UCB, which alternate periods of learning and not-learning, to reduce space
and time usage while maintaining sublinear regret. We show experimentally, on
synthetic data and real-world benchmarks, that our algorithms achieve low space
usage and running time, while not significantly sacrificing regret.
[COMMENTS]
27 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.10906v1
[DATE]
2024-11-17 06:51:52+08:00
[CATEGORIES]
cs.LG
Clover: Closed-Loop Verifiable Code Generation
[AUTHORS]
Chuyue Sun, Ying Sheng, Oded Padon, Clark Barrett
[ABSTRACT]
The use of large language models for code generation is a rapidly growing
trend in software development. However, without effective methods for ensuring
the correctness of generated code, this trend could lead to undesirable
outcomes. In this paper, we introduce a new approach for addressing this
challenge: the Clover paradigm, short for Closed-Loop Verifiable Code
Generation, which uses consistency checking to provide a strong filter for
incorrect code. Clover performs consistency checks among code, docstrings, and
formal annotations. The checker is implemented using a novel integration of
formal verification tools and large language models. We provide a theoretical
analysis to support our thesis that Clover should be effective at consistency
checking. We also empirically investigate its performance on a hand-designed
dataset (CloverBench) featuring annotated Dafny programs at a textbook level of
difficulty. Experimental results show that for this dataset: (i) LLMs are
reasonably successful at automatically generating formal specifications; and
(ii) our consistency checker achieves a promising acceptance rate (up to 87%)
for correct instances while maintaining zero tolerance for adversarial
incorrect ones (no false positives). Clover also discovered 6 incorrect
programs in the existing human-written dataset MBPP-DFY-50.
[COMMENTS]
add appendix
[LINK]
http://arxiv.org/abs/2310.17807v4
[DATE]
2024-11-17 05:57:49+08:00
[CATEGORIES]
cs.LG
Watermarking Generative Categorical Data
[AUTHORS]
Bochao Gu, Hengzhi He, Guang Cheng
[ABSTRACT]
In this paper, we propose a novel statistical framework for watermarking
generative categorical data. Our method systematically embeds pre-agreed secret
signals by splitting the data distribution into two components and modifying
one distribution based on a deterministic relationship with the other, ensuring
the watermark is embedded at the distribution-level. To verify the watermark,
we introduce an insertion inverse algorithm and detect its presence by
measuring the total variation distance between the inverse-decoded data and the
original distribution. Unlike previous categorical watermarking methods, which
primarily focus on embedding watermarks into a given dataset, our approach
operates at the distribution-level, allowing for verification from a
statistical distributional perspective. This makes it particularly well-suited
for the modern paradigm of synthetic data generation, where the underlying data
distribution, rather than specific data points, is of primary importance. The
effectiveness of our method is demonstrated through both theoretical analysis
and empirical validation.
[LINK]
http://arxiv.org/abs/2411.10898v1
[DATE]
2024-11-17 05:57:45+08:00
[CATEGORIES]
cs.LG
Neuc-MDS: Non-Euclidean Multidimensional Scaling Through Bilinear Forms
[AUTHORS]
Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Hongbin Sun, Cheng Xin
[ABSTRACT]
We introduce Non-Euclidean-MDS (Neuc-MDS), an extension of classical
Multidimensional Scaling (MDS) that accommodates non-Euclidean and non-metric
inputs. The main idea is to generalize the standard inner product to symmetric
bilinear forms to utilize the negative eigenvalues of dissimilarity Gram
matrices. Neuc-MDS efficiently optimizes the choice of (both positive and
negative) eigenvalues of the dissimilarity Gram matrix to reduce STRESS, the
sum of squared pairwise error. We provide an in-depth error analysis and proofs
of the optimality in minimizing lower bounds of STRESS. We demonstrate
Neuc-MDS’s ability to address limitations of classical MDS raised by prior
research, and test it on various synthetic and real-world datasets in
comparison with both linear and non-linear dimension reduction methods.
[COMMENTS]
Accepted to 38th Conference on Neural Information Processing Systems
(NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2411.10889v1
[DATE]
2024-11-17 05:09:38+08:00
[CATEGORIES]
cs.LG
A Survey of Graph Unlearning
[AUTHORS]
Anwar Said, Yuying Zhao, Tyler Derr, Mudassir Shabbir, Waseem Abbas, Xenofon Koutsoukos
[ABSTRACT]
Graph unlearning emerges as a crucial advancement in the pursuit of
responsible AI, providing the means to remove sensitive data traces from
trained models, thereby upholding the right to be forgotten. It is evident that
graph machine learning exhibits sensitivity to data privacy and adversarial
attacks, necessitating the application of graph unlearning techniques to
address these concerns effectively. In this comprehensive survey paper, we
present the first systematic review of graph unlearning approaches,
encompassing a diverse array of methodologies and offering a detailed taxonomy
and up-to-date literature overview to facilitate the understanding of
researchers new to this field. To ensure clarity, we provide lucid explanations
of the fundamental concepts and evaluation measures used in graph unlearning,
catering to a broader audience with varying levels of expertise. Delving into
potential applications, we explore the versatility of graph unlearning across
various domains, including but not limited to social networks, adversarial
settings, recommender systems, and resource-constrained environments like the
Internet of Things, illustrating its potential impact in safeguarding data
privacy and enhancing AI systems’ robustness. Finally, we shed light on
promising research directions, encouraging further progress and innovation
within the domain of graph unlearning. By laying a solid foundation and
fostering continued progress, this survey seeks to inspire researchers to
further advance the field of graph unlearning, thereby instilling confidence in
the ethical growth of AI systems and reinforcing the responsible application of
machine learning techniques in various domains.
[COMMENTS]
22 page review paper on graph unlearning
[LINK]
http://arxiv.org/abs/2310.02164v3
[DATE]
2024-11-17 04:51:30+08:00
[CATEGORIES]
cs.LG
Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network
[AUTHORS]
Raúl de la Fuente, Luciano Radrigan, Anibal S Morales
[ABSTRACT]
Mining machinery operating in variable environments faces high wear and
unpredictable stress, challenging Predictive Maintenance (PdM). This paper
introduces the Edge Sensor Network for Predictive Maintenance (ESN-PdM), a
hierarchical inference framework across edge devices, gateways, and cloud
services for real-time condition monitoring. The system dynamically adjusts
inference locations–on-device, on-gateway, or on-cloud–based on trade-offs
among accuracy, latency, and battery life, leveraging Tiny Machine Learning
(TinyML) techniques for model optimization on resource-constrained devices.
Performance evaluations showed that on-sensor and on-gateway inference modes
achieved over 90\% classification accuracy, while cloud-based inference reached
99\%. On-sensor inference reduced power consumption by approximately 44\%,
enabling up to 104 hours of operation. Latency was lowest for on-device
inference (3.33 ms), increasing when offloading to the gateway (146.67 ms) or
cloud (641.71 ms). The ESN-PdM framework provides a scalable, adaptive solution
for reliable anomaly detection and PdM, crucial for maintaining machinery
uptime in remote environments. By balancing accuracy, latency, and energy
consumption, this approach advances PdM frameworks for industrial applications.
[COMMENTS]
This work has been submitted to the IEEE for possible publication
[LINK]
http://arxiv.org/abs/2411.07168v2
[DATE]
2024-11-17 03:41:25+08:00
[CATEGORIES]
cs.LG
See-Saw Generative Mechanism for Scalable Recursive Code Generation with Generative AI
[AUTHORS]
Ruslan Idelfonso Magaña Vsevolodovna
[ABSTRACT]
The generation of complex, large-scale code projects using generative AI
models presents challenges due to token limitations, dependency management, and
iterative refinement requirements. This paper introduces the See-Saw generative
mechanism, a novel methodology for dynamic and recursive code generation. The
proposed approach alternates between main code updates and dependency
generation to ensure alignment and functionality. By dynamically optimizing
token usage and incorporating key elements of the main code into the generation
of dependencies, the method enables efficient and scalable code generation for
projects requiring hundreds of interdependent files. The mechanism ensures that
all code components are synchronized and functional, enabling scalable and
efficient project generation. Experimental validation demonstrates the method’s
capability to manage dependencies effectively while maintaining coherence and
minimizing computational overhead.
[COMMENTS]
18 pages, 4 figures
[LINK]
http://arxiv.org/abs/2411.10861v1
[DATE]
2024-11-17 02:54:56+08:00
[CATEGORIES]
cs.LG
ReLU’s Revival: On the Entropic Overload in Normalization-Free Large Language Models
[AUTHORS]
Nandan Kumar Jha, Brandon Reagen
[ABSTRACT]
LayerNorm is a critical component in modern large language models (LLMs) for
stabilizing training and ensuring smooth optimization. However, it introduces
significant challenges in mechanistic interpretability, outlier feature
suppression, faithful signal propagation, and computational and communication
complexity of private inference. This work explores desirable activation
functions in normalization-free decoder-only LLMs. Contrary to the conventional
preference for the GELU in transformer-based models, our empirical findings
demonstrate an {\em opposite trend} – ReLU significantly outperforms GELU in
LayerNorm-free models, leading to an {\bf 8.2\%} perplexity improvement. We
discover a key issue with GELU, where early layers experience entropic
overload, leading to the under-utilization of the representational capacity of
attention heads. This highlights that smoother activations like GELU are {\em
ill-suited} for LayerNorm-free architectures, whereas ReLU’s geometrical
properties – specialization in input space and intra-class selectivity – lead
to improved learning dynamics and better information retention in the absence
of LayerNorm. This study offers key insights for optimizing transformer
architectures where LayerNorm introduces significant challenges. The code and
implementation are available at
https://github.com/Nandan91/relu-revival-normfree
[COMMENTS]
Accepted to NeurIPS 2024 Workshop on Attributing Model Behavior at
Scale (Camera-ready version)
[LINK]
http://arxiv.org/abs/2410.09637v3
[DATE]
2024-11-17 01:59:35+08:00
[CATEGORIES]
cs.LG
Verifiably Robust Conformal Prediction
[AUTHORS]
Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti
[ABSTRACT]
Conformal Prediction (CP) is a popular uncertainty quantification method that
provides distribution-free, statistically valid prediction sets, assuming that
training and test data are exchangeable. In such a case, CP’s prediction sets
are guaranteed to cover the (unknown) true test output with a user-specified
probability. Nevertheless, this guarantee is violated when the data is
subjected to adversarial attacks, which often result in a significant loss of
coverage. Recently, several approaches have been put forward to recover CP
guarantees in this setting. These approaches leverage variations of randomised
smoothing to produce conservative sets which account for the effect of the
adversarial perturbations. They are, however, limited in that they only support
$\ell^2$-bounded perturbations and classification tasks. This paper introduces
VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages
recent neural network verification methods to recover coverage guarantees under
adversarial attacks. Our VRCP method is the first to support perturbations
bounded by arbitrary norms including $\ell^1$, $\ell^2$, and $\ell^\infty$, as
well as regression tasks. We evaluate and compare our approach on image
classification tasks (CIFAR10, CIFAR100, and TinyImageNet) and regression tasks
for deep reinforcement learning environments. In every case, VRCP achieves
above nominal coverage and yields significantly more efficient and informative
prediction regions than the SotA.
[COMMENTS]
Accepted at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2405.18942v3
[DATE]
2024-11-17 01:51:33+08:00
[CATEGORIES]
cs.LG
Reinforcement Learning for Sequence Design Leveraging Protein Language Models
[AUTHORS]
Jithendaraa Subramanian, Shivakanth Sujit, Niloy Irtisam, Umong Sain, Riashat Islam, Derek Nowrouzezahrai, Samira Ebrahimi Kahou
[ABSTRACT]
Protein sequence design, determined by amino acid sequences, are essential to
protein engineering problems in drug discovery. Prior approaches have resorted
to evolutionary strategies or Monte-Carlo methods for protein design, but often
fail to exploit the structure of the combinatorial search space, to generalize
to unseen sequences. In the context of discrete black box optimization over
large search spaces, learning a mutation policy to generate novel sequences
with reinforcement learning is appealing. Recent advances in protein language
models (PLMs) trained on large corpora of protein sequences offer a potential
solution to this problem by scoring proteins according to their biological
plausibility (such as the TM-score). In this work, we propose to use PLMs as a
reward function to generate new sequences. Yet the PLM can be computationally
expensive to query due to its large size. To this end, we propose an
alternative paradigm where optimization can be performed on scores from a
smaller proxy model that is periodically finetuned, jointly while learning the
mutation policy. We perform extensive experiments on various sequence lengths
to benchmark RL-based approaches, and provide comprehensive evaluations along
biological plausibility and diversity of the protein. Our experimental results
include favorable evaluations of the proposed sequences, along with high
diversity scores, demonstrating that RL is a strong candidate for biological
sequence design. Finally, we provide a modular open source implementation can
be easily integrated in most RL training loops, with support for replacing the
reward model with other PLMs, to spur further research in this domain. The code
for all experiments is provided in the supplementary material.
[COMMENTS]
22 pages, 7 figures, 4 tables
[LINK]
http://arxiv.org/abs/2407.03154v2
[DATE]
2024-11-17 01:48:19+08:00
[CATEGORIES]
cs.LG
When is an Embedding Model More Promising than Another?
[AUTHORS]
Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida
[ABSTRACT]
Embedders play a central role in machine learning, projecting any object into
numerical representations that can, in turn, be leveraged to perform various
downstream tasks. The evaluation of embedding models typically depends on
domain-specific empirical approaches utilizing downstream tasks, primarily
because of the lack of a standardized framework for comparison. However,
acquiring adequately large and representative datasets for conducting these
assessments is not always viable and can prove to be prohibitively expensive
and time-consuming. In this paper, we present a unified approach to evaluate
embedders. First, we establish theoretical foundations for comparing embedding
models, drawing upon the concepts of sufficiency and informativeness. We then
leverage these concepts to devise a tractable comparison criterion (information
sufficiency), leading to a task-agnostic and self-supervised ranking procedure.
We demonstrate experimentally that our approach aligns closely with the
capability of embedding models to facilitate various downstream tasks in both
natural language processing and molecular biology. This effectively offers
practitioners a valuable tool for prioritizing model trials.
[LINK]
http://arxiv.org/abs/2406.07640v2
[DATE]
2024-11-17 01:01:02+08:00
[CATEGORIES]
cs.LG
Bayesian inverse Navier-Stokes problems: joint flow field reconstruction and parameter learning
[AUTHORS]
Alexandros Kontogiannis, Scott V. Elgersma, Andrew J. Sederman, Matthew P. Juniper
[ABSTRACT]
We formulate and solve a Bayesian inverse Navier-Stokes (N-S) problem that
assimilates velocimetry data in order to jointly reconstruct a 3D flow field
and learn the unknown N-S parameters, including the boundary position. By
hardwiring a generalised N-S problem, and regularising its unknown parameters
using Gaussian prior distributions, we learn the most likely parameters in a
collapsed search space. The most likely flow field reconstruction is then the
N-S solution that corresponds to the learned parameters. We develop the method
in the variational setting and use a stabilised Nitsche weak form of the N-S
problem that permits the control of all N-S parameters. To regularise the
inferred the geometry, we use a viscous signed distance field (vSDF) as an
auxiliary variable, which is given as the solution of a viscous Eikonal
boundary value problem. We devise an algorithm that solves this inverse
problem, and numerically implement it using an adjoint-consistent stabilised
cut-cell finite element method. We then use this method to reconstruct magnetic
resonance velocimetry (flow-MRI) data of a 3D steady laminar flow through a
physical model of an aortic arch for two different Reynolds numbers and
signal-to-noise ratio (SNR) levels (low/high). We find that the method can
accurately i) reconstruct the low SNR data by filtering out the noise/artefacts
and recovering flow features that are obscured by noise, and ii) reproduce the
high SNR data without overfitting. Although the framework that we develop
applies to 3D steady laminar flows in complex geometries, it readily extends to
time-dependent laminar and Reynolds-averaged turbulent flows, as well as
non-Newtonian (e.g. viscoelastic) fluids.
[LINK]
http://arxiv.org/abs/2406.18464v2
[DATE]
2024-11-17 00:57:04+08:00
[CATEGORIES]
cs.LG
Adaptive Learning of Design Strategies over Non-Hierarchical Multi-Fidelity Models via Policy Alignment
[AUTHORS]
Akash Agrawal, Christopher McComb
[ABSTRACT]
Multi-fidelity Reinforcement Learning (RL) frameworks significantly enhance
the efficiency of engineering design by leveraging analysis models with varying
levels of accuracy and computational costs. The prevailing methodologies,
characterized by transfer learning, human-inspired strategies, control variate
techniques, and adaptive sampling, predominantly depend on a structured
hierarchy of models. However, this reliance on a model hierarchy overlooks the
heterogeneous error distributions of models across the design space, extending
beyond mere fidelity levels. This work proposes ALPHA (Adaptively Learned
Policy with Heterogeneous Analyses), a novel multi-fidelity RL framework to
efficiently learn a high-fidelity policy by adaptively leveraging an arbitrary
set of non-hierarchical, heterogeneous, low-fidelity models alongside a
high-fidelity model. Specifically, low-fidelity policies and their experience
data are dynamically used for efficient targeted learning, guided by their
alignment with the high-fidelity policy. The effectiveness of ALPHA is
demonstrated in analytical test optimization and octocopter design problems,
utilizing two low-fidelity models alongside a high-fidelity one. The results
highlight ALPHA’s adaptive capability to dynamically utilize models across time
and design space, eliminating the need for scheduling models as required in a
hierarchical framework. Furthermore, the adaptive agents find more direct paths
to high-performance solutions, showing superior convergence behavior compared
to hierarchical agents.
[COMMENTS]
48 pages, 20 figures
[LINK]
http://arxiv.org/abs/2411.10841v1
[DATE]
2024-11-17 00:54:33+08:00
[CATEGORIES]
cs.LG
LoRA Unlearns More and Retains More (Student Abstract)
[AUTHORS]
Atharv Mittal
[ABSTRACT]
Due to increasing privacy regulations and regulatory compliance, Machine
Unlearning (MU) has become essential. The goal of unlearning is to remove
information related to a specific class from a model. Traditional approaches
achieve exact unlearning by retraining the model on the remaining dataset, but
incur high computational costs. This has driven the development of more
efficient unlearning techniques, including model sparsification techniques,
which boost computational efficiency, but degrade the model’s performance on
the remaining classes. To mitigate these issues, we propose a novel method,
PruneLoRA which introduces a new MU paradigm, termed prune first, then adapt,
then unlearn. LoRA (Hu et al. 2022) reduces the need for large-scale parameter
updates by applying low-rank updates to the model. We leverage LoRA to
selectively modify a subset of the pruned model’s parameters, thereby reducing
the computational cost, memory requirements and improving the model’s ability
to retain performance on the remaining classes. Experimental Results across
various metrics showcase that our method outperforms other approximate MU
methods and bridges the gap between exact and approximate unlearning. Our code
is available at https://github.com/vlgiitr/LoRA-Unlearn.
[COMMENTS]
AAAI-25 Student Abstract
[LINK]
http://arxiv.org/abs/2411.11907v1
[DATE]
2024-11-17 00:47:57+08:00
[CATEGORIES]
cs.LG
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
[AUTHORS]
Zihao Li, Yuan Cao, Cheng Gao, Yihan He, Han Liu, Jason M. Klusowski, Jianqing Fan, Mengdi Wang
[ABSTRACT]
Transformers have achieved great success in recent years. Interestingly,
transformers have shown particularly strong in-context learning capability –
even without fine-tuning, they are still able to solve unseen tasks well purely
based on task-specific prompts. In this paper, we study the capability of
one-layer transformers in learning one of the most classical nonparametric
estimators, the one-nearest neighbor prediction rule. Under a theoretical
framework where the prompt contains a sequence of labeled training data and
unlabeled test data, we show that, although the loss function is nonconvex when
trained with gradient descent, a single softmax attention layer can
successfully learn to behave like a one-nearest neighbor classifier. Our result
gives a concrete example of how transformers can be trained to implement
nonparametric machine learning algorithms, and sheds light on the role of
softmax attention in transformer models.
[LINK]
http://arxiv.org/abs/2411.10830v1
[DATE]
2024-11-17 00:12:42+08:00
[CATEGORIES]
cs.LG
Information Anxiety in Large Language Models
[AUTHORS]
Prasoon Bajpai, Sarah Masud, Tanmoy Chakraborty
[ABSTRACT]
Large Language Models (LLMs) have demonstrated strong performance as
knowledge repositories, enabling models to understand user queries and generate
accurate and context-aware responses. Extensive evaluation setups have
corroborated the positive correlation between the retrieval capability of LLMs
and the frequency of entities in their pretraining corpus. We take the
investigation further by conducting a comprehensive analysis of the internal
reasoning and retrieval mechanisms of LLMs. Our work focuses on three critical
dimensions - the impact of entity popularity, the models’ sensitivity to
lexical variations in query formulation, and the progression of hidden state
representations across LLM layers. Our preliminary findings reveal that popular
questions facilitate early convergence of internal states toward the correct
answer. However, as the popularity of a query increases, retrieved attributes
across lexical variations become increasingly dissimilar and less accurate.
Interestingly, we find that LLMs struggle to disentangle facts, grounded in
distinct relations, from their parametric memory when dealing with highly
popular subjects. Through a case study, we explore these latent strains within
LLMs when processing highly popular queries, a phenomenon we term information
anxiety. The emergence of information anxiety in LLMs underscores the
adversarial injection in the form of linguistic variations and calls for a more
holistic evaluation of frequently occurring entities.
[LINK]
http://arxiv.org/abs/2411.10813v1
[DATE]
2024-11-16 22:28:33+08:00
[CATEGORIES]
cs.CL
Can Generic LLMs Help Analyze Child-adult Interactions Involving Children with Autism in Clinical Observation?
[AUTHORS]
Tiantian Feng, Anfeng Xu, Rimita Lahiri, Helen Tager-Flusberg, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan
[COMMENTS]
GenAI for Health Workshop, NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10761v1
[DATE]
2024-11-16 17:36:56+08:00
[CATEGORIES]
cs.CL
Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation
[AUTHORS]
Shuyang Hou, Haoyue Jiao, Zhangxiao Shen, Jianyuan Liang, Anqi Zhao, Xiaopu Zhang, Jianxun Wang, Huayi Wu
[ABSTRACT]
With the rapid growth of interdisciplinary demands for geospatial modeling
and the rise of large language models (LLMs), geospatial code generation
technology has seen significant advancements. However, existing LLMs often face
challenges in the geospatial code generation process due to incomplete or
unclear user requirements and insufficient knowledge of specific platform
syntax rules, leading to the generation of non-executable code, a phenomenon
known as “code hallucination.” To address this issue, this paper proposes a
Chain of Programming (CoP) framework, which decomposes the code generation
process into five steps: requirement analysis, algorithm design, code
implementation, code debugging, and code annotation. The framework incorporates
a shared information pool, knowledge base retrieval, and user feedback
mechanisms, forming an end-to-end code generation flow from requirements to
code without the need for model fine-tuning. Based on a geospatial problem
classification framework and evaluation benchmarks, the CoP strategy
significantly improves the logical clarity, syntactical correctness, and
executability of the generated code, with improvements ranging from 3.0% to
48.8%. Comparative and ablation experiments further validate the superiority of
the CoP strategy over other optimization approaches and confirm the rationality
and necessity of its key components. Through case studies on building data
visualization and fire data analysis, this paper demonstrates the application
and effectiveness of CoP in various geospatial scenarios. The CoP framework
offers a systematic, step-by-step approach to LLM-based geospatial code
generation tasks, significantly enhancing code generation performance in
geospatial tasks and providing valuable insights for code generation in other
vertical domains.
[LINK]
http://arxiv.org/abs/2411.10753v1
[DATE]
2024-11-16 17:20:35+08:00
[CATEGORIES]
cs.CL
Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
[AUTHORS]
Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan
[ABSTRACT]
Transformers excel at in-context learning (ICL) – learning from
demonstrations without parameter updates – but how they do so remains a
mystery. Recent work suggests that Transformers may internally run Gradient
Descent (GD), a first-order optimization method, to perform ICL. In this paper,
we instead demonstrate that Transformers learn to approximate second-order
optimization methods for ICL. For in-context linear regression, Transformers
share a similar convergence rate as Iterative Newton’s Method, both
exponentially faster than GD. Empirically, predictions from successive
Transformer layers closely match different iterations of Newton’s Method
linearly, with each middle layer roughly computing 3 iterations; thus,
Transformers and Newton’s method converge at roughly the same rate. In
contrast, Gradient Descent converges exponentially more slowly. We also show
that Transformers can learn in-context on ill-conditioned data, a setting where
Gradient Descent struggles but Iterative Newton succeeds. Finally, to
corroborate our empirical findings, we prove that Transformers can implement
$k$ iterations of Newton’s method with $k + \mathcal{O}(1)$ layers.
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2310.17086v3
[DATE]
2024-11-16 16:20:24+08:00
[CATEGORIES]
cs.LG
cs.CL
Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and English
[AUTHORS]
Omar W. Abdalla, Aditya Joshi, Rahat Masood, Salil S. Kanhere
[ABSTRACT]
Satirical news is real news combined with a humorous comment or exaggerated
content, and it often mimics the format and style of real news. However,
satirical news is often misunderstood as misinformation, especially by
individuals from different cultural and social backgrounds. This research
addresses the challenge of distinguishing satire from truthful news by
leveraging multilingual satire detection methods in English and Arabic. We
explore both zero-shot and chain-of-thought (CoT) prompting using two language
models, Jais-chat(13B) and LLaMA-2-chat(7B). Our results show that CoT
prompting offers a significant advantage for the Jais-chat model over the
LLaMA-2-chat model. Specifically, Jais-chat achieved the best performance, with
an F1-score of 80\% in English when using CoT prompting. These results
highlight the importance of structured reasoning in CoT, which enhances
contextual understanding and is vital for complex tasks like satire detection.
[COMMENTS]
ALTA 2024 (Selected for publication)
[LINK]
http://arxiv.org/abs/2411.10730v1
[DATE]
2024-11-16 15:49:15+08:00
[CATEGORIES]
cs.CL
A Regularized LSTM Method for Detecting Fake News Articles
[AUTHORS]
Tanjina Sultana Camelia, Faizur Rahman Fahim, Md. Musfique Anwar
[ABSTRACT]
Nowadays, the rapid diffusion of fake news poses a significant problem, as it
can spread misinformation and confusion. This paper aims to develop an advanced
machine learning solution for detecting fake news articles. Leveraging a
comprehensive dataset of news articles, including 23,502 fake news articles and
21,417 accurate news articles, we implemented and evaluated three
machine-learning models. Our dataset, curated from diverse sources, provides
rich textual content categorized into title, text, subject, and Date features.
These features are essential for training robust classification models to
distinguish between fake and authentic news articles. The initial model
employed a Long Short-Term Memory (LSTM) network, achieving an accuracy of 94%.
The second model improved upon this by incorporating additional regularization
techniques and fine-tuning hyperparameters, resulting in a 97% accuracy. The
final model combined the strengths of previous architectures with advanced
optimization strategies, achieving a peak accuracy of 98%. These results
demonstrate the effectiveness of our approach in identifying fake news with
high precision. Implementing these models showcases significant advancements in
natural language processing and machine learning techniques, contributing
valuable tools for combating misinformation. Our work highlights the potential
for deploying such models in real-world applications, providing a reliable
method for automated fake news detection and enhancing the credibility of news
dissemination.
[COMMENTS]
6 pages, 7 figures, 2024 IEEE International Conference on Signal
Processing, Information, Communication and Systems (SPICSCON)
[LINK]
http://arxiv.org/abs/2411.10713v1
[DATE]
2024-11-16 13:54:36+08:00
[CATEGORIES]
cs.LG
cs.CL
Large Language Models are Null-Shot Learners
[AUTHORS]
Pittawat Taveekitworachai, Febri Abdullah, Ruck Thawonmas
[COMMENTS]
28 pages; v2: added Gemini Pro results, error analysis, and a
discussion on confabulation; v3: see its extended version, an EMNLP 2024
paper, at https://aclanthology.org/2024.emnlp-main.740/
[LINK]
http://arxiv.org/abs/2401.08273v3
[DATE]
2024-11-16 12:23:20+08:00
[CATEGORIES]
cs.CL
cs.LG
Structured Dialogue System for Mental Health: An LLM Chatbot Leveraging the PM+ Guidelines
[AUTHORS]
Yixiang Chen, Xinyu Zhang, Jinran Wang, Xurong Xie, Nan Yan, Hui Chen, Lan Wang
[ABSTRACT]
The Structured Dialogue System, referred to as SuDoSys, is an innovative
Large Language Model (LLM)-based chatbot designed to provide psychological
counseling. SuDoSys leverages the World Health Organization (WHO)’s Problem
Management Plus (PM+) guidelines to deliver stage-aware multi-turn dialogues.
Existing methods for employing an LLM in multi-turn psychological counseling
typically involve direct fine-tuning using generated dialogues, often
neglecting the dynamic stage shifts of counseling sessions. Unlike previous
approaches, SuDoSys considers the different stages of counseling and stores
essential information throughout the counseling process, ensuring coherent and
directed conversations. The system employs an LLM, a stage-aware instruction
generator, a response unpacker, a topic database, and a stage controller to
maintain dialogue flow. In addition, we propose a novel technique that
simulates counseling clients to interact with the evaluated system and evaluate
its performance automatically. When assessed using both objective and
subjective evaluations, SuDoSys demonstrates its effectiveness in generating
logically coherent responses. The system’s code and program scripts for
evaluation are open-sourced.
[COMMENTS]
Accepted to the 16th International Conference on Social Robotic (ICSR
2024)
[LINK]
http://arxiv.org/abs/2411.10681v1
[DATE]
2024-11-16 11:12:17+08:00
[CATEGORIES]
cs.CL
A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery
[AUTHORS]
Grace Sng, Yanming Zhang, Klaus Mueller
[ABSTRACT]
The increasing use of large language models (LLMs) in causal discovery as a
substitute for human domain experts highlights the need for optimal model
selection. This paper presents the first hallucination survey of popular LLMs
for causal discovery. We show that hallucinations exist when using LLMs in
causal discovery so the choice of LLM is important. We propose using Retrieval
Augmented Generation (RAG) to reduce hallucinations when quality data is
available. Additionally, we introduce a novel method employing multiple LLMs
with an arbiter in a debate to audit edges in causal graphs, achieving a
comparable reduction in hallucinations to RAG.
[LINK]
http://arxiv.org/abs/2411.12759v1
[DATE]
2024-11-16 11:06:39+08:00
[CATEGORIES]
cs.CL
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
[AUTHORS]
Zichun Yu, Spandan Das, Chenyan Xiong
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.06046v2
[DATE]
2024-11-16 10:59:22+08:00
[CATEGORIES]
cs.CL
cs.LG
IntentGPT: Few-shot Intent Discovery with Large Language Models
[AUTHORS]
Juan A. Rodriguez, Nicholas Botzer, David Vazquez, Christopher Pal, Marco Pedersoli, Issam Laradji
[ABSTRACT]
In today’s digitally driven world, dialogue systems play a pivotal role in
enhancing user interactions, from customer service to virtual assistants. In
these dialogues, it is important to identify user’s goals automatically to
resolve their needs promptly. This has necessitated the integration of models
that perform Intent Detection. However, users’ intents are diverse and dynamic,
making it challenging to maintain a fixed set of predefined intents. As a
result, a more practical approach is to develop a model capable of identifying
new intents as they emerge. We address the challenge of Intent Discovery, an
area that has drawn significant attention in recent research efforts. Existing
methods need to train on a substantial amount of data for correctly identifying
new intents, demanding significant human effort. To overcome this, we introduce
IntentGPT, a novel training-free method that effectively prompts Large Language
Models (LLMs) such as GPT-4 to discover new intents with minimal labeled data.
IntentGPT comprises an \textit{In-Context Prompt Generator}, which generates
informative prompts for In-Context Learning, an \textit{Intent Predictor} for
classifying and discovering user intents from utterances, and a
\textit{Semantic Few-Shot Sampler} that selects relevant few-shot examples and
a set of known intents to be injected into the prompt. Our experiments show
that IntentGPT outperforms previous methods that require extensive
domain-specific data and fine-tuning, in popular benchmarks, including CLINC
and BANKING, among others.
[COMMENTS]
ICLR 2024 Workshop on LLM Agents
[LINK]
http://arxiv.org/abs/2411.10670v1
[DATE]
2024-11-16 10:16:59+08:00
[CATEGORIES]
cs.CL
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
[AUTHORS]
Ruosen Li, Ruochen Li, Barry Wang, Xinya Du
[ABSTRACT]
To evaluate Large Language Models (LLMs) for question answering (QA),
traditional methods typically focus on assessing single-turn responses to given
questions. However, this approach doesn’t capture the dynamic nature of
human-AI interactions, where humans actively seek information through
conversation. Recent works in human-computer interaction (HCI) have employed
human evaluators to conduct interactions and evaluations, but they are often
prohibitively expensive and time-consuming to scale. We introduce an automatic
evaluation framework IQA-EVAL to achieve Interactive Question Answering
Evaluations, more specifically, we introduce a LLM-based Evaluation Agent (LEA)
that can: (1) simulate human behaviors to generate interactions with IQA
models; (2) automatically evaluate the generated interactions. Moreover, we
propose assigning personas to LEAs to better simulate groups of real human
evaluators. We show that: (1) our evaluation framework with GPT-4 (or Claude)
as the backbone model achieves a high correlation with human evaluations on the
IQA task; (2) assigning personas to LEA to better represent the crowd further
significantly improves correlations. Finally, we use our automatic metric to
evaluate five recent representative LLMs with over 1000 questions from complex
and ambiguous question answering tasks, which comes with a substantial cost of
$5k if evaluated by humans.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2408.13545v2
[DATE]
2024-11-16 10:08:31+08:00
[CATEGORIES]
cs.CL
SAM Decoding: Speculative Decoding via Suffix Automaton
[AUTHORS]
Yuxuan Hu, Ke Wang, Jing Zhang, Cuiping Li, Hong Chen
[ABSTRACT]
Large Language Models (LLMs) have revolutionized natural language processing
by unifying tasks into text generation, yet their large parameter sizes and
autoregressive nature limit inference speed. SAM-Decoding addresses this by
introducing a novel retrieval-based speculative decoding method that uses a
suffix automaton for efficient and accurate draft generation. Unlike n-gram
matching used by the existing method, SAM-Decoding finds the longest suffix
match in generating text and text corpuss, achieving an average time complexity
of $O(1)$ per generation step. SAM-Decoding constructs static and dynamic
suffix automatons for the text corpus and input prompts, respectively, enabling
fast and precise draft generation. Meanwhile, it is designed as an approach
that can be combined with existing methods, allowing SAM-Decoding to adaptively
select a draft generation strategy based on the matching length, thus
increasing the inference speed of the LLM. When combined with Token Recycling,
evaluations show SAM-Decoding outperforms existing model-free methods,
achieving a speedup of $2.27\times$ over autoregressive decoding on Spec-Bench.
When combined with EAGLE2, it reaches a speedup of $2.49\times$, surpassing all
current approaches. Our code is available at
https://github.com/hyx1999/SAM-Decoding.
[COMMENTS]
13 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.10666v1
[DATE]
2024-11-16 10:02:49+08:00
[CATEGORIES]
cs.CL
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach
[AUTHORS]
Zekun Wu, Sahan Bulathwela, Maria Perez-Ortiz, Adriano Soares Koshiyama
[ABSTRACT]
Stereotype detection is a challenging and subjective task, as certain
statements, such as “Black people like to play basketball,” may not appear
overtly toxic but still reinforce racial stereotypes. With the increasing
prevalence of large language models (LLMs) in human-facing artificial
intelligence (AI) applications, detecting these types of biases is essential.
However, LLMs risk perpetuating and amplifying stereotypical outputs derived
from their training data. A reliable stereotype detector is crucial for
benchmarking bias, monitoring model input and output, filtering training data,
and ensuring fairer model behavior in downstream applications. This paper
introduces the Multi-Grain Stereotype (MGS) dataset, consisting of 51,867
instances across gender, race, profession, religion, and other stereotypes,
curated from multiple existing datasets. We evaluate various machine learning
approaches to establish baselines and fine-tune language models of different
architectures and sizes, presenting a suite of stereotype multiclass
classifiers trained on the MGS dataset. Given the subjectivity of stereotypes,
explainability is essential to align model learning with human understanding of
stereotypes. We employ explainable AI (XAI) tools, including SHAP, LIME, and
BertViz, to assess whether the model’s learned patterns align with human
intuitions about stereotypes.Additionally, we develop stereotype elicitation
prompts and benchmark the presence of stereotypes in text generation tasks
using popular LLMs, employing the best-performing stereotype classifiers.
[COMMENTS]
Under review as a conference paper at ARR October 2024
[LINK]
http://arxiv.org/abs/2404.01768v2
[DATE]
2024-11-16 08:54:09+08:00
[CATEGORIES]
cs.CL
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration
[AUTHORS]
Xin Guan, Nathaniel Demchak, Saloni Gupta, Ze Wang, Ediz Ertekin Jr., Adriano Koshiyama, Emre Kazim, Zekun Wu
[COMMENTS]
Submitted to COLING 2025 Main Conference
[LINK]
http://arxiv.org/abs/2409.11149v3
[DATE]
2024-11-16 08:28:03+08:00
[CATEGORIES]
cs.CL
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
[AUTHORS]
Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu, Yan Hu, Yi Zeng, Lei Wu, Liuyang Bian, Zhaoxiong Wang, Long Liu, Yanzhou Yang, Han Xiao, Aojun Zhou, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li
[ABSTRACT]
The emergence and growing popularity of multimodal large language models
(MLLMs) have significant potential to enhance various aspects of daily life,
from improving communication to facilitating learning and problem-solving.
Mobile phones, as essential daily companions, represent the most effective and
accessible deployment platform for MLLMs, enabling seamless integration into
everyday tasks. However, deploying MLLMs on mobile phones presents challenges
due to limitations in memory size and computational capability, making it
difficult to achieve smooth and real-time processing without extensive
optimization. In this paper, we present BlueLM-V-3B, an algorithm and system
co-design approach specifically tailored for the efficient deployment of MLLMs
on mobile platforms. To be specific, we redesign the dynamic resolution scheme
adopted by mainstream MLLMs and implement system optimization for
hardware-aware deployment to optimize model inference on mobile phones.
BlueLM-V-3B boasts the following key highlights: (1) Small Size: BlueLM-V-3B
features a language model with 2.7B parameters and a vision encoder with 400M
parameters. (2) Fast Speed: BlueLM-V-3B achieves a generation speed of 24.4
token/s on the MediaTek Dimensity 9300 processor with 4-bit LLM weight
quantization. (3) Strong Performance: BlueLM-V-3B has attained the highest
average score of 66.1 on the OpenCompass benchmark among models with $\leq$ 4B
parameters and surpassed a series of models with much larger parameter sizes
(e.g., MiniCPM-V-2.6, InternVL2-8B).
[COMMENTS]
21 pages
[LINK]
http://arxiv.org/abs/2411.10640v1
[DATE]
2024-11-16 08:14:51+08:00
[CATEGORIES]
cs.CL
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
[AUTHORS]
Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi
[ABSTRACT]
Data availability across domains often follows a long-tail distribution: a
few domains have abundant data, while most face dat . a scarcity. This
imbalance poses challenges in training language models uniformly across all
domains. In our study, we focus on multilingual settings, where data sizes vary
significantly between high- and low-resource languages. Common strategies to
address this include upsampling low-resource languages (Temperature Sampling)
or upweighting their loss (Scalarization). Although often considered
equivalent, this assumption has not been proven, which motivates our study.
Through both theoretical and empirical analysis, we identify the conditions
under which these approaches are equivalent and when they diverge.
Specifically, we demonstrate that these two methods are equivalent under full
gradient descent, but this equivalence breaks down with stochastic gradient
descent. Empirically, we observe that Temperature Sampling converges more
quickly but is prone to overfitting. We argue that this faster convergence is
likely due to the lower variance in gradient estimations, as shown
theoretically. Based on these insights, we propose Cooldown, a strategy that
reduces sampling temperature during training, accelerating convergence without
overfitting to low-resource languages. Our method is competitive with existing
data re-weighting and offers computational efficiency.
[COMMENTS]
19 pages
[LINK]
http://arxiv.org/abs/2410.04579v4
[DATE]
2024-11-16 05:33:18+08:00
[CATEGORIES]
cs.CL
cs.LG
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2
[AUTHORS]
Pepijn de Reus, Ana Oprescu, Jelle Zuidema
[ABSTRACT]
This study examines quantisation and pruning strategies to reduce energy
consumption in code Large Language Models (LLMs) inference. Using StarCoder2,
we observe increased energy demands with quantization due to lower throughput
and some accuracy losses. Conversely, pruning reduces energy usage but impairs
performance. The results highlight challenges and trade-offs in LLM model
compression. We suggest future work on hardware-optimized quantization to
enhance efficiency with minimal loss in accuracy.
[LINK]
http://arxiv.org/abs/2411.12758v1
[DATE]
2024-11-16 05:28:19+08:00
[CATEGORIES]
cs.CL
On the Shortcut Learning in Multilingual Neural Machine Translation
[AUTHORS]
Wenxuan Wang, Wenxiang Jiao, Jen-tse Huang, Zhaopeng Tu, Michael R. Lyu
[ABSTRACT]
In this study, we revisit the commonly-cited off-target issue in multilingual
neural machine translation (MNMT). By carefully designing experiments on
different MNMT scenarios and models, we attribute the off-target issue to the
overfitting of the shortcuts of (non-centric, centric) language mappings.
Specifically, the learned shortcuts biases MNMT to mistakenly translate
non-centric languages into the centric language instead of the expected
non-centric language for zero-shot translation. Analyses on learning dynamics
show that the shortcut learning generally occurs in the later stage of model
training, and multilingual pretraining accelerates and aggravates the shortcut
learning. Based on these observations, we propose a simple and effective
training strategy to eliminate the shortcuts in MNMT models by leveraging the
forgetting nature of model training. The only difference from the standard
training is that we remove the training instances that may induce the shortcut
learning in the later stage of model training. Without introducing any
additional data and computational costs, our approach can consistently and
significantly improve the zero-shot translation performance by alleviating the
shortcut learning for different MNMT models and benchmarks.
[COMMENTS]
Accepted by Neurocomputing 2024
[LINK]
http://arxiv.org/abs/2411.10581v1
[DATE]
2024-11-16 05:09:36+08:00
[CATEGORIES]
cs.CL
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and Semantically-Rich Vision-Language Models
[AUTHORS]
Md Zarif Hossain, Ahmed Imteaj
[ABSTRACT]
Vision-language models (VLMs) have achieved significant strides in recent
times specially in multimodal tasks, yet they remain susceptible to adversarial
attacks on their vision components. To address this, we propose Sim-CLIP, an
unsupervised adversarial fine-tuning method that enhances the robustness of the
widely-used CLIP vision encoder against such attacks while maintaining semantic
richness and specificity. By employing a Siamese architecture with cosine
similarity loss, Sim-CLIP learns semantically meaningful and attack-resilient
visual representations without requiring large batch sizes or momentum
encoders. Our results demonstrate that VLMs enhanced with Sim-CLIP’s fine-tuned
CLIP encoder exhibit significantly enhanced robustness against adversarial
attacks, while preserving semantic meaning of the perturbed images. Notably,
Sim-CLIP does not require additional training or fine-tuning of the VLM itself;
replacing the original vision encoder with our fine-tuned Sim-CLIP suffices to
provide robustness. This work underscores the significance of reinforcing
foundational models like CLIP to safeguard the reliability of downstream VLM
applications, paving the way for more secure and effective multimodal systems.
[LINK]
http://arxiv.org/abs/2407.14971v2
[DATE]
2024-11-16 05:09:28+08:00
[CATEGORIES]
cs.CL
cs.LG
Hysteresis Activation Function for Efficient Inference
[AUTHORS]
Moshe Kimhi, Idan Kashani, Avi Mendelson, Chaim Baskin
[COMMENTS]
Accepted to 4th NeurIPS Efficient Natural Language and Speech
Processing Workshop (ENLSP-IV 2024)
[LINK]
http://arxiv.org/abs/2411.10573v1
[DATE]
2024-11-16 04:46:58+08:00
[CATEGORIES]
cs.LG
cs.CL
Efficient Alignment of Large Language Models via Data Sampling
[AUTHORS]
Amrit Khera, Rajat Ghosh, Debojyoti Dutta
[ABSTRACT]
LLM alignment ensures that large language models behave safely and
effectively by aligning their outputs with human values, goals, and intentions.
Aligning LLMs employ huge amounts of data, computation, and time. Moreover,
curating data with human feedback is expensive and takes time. Recent research
depicts the benefit of data engineering in the fine-tuning and pre-training
paradigms to bring down such costs. However, alignment differs from the
afore-mentioned paradigms and it is unclear if data efficient alignment is
feasible. In this work, we first aim to understand how the performance of LLM
alignment scales with data. We find out that LLM alignment performance follows
an exponential plateau pattern which tapers off post a rapid initial increase.
Based on this, we identify data subsampling as a viable method to reduce
resources required for alignment. Further, we propose an information
theory-based methodology for efficient alignment by identifying a small high
quality subset thereby reducing the computation and time required by alignment.
We evaluate the proposed methodology over multiple datasets and compare the
results. We find that the model aligned using our proposed methodology
outperforms other sampling methods and performs comparable to the model aligned
with the full dataset while using less than 10% data, leading to greater than
90% savings in costs, resources, and faster LLM alignment.
[LINK]
http://arxiv.org/abs/2411.10545v1
[DATE]
2024-11-16 03:36:15+08:00
[CATEGORIES]
cs.LG
cs.CL
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
[AUTHORS]
Priyansh Bhatnagar, Linfeng Wen, Mingu Kang
[ABSTRACT]
Extensive efforts have been made to boost the performance in the domain of
language models by introducing various attention-based transformers. However,
the inclusion of linear layers with large dimensions contributes to significant
computational and memory overheads. The escalating computational demands of
these models necessitate the development of various compression techniques to
ensure their deployment on devices, particularly in resource-constrained
environments. In this paper, we propose a novel compression methodology that
dynamically determines the rank of each layer using a soft thresholding
mechanism, which clips the singular values with a small magnitude in a
differentiable form. This approach automates the decision-making process to
identify the optimal degree of compression for each layer. We have successfully
applied the proposed technique to attention-based architectures, including BERT
for discriminative tasks and GPT2 and TinyLlama for generative tasks.
Additionally, we have validated our method on Mamba, a recently proposed
state-space model. Our experiments demonstrate that the proposed technique
achieves a speed-up of 1.33X to 1.72X in the encoder/ decoder with a 50%
reduction in total parameters.
[LINK]
http://arxiv.org/abs/2411.10543v1
[DATE]
2024-11-16 03:29:51+08:00
[CATEGORIES]
cs.LG
cs.CL
Does Prompt Formatting Have Any Impact on LLM Performance?
[AUTHORS]
Jia He, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X Wang, Sadid Hasan
[ABSTRACT]
In the realm of Large Language Models (LLMs), prompt optimization is crucial
for model performance. Although previous research has explored aspects like
rephrasing prompt contexts, using various prompting techniques (like in-context
learning and chain-of-thought), and ordering few-shot examples, our
understanding of LLM sensitivity to prompt templates remains limited.
Therefore, this paper examines the impact of different prompt templates on LLM
performance. We formatted the same contexts into various human-readable
templates, including plain text, Markdown, JSON, and YAML, and evaluated their
impact across tasks like natural language reasoning, code generation, and
translation using OpenAI’s GPT models. Experiments show that GPT-3.5-turbo’s
performance varies by up to 40\% in a code translation task depending on the
prompt template, while larger models like GPT-4 are more robust to these
variations. Our analysis highlights the need to reconsider the use of fixed
prompt templates, as different formats can significantly affect model
performance.
[COMMENTS]
Submitted to NAACL 2025
[LINK]
http://arxiv.org/abs/2411.10541v1
[DATE]
2024-11-16 03:26:38+08:00
[CATEGORIES]
cs.CL
cs.LG
“On the goals of linguistic theory”: Revisiting Chomskyan theories in the era of AI
[AUTHORS]
Eva Portelance, Masoud Jasbi
[ABSTRACT]
Theoretical linguistics seeks to explain what human language is, and why.
Linguists and cognitive scientists have proposed different theoretical models
of what language is, as well as cognitive factors that shape it, and allow
humans to ‘produce’, ‘understand’, and ‘acquire’ natural languages. However,
humans may no longer be the only ones learning to ‘generate’, ‘parse’, and
‘learn’ natural language: artificial intelligence (AI) models such as large
language models are proving to have impressive linguistic capabilities. Many
are thus questioning what role, if any, such models should play in helping
theoretical linguistics reach its ultimate research goals? In this paper, we
propose to answer this question, by reiterating the tenets of generative
linguistics, a leading school of thought in the field, and by considering how
AI models as theories of language relate to each of these important concepts.
Specifically, we consider three foundational principles, finding roots in the
early works of Noam Chomsky: (1) levels of theoretical adequacy; (2) procedures
for linguistic theory development; (3) language learnability and Universal
Grammar. In our discussions of each principle, we give special attention to two
types of AI models: neural language models and neural grammar induction models.
We will argue that such models, in particular neural grammar induction models,
do have a role to play, but that this role is largely modulated by the stance
one takes regarding each of these three guiding principles.
[LINK]
http://arxiv.org/abs/2411.10533v1
[DATE]
2024-11-16 03:09:22+08:00
[CATEGORIES]
cs.CL
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
[AUTHORS]
Yuhan Fu, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Xirong Li
[ABSTRACT]
Multimodal Large Language Models (MLLMs) are known to hallucinate, which
limits their practical applications. Recent works have attempted to apply
Direct Preference Optimization (DPO) to enhance the performance of MLLMs, but
have shown inconsistent improvements in mitigating hallucinations. To address
this issue more effectively, we introduce Hallucination-targeted Direct
Preference Optimization (HDPO) to reduce hallucinations in MLLMs. Unlike
previous approaches, our method tackles hallucinations from their diverse forms
and causes. Specifically, we develop three types of preference pair data
targeting the following causes of MLLM hallucinations: (1) insufficient visual
capabilities, (2) long context generation, and (3) multimodal conflicts.
Experimental results demonstrate that our method achieves superior performance
across multiple hallucination evaluation datasets, surpassing most
state-of-the-art (SOTA) methods and highlighting the potential of our approach.
Ablation studies and in-depth analyses further confirm the effectiveness of our
method and suggest the potential for further improvements through scaling up.
[LINK]
http://arxiv.org/abs/2411.10436v1
[DATE]
2024-11-16 02:56:01+08:00
[CATEGORIES]
cs.CL
Towards Automatic Evaluation of Task-Oriented Dialogue Flows
[AUTHORS]
Mehrnoosh Mirtaheri, Nikhil Varghese, Chandra Khatri, Amol Kelkar
[ABSTRACT]
Task-oriented dialogue systems rely on predefined conversation schemes
(dialogue flows) often represented as directed acyclic graphs. These flows can
be manually designed or automatically generated from previously recorded
conversations. Due to variations in domain expertise or reliance on different
sets of prior conversations, these dialogue flows can manifest in significantly
different graph structures. Despite their importance, there is no standard
method for evaluating the quality of dialogue flows. We introduce FuDGE (Fuzzy
Dialogue-Graph Edit Distance), a novel metric that evaluates dialogue flows by
assessing their structural complexity and representational coverage of the
conversation data. FuDGE measures how well individual conversations align with
a flow and, consequently, how well a set of conversations is represented by the
flow overall. Through extensive experiments on manually configured flows and
flows generated by automated techniques, we demonstrate the effectiveness of
FuDGE and its evaluation framework. By standardizing and optimizing dialogue
flows, FuDGE enables conversational designers and automated techniques to
achieve higher levels of efficiency and automation.
[LINK]
http://arxiv.org/abs/2411.10416v1
[DATE]
2024-11-16 02:35:00+08:00
[CATEGORIES]
cs.CL
Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
[AUTHORS]
Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, Mahesh Pasupuleti
[ABSTRACT]
We introduce Llama Guard 3 Vision, a multimodal LLM-based safeguard for
human-AI conversations that involves image understanding: it can be used to
safeguard content for both multimodal LLM inputs (prompt classification) and
outputs (response classification). Unlike the previous text-only Llama Guard
versions (Inan et al., 2023; Llama Team, 2024b,a), it is specifically designed
to support image reasoning use cases and is optimized to detect harmful
multimodal (text and image) prompts and text responses to these prompts. Llama
Guard 3 Vision is fine-tuned on Llama 3.2-Vision and demonstrates strong
performance on the internal benchmarks using the MLCommons taxonomy. We also
test its robustness against adversarial attacks. We believe that Llama Guard 3
Vision serves as a good starting point to build more capable and robust content
moderation tools for human-AI conversation with multimodal capabilities.
[LINK]
http://arxiv.org/abs/2411.10414v1
[DATE]
2024-11-16 02:34:07+08:00
[CATEGORIES]
cs.CL
Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning
[AUTHORS]
Jeffrey Olmo, Jared Wilson, Max Forsey, Bryce Hepner, Thomas Vin Howe, David Wingate
[ABSTRACT]
Sparse Autoencoders (SAEs) are a promising approach for extracting neural
network representations by learning a sparse and overcomplete decomposition of
the network’s internal activations. However, SAEs are traditionally trained
considering only activation values and not the effect those activations have on
downstream computations. This limits the information available to learn
features, and biases the autoencoder towards neglecting features which are
represented with small activation values but strongly influence model outputs.
To address this, we introduce Gradient SAEs (g-SAEs), which modify the
$k$-sparse autoencoder architecture by augmenting the TopK activation function
to rely on the gradients of the input activation when selecting the $k$
elements. For a given sparsity level, g-SAEs produce reconstructions that are
more faithful to original network performance when propagated through the
network. Additionally, we find evidence that g-SAEs learn latents that are on
average more effective at steering models in arbitrary contexts. By considering
the downstream effects of activations, our approach leverages the dual nature
of neural network features as both $\textit{representations}$, retrospectively,
and $\textit{actions}$, prospectively. While previous methods have approached
the problem of feature discovery primarily focused on the former aspect, g-SAEs
represent a step towards accounting for the latter as well.
[COMMENTS]
9 pages, 8 figures. Submitted to NAACL 2025
[LINK]
http://arxiv.org/abs/2411.10397v1
[DATE]
2024-11-16 02:03:52+08:00
[CATEGORIES]
cs.LG
cs.CL
KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
[AUTHORS]
Kibeom Nam
[COMMENTS]
Work in Progress, DMLR@ICML 2024
[LINK]
http://arxiv.org/abs/2407.00342v4
[DATE]
2024-11-16 01:59:10+08:00
[CATEGORIES]
cs.CL
Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer
[AUTHORS]
Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, David R. Mortensen
[ABSTRACT]
Approaches to improving multilingual language understanding often struggle
with significant performance gaps between high-resource and low-resource
languages. While there are efforts to align the languages in a single latent
space to mitigate such gaps, how different input-level representations
influence such gaps has not been investigated, particularly with phonemic
inputs. We hypothesize that the performance gaps are affected by representation
discrepancies between these languages, and revisit the use of phonemic
representations as a means to mitigate these discrepancies. To demonstrate the
effectiveness of phonemic representations, we present experiments on three
representative cross-lingual tasks on 12 languages in total. The results show
that phonemic representations exhibit higher similarities between languages
compared to orthographic representations, and it consistently outperforms
grapheme-based baseline model on languages that are relatively low-resourced.
We present quantitative evidence from three cross-lingual tasks that
demonstrate the effectiveness of phonemic representations, and it is further
justified by a theoretical analysis of the cross-lingual performance gap.
[COMMENTS]
Accepted to the 4th Multilingual Representation Learning (MRL)
Workshop (co-located with EMNLP 2024)
[LINK]
http://arxiv.org/abs/2402.14279v3
[DATE]
2024-11-16 01:11:08+08:00
[CATEGORIES]
cs.CL
The Silicon Ceiling: Auditing GPT’s Race and Gender Biases in Hiring
[AUTHORS]
Lena Armstrong, Abbey Liu, Stephen MacNeil, Danaë Metaxa
[ABSTRACT]
Large language models (LLMs) are increasingly being introduced in workplace
settings, with the goals of improving efficiency and fairness. However,
concerns have arisen regarding these models’ potential to reflect or exacerbate
social biases and stereotypes. This study explores the potential impact of LLMs
on hiring practices. To do so, we conduct an AI audit of race and gender biases
in one commonly-used LLM, OpenAI’s GPT-3.5, taking inspiration from the history
of traditional offline resume audits. We conduct two studies using names with
varied race and gender connotations: resume assessment (Study 1) and resume
generation (Study 2). In Study 1, we ask GPT to score resumes with 32 different
names (4 names for each combination of the 2 gender and 4 racial groups) and
two anonymous options across 10 occupations and 3 evaluation tasks (overall
rating, willingness to interview, and hireability). We find that the model
reflects some biases based on stereotypes. In Study 2, we prompt GPT to create
resumes (10 for each name) for fictitious job candidates. When generating
resumes, GPT reveals underlying biases; women’s resumes had occupations with
less experience, while Asian and Hispanic resumes had immigrant markers, such
as non-native English and non-U.S. education and work experiences. Our findings
contribute to a growing body of literature on LLM biases, particularly in
workplace contexts.
[LINK]
http://arxiv.org/abs/2405.04412v3
[DATE]
2024-11-16 00:53:18+08:00
[CATEGORIES]
cs.CL
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding
[AUTHORS]
Huming Qiu, Guanxu Chen, Mi Zhang, Min Yang
[ABSTRACT]
In recent years, text-to-image (T2I) generation models have made significant
progress in generating high-quality images that align with text descriptions.
However, these models also face the risk of unsafe generation, potentially
producing harmful content that violates usage policies, such as explicit
material. Existing safe generation methods typically focus on suppressing
inappropriate content by erasing undesired concepts from visual
representations, while neglecting to sanitize the textual representation.
Although these methods help mitigate the risk of misuse to certain extent,
their robustness remains insufficient when dealing with adversarial attacks.
Given that semantic consistency between input text and output image is a
fundamental requirement for T2I models, we identify that textual
representations (i.e., prompt embeddings) are likely the primary source of
unsafe generation. To this end, we propose a vision-agnostic safe generation
framework, Embedding Sanitizer (ES), which focuses on erasing inappropriate
concepts from prompt embeddings and uses the sanitized embeddings to guide the
model for safe generation. ES is applied to the output of the text encoder as a
plug-and-play module, enabling seamless integration with different T2I models
as well as other safeguards. In addition, ES’s unique scoring mechanism assigns
a score to each token in the prompt to indicate its potential harmfulness, and
dynamically adjusts the sanitization intensity to balance defensive performance
and generation quality. Through extensive evaluation on five prompt benchmarks,
our approach achieves state-of-the-art robustness by sanitizing the source
(prompt embedding) of unsafe generation compared to nine baseline methods. It
significantly outperforms existing safeguards in terms of interpretability and
controllability while maintaining generation quality.
[LINK]
http://arxiv.org/abs/2411.10329v1
[DATE]
2024-11-16 00:29:02+08:00
[CATEGORIES]
cs.CL
Emotion Detection in Reddit: Comparative Study of Machine Learning and Deep Learning Techniques
[AUTHORS]
Maliheh Alaeddini
[ABSTRACT]
Emotion detection is pivotal in human communication, as it significantly
influences behavior, relationships, and decision-making processes. This study
concentrates on text-based emotion detection by leveraging the GoEmotions
dataset, which annotates Reddit comments with 27 distinct emotions. These
emotions are subsequently mapped to Ekman’s six basic categories: joy, anger,
fear, sadness, disgust, and surprise. We employed a range of models for this
task, including six machine learning models, three ensemble models, and a Long
Short-Term Memory (LSTM) model to determine the optimal model for emotion
detection. Results indicate that the Stacking classifier outperforms other
models in accuracy and performance. We also benchmark our models against
EmoBERTa, a pre-trained emotion detection model, with our Stacking classifier
proving more effective. Finally, the Stacking classifier is deployed via a
Streamlit web application, underscoring its potential for real-world
applications in text-based emotion analysis.
[LINK]
http://arxiv.org/abs/2411.10328v1
[DATE]
2024-11-16 00:28:25+08:00
[CATEGORIES]
cs.CL
A Data-Efficient Sequential Learning Framework for Melt Pool Defect Classification in Laser Powder Bed Fusion
[AUTHORS]
Ahmed Shoyeb Raihan, Austin Harper, Israt Zarin Era, Omar Al-Shebeeb, Thorsten Wuest, Srinjoy Das, Imtiaz Ahmed
[ABSTRACT]
Ensuring the quality and reliability of Metal Additive Manufacturing (MAM)
components is crucial, especially in the Laser Powder Bed Fusion (L-PBF)
process, where melt pool defects such as keyhole, balling, and lack of fusion
can significantly compromise structural integrity. This study presents SL-RF+
(Sequentially Learned Random Forest with Enhanced Sampling), a novel Sequential
Learning (SL) framework for melt pool defect classification designed to
maximize data efficiency and model accuracy in data-scarce environments. SL-RF+
utilizes RF classifier combined with Least Confidence Sampling (LCS) and Sobol
sequence-based synthetic sampling to iteratively select the most informative
samples to learn from, thereby refining the model’s decision boundaries with
minimal labeled data. Results show that SL-RF+ outperformed traditional machine
learning models across key performance metrics, including accuracy, precision,
recall, and F1 score, demonstrating significant robustness in identifying melt
pool defects with limited data. This framework efficiently captures complex
defect patterns by focusing on high-uncertainty regions in the process
parameter space, ultimately achieving superior classification performance
without the need for extensive labeled datasets. While this study utilizes
pre-existing experimental data, SL-RF+ shows strong potential for real-world
applications in pure sequential learning settings, where data is acquired and
labeled incrementally, mitigating the high costs and time constraints of sample
acquisition.
[LINK]
http://arxiv.org/abs/2411.10822v1
[DATE]
2024-11-16 23:18:56+08:00
[CATEGORIES]
cs.LG
Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Theory Perspective
[AUTHORS]
Taeyoung Kim, Myungjoo Kang
[ABSTRACT]
The Rectified Power Unit (RePU) activation functions, unlike the Rectified
Linear Unit (ReLU), have the advantage of being a differentiable function when
constructing neural networks. However, it can be experimentally observed when
deep layers are stacked, neural networks constructed with RePU encounter
critical issues. These issues include the values exploding or vanishing and
failure of training. And these happen regardless of the hyperparameter
initialization. From the perspective of effective theory, we aim to identify
the causes of this phenomenon and propose a new activation function that
retains the advantages of RePU while overcoming its drawbacks.
[COMMENTS]
30 pages, 8 figures
[LINK]
http://arxiv.org/abs/2408.02697v2
[DATE]
2024-11-16 23:15:15+08:00
[CATEGORIES]
cs.LG
An Oversampling-enhanced Multi-class Imbalanced Classification Framework for Patient Health Status Prediction Using Patient-reported Outcomes
[AUTHORS]
Yang Yan, Zhong Chen, Cai Xu, Xinglei Shen, Jay Shiao, John Einck, Ronald C Chen, Hao Gao
[ABSTRACT]
Patient-reported outcomes (PROs) directly collected from cancer patients
being treated with radiation therapy play a vital role in assisting clinicians
in counseling patients regarding likely toxicities. Precise prediction and
evaluation of symptoms or health status associated with PROs are fundamental to
enhancing decision-making and planning for the required services and support as
patients transition into survivorship. However, the raw PRO data collected from
hospitals exhibits some intrinsic challenges such as incomplete item reports
and imbalance patient toxicities. To the end, in this study, we explore various
machine learning techniques to predict patient outcomes related to health
status such as pain levels and sleep discomfort using PRO datasets from a
cancer photon/proton therapy center. Specifically, we deploy six advanced
machine learning classifiers – Random Forest (RF), XGBoost, Gradient Boosting
(GB), Support Vector Machine (SVM), Multi-Layer Perceptron with Bagging
(MLP-Bagging), and Logistic Regression (LR) – to tackle a multi-class
imbalance classification problem across three prevalent cancer types: head and
neck, prostate, and breast cancers. To address the class imbalance issue, we
employ an oversampling strategy, adjusting the training set sample sizes
through interpolations of in-class neighboring samples, thereby augmenting
minority classes without deviating from the original skewed class distribution.
Our experimental findings across multiple PRO datasets indicate that the RF and
XGB methods achieve robust generalization performance, evidenced by weighted
AUC and detailed confusion matrices, in categorizing outcomes as mild,
intermediate, and severe post-radiation therapy. These results underscore the
models’ effectiveness and potential utility in clinical settings.
[COMMENTS]
10 pages, 12 figures, 4 tables
[LINK]
http://arxiv.org/abs/2411.10819v1
[DATE]
2024-11-16 22:54:18+08:00
[CATEGORIES]
cs.LG
Conformation Generation using Transformer Flows
[AUTHORS]
Sohil Atul Shah, Vladlen Koltun
[ABSTRACT]
Estimating three-dimensional conformations of a molecular graph allows
insight into the molecule’s biological and chemical functions. Fast generation
of valid conformations is thus central to molecular modeling. Recent advances
in graph-based deep networks have accelerated conformation generation from
hours to seconds. However, current network architectures do not scale well to
large molecules. Here we present ConfFlow, a flow-based model for conformation
generation based on transformer networks. In contrast with existing approaches,
ConfFlow directly samples in the coordinate space without enforcing any
explicit physical constraints. The generative procedure is highly interpretable
and is akin to force field updates in molecular dynamics simulation. When
applied to the generation of large molecule conformations, ConfFlow improve
accuracy by up to $40\%$ relative to state-of-the-art learning-based methods.
The source code is made available at https://github.com/IntelLabs/ConfFlow.
[COMMENTS]
Technical Report. Code available at
https://github.com/IntelLabs/ConfFlow
[LINK]
http://arxiv.org/abs/2411.10817v1
[DATE]
2024-11-16 22:42:05+08:00
[CATEGORIES]
cs.LG
An adaptively inexact first-order method for bilevel optimization with application to hyperparameter learning
[AUTHORS]
Mohammad Sadegh Salehi, Subhadip Mukherjee, Lindon Roberts, Matthias J. Ehrhardt
[ABSTRACT]
Various tasks in data science are modeled utilizing the variational
regularization approach, where manually selecting regularization parameters
presents a challenge. The difficulty gets exacerbated when employing
regularizers involving a large number of hyperparameters. To overcome this
challenge, bilevel learning can be employed to learn such parameters from data.
However, neither exact function values nor exact gradients with respect to the
hyperparameters are attainable, necessitating methods that only rely on inexact
evaluation of such quantities. State-of-the-art inexact gradient-based methods
a priori select a sequence of the required accuracies and cannot identify an
appropriate step size since the Lipschitz constant of the hypergradient is
unknown. In this work, we propose an algorithm with backtracking line search
that only relies on inexact function evaluations and hypergradients and show
convergence to a stationary point. Furthermore, the proposed algorithm
determines the required accuracy dynamically rather than manually selected
before running it. Our numerical experiments demonstrate the efficiency and
feasibility of our approach for hyperparameter estimation on a range of
relevant problems in imaging and data science such as total variation and field
of experts denoising and multinomial logistic regression. Particularly, the
results show that the algorithm is robust to its own hyperparameters such as
the initial accuracies and step size.
[LINK]
http://arxiv.org/abs/2308.10098v3
[DATE]
2024-11-16 22:11:18+08:00
[CATEGORIES]
cs.LG
Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay
[AUTHORS]
Feng Chen, Fuguang Han, Cong Guan, Lei Yuan, Zhilong Zhang, Yang Yu, Zongzhang Zhang
[ABSTRACT]
Given the inherent non-stationarity prevalent in real-world applications,
continual Reinforcement Learning (RL) aims to equip the agent with the
capability to address a series of sequentially presented decision-making tasks.
Within this problem setting, a pivotal challenge revolves around
\textit{catastrophic forgetting} issue, wherein the agent is prone to
effortlessly erode the decisional knowledge associated with past encountered
tasks when learning the new one. In recent progresses, the \textit{generative
replay} methods have showcased substantial potential by employing generative
models to replay data distribution of past tasks. Compared to storing the data
from past tasks directly, this category of methods circumvents the growing
storage overhead and possible data privacy concerns. However, constrained by
the expressive capacity of generative models, existing \textit{generative
replay} methods face challenges in faithfully reconstructing the data
distribution of past tasks, particularly in scenarios with a myriad of tasks or
high-dimensional data. Inspired by the success of diffusion models in various
generative tasks, this paper introduces a novel continual RL algorithm DISTR
(Diffusion-based Trajectory Replay) that employs a diffusion model to memorize
the high-return trajectory distribution of each encountered task and wakeups
these distributions during the policy learning on new tasks. Besides,
considering the impracticality of replaying all past data each time, a
prioritization mechanism is proposed to prioritize the trajectory replay of
pivotal tasks in our method. Empirical experiments on the popular continual RL
benchmark \texttt{Continual World} demonstrate that our proposed method obtains
a favorable balance between \textit{stability} and \textit{plasticity},
surpassing various existing continual RL baselines in average success rate.
[COMMENTS]
10 pages, 3 figures, 1 table, inclusion at ICLR 2024 Workshop on
Generative Models for Decision Making
[LINK]
http://arxiv.org/abs/2411.10809v1
[DATE]
2024-11-16 22:03:23+08:00
[CATEGORIES]
cs.LG
Deep models for stroke segmentation: do complex architectures always perform better?
[AUTHORS]
Yalda Zafari-Ghadim, Ahmed Soliman, Yousif Yousif, Ahmed Ibrahim, Essam A. Rashed, Mohamed Mabrok
[ABSTRACT]
Stroke segmentation plays a crucial role in the diagnosis and treatment of
stroke patients by providing spatial information about affected brain regions
and the extent of damage. Segmenting stroke lesions accurately is a challenging
task, given that conventional manual techniques are time consuming and prone to
errors. Recently, advanced deep models have been introduced for general medical
image segmentation, demonstrating promising results that surpass many state of
the art networks when evaluated on specific datasets. With the advent of the
vision Transformers, several models have been introduced based on them, while
others have aimed to design better modules based on traditional convolutional
layers to extract long-range dependencies like Transformers. The question of
whether such high-level designs are necessary for all segmentation cases to
achieve the best results remains unanswered. In this study, we selected four
types of deep models that were recently proposed and evaluated their
performance for stroke segmentation: a pure Transformer-based architecture
(DAE-Former), two advanced CNN-based models (LKA and DLKA) with attention
mechanisms in their design, an advanced hybrid model that incorporates CNNs
with Transformers (FCT), and the well-known self-adaptive nnUNet framework with
its configuration based on given data. We examined their performance on two
publicly available datasets, and found that the nnUNet achieved the best
results with the simplest design among all. Revealing the robustness issue of
Transformers to such variabilities serves as a potential reason for their
weaker performance. Furthermore, nnUNet’s success underscores the significant
impact of preprocessing and postprocessing techniques in enhancing segmentation
results, surpassing the focus solely on architectural designs
[LINK]
http://arxiv.org/abs/2403.17177v2
[DATE]
2024-11-16 22:01:08+08:00
[CATEGORIES]
cs.LG
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
[AUTHORS]
Jinghan Li, Zhicheng Sun, Fei Li, Cao Sheng, Jiazhong Yu, Yadong Mu
[ABSTRACT]
In the endeavor to make autonomous robots take actions, task planning is a
major challenge that requires translating high-level task descriptions into
long-horizon action sequences. Despite recent advances in language model
agents, they remain prone to planning errors and limited in their ability to
plan ahead. To address these limitations in robotic planning, we advocate a
self-refining scheme that iteratively refines a draft plan until an equilibrium
is reached. Remarkably, this process can be optimized end-to-end from an
analytical perspective without the need to curate additional verifiers or
reward models, allowing us to train self-refining planners in a simple
supervised learning fashion. Meanwhile, a nested equilibrium sequence modeling
procedure is devised for efficient closed-loop planning that incorporates
useful feedback from the environment (or an internal world model). Our method
is evaluated on the VirtualHome-Env benchmark, showing advanced performance
with better scaling for inference computation. Code is available at
https://github.com/Singularity0104/equilibrium-planner.
[LINK]
http://arxiv.org/abs/2410.01440v4
[DATE]
2024-11-16 21:58:45+08:00
[CATEGORIES]
cs.LG
QT-TDM: Planning With Transformer Dynamics Model and Autoregressive Q-Learning
[AUTHORS]
Mostafa Kotb, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter
[ABSTRACT]
Inspired by the success of the Transformer architecture in natural language
processing and computer vision, we investigate the use of Transformers in
Reinforcement Learning (RL), specifically in modeling the environment’s
dynamics using Transformer Dynamics Models (TDMs). We evaluate the capabilities
of TDMs for continuous control in real-time planning scenarios with Model
Predictive Control (MPC). While Transformers excel in long-horizon prediction,
their tokenization mechanism and autoregressive nature lead to costly planning
over long horizons, especially as the environment’s dimensionality increases.
To alleviate this issue, we use a TDM for short-term planning, and learn an
autoregressive discrete Q-function using a separate Q-Transformer (QT) model to
estimate a long-term return beyond the short-horizon planning. Our proposed
method, QT-TDM, integrates the robust predictive capabilities of Transformers
as dynamics models with the efficacy of a model-free Q-Transformer to mitigate
the computational burden associated with real-time planning. Experiments in
diverse state-based continuous control tasks show that QT-TDM is superior in
performance and sample efficiency compared to existing Transformer-based RL
models while achieving fast and computationally efficient inference.
[COMMENTS]
Accepted by IEEE Robotics and Automation Letters (RA-L)
[LINK]
http://arxiv.org/abs/2407.18841v2
[DATE]
2024-11-16 21:32:45+08:00
[CATEGORIES]
cs.LG
On Reductions and Representations of Learning Problems in Euclidean Spaces
[AUTHORS]
Bogdan Chornomaz, Shay Moran, Tom Waknine
[ABSTRACT]
Many practical prediction algorithms represent inputs in Euclidean space and
replace the discrete 0/1 classification loss with a real-valued surrogate loss,
effectively reducing classification tasks to stochastic optimization. In this
paper, we investigate the expressivity of such reductions in terms of key
resources, including dimension and the role of randomness.
We establish bounds on the minimum Euclidean dimension $D$ needed to reduce a
concept class with VC dimension $d$ to a Stochastic Convex Optimization (SCO)
problem in $\mathbb{R}^D$, formally addressing the intuitive interpretation of
the VC dimension as the number of parameters needed to learn the class. To
achieve this, we develop a generalization of the Borsuk-Ulam Theorem that
combines the classical topological approach with convexity considerations.
Perhaps surprisingly, we show that, in some cases, the number of parameters $D$
must be exponentially larger than the VC dimension $d$, even if the reduction
is only slightly non-trivial. We also present natural classification tasks that
can be represented in much smaller dimensions by leveraging randomness, as seen
in techniques like random initialization. This result resolves an open question
posed by Kamath, Montasser, and Srebro (COLT 2020).
Our findings introduce new variants of \emph{dimension complexity} (also
known as \emph{sign-rank}), a well-studied parameter in learning and complexity
theory. Specifically, we define an approximate version of sign-rank and another
variant that captures the minimum dimension required for a reduction to SCO. We
also propose several open questions and directions for future research.
[LINK]
http://arxiv.org/abs/2411.10784v1
[DATE]
2024-11-16 20:09:37+08:00
[CATEGORIES]
cs.LG
FPPL: An Efficient and Non-IID Robust Federated Continual Learning Framework
[AUTHORS]
Yuchen He, Chuyun Shen, Xiangfeng Wang, Bo Jin
[ABSTRACT]
Federated continual learning (FCL) aims to learn from sequential data stream
in the decentralized federated learning setting, while simultaneously
mitigating the catastrophic forgetting issue in classical continual learning.
Existing FCL methods usually employ typical rehearsal mechanisms, which could
result in privacy violations or additional onerous storage and computational
burdens. In this work, an efficient and non-IID robust federated continual
learning framework, called Federated Prototype-Augmented Prompt Learning
(FPPL), is proposed. The FPPL can collaboratively learn lightweight prompts
augmented by prototypes without rehearsal. On the client side, a fusion
function is employed to fully leverage the knowledge contained in task-specific
prompts for alleviating catastrophic forgetting. Additionally, global
prototypes aggregated from the server are used to obtain unified representation
through contrastive learning, mitigating the impact of non-IID-derived data
heterogeneity. On the server side, locally uploaded prototypes are utilized to
perform debiasing on the classifier, further alleviating the performance
degradation caused by both non-IID and catastrophic forgetting. Empirical
evaluations demonstrate the effectiveness of FPPL, achieving notable
performance with an efficient design while remaining robust to diverse non-IID
degrees. Code is available at: https://github.com/ycheoo/FPPL.
[LINK]
http://arxiv.org/abs/2411.01904v2
[DATE]
2024-11-16 20:05:45+08:00
[CATEGORIES]
cs.LG
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
[AUTHORS]
Shitong Shao, Zikai Zhou, Tian Ye, Lichen Bai, Zhiqiang Xu, Zeke Xie
[ABSTRACT]
Text-to-image diffusion models (DMs) develop at an unprecedented pace,
supported by thorough theoretical exploration and empirical analysis.
Unfortunately, the discrepancy between DMs and autoregressive models (ARMs)
complicates the path toward achieving the goal of unified vision and language
generation. Recently, the masked generative Transformer (MGT) serves as a
promising intermediary between DM and ARM by predicting randomly masked image
tokens (i.e., masked image modeling), combining the efficiency of DM with the
discrete token nature of ARM. However, we find that the comprehensive analyses
regarding the inference for MGT are virtually non-existent, and thus we aim to
present positive design choices to fill this gap. We modify and re-design a set
of DM-based inference techniques for MGT and further elucidate their
performance on MGT. We also discuss the approach to correcting token’s
distribution to enhance inference. Extensive experiments and empirical analyses
lead to concrete and effective design choices, and these design choices can be
merged to achieve further performance gains. For instance, in terms of enhanced
inference, we achieve winning rates of approximately 70% compared to vanilla
sampling on HPS v2 with the recent SOTA MGT Meissonic. Our contributions have
the potential to further enhance the capabilities and future development of
MGTs.
[LINK]
http://arxiv.org/abs/2411.10781v1
[DATE]
2024-11-16 19:51:33+08:00
[CATEGORIES]
cs.LG
SugarcaneNet: An Optimized Ensemble of LASSO-Regularized Pre-trained Models for Accurate Disease Classification
[AUTHORS]
Md. Simul Hasan Talukder, Sharmin Akter, Abdullah Hafez Nur, Mohammad Aljaidi, Rejwan Bin Sulaiman, Ali Fayez Alkoradees
[ABSTRACT]
Sugarcane, a key crop for the world’s sugar industry, is prone to several
diseases that have a substantial negative influence on both its yield and
quality. To effectively manage and implement preventative initiatives, diseases
must be detected promptly and accurately. In this study, we present a unique
model called sugarcaneNet2024 that outperforms previous methods for
automatically and quickly detecting sugarcane disease through leaf image
processing. Our proposed model consolidates an optimized weighted average
ensemble of seven customized and LASSO-regularized pre-trained models,
particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169,
Xception, and ResNet152V2. Initially, we added three more dense layers with
0.0001 LASSO regularization, three 30% dropout layers, and three batch
normalizations with renorm enabled at the bottom of these pre-trained models to
improve the performance. The accuracy of sugarcane leaf disease classification
was greatly increased by this addition. Following this, several comparative
studies between the average ensemble and individual models were carried out,
indicating that the ensemble technique performed better. The average ensemble
of all modified pre-trained models produced outstanding outcomes: 100%, 99%,
99%, and 99.45% for f1 score, precision, recall, and accuracy, respectively.
Performance was further enhanced by the implementation of an optimized weighted
average ensemble technique incorporated with grid search. This optimized
sugarcaneNet2024 model performed the best for detecting sugarcane diseases,
having achieved accuracy, precision, recall, and F1 score of 99.67%, 100%,
100%, and 100% , respectively.
[COMMENTS]
32 pages, 11 Figures, 13 Tables
[LINK]
http://arxiv.org/abs/2403.18870v3
[DATE]
2024-11-16 19:36:01+08:00
[CATEGORIES]
cs.LG
MRI Parameter Mapping via Gaussian Mixture VAE: Breaking the Assumption of Independent Pixels
[AUTHORS]
Moucheng Xu, Yukun Zhou, Tobias Goodwin-Allcock, Kimia Firoozabadi, Joseph Jacob, Daniel C. Alexander, Paddy J. Slator
[ABSTRACT]
We introduce and demonstrate a new paradigm for quantitative parameter
mapping in MRI. Parameter mapping techniques, such as diffusion MRI and
quantitative MRI, have the potential to robustly and repeatably measure
biologically-relevant tissue maps that strongly relate to underlying
microstructure. Quantitative maps are calculated by fitting a model to multiple
images, e.g. with least-squares or machine learning. However, the overwhelming
majority of model fitting techniques assume that each voxel is independent,
ignoring any co-dependencies in the data. This makes model fitting sensitive to
voxelwise measurement noise, hampering reliability and repeatability. We
propose a self-supervised deep variational approach that breaks the assumption
of independent pixels, leveraging redundancies in the data to effectively
perform data-driven regularisation of quantitative maps. We demonstrate that
our approach outperforms current model fitting techniques in dMRI simulations
and real data. Especially with a Gaussian mixture prior, our model enables
sharper quantitative maps, revealing finer anatomical details that are not
presented in the baselines. Our approach can hence support the clinical
adoption of parameter mapping methods such as dMRI and qMRI.
[COMMENTS]
NeurIPS 2024 Workshop in Machine Learning and the Physical Sciences
[LINK]
http://arxiv.org/abs/2411.10772v1
[DATE]
2024-11-16 19:11:36+08:00
[CATEGORIES]
cs.LG
Building Interpretable Climate Emulators for Economics
[AUTHORS]
Aryan Eftekhari, Doris Folini, Aleksandra Friedl, Felix Kübler, Simon Scheidegger, Olaf Schenk
[ABSTRACT]
This paper presents a framework for developing efficient and interpretable
carbon-cycle emulators (CCEs) as part of climate emulators in Integrated
Assessment Models, enabling economists to custom-build CCEs accurately
calibrated to advanced climate science. We propose a generalized
multi-reservoir linear box-model CCE that preserves key physical quantities and
can be use-case tailored for specific use cases. Three CCEs are presented for
illustration: the 3SR model (replicating DICE-2016), the 4PR model (including
the land biosphere), and the 4PR-X model (accounting for dynamic land-use
changes like deforestation that impact the reservoir’s storage capacity).
Evaluation of these models within the DICE framework shows that land-use
changes in the 4PR-X model significantly impact atmospheric carbon and
temperatures – emphasizing the importance of using tailored climate emulators.
By providing a transparent and flexible tool for policy analysis, our framework
allows economists to assess the economic impacts of climate policies more
accurately.
[LINK]
http://arxiv.org/abs/2411.10768v1
[DATE]
2024-11-16 18:22:23+08:00
[CATEGORIES]
cs.LG
Steam Turbine Anomaly Detection: An Unsupervised Learning Approach Using Enhanced Long Short-Term Memory Variational Autoencoder
[AUTHORS]
Weiming Xu, Peng Zhang
[ABSTRACT]
As core thermal power generation equipment, steam turbines incur significant
expenses and adverse effects on operation when facing interruptions like
downtime, maintenance, and damage. Accurate anomaly detection is the
prerequisite for ensuring the safe and stable operation of steam turbines.
However, challenges in steam turbine anomaly detection, including inherent
anomalies, lack of temporal information analysis, and high-dimensional data
complexity, limit the effectiveness of existing methods. To address these
challenges, we proposed an Enhanced Long Short-Term Memory Variational
Autoencoder using Deep Advanced Features and Gaussian Mixture Model
(ELSTMVAE-DAF-GMM) for precise unsupervised anomaly detection in unlabeled
datasets. Specifically, LSTMVAE, integrating LSTM with VAE, was used to project
high-dimensional time-series data to a low-dimensional phase space. The Deep
Autoencoder-Local Outlier Factor (DAE-LOF) sample selection mechanism was used
to eliminate inherent anomalies during training, further improving the model’s
precision and reliability. The novel deep advanced features (DAF) hybridize
latent embeddings and reconstruction discrepancies from the LSTMVAE model and
provide a more comprehensive data representation within a continuous and
structured phase space, significantly enhancing anomaly detection by
synergizing temporal dynamics with data pattern variations. These DAF were
incorporated into GMM to ensure robust and effective unsupervised anomaly
detection. We utilized real operating data from industry steam turbines and
conducted both comparison and ablation experiments, demonstrating superior
anomaly detection outcomes characterized by high accuracy and minimal false
alarm rates compared with existing methods.
[LINK]
http://arxiv.org/abs/2411.10765v1
[DATE]
2024-11-16 18:11:36+08:00
[CATEGORIES]
cs.LG
ML$^2$Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models
[AUTHORS]
JooHyoung Cha, Munyoung Lee, Jinse Kwon, Jubin Lee, Jemin Lee, Yongin Kwon
[ABSTRACT]
The increasing complexity of deep learning models necessitates specialized
hardware and software optimizations, particularly for deep learning
accelerators. Existing autotuning methods often suffer from prolonged tuning
times due to profiling invalid configurations, which can cause runtime errors.
We introduce ML$^2$Tuner, a multi-level machine learning tuning technique that
enhances autotuning efficiency by incorporating a validity prediction model to
filter out invalid configurations and an advanced performance prediction model
utilizing hidden features from the compilation process. Experimental results on
an extended VTA accelerator demonstrate that ML$^2$Tuner achieves equivalent
performance improvements using only 12.3% of the samples required with a
similar approach as TVM and reduces invalid profiling attempts by an average of
60.8%, Highlighting its potential to enhance autotuning performance by
filtering out invalid configurations
[COMMENTS]
Accepted in NeurIPS 2024 workshop on Machine Learning for Systems, 12
pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.10764v1
[DATE]
2024-11-16 18:10:12+08:00
[CATEGORIES]
cs.LG
Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning
[AUTHORS]
Eslam Eldeeb, Houssem Sifaou, Osvaldo Simeone, Mohammad Shehab, Hirley Alves
[ABSTRACT]
Reinforcement learning (RL) has been widely adopted for controlling and
optimizing complex engineering systems such as next-generation wireless
networks. An important challenge in adopting RL is the need for direct access
to the physical environment. This limitation is particularly severe in
multi-agent systems, for which conventional multi-agent reinforcement learning
(MARL) requires a large number of coordinated online interactions with the
environment during training. When only offline data is available, a direct
application of online MARL schemes would generally fail due to the epistemic
uncertainty entailed by the lack of exploration during training. In this work,
we propose an offline MARL scheme that integrates distributional RL and
conservative Q-learning to address the environment’s inherent aleatoric
uncertainty and the epistemic uncertainty arising from the use of offline data.
We explore both independent and joint learning strategies. The proposed MARL
scheme, referred to as multi-agent conservative quantile regression, addresses
general risk-sensitive design criteria and is applied to the trajectory
planning problem in drone networks, showcasing its advantages.
[COMMENTS]
Early access in IEEE Transactions on Cognitive Communications and
Networking
[LINK]
http://arxiv.org/abs/2402.08421v2
[DATE]
2024-11-16 18:08:06+08:00
[CATEGORIES]
cs.LG
Integrated Machine Learning and Survival Analysis Modeling for Enhanced Chronic Kidney Disease Risk Stratification
[AUTHORS]
Zachary Dana, Ahmed Ammar Naseer, Botros Toro, Sumanth Swaminathan
[ABSTRACT]
Chronic kidney disease (CKD) is a significant public health challenge, often
progressing to end-stage renal disease (ESRD) if not detected and managed
early. Early intervention, warranted by silent disease progression, can
significantly reduce associated morbidity, mortality, and financial burden. In
this study, we propose a novel approach to modeling CKD progression using a
combination of machine learning techniques and classical statistical models.
Building on the work of Liu et al. (2023), we evaluate linear models,
tree-based methods, and deep learning models to extract novel predictors for
CKD progression, with feature importance assessed using Shapley values. These
newly identified predictors, integrated with established clinical features from
the Kidney Failure Risk Equation, are then applied within the framework of Cox
proportional hazards models to predict CKD progression.
[COMMENTS]
Findings paper presented at Machine Learning for Health (ML4H)
symposium 2024, December 15-16, 2024, Vancouver, Canada, 19 pages
[LINK]
http://arxiv.org/abs/2411.10754v1
[DATE]
2024-11-16 17:22:06+08:00
[CATEGORIES]
cs.LG
Memetic Differential Evolution Methods for Semi-Supervised Clustering
[AUTHORS]
Pierluigi Mansueto, Fabio Schoen
[ABSTRACT]
In this paper, we propose an extension for semi-supervised Minimum
Sum-of-Squares Clustering (MSSC) problems of MDEClust, a memetic framework
based on the Differential Evolution paradigm for unsupervised clustering. In
semi-supervised MSSC, background knowledge is available in the form of
(instance-level) “must-link” and “cannot-link” constraints, each of which
indicating if two dataset points should be associated to the same or to a
different cluster, respectively. The presence of such constraints makes the
problem at least as hard as its unsupervised version and, as a consequence,
some framework operations need to be carefully designed to handle this
additional complexity: for instance, it is no more true that each point is
associated to its nearest cluster center. As far as we know, our new framework,
called S-MDEClust, represents the first memetic methodology designed to
generate a (hopefully) optimal feasible solution for semi-supervised MSSC
problems. Results of thorough computational experiments on a set of well-known
as well as synthetic datasets show the effectiveness and efficiency of our
proposal.
[LINK]
http://arxiv.org/abs/2403.04322v2
[DATE]
2024-11-16 17:03:05+08:00
[CATEGORIES]
cs.LG
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
[AUTHORS]
Yuhong Chou, Man Yao, Kexin Wang, Yuqi Pan, Ruijie Zhu, Yiran Zhong, Yu Qiao, Jibin Wu, Bo Xu, Guoqi Li
[ABSTRACT]
Various linear complexity models, such as Linear Transformer (LinFormer),
State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace
the conventional softmax attention in Transformer structures. However, the
optimal design of these linear models is still an open question. In this work,
we attempt to answer this question by finding the best linear approximation to
softmax attention from a theoretical perspective. We start by unifying existing
linear complexity models as the linear attention form and then identify three
conditions for the optimal linear attention design: 1) Dynamic memory ability;
2) Static approximation ability; 3) Least parameter approximation. We find that
none of the current linear models meet all three conditions, resulting in
suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a
solution that satisfies these conditions. Our experiments on Multi-Query
Associative Recall (MQAR) task, language modeling, image classification, and
Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than
the existing linear models.
[LINK]
http://arxiv.org/abs/2411.10741v1
[DATE]
2024-11-16 16:47:32+08:00
[CATEGORIES]
cs.LG
White-Box Diffusion Transformer for single-cell RNA-seq generation
[AUTHORS]
Zhuorui Cui, Shengze Dong, Ding Liu
[ABSTRACT]
As a powerful tool for characterizing cellular subpopulations and cellular
heterogeneity, single cell RNA sequencing (scRNA-seq) technology offers
advantages of high throughput and multidimensional analysis. However, the
process of data acquisition is often constrained by high cost and limited
sample availability. To overcome these limitations, we propose a hybrid model
based on Diffusion model and White-Box transformer that aims to generate
synthetic and biologically plausible scRNA-seq data. Diffusion model
progressively introduce noise into the data and then recover the original data
through a denoising process, a forward and reverse process that is particularly
suitable for generating complex data distributions. White-Box transformer is a
deep learning architecture that emphasizes mathematical interpretability. By
minimizing the encoding rate of the data and maximizing the sparsity of the
representation, it not only reduces the computational burden, but also provides
clear insight into underlying structure. Our White-Box Diffusion Transformer
combines the generative capabilities of Diffusion model with the mathematical
interpretability of White-Box transformer. Through experiments using six
different single-cell RNA-Seq datasets, we visualize both generated and real
data using t-SNE dimensionality reduction technique, as well as quantify
similarity between generated and real data using various metrics to demonstrate
comparable performance of White-Box Diffusion Transformer and Diffusion
Transformer in generating scRNA-seq data alongside significant improvements in
training efficiency and resource utilization. Our code is available at
https://github.com/lingximamo/White-Box-Diffusion-Transformer
[COMMENTS]
11pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.06785v2
[DATE]
2024-11-16 16:36:17+08:00
[CATEGORIES]
cs.LG
VayuBuddy: an LLM-Powered Chatbot to Democratize Air Quality Insights
[AUTHORS]
Zeel B Patel, Yash Bachwana, Nitish Sharma, Sarath Guttikunda, Nipun Batra
[ABSTRACT]
Nearly 6.7 million lives are lost due to air pollution every year. While
policymakers are working on the mitigation strategies, public awareness can
help reduce the exposure to air pollution. Air pollution data from
government-installed sensors is often publicly available in raw format, but
there is a non-trivial barrier for various stakeholders in deriving meaningful
insights from that data. In this work, we present VayuBuddy, a Large Language
Model (LLM)-powered chatbot system to reduce the barrier between the
stakeholders and air quality sensor data. VayuBuddy receives the questions in
natural language, analyses the structured sensory data with a LLM-generated
Python code and provides answers in natural language. We use the data from
Indian government air quality sensors. We benchmark the capabilities of 7 LLMs
on 45 diverse question-answer pairs prepared by us. Additionally, VayuBuddy can
also generate visual analysis such as line-plots, map plot, bar charts and many
others from the sensory data as we demonstrate in this work.
[LINK]
http://arxiv.org/abs/2411.12760v1
[DATE]
2024-11-16 16:02:35+08:00
[CATEGORIES]
cs.LG
SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks
[AUTHORS]
Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib
[ABSTRACT]
Graph Neural Networks (GNNs) have gained traction across different domains
such as transportation, bio-informatics, language processing, and computer
vision. However, there is a noticeable absence of research on applying GNNs to
supply chain networks. Supply chain networks are inherently graph-like in
structure, making them prime candidates for applying GNN methodologies. This
opens up a world of possibilities for optimizing, predicting, and solving even
the most complex supply chain problems. A major setback in this approach lies
in the absence of real-world benchmark datasets to facilitate the research and
resolution of supply chain problems using GNNs. To address the issue, we
present a real-world benchmark dataset for temporal tasks, obtained from one of
the leading FMCG companies in Bangladesh, focusing on supply chain planning for
production purposes. The dataset includes temporal data as node features to
enable sales predictions, production planning, and the identification of
factory issues. By utilizing this dataset, researchers can employ GNNs to
address numerous supply chain problems, thereby advancing the field of supply
chain analytics and planning. Source: https://github.com/CIOL-SUST/SupplyGraph
[COMMENTS]
Accepted to 4th workshop on Graphs and more Complex structures for
Learning and Reasoning, colocated with AAAI 2024. Extended journal version
with experiments is available here: arXiv:2411.08550
[LINK]
http://arxiv.org/abs/2401.15299v2
[DATE]
2024-11-16 15:54:05+08:00
[CATEGORIES]
cs.LG
On-device Anomaly Detection in Conveyor Belt Operations
[AUTHORS]
Luciano S. Martinez-Rau, Yuxuan Zhang, Bengt Oelmann, Sebastian Bader
[ABSTRACT]
Mining 4.0 leverages advancements in automation, digitalization, and
interconnected technologies from Industry 4.0 to address the unique challenges
of the mining sector, enhancing efficiency, safety, and sustainability.
Conveyor belts are crucial in mining operations by enabling the continuous and
efficient movement of bulk materials over long distances, which directly
impacts productivity. While detecting anomalies in specific conveyor belt
components, such as idlers, pulleys, and belt surfaces, has been widely
studied, identifying the root causes of these failures remains critical due to
factors like changing production conditions and operator errors. Continuous
monitoring of mining conveyor belt work cycles for anomaly detection is still
at an early stage and requires robust solutions. This study proposes two
distinctive pattern recognition approaches for real-time anomaly detection in
the operational cycles of mining conveyor belts, combining feature extraction,
threshold-based cycle detection, and tiny machine-learning classification. Both
approaches outperformed a state-of-the-art technique on two datasets for duty
cycle classification in terms of F1-scores. The first approach, with 97.3% and
80.2% for normal and abnormal cycles, respectively, reaches the highest
performance in the first dataset while the second approach excels on the second
dataset, scoring 91.3% and 67.9%. Implemented on two low-power
microcontrollers, the methods demonstrated efficient, real-time operation with
energy consumption of 13.3 and 20.6 ${\mu}$J during inference. These results
offer valuable insights for detecting mechanical failure sources, supporting
targeted preventive maintenance, and optimizing production cycles.
[COMMENTS]
Preprint submitted to IEEE Transactions on Instrumentation and
Measurement
[LINK]
http://arxiv.org/abs/2411.10729v1
[DATE]
2024-11-16 15:46:28+08:00
[CATEGORIES]
cs.LG
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
[AUTHORS]
Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang
[ABSTRACT]
Diffusion Transformers have recently demonstrated unprecedented generative
capabilities for various tasks. The encouraging results, however, come with the
cost of slow inference, since each denoising step requires inference on a
transformer model with a large scale of parameters. In this study, we make an
interesting and somehow surprising observation: the computation of a large
proportion of layers in the diffusion transformer, through introducing a
caching mechanism, can be readily removed even without updating the model
parameters. In the case of U-ViT-H/2, for example, we may remove up to 93.68%
of the computation in the cache steps (46.84% for all steps), with less than
0.01 drop in FID. To achieve this, we introduce a novel scheme, named
Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for
diffusion transformers. Specifically, by leveraging the identical structure of
layers in transformers and the sequential nature of diffusion, we explore
redundant computations between timesteps by treating each layer as the
fundamental unit for caching. To address the challenge of the exponential
search space in deep models for identifying layers to cache and remove, we
propose a novel differentiable optimization objective. An input-invariant yet
timestep-variant router is then optimized, which can finally produce a static
computation graph. Experimental results show that L2C largely outperforms
samplers such as DDIM and DPM-Solver, alongside prior cache-based methods at
the same inference speed. Code is available at
https://github.com/horseee/learning-to-cache
[COMMENTS]
Accepted at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.01733v2
[DATE]
2024-11-16 15:43:28+08:00
[CATEGORIES]
cs.LG
A Model-Agnostic Graph Neural Network for Integrating Local and Global Information
[AUTHORS]
Wenzhuo Zhou, Annie Qu, Keiland W. Cooper, Norbert Fortin, Babak Shahbaba
[ABSTRACT]
Graph Neural Networks (GNNs) have achieved promising performance in a variety
of graph-focused tasks. Despite their success, however, existing GNNs suffer
from two significant limitations: a lack of interpretability in their results
due to their black-box nature, and an inability to learn representations of
varying orders. To tackle these issues, we propose a novel Model-agnostic Graph
Neural Network (MaGNet) framework, which is able to effectively integrate
information of various orders, extract knowledge from high-order neighbors, and
provide meaningful and interpretable results by identifying influential compact
graph structures. In particular, MaGNet consists of two components: an
estimation model for the latent representation of complex relationships under
graph topology, and an interpretation model that identifies influential nodes,
edges, and node features. Theoretically, we establish the generalization error
bound for MaGNet via empirical Rademacher complexity, and demonstrate its power
to represent layer-wise neighborhood mixing. We conduct comprehensive numerical
studies using simulated data to demonstrate the superior performance of MaGNet
in comparison to several state-of-the-art alternatives. Furthermore, we apply
MaGNet to a real-world case study aimed at extracting task-critical information
from brain activity data, thereby highlighting its effectiveness in advancing
scientific research.
[LINK]
http://arxiv.org/abs/2309.13459v4
[DATE]
2024-11-16 15:25:08+08:00
[CATEGORIES]
cs.LG
Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization
[AUTHORS]
Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, Kay Chen Tan
[ABSTRACT]
In this survey, we introduce Meta-Black-Box-Optimization~(MetaBBO) as an
emerging avenue within the Evolutionary Computation~(EC) community, which
incorporates Meta-learning approaches to assist automated algorithm design.
Despite the success of MetaBBO, the current literature provides insufficient
summaries of its key aspects and lacks practical guidance for implementation.
To bridge this gap, we offer a comprehensive review of recent advances in
MetaBBO, providing an in-depth examination of its key developments. We begin
with a unified definition of the MetaBBO paradigm, followed by a systematic
taxonomy of various algorithm design tasks, including algorithm selection,
algorithm configuration, solution manipulation, and algorithm generation.
Further, we conceptually summarize different learning methodologies behind
current MetaBBO works, including reinforcement learning, supervised learning,
neuroevolution, and in-context learning with Large Language Models. A
comprehensive evaluation of the latest representative MetaBBO methods is then
carried out, alongside an experimental analysis of their optimization
performance, computational efficiency, and generalization ability. Based on the
evaluation results, we meticulously identify a set of core designs that enhance
the generalization and learning effectiveness of MetaBBO. Finally, we outline
the vision for the field by providing insight into the latest trends and
potential future directions. Relevant literature will be continuously collected
and updated at \url{https://github.com/GMC-DRL/Awesome-MetaBBO}.
[LINK]
http://arxiv.org/abs/2411.00625v2
[DATE]
2024-11-16 15:23:09+08:00
[CATEGORIES]
cs.LG
UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces
[AUTHORS]
Yuanshao Zhu, James Jianqiao Yu, Xiangyu Zhao, Xuetao Wei, Yuxuan Liang
[ABSTRACT]
Human trajectory modeling is essential for deciphering movement patterns and
supporting advanced applications across various domains. However, existing
methods are often tailored to specific tasks and regions, resulting in
limitations related to task specificity, regional dependency, and data quality
sensitivity. Addressing these challenges requires a universal human trajectory
foundation model capable of generalizing and scaling across diverse tasks and
geographic contexts. To this end, we propose UniTraj, a Universal human
Trajectory foundation model that is task-adaptive, region-independent, and
highly generalizable. To further enhance performance, we construct WorldTrace,
the first large-scale, high-quality, globally distributed dataset sourced from
open web platforms, encompassing 2.45 million trajectories with billions of
points across 70 countries. Through multiple resampling and masking strategies
designed for pre-training, UniTraj effectively overcomes geographic and task
constraints, adapting to heterogeneous data quality. Extensive experiments
across multiple trajectory analysis tasks and real-world datasets demonstrate
that UniTraj consistently outperforms existing approaches in terms of
scalability and adaptability. These results underscore the potential of UniTraj
as a versatile, robust solution for a wide range of trajectory analysis
applications, with WorldTrace serving as an ideal but non-exclusive foundation
for training.
[LINK]
http://arxiv.org/abs/2411.03859v2
[DATE]
2024-11-16 14:53:43+08:00
[CATEGORIES]
cs.LG
Multi Scale Graph Neural Network for Alzheimer’s Disease
[AUTHORS]
Anya Chauhan, Ayush Noori, Zhaozhi Li, Yingnan He, Michelle M Li, Marinka Zitnik, Sudeshna Das
[ABSTRACT]
Alzheimer’s disease (AD) is a complex, progressive neurodegenerative disorder
characterized by extracellular A\b{eta} plaques, neurofibrillary tau tangles,
glial activation, and neuronal degeneration, involving multiple cell types and
pathways. Current models often overlook the cellular context of these pathways.
To address this, we developed a multiscale graph neural network (GNN) model,
ALZ PINNACLE, using brain omics data from donors spanning the entire aging to
AD spectrum. ALZ PINNACLE is based on the PINNACLE GNN framework, which learns
context-aware protein, cell type, and tissue representations within a unified
latent space. ALZ PINNACLE was trained on 14,951 proteins, 206,850 protein
interactions, 7 cell types, and 48 cell subtypes or states. After pretraining,
we investigated the learned embedding of APOE, the largest genetic risk factor
for AD, across different cell types. Notably, APOE embeddings showed high
similarity in microglial, neuronal, and CD8 cells, suggesting a similar role of
APOE in these cell types. Fine tuning the model on AD risk genes revealed cell
type contexts predictive of the role of APOE in AD. Our results suggest that
ALZ PINNACLE may provide a valuable framework for uncovering novel insights
into AD neurobiology.
[COMMENTS]
Findings paper presented at Machine Learning for Health (ML4H)
symposium 2024, December 15-16, 2024, Vancouver, Canada, 9 pages
[LINK]
http://arxiv.org/abs/2411.10720v1
[DATE]
2024-11-16 14:48:14+08:00
[CATEGORIES]
cs.LG
Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models
[AUTHORS]
Ziting Wen, Oscar Pizarro, Stefan Williams
[ABSTRACT]
Fine-tuning the pre-trained model with active learning holds promise for
reducing annotation costs. However, this combination introduces significant
computational costs, particularly with the growing scale of pre-trained models.
Recent research has proposed proxy-based active learning, which pre-computes
features to reduce computational costs. Yet, this approach often incurs a
significant loss in active learning performance, sometimes outweighing the
computational cost savings. This paper demonstrates that not all sample
selection differences result in performance degradation. Furthermore, we show
that suitable training methods can mitigate the decline of active learning
performance caused by certain selection discrepancies. Building upon detailed
analysis, we propose a novel method, aligned selection via proxy, which
improves proxy-based active learning performance by updating pre-computed
features and selecting a proper training method. Extensive experiments validate
that our method improves the total cost of efficient active learning while
maintaining computational efficiency. The code is available at
\url{https://github.com/ZiTingW/asvp}.
[COMMENTS]
Accepted by Transactions on Machine Learning Research (TMLR, 2024)
https://openreview.net/forum?id=PNcgJMJcdl
[LINK]
http://arxiv.org/abs/2403.01101v2
[DATE]
2024-11-16 14:45:43+08:00
[CATEGORIES]
cs.LG
QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring
[AUTHORS]
Nikhil P Ghanathe, Steven J E Wilton
[ABSTRACT]
Uncertainty quantification (UQ) provides a resource-efficient solution for
on-device monitoring of tinyML models deployed without access to true labels.
However, existing UQ methods impose significant memory and compute demands,
making them impractical for ultra-low-power, KB-sized TinyML devices. Prior
work has attempted to reduce overhead by using early-exit ensembles to quantify
uncertainty in a single forward pass, but these approaches still carry
prohibitive costs. To address this, we propose QUTE, a novel resource-efficient
early-exit-assisted ensemble architecture optimized for tinyML models. QUTE
introduces additional output blocks at the final exit of the base network,
distilling early-exit knowledge into these blocks to form a diverse yet
lightweight ensemble. We show that QUTE delivers superior uncertainty quality
on tiny models, achieving comparable performance on larger models with 59%
smaller model sizes than the closest prior work. When deployed on a
microcontroller, QUTE demonstrates a 31% reduction in latency on average. In
addition, we show that QUTE excels at detecting accuracy-drop events,
outperforming all prior works.
[LINK]
http://arxiv.org/abs/2404.12599v2
[DATE]
2024-11-16 14:34:58+08:00
[CATEGORIES]
cs.LG
FlowScope: Enhancing Decision Making by Time Series Forecasting based on Prediction Optimization using HybridFlow Forecast Framework
[AUTHORS]
Nitin Sagar Boyeena, Begari Susheel Kumar
[ABSTRACT]
Time series forecasting is crucial in several sectors, such as meteorology,
retail, healthcare, and finance. Accurately forecasting future trends and
patterns is crucial for strategic planning and making well-informed decisions.
In this case, it is crucial to include many forecasting methodologies. The
strengths of Auto-regressive Integrated Moving Average (ARIMA) for linear time
series, Seasonal ARIMA models (SARIMA) for seasonal time series, Exponential
Smoothing State Space Models (ETS) for handling errors and trends, and Long
Short-Term Memory (LSTM) Neural Network model for complex pattern recognition
have been combined to create a comprehensive framework called FlowScope. SARIMA
excels in capturing seasonal variations, whereas ARIMA ensures effective
handling of linear time series. ETS models excel in capturing trends and
correcting errors, whereas LSTM networks excel in reflecting intricate temporal
connections. By combining these methods from both machine learning and deep
learning, we propose a deep-hybrid learning approach FlowScope which offers a
versatile and robust platform for predicting time series data. This empowers
enterprises to make informed decisions and optimize long-term strategies for
maximum performance.
Keywords: Time Series Forecasting, HybridFlow Forecast Framework, Deep-Hybrid
Learning, Informed Decisions.
[COMMENTS]
12 pages and 6 figures
[LINK]
http://arxiv.org/abs/2411.10716v1
[DATE]
2024-11-16 14:25:30+08:00
[CATEGORIES]
cs.LG
Physics in Next-token Prediction
[AUTHORS]
Hongjun An, Yiliang Song, Xuelong Li
[ABSTRACT]
We discovered the underlying physics in Next-token Prediction (NTP). We
identified the law of information conservation within NTP and proposed the
First Law of Information Capacity (IC-1), demonstrating that the essence of
intelligence emergence in auto-regressive models is fundamentally a process of
information transfer. We also introduced Landauer’s Principle into NTP,
formulating the Second Law of Information Capacity (IC-2), which establishes
the relationship between auto-regressive model training and energy consumption.
Additionally, we presented several corollaries, which hold practical
significance for production practices. Finally, we demonstrate the consistency
between the Law of Information Capacity and the Scaling Law for Neural Language
Models, the Knowledge Capacity Scaling Laws, and the Scaling Laws for
Precision.
[COMMENTS]
Second Submit
[LINK]
http://arxiv.org/abs/2411.00660v2
[DATE]
2024-11-16 14:17:37+08:00
[CATEGORIES]
cs.LG
Efficient generative adversarial networks using linear additive-attention Transformers
[AUTHORS]
Emilio Morales-Juarez, Gibran Fuentes-Pineda
[ABSTRACT]
Although the capacity of deep generative models for image generation, such as
Diffusion Models (DMs) and Generative Adversarial Networks (GANs), has
dramatically improved in recent years, much of their success can be attributed
to computationally expensive architectures. This has limited their adoption and
use to research laboratories and companies with large resources, while
significantly raising the carbon footprint for training, fine-tuning, and
inference. In this work, we present a novel GAN architecture which we call
LadaGAN. This architecture is based on a linear attention Transformer block
named Ladaformer. The main component of this block is a linear
additive-attention mechanism that computes a single attention vector per head
instead of the quadratic dot-product attention. We employ Ladaformer in both
the generator and discriminator, which reduces the computational complexity and
overcomes the training instabilities often associated with Transformer GANs.
LadaGAN consistently outperforms existing convolutional and Transformer GANs on
benchmark datasets at different resolutions while being significantly more
efficient. Moreover, LadaGAN shows competitive performance compared to
state-of-the-art multi-step generative models (e.g. DMs) using orders of
magnitude less computational resources.
[COMMENTS]
12 pages, 6 figures
[LINK]
http://arxiv.org/abs/2401.09596v4
[DATE]
2024-11-16 14:16:42+08:00
[CATEGORIES]
cs.LG
Decision Machines: Congruent Decision Trees
[AUTHORS]
Jinxiong Zhang
[ABSTRACT]
The decision tree recursively partitions the input space into regions and
derives axis-aligned decision boundaries from data. Despite its simplicity and
interpretability, decision trees lack parameterized representation, which makes
it prone to overfitting and difficult to find the optimal structure. We propose
Decision Machines, which embed Boolean tests into a binary vector space and
represent the tree structure as a matrices, enabling an interleaved traversal
of decision trees through matrix computation. Furthermore, we explore the
congruence of decision trees and attention mechanisms, opening new avenues for
optimizing decision trees and potentially enhancing their predictive power.
[LINK]
http://arxiv.org/abs/2101.11347v7
[DATE]
2024-11-16 13:22:37+08:00
[CATEGORIES]
cs.LG
Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Glucose Forecasting
[AUTHORS]
Ebrahim Farahmand, Shovito Barua Soumma, Nooshin Taheri Chatrudi, Hassan Ghasemzadeh
[ABSTRACT]
The availability of continuous glucose monitors as over-the-counter
commodities have created a unique opportunity to monitor a person’s blood
glucose levels, forecast blood glucose trajectories and provide automated
interventions to prevent devastating chronic complications that arise from poor
glucose control. However, forecasting blood glucose levels is challenging
because blood glucose changes consistently in response to food intake,
medication intake, physical activity, sleep, and stress. It is particularly
difficult to accurately predict BGL from multimodal and irregularly sampled
data and over long prediction horizons. Furthermore, these forecasting models
must operate in real-time on edge devices to provide in-the-moment
interventions. To address these challenges, we propose GlucoNet, an AI-powered
sensor system for continuously monitoring behavioral and physiological health
and robust forecasting of blood glucose patterns. GlucoNet devises a feature
decomposition-based transformer model that incorporates patients’ behavioral
and physiological data and transforms sparse and irregular patient data (e.g.,
diet and medication intake data) into continuous features using a mathematical
model, facilitating better integration with the BGL data. Given the non-linear
and non-stationary nature of BG signals, we propose a decomposition method to
extract both low and high-frequency components from the BGL signals, thus
providing accurate forecasting. To reduce the computational complexity, we also
propose to employ knowledge distillation to compress the transformer model.
GlucoNet achieves a 60% improvement in RMSE and a 21% reduction in the number
of parameters, using data obtained involving 12 participants with T1-Diabetes.
These results underscore GlucoNet’s potential as a compact and reliable tool
for real-world diabetes prevention and management.
[LINK]
http://arxiv.org/abs/2411.10703v1
[DATE]
2024-11-16 13:09:20+08:00
[CATEGORIES]
cs.LG
Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection
[AUTHORS]
Ying Yang, De Cheng, Chaowei Fang, Yubiao Wang, Changzhe Jiao, Lechao Cheng, Nannan Wang
[ABSTRACT]
Unsupervised out-of-distribution (OOD) detection aims to identify
out-of-domain data by learning only from unlabeled In-Distribution (ID)
training samples, which is crucial for developing a safe real-world machine
learning system. Current reconstruction-based methods provide a good
alternative approach by measuring the reconstruction error between the input
and its corresponding generative counterpart in the pixel/feature space.
However, such generative methods face a key dilemma: improving the
reconstruction power of the generative model while keeping a compact
representation of the ID data. To address this issue, we propose the
diffusion-based layer-wise semantic reconstruction approach for unsupervised
OOD detection. The innovation of our approach is that we leverage the diffusion
model’s intrinsic data reconstruction ability to distinguish ID samples from
OOD samples in the latent feature space. Moreover, to set up a comprehensive
and discriminative feature representation, we devise a multi-layer semantic
feature extraction strategy. By distorting the extracted features with Gaussian
noise and applying the diffusion model for feature reconstruction, the
separation of ID and OOD samples is implemented according to the reconstruction
errors. Extensive experimental results on multiple benchmarks built upon
various datasets demonstrate that our method achieves state-of-the-art
performance in terms of detection accuracy and speed. Code is available at
https://github.com/xbyym/DLSR.
[COMMENTS]
26 pages, 23 figures, published to Neurlps2024
[LINK]
http://arxiv.org/abs/2411.10701v1
[DATE]
2024-11-16 12:54:07+08:00
[CATEGORIES]
cs.LG
RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models
[AUTHORS]
Serena Zhang, Sraavya Sambara, Oishi Banerjee, Julian Acosta, L. John Fahrner, Pranav Rajpurkar
[ABSTRACT]
Generating accurate radiology reports from medical images is a clinically
important but challenging task. While current Vision Language Models (VLMs)
show promise, they are prone to generating hallucinations, potentially
compromising patient care. We introduce RadFlag, a black-box method to enhance
the accuracy of radiology report generation. Our method uses a sampling-based
flagging technique to find hallucinatory generations that should be removed. We
first sample multiple reports at varying temperatures and then use a Large
Language Model (LLM) to identify claims that are not consistently supported
across samples, indicating that the model has low confidence in those claims.
Using a calibrated threshold, we flag a fraction of these claims as likely
hallucinations, which should undergo extra review or be automatically rejected.
Our method achieves high precision when identifying both individual
hallucinatory sentences and reports that contain hallucinations. As an
easy-to-use, black-box system that only requires access to a model’s
temperature parameter, RadFlag is compatible with a wide range of radiology
report generation models and has the potential to broadly improve the quality
of automated radiology reporting.
[COMMENTS]
17 pages, 6 figures
[LINK]
http://arxiv.org/abs/2411.00299v2
[DATE]
2024-11-16 12:37:48+08:00
[CATEGORIES]
cs.LG
HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
[AUTHORS]
Huaqin Zhao, Jiaxi Li, Yi Pan, Shizhe Liang, Xiaofeng Yang, Wei Liu, Xiang Li, Fei Dou, Tianming Liu, Jin Lu
[ABSTRACT]
Fine-tuning large language models (LLMs) poses significant memory challenges,
as the back-propagation process demands extensive resources, especially with
growing model sizes. Recent work, MeZO, addresses this issue using a
zeroth-order (ZO) optimization method, which reduces memory consumption by
matching the usage to the inference phase. However, MeZO experiences slow
convergence due to varying curvatures across model parameters. To overcome this
limitation, we introduce HELENE, a novel scalable and memory-efficient
optimizer that integrates annealed A-GNB gradients with a diagonal Hessian
estimation and layer-wise clipping, serving as a second-order pre-conditioner.
This combination allows for faster and more stable convergence. Our theoretical
analysis demonstrates that HELENE improves convergence rates, particularly for
models with heterogeneous layer dimensions, by reducing the dependency on the
total parameter space dimension. Instead, the method scales with the largest
layer dimension, making it highly suitable for modern LLM architectures.
Experimental results on RoBERTa-large and OPT-1.3B across multiple tasks show
that HELENE achieves up to a 20x speedup compared to MeZO, with average
accuracy improvements of 1.5%. Furthermore, HELENE remains compatible with both
full parameter tuning and parameter-efficient fine-tuning (PEFT), outperforming
several state-of-the-art optimizers. The codes will be released after
reviewing.
[LINK]
http://arxiv.org/abs/2411.10696v1
[DATE]
2024-11-16 12:27:22+08:00
[CATEGORIES]
cs.LG
Series Expansion of Probability of Correct Selection for Improved Finite Budget Allocation in Ranking and Selection
[AUTHORS]
Xinbo Shi, Yijie Peng, Bruno Tuffin
[ABSTRACT]
This paper addresses the challenge of improving finite sample performance in
Ranking and Selection by developing a Bahadur-Rao type expansion for the
Probability of Correct Selection (PCS). While traditional large deviations
approximations captures PCS behavior in the asymptotic regime, they can lack
precision in finite sample settings. Our approach enhances PCS approximation
under limited simulation budgets, providing more accurate characterization of
optimal sampling ratios and optimality conditions dependent of budgets.
Algorithmically, we propose a novel finite budget allocation (FCBA) policy,
which sequentially estimates the optimality conditions and accordingly balances
the sampling ratios. We illustrate numerically on toy examples that our FCBA
policy achieves superior PCS performance compared to tested traditional
methods. As an extension, we note that the non-monotonic PCS behavior described
in the literature for low-confidence scenarios can be attributed to the
negligence of simultaneous incorrect binary comparisons in PCS approximations.
We provide a refined expansion and a tailored allocation strategy to handle
low-confidence scenarios, addressing the non-monotonicity issue.
[LINK]
http://arxiv.org/abs/2411.10695v1
[DATE]
2024-11-16 12:26:19+08:00
[CATEGORIES]
cs.LG
Verified Safe Reinforcement Learning for Neural Network Dynamic Models
[AUTHORS]
Junlin Wu, Huan Zhang, Yevgeniy Vorobeychik
[LINK]
http://arxiv.org/abs/2405.15994v2
[DATE]
2024-11-16 12:21:50+08:00
[CATEGORIES]
cs.LG
DEBUG-HD: Debugging TinyML models on-device using Hyper-Dimensional computing
[AUTHORS]
Nikhil P Ghanathe, Steven J E Wilton
[ABSTRACT]
TinyML models often operate in remote, dynamic environments without cloud
connectivity, making them prone to failures. Ensuring reliability in such
scenarios requires not only detecting model failures but also identifying their
root causes. However, transient failures, privacy concerns, and the
safety-critical nature of many applications-where systems cannot be interrupted
for debugging-complicate the use of raw sensor data for offline analysis. We
propose DEBUG-HD, a novel, resource-efficient on-device debugging approach
optimized for KB-sized tinyML devices that utilizes hyper-dimensional computing
(HDC). Our method introduces a new HDC encoding technique that leverages
conventional neural networks, allowing DEBUG-HD to outperform prior binary HDC
methods by 27% on average in detecting input corruptions across various image
and audio datasets.
[COMMENTS]
Accepted at the Machine Learning for Systems Workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10692v1
[DATE]
2024-11-16 12:03:22+08:00
[CATEGORIES]
cs.LG
MaskMedPaint: Masked Medical Image Inpainting with Diffusion Models for Mitigation of Spurious Correlations
[AUTHORS]
Qixuan Jin, Walter Gerych, Marzyeh Ghassemi
[ABSTRACT]
Spurious features associated with class labels can lead image classifiers to
rely on shortcuts that don’t generalize well to new domains. This is especially
problematic in medical settings, where biased models fail when applied to
different hospitals or systems. In such cases, data-driven methods to reduce
spurious correlations are preferred, as clinicians can directly validate the
modified images. While Denoising Diffusion Probabilistic Models (Diffusion
Models) show promise for natural images, they are impractical for medical use
due to the difficulty of describing spurious medical features. To address this,
we propose Masked Medical Image Inpainting (MaskMedPaint), which uses
text-to-image diffusion models to augment training images by inpainting areas
outside key classification regions to match the target domain. We demonstrate
that MaskMedPaint enhances generalization to target domains across both natural
(Waterbirds, iWildCam) and medical (ISIC 2018, Chest X-ray) datasets, given
limited unlabeled target images.
[COMMENTS]
Findings paper presented at Machine Learning for Health (ML4H)
symposium 2024, December 15-16, 2024, Vancouver, Canada, 12 pages
[LINK]
http://arxiv.org/abs/2411.10686v1
[DATE]
2024-11-16 11:23:06+08:00
[CATEGORIES]
cs.LG
SAFES: Sequential Privacy and Fairness Enhancing Data Synthesis for Responsible AI
[AUTHORS]
Spencer Giddens, Fang Liu
[ABSTRACT]
As data-driven and AI-based decision making gains widespread adoption in most
disciplines, it is crucial that both data privacy and decision fairness are
appropriately addressed. While differential privacy (DP) provides a robust
framework for guaranteeing privacy and several widely accepted methods have
been proposed for improving fairness, the vast majority of existing literature
treats the two concerns independently. For methods that do consider privacy and
fairness simultaneously, they often only apply to a specific machine learning
task, limiting their generalizability. In response, we introduce SAFES, a
Sequential PrivAcy and Fairness Enhancing data Synthesis procedure that
sequentially combines DP data synthesis with a fairness-aware data
transformation. SAFES allows full control over the privacy-fairness-utility
trade-off via tunable privacy and fairness parameters. We illustrate SAFES by
combining AIM, a graphical model-based DP data synthesizer, with a popular
fairness-aware data pre-processing transformation. Empirical evaluations on the
Adult and COMPAS datasets demonstrate that for reasonable privacy loss,
SAFES-generated synthetic data achieve significantly improved fairness metrics
with relatively low utility loss.
[LINK]
http://arxiv.org/abs/2411.09178v2
[DATE]
2024-11-16 11:13:23+08:00
[CATEGORIES]
cs.LG
Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication
[AUTHORS]
Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, Chenfanfu Jiang
[ABSTRACT]
Existing diffusion-based text-to-3D generation methods primarily focus on
producing visually realistic shapes and appearances, often neglecting the
physical constraints necessary for downstream tasks. Generated models
frequently fail to maintain balance when placed in physics-based simulations or
3D printed. This balance is crucial for satisfying user design intentions in
interactive gaming, embodied AI, and robotics, where stable models are needed
for reliable interaction. Additionally, stable models ensure that 3D-printed
objects, such as figurines for home decoration, can stand on their own without
requiring additional supports. To fill this gap, we introduce Atlas3D, an
automatic and easy-to-implement method that enhances existing Score
Distillation Sampling (SDS)-based text-to-3D tools. Atlas3D ensures the
generation of self-supporting 3D models that adhere to physical laws of
stability under gravity, contact, and friction. Our approach combines a novel
differentiable simulation-based loss function with physically inspired
regularization, serving as either a refinement or a post-processing module for
existing frameworks. We verify Atlas3D’s efficacy through extensive generation
tasks and validate the resulting 3D models in both simulated and real-world
environments.
[COMMENTS]
Project Page: https://yunuoch.github.io/Atlas3D/
[LINK]
http://arxiv.org/abs/2405.18515v2
[DATE]
2024-11-16 10:54:28+08:00
[CATEGORIES]
cs.LG
Geometric Deep Learning for Structure-Based Drug Design: A Survey
[AUTHORS]
Zaixi Zhang, Jiaxian Yan, Yining Huang, Qi Liu, Enhong Chen, Mengdi Wang, Marinka Zitnik
[ABSTRACT]
Structure-based drug design (SBDD) leverages the three-dimensional geometry
of proteins to identify potential drug candidates. Traditional approaches,
rooted in physicochemical modeling and domain expertise, are often
resource-intensive. Recent advancements in geometric deep learning, which
effectively integrate and process 3D geometric data, alongside breakthroughs in
accurate protein structure predictions from tools like AlphaFold, have
significantly propelled the field forward. This paper systematically reviews
the state-of-the-art in geometric deep learning for SBDD. We begin by outlining
foundational tasks in SBDD, discussing prevalent 3D protein representations,
and highlighting representative predictive and generative models. Next, we
provide an in-depth review of key tasks, including binding site prediction,
binding pose generation, de novo molecule generation, linker design, protein
pocket generation, and binding affinity prediction. For each task, we present
formal problem definitions, key methods, datasets, evaluation metrics, and
performance benchmarks. Lastly, we explore current challenges and future
opportunities in SBDD. Challenges include oversimplified problem formulations,
limited out-of-distribution generalization, biosecurity concerns related to the
misuse of structural data, insufficient evaluation metrics and large-scale
benchmarks, and the need for experimental validation and enhanced model
interpretability. Opportunities lie in leveraging multimodal datasets,
integrating domain knowledge, developing comprehensive benchmarks, establishing
criteria aligned with clinical outcomes, and designing foundation models to
expand the scope of design tasks. We also curate
\url{https://github.com/zaixizhang/Awesome-SBDD}, reflecting ongoing
contributions and new datasets in SBDD.
[COMMENTS]
28 pages, under review
[LINK]
http://arxiv.org/abs/2306.11768v6
[DATE]
2024-11-16 10:51:56+08:00
[CATEGORIES]
cs.LG
Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment
[AUTHORS]
Danqing Ma, Meng Wang, Ao Xiang, Zongqing Qi, Qin Yang
[ABSTRACT]
This study proposes a multi-modal fusion framework Multitrans based on the
Transformer architecture and self-attention mechanism. This architecture
combines the study of non-contrast computed tomography (NCCT) images and
discharge diagnosis reports of patients undergoing stroke treatment, using a
variety of methods based on Transformer architecture approach to predicting
functional outcomes of stroke treatment. The results show that the performance
of single-modal text classification is significantly better than single-modal
image classification, but the effect of multi-modal combination is better than
any single modality. Although the Transformer model only performs worse on
imaging data, when combined with clinical meta-diagnostic information, both can
learn better complementary information and make good contributions to
accurately predicting stroke treatment effects..
[LINK]
http://arxiv.org/abs/2404.12634v3
[DATE]
2024-11-16 10:36:32+08:00
[CATEGORIES]
cs.LG
How to Defend Against Large-scale Model Poisoning Attacks in Federated Learning: A Vertical Solution
[AUTHORS]
Jinbo Wang, Ruijin Wang, Fengli Zhang
[ABSTRACT]
Federated learning (FL) is vulnerable to model poisoning attacks due to its
distributed nature. The current defenses start from all user gradients (model
updates) in each communication round and solve for the optimal aggregation
gradients (horizontal solution). This horizontal solution will completely fail
when facing large-scale (>50%) model poisoning attacks. In this work, based on
the key insight that the convergence process of the model is a highly
predictable process, we break away from the traditional horizontal solution of
defense and innovatively transform the problem of solving the optimal
aggregation gradients into a vertical solution problem. We propose VERT, which
uses global communication rounds as the vertical axis, trains a predictor using
historical gradients information to predict user gradients, and compares the
similarity with actual user gradients to precisely and efficiently select the
optimal aggregation gradients. In order to reduce the computational complexity
of VERT, we design a low dimensional vector projector to project the user
gradients to a computationally acceptable length, and then perform subsequent
predictor training and prediction tasks. Exhaustive experiments show that VERT
is efficient and scalable, exhibiting excellent large-scale (>=80%) model
poisoning defense effects under different FL scenarios. In addition, we can
design projector with different structures for different model structures to
adapt to aggregation servers with different computing power.
[LINK]
http://arxiv.org/abs/2411.10673v1
[DATE]
2024-11-16 10:25:05+08:00
[CATEGORIES]
cs.LG
Enhancing PTSD Outcome Prediction with Ensemble Models in Disaster Contexts
[AUTHORS]
Ayesha Siddiqua, Atib Mohammad Oni, Abu Saleh Musa Miah, Jungpil Shin
[ABSTRACT]
Post-traumatic stress disorder (PTSD) is a significant mental health
challenge that affects individuals exposed to traumatic events. Early detection
and effective intervention for PTSD are crucial, as it can lead to long-term
psychological distress if untreated. Accurate detection of PTSD is essential
for timely and targeted mental health interventions, especially in
disaster-affected populations. Existing research has explored machine learning
approaches for classifying PTSD, but many face limitations in terms of model
performance and generalizability. To address these issues, we implemented a
comprehensive preprocessing pipeline. This included data cleaning, missing
value treatment using the SimpleImputer, label encoding of categorical
variables, data augmentation using SMOTE to balance the dataset, and feature
scaling with StandardScaler. The dataset was split into 80\% training and 20\%
testing. We developed an ensemble model using a majority voting technique among
several classifiers, including Logistic Regression, Support Vector Machines
(SVM), Random Forest, XGBoost, LightGBM, and a customized Artificial Neural
Network (ANN). The ensemble model achieved an accuracy of 96.76\% with a
benchmark dataset, significantly outperforming individual models. The proposed
method’s advantages include improved robustness through the combination of
multiple models, enhanced ability to generalize across diverse data points, and
increased accuracy in detecting PTSD. Additionally, the use of SMOTE for data
augmentation ensured better handling of imbalanced datasets, leading to more
reliable predictions. The proposed approach offers valuable insights for
policymakers and healthcare providers by leveraging predictive analytics to
address mental health issues in vulnerable populations, particularly those
affected by disasters.
[LINK]
http://arxiv.org/abs/2411.10661v1
[DATE]
2024-11-16 09:44:43+08:00
[CATEGORIES]
cs.LG
Pluralistic Alignment Over Time
[AUTHORS]
Toryn Q. Klassen, Parand A. Alamdari, Sheila A. McIlraith
[COMMENTS]
Pluralistic Alignment Workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10654v1
[DATE]
2024-11-16 09:23:25+08:00
[CATEGORIES]
cs.LG
Understanding Learning with Sliced-Wasserstein Requires Rethinking Informative Slices
[AUTHORS]
Huy Tran, Yikun Bai, Ashkan Shahbazi, John R. Hershey, Soheil Kolouri
[ABSTRACT]
The practical applications of Wasserstein distances (WDs) are constrained by
their sample and computational complexities. Sliced-Wasserstein distances
(SWDs) provide a workaround by projecting distributions onto one-dimensional
subspaces, leveraging the more efficient, closed-form WDs for one-dimensional
distributions. However, in high dimensions, most random projections become
uninformative due to the concentration of measure phenomenon. Although several
SWD variants have been proposed to focus on \textit{informative} slices, they
often introduce additional complexity, numerical instability, and compromise
desirable theoretical (metric) properties of SWD. Amidst the growing literature
that focuses on directly modifying the slicing distribution, which often face
challenges, we revisit the classical Sliced-Wasserstein and propose instead to
rescale the 1D Wasserstein to make all slices equally informative. Importantly,
we show that with an appropriate data assumption and notion of \textit{slice
informativeness}, rescaling for all individual slices simplifies to \textbf{a
single global scaling factor} on the SWD. This, in turn, translates to the
standard learning rate search for gradient-based learning in common machine
learning workflows. We perform extensive experiments across various machine
learning tasks showing that the classical SWD, when properly configured, can
often match or surpass the performance of more complex variants. We then answer
the following question: “Is Sliced-Wasserstein all you need for common learning
tasks?”
[LINK]
http://arxiv.org/abs/2411.10651v1
[DATE]
2024-11-16 09:18:27+08:00
[CATEGORIES]
cs.LG
Deep Learning-Based Image Compression for Wireless Communications: Impacts on Reliability,Throughput, and Latency
[AUTHORS]
Mostafa Naseri, Pooya Ashtari, Mohamed Seif, Eli De Poorter, H. Vincent Poor, Adnan Shahid
[ABSTRACT]
In wireless communications, efficient image transmission must balance
reliability, throughput, and latency, especially under dynamic channel
conditions. This paper presents an adaptive and progressive pipeline for
learned image compression (LIC)-based architectures tailored to such
environments. We investigate two state-of-the-art learning-based models: the
hyperprior model and Vector Quantized Generative Adversarial Network (VQGAN).
The hyperprior model achieves superior compression performance through lossless
compression in the bottleneck but is susceptible to bit errors, necessitating
the use of error correction or retransmission mechanisms. In contrast, the
VQGAN decoder demonstrates robust image reconstruction capabilities even in the
absence of channel coding, enhancing reliability in challenging transmission
scenarios. We propose progressive versions of both models, enabling partial
image transmission and decoding under imperfect channel conditions. This
progressive approach not only maintains image integrity under poor channel
conditions but also significantly reduces latency by allowing immediate partial
image availability. We evaluate our pipeline using the Kodak high-resolution
image dataset under a Rayleigh fading wireless channel model simulating dynamic
conditions. The results indicate that the progressive transmission framework
enhances reliability and latency while maintaining or improving throughput
compared to non-progressive counterparts across various Signal-to-Noise Ratio
(SNR) levels. Specifically, the progressive-hyperprior model consistently
outperforms others in latency metrics, particularly in the 99.9th percentile
waiting time-a measure indicating the maximum waiting time experienced by 99.9%
of transmission instances-across all SNRs, and achieves higher throughput in
low SNR scenarios. where Adaptive WebP fails.
[LINK]
http://arxiv.org/abs/2411.10650v1
[DATE]
2024-11-16 09:14:55+08:00
[CATEGORIES]
cs.LG
Patient-Specific Models of Treatment Effects Explain Heterogeneity in Tuberculosis
[AUTHORS]
Ethan Wu, Caleb Ellington, Ben Lengerich, Eric P. Xing
[ABSTRACT]
Tuberculosis (TB) is a major global health challenge, and is compounded by
co-morbidities such as HIV, diabetes, and anemia, which complicate treatment
outcomes and contribute to heterogeneous patient responses. Traditional models
of TB often overlook this heterogeneity by focusing on broad, pre-defined
patient groups, thereby missing the nuanced effects of individual patient
contexts. We propose moving beyond coarse subgroup analyses by using
contextualized modeling, a multi-task learning approach that encodes patient
context into personalized models of treatment effects, revealing
patient-specific treatment benefits. Applied to the TB Portals dataset with
multi-modal measurements for over 3,000 TB patients, our model reveals
structured interactions between co-morbidities, treatments, and patient
outcomes, identifying anemia, age of onset, and HIV as influential for
treatment efficacy. By enhancing predictive accuracy in heterogeneous
populations and providing patient-specific insights, contextualized models
promise to enable new approaches to personalized treatment.
[COMMENTS]
Findings paper presented at Machine Learning for Health (ML4H)
symposium 2024, December 15-16, 2024, Vancouver, Canada, 4 pages
[LINK]
http://arxiv.org/abs/2411.10645v1
[DATE]
2024-11-16 08:55:24+08:00
[CATEGORIES]
cs.LG
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
[AUTHORS]
Kai Helli, David Schnurr, Noah Hollmann, Samuel Müller, Frank Hutter
[ABSTRACT]
While most ML models expect independent and identically distributed data,
this assumption is often violated in real-world scenarios due to distribution
shifts, resulting in the degradation of machine learning model performance.
Until now, no tabular method has consistently outperformed classical supervised
learning, which ignores these shifts. To address temporal distribution shifts,
we present Drift-Resilient TabPFN, a fresh approach based on In-Context
Learning with a Prior-Data Fitted Network that learns the learning algorithm
itself: it accepts the entire training dataset as input and makes predictions
on the test set in a single forward pass. Specifically, it learns to
approximate Bayesian inference on synthetic datasets drawn from a prior that
specifies the model’s inductive bias. This prior is based on structural causal
models (SCM), which gradually shift over time. To model shifts of these causal
models, we use a secondary SCM, that specifies changes in the primary model
parameters. The resulting Drift-Resilient TabPFN can be applied to unseen data,
runs in seconds on small to moderately sized datasets and needs no
hyperparameter tuning. Comprehensive evaluations across 18 synthetic and
real-world datasets demonstrate large performance improvements over a wide
range of baselines, such as XGB, CatBoost, TabPFN, and applicable methods
featured in the Wild-Time benchmark. Compared to the strongest baselines, it
improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while
maintaining stronger calibration. This approach could serve as significant
groundwork for further research on out-of-distribution prediction.
[COMMENTS]
Accepted at the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2411.10634v1
[DATE]
2024-11-16 07:49:23+08:00
[CATEGORIES]
cs.LG
KAT to KANs: A Review of Kolmogorov-Arnold Networks and the Neural Leap Forward
[AUTHORS]
Divesh Basina, Joseph Raj Vishal, Aarya Choudhary, Bharatesh Chakravarthi
[ABSTRACT]
The curse of dimensionality poses a significant challenge to modern
multilayer perceptron-based architectures, often causing performance stagnation
and scalability issues. Addressing this limitation typically requires vast
amounts of data. In contrast, Kolmogorov-Arnold Networks have gained attention
in the machine learning community for their bold claim of being unaffected by
the curse of dimensionality. This paper explores the Kolmogorov-Arnold
representation theorem and the mathematical principles underlying
Kolmogorov-Arnold Networks, which enable their scalability and high performance
in high-dimensional spaces. We begin with an introduction to foundational
concepts necessary to understand Kolmogorov-Arnold Networks, including
interpolation methods and Basis-splines, which form their mathematical
backbone. This is followed by an overview of perceptron architectures and the
Universal approximation theorem, a key principle guiding modern machine
learning. This is followed by an overview of the Kolmogorov-Arnold
representation theorem, including its mathematical formulation and implications
for overcoming dimensionality challenges. Next, we review the architecture and
error-scaling properties of Kolmogorov-Arnold Networks, demonstrating how these
networks achieve true freedom from the curse of dimensionality. Finally, we
discuss the practical viability of Kolmogorov-Arnold Networks, highlighting
scenarios where their unique capabilities position them to excel in real-world
applications. This review aims to offer insights into Kolmogorov-Arnold
Networks’ potential to redefine scalability and performance in high-dimensional
learning tasks.
[LINK]
http://arxiv.org/abs/2411.10622v1
[DATE]
2024-11-16 07:02:26+08:00
[CATEGORIES]
cs.LG
Electrical Load Forecasting in Smart Grid: A Personalized Federated Learning Approach
[AUTHORS]
Ratun Rahman, Neeraj Kumar, Dinh C. Nguyen
[ABSTRACT]
Electric load forecasting is essential for power management and stability in
smart grids. This is mainly achieved via advanced metering infrastructure,
where smart meters (SMs) are used to record household energy consumption.
Traditional machine learning (ML) methods are often employed for load
forecasting but require data sharing which raises data privacy concerns.
Federated learning (FL) can address this issue by running distributed ML models
at local SMs without data exchange. However, current FL-based approaches
struggle to achieve efficient load forecasting due to imbalanced data
distribution across heterogeneous SMs. This paper presents a novel personalized
federated learning (PFL) method to load prediction under non-independent and
identically distributed (non-IID) metering data settings. Specifically, we
introduce meta-learning, where the learning rates are manipulated using the
meta-learning idea to maximize the gradient for each client in each global
round. Clients with varying processing capacities, data sizes, and batch sizes
can participate in global model aggregation and improve their local load
forecasting via personalized learning. Simulation results show that our
approach outperforms state-of-the-art ML and FL methods in terms of better load
forecasting accuracy.
[COMMENTS]
This paper has been accepted by the IEEE Consumer Communications \&
Networking Conference (CCNC), Jan. 2025
[LINK]
http://arxiv.org/abs/2411.10619v1
[DATE]
2024-11-16 06:44:50+08:00
[CATEGORIES]
cs.LG
Attraction-Repulsion Swarming: A Generalized Framework of t-SNE via Force Normalization and Tunable Interactions
[AUTHORS]
Jingcheng Lu, Jeff Calder
[ABSTRACT]
We propose a new method for data visualization based on attraction-repulsion
swarming (ARS) dynamics, which we call ARS visualization. ARS is a generalized
framework that is based on viewing the t-distributed stochastic neighbor
embedding (t-SNE) visualization technique as a swarm of interacting agents
driven by attraction and repulsion. Motivated by recent developments in
swarming, we modify the t-SNE dynamics to include a normalization by the
\emph{total influence}, which results in better posed dynamics in which we can
use a data size independent time step (of $h=1$) and a simple iteration,
without the need for the array of optimization tricks employed in t-SNE. ARS
also includes the ability to separately tune the attraction and repulsion
kernels, which gives the user control over the tightness within clusters and
the spacing between them in the visualization.
In contrast with t-SNE, our proposed ARS data visualization method is not
gradient descent on the Kullback-Leibler divergence, and can be viewed solely
as an interacting particle system driven by attraction and repulsion forces. We
provide theoretical results illustrating how the choice of interaction kernel
affects the dynamics, and experimental results to validate our method and
compare to t-SNE on the MNIST and Cifar-10 data sets.
[LINK]
http://arxiv.org/abs/2411.10617v1
[DATE]
2024-11-16 06:42:11+08:00
[CATEGORIES]
cs.LG
Budget-Aware Sequential Brick Assembly with Efficient Constraint Satisfaction
[AUTHORS]
Seokjun Ahn, Jungtaek Kim, Minsu Cho, Jaesik Park
[ABSTRACT]
We tackle the problem of sequential brick assembly with LEGO bricks to create
combinatorial 3D structures. This problem is challenging since this brick
assembly task encompasses the characteristics of combinatorial optimization
problems. In particular, the number of assemblable structures increases
exponentially as the number of bricks used increases. To solve this problem, we
propose a new method to predict the scores of the next brick position by
employing a U-shaped sparse 3D convolutional neural network. Along with the 3D
convolutional network, a one-initialized brick-sized convolution filter is used
to efficiently validate assembly constraints between bricks without training
itself. By the nature of this one-initialized convolution filter, we can
readily consider several different brick types by benefiting from modern
implementation of convolution operations. To generate a novel structure, we
devise a sampling strategy to determine the next brick position considering the
satisfaction of assembly constraints. Moreover, our method is designed for
either budget-free or budget-aware scenario where a budget may confine the
number of bricks and their types. We demonstrate that our method successfully
generates a variety of brick structures and outperforms existing methods with
Bayesian optimization, deep graph generative model, and reinforcement learning.
[COMMENTS]
Accepted for publication in Transactions on Machine Learning Research
(TMLR). Seokjun Ahn and Jungtaek Kim equally contributed
[LINK]
http://arxiv.org/abs/2210.01021v3
[DATE]
2024-11-16 06:42:07+08:00
[CATEGORIES]
cs.LG
To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling
[AUTHORS]
Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Emiliano De Cristofaro, Jamie Hayes
[ABSTRACT]
Differentially Private Stochastic Gradient Descent (DP-SGD) is a popular
method for training machine learning models with formal Differential Privacy
(DP) guarantees. As DP-SGD processes the training data in batches, it uses
Poisson sub-sampling to select batches at each step. However, due to
computational and compatibility benefits, replacing sub-sampling with shuffling
has become common practice. Yet, since tight theoretical guarantees for
shuffling are currently unknown, prior work using shuffling reports DP
guarantees as though Poisson sub-sampling was used.
This prompts the need to verify whether this discrepancy is reflected in a
gap between the theoretical guarantees from state-of-the-art models and the
actual privacy leakage. To do so, we introduce a novel DP auditing procedure to
analyze DP-SGD with shuffling. We show that state-of-the-art DP models trained
with shuffling appreciably overestimated privacy guarantees (up to 4x). In the
process, we assess the impact of several parameters, such as batch size,
privacy budget, and threat model, on privacy leakage. Finally, we study two
variations of the shuffling procedure found in the wild, which result in
further privacy leakage. Overall, our work empirically attests to the risk of
using shuffling instead of Poisson sub-sampling vis-`a-vis the actual privacy
leakage of DP-SGD.
[LINK]
http://arxiv.org/abs/2411.10614v1
[DATE]
2024-11-16 06:34:28+08:00
[CATEGORIES]
cs.LG
Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI
[AUTHORS]
Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith
[COMMENTS]
Pluralistic Alignment Workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10613v1
[DATE]
2024-11-16 06:34:09+08:00
[CATEGORIES]
cs.LG
Pooling Image Datasets With Multiple Covariate Shift and Imbalance
[AUTHORS]
Sotirios Panagiotis Chytas, Vishnu Suresh Lokhande, Peiran Li, Vikas Singh
[ABSTRACT]
Small sample sizes are common in many disciplines, which necessitates pooling
roughly similar datasets across multiple institutions to study weak but
relevant associations between images and disease outcomes. Such data often
manifest shift/imbalance in covariates (i.e., secondary non-imaging data).
Controlling for such nuisance variables is common within standard statistical
analysis, but the ideas do not directly apply to overparameterized models.
Consequently, recent work has shown how strategies from invariant
representation learning provides a meaningful starting point, but the current
repertoire of methods is limited to accounting for shifts/imbalances in just a
couple of covariates at a time. In this paper, we show how viewing this problem
from the perspective of Category theory provides a simple and effective
solution that completely avoids elaborate multi-stage training pipelines that
would otherwise be needed. We show the effectiveness of this approach via
extensive experiments on real datasets. Further, we discuss how this style of
formulation offers a unified perspective on at least 5+ distinct problem
settings, from self-supervised learning to matching problems in 3D
reconstruction.
[LINK]
http://arxiv.org/abs/2403.02598v3
[DATE]
2024-11-16 06:28:17+08:00
[CATEGORIES]
cs.LG
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
[AUTHORS]
Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan Celine Lin
[ABSTRACT]
Motivated by the transformative capabilities of large language models (LLMs)
across various natural language tasks, there has been a growing demand to
deploy these models effectively across diverse real-world applications and
platforms. However, the challenge of efficiently deploying LLMs has become
increasingly pronounced due to the varying application-specific performance
requirements and the rapid evolution of computational platforms, which feature
diverse resource constraints and deployment flows. These varying requirements
necessitate LLMs that can adapt their structures (depth and width) for optimal
efficiency across different platforms and application specifications. To
address this critical gap, we propose AmoebaLLM, a novel framework designed to
enable the instant derivation of LLM subnets of arbitrary shapes, which achieve
the accuracy-efficiency frontier and can be extracted immediately after a
one-time fine-tuning. In this way, AmoebaLLM significantly facilitates rapid
deployment tailored to various platforms and applications. Specifically,
AmoebaLLM integrates three innovative components: (1) a knowledge-preserving
subnet selection strategy that features a dynamic-programming approach for
depth shrinking and an importance-driven method for width shrinking; (2) a
shape-aware mixture of LoRAs to mitigate gradient conflicts among subnets
during fine-tuning; and (3) an in-place distillation scheme with loss-magnitude
balancing as the fine-tuning objective. Extensive experiments validate that
AmoebaLLM not only sets new standards in LLM adaptability but also successfully
delivers subnets that achieve state-of-the-art trade-offs between accuracy and
efficiency.
[COMMENTS]
Accepted at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10606v1
[DATE]
2024-11-16 06:02:28+08:00
[CATEGORIES]
cs.LG
Learning Quantitative Automata Modulo Theories
[AUTHORS]
Eric Hsiung, Swarat Chaudhuri, Joydeep Biswas
[ABSTRACT]
Quantitative automata are useful representations for numerous applications,
including modeling probability distributions over sequences to Markov chains
and reward machines. Actively learning such automata typically occurs using
explicitly gathered input-output examples under adaptations of the L-star
algorithm. However, obtaining explicit input-output pairs can be expensive, and
there exist scenarios, including preference-based learning or learning from
rankings, where providing constraints is a less exerting and a more natural way
to concisely describe desired properties. Consequently, we propose the problem
of learning deterministic quantitative automata from sets of constraints over
the valuations of input sequences. We present QUINTIC, an active learning
algorithm, wherein the learner infers a valid automaton through deductive
reasoning, by applying a theory to a set of currently available constraints and
an assumed preference model and quantitative automaton class. QUINTIC performs
a complete search over the space of automata, and is guaranteed to be minimal
and correctly terminate. Our evaluations utilize theory of rationals in order
to learn summation, discounted summation, product, and classification
quantitative automata, and indicate QUINTIC is effective at learning these
types of automata.
[COMMENTS]
30 pages, 13 figures, 1 table
[LINK]
http://arxiv.org/abs/2411.10601v1
[DATE]
2024-11-16 05:51:14+08:00
[CATEGORIES]
cs.LG
FedAli: Personalized Federated Learning with Aligned Prototypes through Optimal Transport
[AUTHORS]
Sannara Ek, Kaile Wang, François Portet, Philippe Lalanda, Jiannong Cao
[ABSTRACT]
Federated Learning (FL) enables collaborative, personalized model training
across multiple devices without sharing raw data, making it ideal for pervasive
computing applications that optimize user-centric performances in diverse
environments. However, data heterogeneity among clients poses a significant
challenge, leading to inconsistencies among trained client models and reduced
performance. To address this, we introduce the Alignment with Prototypes (ALP)
layers, which align incoming embeddings closer to learnable prototypes through
an optimal transport plan. During local training, the ALP layer updates local
prototypes and aligns embeddings toward global prototypes aggregated from all
clients using our novel FL framework, Federated Alignment (FedAli). For model
inferences, embeddings are guided toward local prototypes to better reflect the
client’s local data distribution. We evaluate FedAli on heterogeneous
sensor-based human activity recognition and vision benchmark datasets,
demonstrating that it outperforms existing FL strategies. We publicly release
our source code to facilitate reproducibility and furthered research.
[COMMENTS]
Pre-print version 1
[LINK]
http://arxiv.org/abs/2411.10595v1
[DATE]
2024-11-16 05:35:21+08:00
[CATEGORIES]
cs.LG
Reducing Reasoning Costs – The Path of Optimization for Chain of Thought via Sparse Attention Mechanism
[AUTHORS]
Libo Wang
[ABSTRACT]
In order to address the chain of thought in the large language model
inference cost surge, this research proposes to use a sparse attention
mechanism that only focuses on a few relevant tokens. The researcher
constructed a new attention mechanism and used GiantRabbit trained with custom
GPTs as an experimental tool. The experiment tested and compared the reasoning
time, correctness score and chain of thought length of this model and o1
Preview in solving the linear algebra test questions of MIT OpenCourseWare. The
results show that GiantRabbit’s reasoning time and chain of thought length are
significantly lower than o1 Preview, confirming the feasibility of the sparse
attention mechanism in reducing chain of thought reasoning. Detailed
architectural details and experimental process have been uploaded to Github,
the link is:https://github.com/brucewang123456789/GeniusTrail.git.
[COMMENTS]
The main text is 9 pages, totaling 13 pages; 5 figures, 3 tables;
preprints have been submitted to NeurIPS 2024 Workshop MusIML and OpenReview
[LINK]
http://arxiv.org/abs/2411.09111v2
[DATE]
2024-11-16 05:28:27+08:00
[CATEGORIES]
cs.LG
Diffusion Model with Perceptual Loss
[AUTHORS]
Shanchuan Lin, Xiao Yang
[ABSTRACT]
Diffusion models without guidance tend to generate unrealistic samples, yet
the cause of this problem is not fully studied. Our analysis suggests that the
loss objective plays an important role in shaping the learned distribution and
the common mean squared error loss is not optimal. We hypothesize that a better
loss objective can be designed with inductive biases and propose a novel
self-perceptual loss that utilizes the diffusion model itself as the perceptual
loss. Our work demonstrates that perceptual loss can be used in diffusion
training to improve sample quality effectively. Models trained using our
objective can generate realistic samples without guidance. We hope our work
paves the way for more future explorations of the diffusion loss objective.
[LINK]
http://arxiv.org/abs/2401.00110v6
[DATE]
2024-11-16 05:05:31+08:00
[CATEGORIES]
cs.LG
Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding
[AUTHORS]
Zhaolin Ren, Tongzheng Ren, Haitong Ma, Na Li, Bo Dai
[ABSTRACT]
This paper presents an approach, Spectral Dynamics Embedding Control (SDEC),
to optimal control for nonlinear stochastic systems. This method leverages an
infinite-dimensional feature to linearly represent the state-action value
function and exploits finite-dimensional truncation approximation for practical
implementation. To characterize the effectiveness of these finite dimensional
approximations, we provide an in-depth theoretical analysis to characterize the
approximation error induced by the finite-dimension truncation and statistical
error induced by finite-sample approximation in both policy evaluation and
policy optimization. Our analysis includes two prominent kernel approximation
methods: truncations onto random features and Nystrom features. We also
empirically test the algorithm and compare the performance with Koopman-based,
iLQR, and energy-based methods on a few benchmark problems.
[COMMENTS]
Updated authorship list
[LINK]
http://arxiv.org/abs/2304.03907v4
[DATE]
2024-11-16 04:51:08+08:00
[CATEGORIES]
cs.LG
Normative Modeling for AD Diagnosis and Biomarker Identification
[AUTHORS]
Songlin Zhao, Rong Zhou, Yu Zhang, Yong Chen, Lifang He
[ABSTRACT]
In this paper, we introduce a novel normative modeling approach that
incorporates focal loss and adversarial autoencoders (FAAE) for Alzheimer’s
Disease (AD) diagnosis and biomarker identification. Our method is an
end-to-end approach that embeds an adversarial focal loss discriminator within
the autoencoder structure, specifically designed to effectively target and
capture more complex and challenging cases. We first use the enhanced
autoencoder to create a normative model based on data from healthy control (HC)
individuals. We then apply this model to estimate total and regional
neuroanatomical deviation in AD patients. Through extensive experiments on the
OASIS-3 and ADNI datasets, our approach significantly outperforms previous
state-of-the-art methods. This advancement not only streamlines the detection
process but also provides a greater insight into the biomarker potential for
AD. Our code can be found at \url{https://github.com/soz223/FAAE}.
[COMMENTS]
10 pages, 3 figures
[LINK]
http://arxiv.org/abs/2411.10570v1
[DATE]
2024-11-16 04:45:16+08:00
[CATEGORIES]
cs.LG
Breaking the $T^{2/3}$ Barrier for Sequential Calibration
[AUTHORS]
Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich, Robert Kleinberg, Princewill Okoroafor
[ABSTRACT]
A set of probabilistic forecasts is calibrated if each prediction of the
forecaster closely approximates the empirical distribution of outcomes on the
subset of timesteps where that prediction was made. We study the fundamental
problem of online calibrated forecasting of binary sequences, which was
initially studied by Foster & Vohra (1998). They derived an algorithm with
$O(T^{2/3})$ calibration error after $T$ time steps, and showed a lower bound
of $\Omega(T^{1/2})$. These bounds remained stagnant for two decades, until
Qiao & Valiant (2021) improved the lower bound to $\Omega(T^{0.528})$ by
introducing a combinatorial game called sign preservation and showing that
lower bounds for this game imply lower bounds for calibration.
In this paper, we give the first improvement to the $O(T^{2/3})$ upper bound
on calibration error of Foster & Vohra. We do this by introducing a variant of
Qiao & Valiant’s game that we call sign preservation with reuse (SPR). We prove
that the relationship between SPR and calibrated forecasting is bidirectional:
not only do lower bounds for SPR translate into lower bounds for calibration,
but algorithms for SPR also translate into new algorithms for calibrated
forecasting. We then give an improved \emph{upper bound} for the SPR game,
which implies, via our equivalence, a forecasting algorithm with calibration
error $O(T^{2/3 - \varepsilon})$ for some $\varepsilon > 0$, improving Foster &
Vohra’s upper bound for the first time. Using similar ideas, we then prove a
slightly stronger lower bound than that of Qiao & Valiant, namely
$\Omega(T^{0.54389})$. Our lower bound is obtained by an oblivious adversary,
marking the first $\omega(T^{1/2})$ calibration lower bound for oblivious
adversaries.
[LINK]
http://arxiv.org/abs/2406.13668v3
[DATE]
2024-11-16 04:32:38+08:00
[CATEGORIES]
cs.LG
Keep it Tighter – A Story on Analytical Mean Embeddings
[AUTHORS]
Linda Chamakh, Zoltan Szabo
[ABSTRACT]
Kernel techniques are among the most popular and flexible approaches in data
science allowing to represent probability measures without loss of information
under mild conditions. The resulting mapping called mean embedding gives rise
to a divergence measure referred to as maximum mean discrepancy (MMD) with
existing quadratic-time estimators (w.r.t. the sample size) and known
convergence properties for bounded kernels. In this paper we focus on the
problem of MMD estimation when the mean embedding of one of the underlying
distributions is available analytically. Particularly, we consider
distributions on the real line (motivated by financial applications) and prove
tighter concentration for the proposed estimator under this semi-explicit
setting; we also extend the result to the case of unbounded (exponential)
kernel with minimax-optimal lower bounds. We demonstrate the efficiency of our
approach beyond synthetic example in three real-world examples relying on
one-dimensional random variables: index replication and calibration on
loss-given-default ratios and on S&P 500 data.
[LINK]
http://arxiv.org/abs/2110.09516v2
[DATE]
2024-11-16 04:15:30+08:00
[CATEGORIES]
cs.LG
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models
[AUTHORS]
Jingyi Chen, Ju-Seung Byun, Micha Elsner, Andrew Perrault
[ABSTRACT]
Recent advancements in generative models have sparked a significant interest
within the machine learning community. Particularly, diffusion models have
demonstrated remarkable capabilities in synthesizing images and speech. Studies
such as those by Lee et al. (2023), Black et al. (2023), Wang et al. (2023),
and Fan et al. (2024) illustrate that Reinforcement Learning with Human
Feedback (RLHF) can enhance diffusion models for image synthesis. However, due
to architectural differences between these models and those employed in speech
synthesis, it remains uncertain whether RLHF could similarly benefit speech
synthesis models. In this paper, we explore the practical application of RLHF
to diffusion-based text-to-speech synthesis, leveraging the mean opinion score
(MOS) as predicted by UTokyo-SaruLab MOS prediction system (Saeki et al., 2022)
as a proxy loss. We introduce diffusion model loss-guided RL policy
optimization (DLPO) and compare it against other RLHF approaches, employing the
NISQA speech quality and naturalness assessment model (Mittag et al., 2021) and
human preference experiments for further evaluation. Our results show that RLHF
can enhance diffusion-based text-to-speech synthesis models, and, moreover,
DLPO can better improve diffusion models in generating natural and high quality
speech audios.
[LINK]
http://arxiv.org/abs/2405.14632v2
[DATE]
2024-11-16 04:10:29+08:00
[CATEGORIES]
cs.LG
Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling
[AUTHORS]
Peter Halmos, Xinhao Liu, Julian Gold, Benjamin J Raphael
[ABSTRACT]
Optimal transport (OT) is a general framework for finding a minimum-cost
transport plan, or coupling, between probability distributions, and has many
applications in machine learning. A key challenge in applying OT to massive
datasets is the quadratic scaling of the coupling matrix with the size of the
dataset. [Forrow et al. 2019] introduced a factored coupling for the
k-Wasserstein barycenter problem, which [Scetbon et al. 2021] adapted to solve
the primal low-rank OT problem. We derive an alternative parameterization of
the low-rank problem based on the $\textit{latent coupling}$ (LC) factorization
previously introduced by [Lin et al. 2021] generalizing [Forrow et al. 2019].
The LC factorization has multiple advantages for low-rank OT including
decoupling the problem into three OT problems and greater flexibility and
interpretability. We leverage these advantages to derive a new algorithm
$\textit{Factor Relaxation with Latent Coupling}$ (FRLC), which uses
$\textit{coordinate}$ mirror descent to compute the LC factorization. FRLC
handles multiple OT objectives (Wasserstein, Gromov-Wasserstein, Fused
Gromov-Wasserstein), and marginal constraints (balanced, unbalanced, and
semi-relaxed) with linear space complexity. We provide theoretical results on
FRLC, and demonstrate superior performance on diverse applications – including
graph clustering and spatial transcriptomics – while demonstrating its
interpretability.
[COMMENTS]
53 pages, 13 figures, NeurIPS 2024. Comments welcome!
[LINK]
http://arxiv.org/abs/2411.10555v1
[DATE]
2024-11-16 04:07:15+08:00
[CATEGORIES]
cs.LG
Debias your Large Multi-Modal Model at Test-Time with Non-Contrastive Visual Attribute Steering
[AUTHORS]
Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard
[ABSTRACT]
Large Multi-Modal Models (LMMs) have demonstrated impressive capabilities as
general-purpose chatbots that can engage in conversations about a provided
input, such as an image. However, their responses are influenced by societal
biases present in their training datasets, leading to undesirable differences
in how the model responds when presented with images depicting people of
different demographics. In this work, we propose a novel debiasing framework
for LMMs that directly removes biased representations during text generation to
decrease outputs related to protected attributes, or even representing them
internally. Our proposed method is training-free; given a single image and a
list of target attributes, we can ablate the corresponding representations with
just one step of gradient descent on the image itself. Our experiments show
that not only can we can minimize the propensity of LMMs to generate text
related to protected attributes, but we can improve sentiment and even simply
use synthetic data to inform the ablation while retaining language modeling
capabilities on real data such as COCO or FACET. Furthermore, we find the
resulting generations from a debiased LMM exhibit similar accuracy as a
baseline biased model, showing that debiasing effects can be achieved without
sacrificing model performance.
[COMMENTS]
10 pages, 3 Figures, 3 Tables. arXiv admin note: text overlap with
arXiv:2410.13976
[LINK]
http://arxiv.org/abs/2411.12590v1
[DATE]
2024-11-16 04:06:09+08:00
[CATEGORIES]
cs.LG
ULTra: Unveiling Latent Token Interpretability in Transformer Based Understanding
[AUTHORS]
Hesam Hosseini, Ghazal Hosseini Mighan, Amirabbas Afzali, Sajjad Amini, Amir Houmansadr
[ABSTRACT]
Transformers have revolutionized Computer Vision (CV) and Natural Language
Processing (NLP) through self-attention mechanisms. However, due to their
complexity, their latent token representations are often difficult to
interpret. We introduce a novel framework that interprets Transformer
embeddings, uncovering meaningful semantic patterns within them. Based on this
framework, we demonstrate that zero-shot unsupervised semantic segmentation can
be performed effectively without any fine-tuning using a model pre-trained for
tasks other than segmentation. Our method reveals the inherent capacity of
Transformer models for understanding input semantics and achieves
state-of-the-art performance in semantic segmentation, outperforming
traditional segmentation models. Specifically, our approach achieves an
accuracy of 67.2 % and an mIoU of 32.9 % on the COCO-Stuff dataset, as well as
an mIoU of 51.9 % on the PASCAL VOC dataset. Additionally, we validate our
interpretability framework on LLMs for text summarization, demonstrating its
broad applicability and robustness.
[LINK]
http://arxiv.org/abs/2411.12589v1
[DATE]
2024-11-16 03:36:50+08:00
[CATEGORIES]
cs.LG
Sm: enhanced localization in Multiple Instance Learning for medical imaging classification
[AUTHORS]
Francisco M. Castro-Macías, Pablo Morales-Álvarez, Yunan Wu, Rafael Molina, Aggelos K. Katsaggelos
[COMMENTS]
24 pages, 14 figures, 2024 Conference on Neural Information
Processing Systems (NeurIPS 2024)
[LINK]
http://arxiv.org/abs/2410.03276v3
[DATE]
2024-11-16 03:24:41+08:00
[CATEGORIES]
cs.LG
An undetectable watermark for generative image models
[AUTHORS]
Sam Gunn, Xuandong Zhao, Dawn Song
[ABSTRACT]
We present the first undetectable watermarking scheme for generative image
models. Undetectability ensures that no efficient adversary can distinguish
between watermarked and un-watermarked images, even after making many adaptive
queries. In particular, an undetectable watermark does not degrade image
quality under any efficiently computable metric. Our scheme works by selecting
the initial latents of a diffusion model using a pseudorandom error-correcting
code (Christ and Gunn, 2024), a strategy which guarantees undetectability and
robustness. We experimentally demonstrate that our watermarks are
quality-preserving and robust using Stable Diffusion 2.1. Our experiments
verify that, in contrast to every prior scheme we tested, our watermark does
not degrade image quality. Our experiments also demonstrate robustness:
existing watermark removal attacks fail to remove our watermark from images
without significantly degrading the quality of the images. Finally, we find
that we can robustly encode 512 bits in our watermark, and up to 2500 bits when
the images are not subjected to watermark removal attacks. Our code is
available at https://github.com/XuandongZhao/PRC-Watermark.
[LINK]
http://arxiv.org/abs/2410.07369v2
[DATE]
2024-11-16 03:20:37+08:00
[CATEGORIES]
cs.LG
MARS: Unleashing the Power of Variance Reduction for Training Large Models
[AUTHORS]
Huizhuo Yuan, Yifeng Liu, Shuang Wu, Xun Zhou, Quanquan Gu
[ABSTRACT]
Training deep neural networks–and more recently, large models–demands
efficient and scalable optimizers. Adaptive gradient algorithms like Adam,
AdamW, and their variants have been central to this task. Despite the
development of numerous variance reduction algorithms in the past decade aimed
at accelerating stochastic optimization in both convex and nonconvex settings,
variance reduction has not found widespread success in training deep neural
networks or large language models. Consequently, it has remained a less favored
approach in modern AI. In this paper, to unleash the power of variance
reduction for efficient training of large models, we propose a unified
optimization framework, MARS (Make vAriance Reduction Shine), which reconciles
preconditioned gradient methods with variance reduction via a scaled stochastic
recursive momentum technique. Within our framework, we introduce three
instances of MARS that leverage preconditioned gradient updates based on AdamW,
Lion, and Shampoo, respectively. We also draw a connection between our
algorithms and existing optimizers. Experimental results on training GPT-2
models indicate that MARS consistently outperforms AdamW by a large margin.
[COMMENTS]
23 pages, 7 figures, 6 tables
[LINK]
http://arxiv.org/abs/2411.10438v1
[DATE]
2024-11-16 02:57:39+08:00
[CATEGORIES]
cs.LG
Learning Diffusion Priors from Observations by Expectation Maximization
[AUTHORS]
François Rozet, Gérôme Andry, François Lanusse, Gilles Louppe
[ABSTRACT]
Diffusion models recently proved to be remarkable priors for Bayesian inverse
problems. However, training these models typically requires access to large
amounts of clean data, which could prove difficult in some settings. In this
work, we present a novel method based on the expectation-maximization algorithm
for training diffusion models from incomplete and noisy observations only.
Unlike previous works, our method leads to proper diffusion models, which is
crucial for downstream tasks. As part of our method, we propose and motivate an
improved posterior sampling scheme for unconditional diffusion models. We
present empirical evidence supporting the effectiveness of our method.
[LINK]
http://arxiv.org/abs/2405.13712v4
[DATE]
2024-11-16 02:57:14+08:00
[CATEGORIES]
cs.LG
The Spatial Complexity of Optical Computing and How to Reduce It
[AUTHORS]
Yandong Li, Francesco Monticone
[ABSTRACT]
Similar to algorithms, which consume time and memory to run, hardware
requires resources to function. For devices processing physical waves,
implementing operations needs sufficient “space,” as dictated by wave physics.
How much space is needed to perform a certain function is a fundamental
question in optics, with recent research addressing it for given mathematical
operations, but not for more general computing tasks, e.g., classification.
Inspired by computational complexity theory, we study the “spatial complexity”
of optical computing systems in terms of scaling laws - specifically, how their
physical dimensions must scale as the dimension of the mathematical operation
increases - and propose a new paradigm for designing optical computing systems:
space-efficient neuromorphic optics, based on structural sparsity constraints
and neural pruning methods motivated by wave physics (notably, the concept of
“overlapping nonlocality”). On two mainstream platforms, free-space optics and
on-chip integrated photonics, our methods demonstrate substantial size
reductions (to 1%-10% the size of conventional designs) with minimal compromise
on performance. Our theoretical and computational results reveal a trend of
diminishing returns on accuracy as structure dimensions increase, providing a
new perspective for interpreting and approaching the ultimate limits of optical
computing - a balanced trade-off between device size and accuracy.
[LINK]
http://arxiv.org/abs/2411.10435v1
[DATE]
2024-11-16 02:56:00+08:00
[CATEGORIES]
cs.LG
Private Counterfactual Retrieval With Immutable Features
[AUTHORS]
Shreya Meel, Pasan Dissanayake, Mohamed Nomeir, Sanghamitra Dutta, Sennur Ulukus
[ABSTRACT]
In a classification task, counterfactual explanations provide the minimum
change needed for an input to be classified into a favorable class. We consider
the problem of privately retrieving the exact closest counterfactual from a
database of accepted samples while enforcing that certain features of the input
sample cannot be changed, i.e., they are \emph{immutable}. An applicant (user)
whose feature vector is rejected by a machine learning model wants to retrieve
the sample closest to them in the database without altering a private subset of
their features, which constitutes the immutable set. While doing this, the user
should keep their feature vector, immutable set and the resulting
counterfactual index information-theoretically private from the institution. We
refer to this as immutable private counterfactual retrieval (I-PCR) problem
which generalizes PCR to a more practical setting. In this paper, we propose
two I-PCR schemes by leveraging techniques from private information retrieval
(PIR) and characterize their communication costs. Further, we quantify the
information that the user learns about the database and compare it for the
proposed schemes.
[LINK]
http://arxiv.org/abs/2411.10429v1
[DATE]
2024-11-16 02:50:53+08:00
[CATEGORIES]
cs.LG
Back to Supervision: Boosting Word Boundary Detection through Frame Classification
[AUTHORS]
Simone Carnemolla, Salvatore Calcagno, Simone Palazzo, Daniela Giordano
[ABSTRACT]
Speech segmentation at both word and phoneme levels is crucial for various
speech processing tasks. It significantly aids in extracting meaningful units
from an utterance, thus enabling the generation of discrete elements. In this
work we propose a model-agnostic framework to perform word boundary detection
in a supervised manner also employing a labels augmentation technique and an
output-frame selection strategy. We trained and tested on the Buckeye dataset
and only tested on TIMIT one, using state-of-the-art encoder models, including
pre-trained solutions (Wav2Vec 2.0 and HuBERT), as well as convolutional and
convolutional recurrent networks. Our method, with the HuBERT encoder,
surpasses the performance of other state-of-the-art architectures, whether
trained in supervised or self-supervised settings on the same datasets.
Specifically, we achieved F-values of 0.8427 on the Buckeye dataset and 0.7436
on the TIMIT dataset, along with R-values of 0.8489 and 0.7807, respectively.
These results establish a new state-of-the-art for both datasets. Beyond the
immediate task, our approach offers a robust and efficient preprocessing method
for future research in audio tokenization.
[LINK]
http://arxiv.org/abs/2411.10423v1
[DATE]
2024-11-16 02:43:29+08:00
[CATEGORIES]
cs.LG
Multiscale Dubuc: A New Similarity Measure for Time Series
[AUTHORS]
Mahsa Khazaei, Azim Ahmadzadeh, Krishna Rukmini Puthucode
[ABSTRACT]
Quantifying similarities between time series in a meaningful way remains a
challenge in time series analysis, despite many advances in the field. Most
real-world solutions still rely on a few popular measures, such as Euclidean
Distance (EuD), Longest Common Subsequence (LCSS), and Dynamic Time Warping
(DTW). The strengths and weaknesses of these measures have been studied
extensively, and incremental improvements have been proposed. In this study,
however, we present a different similarity measure that fuses the notion of
Dubuc’s variation from fractal analysis with the Intersection-over-Union (IoU)
measure which is widely used in object recognition (also known as the Jaccard
Index). In this proof-of-concept paper, we introduce the Multiscale Dubuc
Distance (MDD) measure and prove that it is a metric, possessing desirable
properties such as the triangle inequality. We use 95 datasets from the UCR
Time Series Classification Archive to compare MDD’s performance with EuD, LCSS,
and DTW. Our experiments show that MDD’s overall success, without any
case-specific customization, is comparable to DTW with optimized window sizes
per dataset. We also highlight several datasets where MDD’s performance
improves significantly when its single parameter is customized. This
customization serves as a powerful tool for gauging MDD’s sensitivity to noise.
Lastly, we show that MDD’s running time is linear in the length of the time
series, which is crucial for real-world applications involving very large
datasets.
[COMMENTS]
6 pages, 3 figures, IEEE Big Data 2024
[LINK]
http://arxiv.org/abs/2411.10418v1
[DATE]
2024-11-16 02:38:18+08:00
[CATEGORIES]
cs.LG
Demo: Multi-Modal Seizure Prediction System
[AUTHORS]
Ali Saeizadeh, Pietro Brach del Prever, Douglas Schonholtz, Raffaele Guida, Emrecan Demirors, Jorge M. Jimenez, Pedram Johari, Tommaso Melodia
[ABSTRACT]
This demo presents SeizNet, an innovative system for predicting epileptic
seizures benefiting from a multi-modal sensor network and utilizing Deep
Learning (DL) techniques. Epilepsy affects approximately 65 million people
worldwide, many of whom experience drug-resistant seizures. SeizNet aims at
providing highly accurate alerts, allowing individuals to take preventive
measures without being disturbed by false alarms. SeizNet uses a combination of
data collected through either invasive (intracranial electroencephalogram
(iEEG)) or non-invasive (electroencephalogram (EEG) and electrocardiogram
(ECG)) sensors, and processed by advanced DL algorithms that are optimized for
real-time inference at the edge, ensuring privacy and minimizing data
transmission. SeizNet achieves > 97% accuracy in seizure prediction while
keeping the size and energy restrictions of an implantable device.
[COMMENTS]
1 page, 1 figure, Proceedings of the IEEE 20th International
Conference on Body Sensor Networks (BSN), October 2024
[LINK]
http://arxiv.org/abs/2411.05817v2
[DATE]
2024-11-16 02:36:30+08:00
[CATEGORIES]
cs.LG
Coniferest: a complete active anomaly detection framework
[AUTHORS]
M. V. Kornilov, V. S. Korolev, K. L. Malanchev, A. D. Lavrukhina, E. Russeil, T. A. Semenikhin, E. Gangler, E. E. O. Ishida, M. V. Pruzhinskaya, A. A. Volnova, S. Sreejith
[ABSTRACT]
We present coniferest, an open source generic purpose active anomaly
detection framework written in Python. The package design and implemented
algorithms are described. Currently, static outlier detection analysis is
supported via the Isolation forest algorithm. Moreover, Active Anomaly
Discovery (AAD) and Pineforest algorithms are available to tackle active
anomaly detection problems. The algorithms and package performance are
evaluated on a series of synthetic datasets. We also describe a few success
cases which resulted from applying the package to real astronomical data in
active anomaly detection tasks within the SNAD project.
[COMMENTS]
13 pages, 1 figure
[LINK]
http://arxiv.org/abs/2410.17142v2
[DATE]
2024-11-16 02:02:00+08:00
[CATEGORIES]
cs.LG
Recurrent Neural Goodness-of-Fit Test for Time Series
[AUTHORS]
Aoran Zhang, Wenbin Zhou, Liyan Xie, Shixiang Zhu
[ABSTRACT]
Time series data are crucial across diverse domains such as finance and
healthcare, where accurate forecasting and decision-making rely on advanced
modeling techniques. While generative models have shown great promise in
capturing the intricate dynamics inherent in time series, evaluating their
performance remains a major challenge. Traditional evaluation metrics fall
short due to the temporal dependencies and potential high dimensionality of the
features. In this paper, we propose the REcurrent NeurAL (RENAL)
Goodness-of-Fit test, a novel and statistically rigorous framework for
evaluating generative time series models. By leveraging recurrent neural
networks, we transform the time series into conditionally independent data
pairs, enabling the application of a chi-square-based goodness-of-fit test to
the temporal dependencies within the data. This approach offers a robust,
theoretically grounded solution for assessing the quality of generative models,
particularly in settings with limited time sequences. We demonstrate the
efficacy of our method across both synthetic and real-world datasets,
outperforming existing methods in terms of reliability and accuracy. Our method
fills a critical gap in the evaluation of time series generative models,
offering a tool that is both practical and adaptable to high-stakes
applications.
[COMMENTS]
27 pages, 4 figures
[LINK]
http://arxiv.org/abs/2410.13986v3
[DATE]
2024-11-16 01:58:35+08:00
[CATEGORIES]
cs.LG
Deep Learning for Micro-Scale Crack Detection on Imbalanced Datasets Using Key Point Localization
[AUTHORS]
Fatahlla Moreh, Yusuf Hasan, Bilal Zahid Hussain, Mohammad Ammar, Sven Tomforde
[ABSTRACT]
Internal crack detection has been a subject of focus in structural health
monitoring. By focusing on crack detection in structural datasets, it is
demonstrated that deep learning (DL) methods can effectively analyze seismic
wave fields interacting with micro-scale cracks, which are beyond the
resolution of conventional visual inspection. This work explores a novel
application of DL-based key point detection technique, where cracks are
localized by predicting the coordinates of four key points that define a
bounding region of the crack. The study not only opens new research directions
for non-visual applications but also effectively mitigates the impact of
imbalanced data which poses a challenge for previous DL models, as it can be
biased toward predicting the majority class (non-crack regions). Popular DL
techniques, such as the Inception blocks, are used and investigated. The model
shows an overall reduction in loss when applied to micro-scale crack detection
and is reflected in the lower average deviation between the location of actual
and predicted cracks, with an average Intersection over Union (IoU) being 0.511
for all micro cracks (greater than 0.00 micrometers) and 0.631 for larger micro
cracks (greater than 4 micrometers).
[LINK]
http://arxiv.org/abs/2411.10389v1
[DATE]
2024-11-16 01:50:46+08:00
[CATEGORIES]
cs.LG
Low-Latency Task-Oriented Communications with Multi-Round, Multi-Task Deep Learning
[AUTHORS]
Yalin E. Sagduyu, Tugba Erpek, Aylin Yener, Sennur Ulukus
[ABSTRACT]
In this paper, we address task-oriented (or goal-oriented) communications
where an encoder at the transmitter learns compressed latent representations of
data, which are then transmitted over a wireless channel. At the receiver, a
decoder performs a machine learning task, specifically for classifying the
received signals. The deep neural networks corresponding to the encoder-decoder
pair are jointly trained, taking both channel and data characteristics into
account. Our objective is to achieve high accuracy in completing the underlying
task while minimizing the number of channel uses determined by the encoder’s
output size. To this end, we propose a multi-round, multi-task learning (MRMTL)
approach for the dynamic update of channel uses in multi-round transmissions.
The transmitter incrementally sends an increasing number of encoded samples
over the channel based on the feedback from the receiver, and the receiver
utilizes the signals from a previous round to enhance the task performance,
rather than only considering the latest transmission. This approach employs
multi-task learning to jointly optimize accuracy across varying number of
channel uses, treating each configuration as a distinct task. By evaluating the
confidence of the receiver in task decisions, MRMTL decides on whether to
allocate additional channel uses in multiple rounds. We characterize both the
accuracy and the delay (total number of channel uses) of MRMTL, demonstrating
that it achieves the accuracy close to that of conventional methods requiring
large numbers of channel uses, but with reduced delay by incorporating signals
from a prior round. We consider the CIFAR-10 dataset, convolutional neural
network architectures, and AWGN and Rayleigh channel models for performance
evaluation. We show that MRMTL significantly improves the efficiency of
task-oriented communications, balancing accuracy and latency effectively.
[LINK]
http://arxiv.org/abs/2411.10385v1
[DATE]
2024-11-16 01:48:06+08:00
[CATEGORIES]
cs.LG
Continual Adversarial Reinforcement Learning (CARL) of False Data Injection detection: forgetting and explainability
[AUTHORS]
Pooja Aslami, Kejun Chen, Timothy M. Hansen, Malik Hassanaly
[ABSTRACT]
False data injection attacks (FDIAs) on smart inverters are a growing concern
linked to increased renewable energy production. While data-based FDIA
detection methods are also actively developed, we show that they remain
vulnerable to impactful and stealthy adversarial examples that can be crafted
using Reinforcement Learning (RL). We propose to include such adversarial
examples in data-based detection training procedure via a continual adversarial
RL (CARL) approach. This way, one can pinpoint the deficiencies of data-based
detection, thereby offering explainability during their incremental
improvement. We show that a continual learning implementation is subject to
catastrophic forgetting, and additionally show that forgetting can be addressed
by employing a joint training strategy on all generated FDIA scenarios.
[LINK]
http://arxiv.org/abs/2411.10367v1
[DATE]
2024-11-16 01:17:06+08:00
[CATEGORIES]
cs.LG
Weakly-Supervised Multimodal Learning on MIMIC-CXR
[AUTHORS]
Andrea Agostini, Daphné Chopard, Yang Meng, Norbert Fortin, Babak Shahbaba, Stephan Mandt, Thomas M. Sutter, Julia E. Vogt
[ABSTRACT]
Multimodal data integration and label scarcity pose significant challenges
for machine learning in medical settings. To address these issues, we conduct
an in-depth evaluation of the newly proposed Multimodal Variational
Mixture-of-Experts (MMVM) VAE on the challenging MIMIC-CXR dataset. Our
analysis demonstrates that the MMVM VAE consistently outperforms other
multimodal VAEs and fully supervised approaches, highlighting its strong
potential for real-world medical applications.
[COMMENTS]
Findings paper presented at Machine Learning for Health (ML4H)
symposium 2024, December 15-16, 2024, Vancouver, Canada, 13 pages. arXiv
admin note: text overlap with arXiv:2403.05300
[LINK]
http://arxiv.org/abs/2411.10356v1
[DATE]
2024-11-16 01:05:33+08:00
[CATEGORIES]
cs.LG
Training Deep 3D Convolutional Neural Networks to Extract BSM Physics Parameters Directly from HEP Data: a Proof-of-Concept Study Using Monte Carlo Simulations
[AUTHORS]
S. Dubey, T. E. Browder, S. Kohani, R. Mandal, A. Sibidanov, R. Sinha
[ABSTRACT]
We report on a novel application of computer vision techniques to extract
beyond the Standard Model parameters directly from high energy physics flavor
data. We propose a simple but novel data representation that transforms the
angular and kinematic distributions into “quasi-images”, which are used to
train a convolutional neural network to perform regression tasks, similar to
fitting. As a proof-of-concept, we train a 34-layer Residual Neural Network to
regress on these images and determine information about the Wilson Coefficient
$C_{9}$ in Monte Carlo simulations of $B^0 \rightarrow K^{*0}\mu^{+}\mu^{-}$
decays. The method described here can be generalized and may find applicability
across a variety of experiments.
[LINK]
http://arxiv.org/abs/2311.13060v3
[DATE]
2024-11-16 00:55:47+08:00
[CATEGORIES]
cs.LG
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
[AUTHORS]
Joseph Liu, Joshua Geddes, Ziyu Guo, Haomiao Jiang, Mahesh Kumar Nandwana
[ABSTRACT]
Diffusion Transformers (DiT) have emerged as powerful generative models for
various tasks, including image, video, and speech synthesis. However, their
inference process remains computationally expensive due to the repeated
evaluation of resource-intensive attention and feed-forward modules. To address
this, we introduce SmoothCache, a model-agnostic inference acceleration
technique for DiT architectures. SmoothCache leverages the observed high
similarity between layer outputs across adjacent diffusion timesteps. By
analyzing layer-wise representation errors from a small calibration set,
SmoothCache adaptively caches and reuses key features during inference. Our
experiments demonstrate that SmoothCache achieves 8% to 71% speed up while
maintaining or even improving generation quality across diverse modalities. We
showcase its effectiveness on DiT-XL for image generation, Open-Sora for
text-to-video, and Stable Audio Open for text-to-audio, highlighting its
potential to enable real-time applications and broaden the accessibility of
powerful DiT models.
[COMMENTS]
Code can be found at https://github.com/Roblox/SmoothCache
[LINK]
http://arxiv.org/abs/2411.10510v1
[DATE]
2024-11-16 00:24:02+08:00
[CATEGORIES]
cs.LG
Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives
[AUTHORS]
Vincent Hanke, Tom Blanchard, Franziska Boenisch, Iyiola Emmanuel Olatunji, Michael Backes, Adam Dziedzic
[ABSTRACT]
While open Large Language Models (LLMs) have made significant progress, they
still fall short of matching the performance of their closed, proprietary
counterparts, making the latter attractive even for the use on highly private
data. Recently, various new methods have been proposed to adapt closed LLMs to
private data without leaking private information to third parties and/or the
LLM provider. In this work, we analyze the privacy protection and performance
of the four most recent methods for private adaptation of closed LLMs. By
examining their threat models and thoroughly comparing their performance under
different privacy levels according to differential privacy (DP), various LLM
architectures, and multiple datasets for classification and generation tasks,
we find that: (1) all the methods leak query data, i.e., the (potentially
sensitive) user data that is queried at inference time, to the LLM provider,
(2) three out of four methods also leak large fractions of private training
data to the LLM provider while the method that protects private data requires a
local open LLM, (3) all the methods exhibit lower performance compared to three
private gradient-based adaptation methods for local open LLMs, and (4) the
private adaptation methods for closed LLMs incur higher monetary training and
query costs than running the alternative methods on local open LLMs. This
yields the conclusion that, to achieve truly privacy-preserving LLM adaptations
that yield high performance and more privacy at lower costs, taking into
account current methods and models, one should use open LLMs.
[COMMENTS]
Accepted at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.05818v2
[DATE]
2024-11-16 00:23:17+08:00
[CATEGORIES]
cs.LG
CE-SSL: Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection
[AUTHORS]
Rushuang Zhou, Lei Clifton, Zijun Liu, Kannie W. Y. Chan, David A. Clifton, Yuan-Ting Zhang, Yining Dong
[ABSTRACT]
The label scarcity problem is the main challenge that hinders the wide
application of deep learning systems in automatic cardiovascular diseases
(CVDs) detection using electrocardiography (ECG). Tuning pre-trained models
alleviates this problem by transferring knowledge learned from large datasets
to downstream small datasets. However, bottlenecks in computational efficiency
and detection performance limit its clinical applications. It is difficult to
improve the detection performance without significantly sacrificing the
computational efficiency during model training. Here, we propose a
computation-efficient semi-supervised learning paradigm (CE-SSL) for robust and
computation-efficient CVDs detection using ECG. It enables a robust adaptation
of pre-trained models on downstream datasets with limited supervision and high
computational efficiency. First, a random-deactivation technique is developed
to achieve robust and fast low-rank adaptation of pre-trained weights.
Subsequently, we propose a one-shot rank allocation module to determine the
optimal ranks for the update matrices of the pre-trained weights. Finally, a
lightweight semi-supervised learning pipeline is introduced to enhance model
performance by leveraging labeled and unlabeled data with high computational
efficiency. Extensive experiments on four downstream datasets demonstrate that
CE-SSL not only outperforms the state-of-the-art methods in multi-label CVDs
detection but also consumes fewer GPU footprints, training time, and parameter
storage space. As such, this paradigm provides an effective solution for
achieving high computational efficiency and robust detection performance in the
clinical applications of pre-trained models under limited supervision. Code and
Supplementary Materials are available at https://github.com/KAZABANA/CE-SSL
[LINK]
http://arxiv.org/abs/2406.14377v2
[DATE]
2024-11-16 00:23:15+08:00
[CATEGORIES]
cs.LG
Label Cluster Chains for Multi-Label Classification
[AUTHORS]
Elaine Cecília Gatto, Felipe Nakano Kenji, Jesse Read, Mauri Ferrandin, Ricardo Cerri, Celine Vens
[ABSTRACT]
Multi-label classification is a type of supervised machine learning that can
simultaneously assign multiple labels to an instance. To solve this task, some
methods divide the original problem into several sub-problems (local approach),
others learn all labels at once (global approach), and others combine several
classifiers (ensemble approach). Regardless of the approach used, exploring and
learning label correlations is important to improve the classifier predictions.
Ensemble of Classifier Chains (ECC) is a well-known multi-label method that
considers label correlations and can achieve good overall performance on
several multi-label datasets and evaluation measures. However, one of the
challenges when working with ECC is the high dimensionality of the label space,
which can impose limitations for fully-cascaded chains as the complexity
increases regarding feature space expansion. To improve classifier chains, we
propose a method to chain disjoint correlated label clusters obtained by
applying a partition method in the label space. During the training phase, the
ground truth labels of each cluster are used as new features for all of the
following clusters. During the test phase, the predicted labels of clusters are
used as new features for all the following clusters. Our proposal, called Label
Cluster Chains for Multi-Label Classification (LCC-ML), uses multi-label Random
Forests as base classifiers in each cluster, combining their predictions to
obtain a final multi-label classification. Our proposal obtained better results
compared to the original ECC. This shows that learning and chaining disjoint
correlated label clusters can better explore and learn label correlations.
[COMMENTS]
The article was submitted prematurely, and after it was published on
arXiv, we identified aspects that require attention, adjustments, and
improvements. We are working to review and significantly improve the content.
Therefore, we request its temporary withdrawal to avoid the dissemination of
information that may be incomplete or incorrectly interpreted
[LINK]
http://arxiv.org/abs/2411.00514v2
[DATE]
2024-11-16 00:15:46+08:00
[CATEGORIES]
cs.LG
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples
[AUTHORS]
Noël Vouitsis, Rasa Hosseinzadeh, Brendan Leigh Ross, Valentin Villecroze, Satya Krishna Gorti, Jesse C. Cresswell, Gabriel Loaiza-Ganem
[ABSTRACT]
Although diffusion models can generate remarkably high-quality samples, they
are intrinsically bottlenecked by their expensive iterative sampling procedure.
Consistency models (CMs) have recently emerged as a promising diffusion model
distillation method, reducing the cost of sampling by generating high-fidelity
samples in just a few iterations. Consistency model distillation aims to solve
the probability flow ordinary differential equation (ODE) defined by an
existing diffusion model. CMs are not directly trained to minimize error
against an ODE solver, rather they use a more computationally tractable
objective. As a way to study how effectively CMs solve the probability flow
ODE, and the effect that any induced error has on the quality of generated
samples, we introduce Direct CMs, which \textit{directly} minimize this error.
Intriguingly, we find that Direct CMs reduce the ODE solving error compared to
CMs but also result in significantly worse sample quality, calling into
question why exactly CMs work well in the first place. Full code is available
at: https://github.com/layer6ai-labs/direct-cms.
[COMMENTS]
NeurIPS 2024 ATTRIB Workshop
[LINK]
http://arxiv.org/abs/2411.08954v2
[DATE]
2024-11-16 00:06:23+08:00
[CATEGORIES]
cs.LG
ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images
[AUTHORS]
Fangqiang Ding, Yunzhou Zhu, Xiangyu Wen, Gaowen Liu, Chris Xiaoxuan Lu
[ABSTRACT]
Designing egocentric 3D hand pose estimation systems that can perform
reliably in complex, real-world scenarios is crucial for downstream
applications. Previous approaches using RGB or NIR imagery struggle in
challenging conditions: RGB methods are susceptible to lighting variations and
obstructions like handwear, while NIR techniques can be disrupted by sunlight
or interference from other NIR-equipped devices. To address these limitations,
we present ThermoHands, the first benchmark focused on thermal image-based
egocentric 3D hand pose estimation, demonstrating the potential of thermal
imaging to achieve robust performance under these conditions. The benchmark
includes a multi-view and multi-spectral dataset collected from 28 subjects
performing hand-object and hand-virtual interactions under diverse scenarios,
accurately annotated with 3D hand poses through an automated process. We
introduce a new baseline method, TherFormer, utilizing dual transformer modules
for effective egocentric 3D hand pose estimation in thermal imagery. Our
experimental results highlight TherFormer’s leading performance and affirm
thermal imaging’s effectiveness in enabling robust 3D hand pose estimation in
adverse conditions.
[COMMENTS]
15 pages, 9 figures, 6 tables
[LINK]
http://arxiv.org/abs/2403.09871v4
[DATE]
2024-11-16 00:01:39+08:00
[CATEGORIES]
cs.LG
Scaling Law for Post-training after Model Pruning
[AUTHORS]
Xiaodong Chen, Yuxuan Hu, Jing Zhang, Xiaokang Zhang, Cuiping Li, Hong Chen
[ABSTRACT]
Large language models (LLMs) based on the Transformer architecture are widely
employed across various domains and tasks. However, their increasing size
imposes significant hardware demands, limiting practical deployment. To
mitigate this, model pruning techniques have been developed to create more
efficient models while maintaining high performance. Despite this,
post-training after pruning is crucial for performance recovery and can be
resource-intensive. This paper investigates the post-training requirements of
pruned LLMs and introduces a scaling law to determine the optimal amount of
post-training data. Post-training experiments with the Llama-3 and Qwen-2.5
series models, pruned using depth pruning, width pruning, and 2:4
semi-structured pruning, show that higher pruning ratios necessitate more
post-training data for performance recovery, whereas larger LLMs require less.
The proposed scaling law predicts a model’s loss based on its parameter counts
before and after pruning, as well as the post-training token counts.
Furthermore, we find that the scaling law established from smaller LLMs can be
reliably extrapolated to larger LLMs. This work provides valuable insights into
the post-training of pruned LLMs and offers a practical scaling law for
optimizing post-training data usage.
[LINK]
http://arxiv.org/abs/2411.10272v1
[DATE]
2024-11-15 23:28:42+08:00
[CATEGORIES]
cs.CL
cs.LG
Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT
[AUTHORS]
Jiangang Hao, Wenju Cui, Patrick Kyllonen, Emily Kerzabi, Lei Liu, Michael Flor
[ABSTRACT]
Collaborative problem solving (CPS) is widely recognized as a critical 21st
century skill. Efficiently coding communication data is a big challenge in
scaling up research on assessing CPS. This paper reports the findings on using
ChatGPT to directly code CPS chat data by benchmarking performance across
multiple datasets and coding frameworks. We found that ChatGPT-based coding
outperformed human coding in tasks where the discussions were characterized by
colloquial languages but fell short in tasks where the discussions dealt with
specialized scientific terminology and contexts. The findings offer practical
guidelines for researchers to develop strategies for efficient and scalable
analysis of communication data from CPS tasks.
[COMMENTS]
21 pages, 3 figures, 5 tables. Initially report in the edArXiv:xw6kz
[LINK]
http://arxiv.org/abs/2411.10246v1
[DATE]
2024-11-15 22:57:39+08:00
[CATEGORIES]
cs.CL
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
[AUTHORS]
Michael Aerni, Javier Rando, Edoardo Debenedetti, Nicholas Carlini, Daphne Ippolito, Florian Tramèr
[ABSTRACT]
Large language models memorize parts of their training data. Memorizing short
snippets and facts is required to answer questions about the world and to be
fluent in any language. But models have also been shown to reproduce long
verbatim sequences of memorized text when prompted by a motivated adversary. In
this work, we investigate an intermediate regime of memorization that we call
non-adversarial reproduction, where we quantify the overlap between model
responses and pretraining data when responding to natural and benign prompts.
For a variety of innocuous prompt categories (e.g., writing a letter or a
tutorial), we show that up to 15% of the text output by popular conversational
language models overlaps with snippets from the Internet. In worst cases, we
find generations where 100% of the content can be found exactly online. For the
same tasks, we find that human-written text has far less overlap with Internet
data. We further study whether prompting strategies can close this reproduction
gap between models and humans. While appropriate prompting can reduce
non-adversarial reproduction on average, we find that mitigating worst-case
reproduction of training data requires stronger defenses – even for benign
interactions.
[LINK]
http://arxiv.org/abs/2411.10242v1
[DATE]
2024-11-15 22:55:01+08:00
[CATEGORIES]
cs.CL
cs.LG
Entropy and type-token ratio in gigaword corpora
[AUTHORS]
Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez
[ABSTRACT]
Lexical diversity measures the vocabulary variation in texts. While its
utility is evident for analyses in language change and applied linguistics, it
is not yet clear how to operationalize this concept in a unique way. We here
investigate entropy and text-token ratio, two widely employed metrics for
lexical diversities, in six massive linguistic datasets in English, Spanish,
and Turkish, consisting of books, news articles, and tweets. These gigaword
corpora correspond to languages with distinct morphological features and differ
in registers and genres, thus constituting a diverse testbed for a quantitative
approach to lexical diversity. Strikingly, we find a functional relation
between entropy and text-token ratio that holds across the corpora under
consideration. Further, in the limit of large vocabularies we find an
analytical expression that sheds light on the origin of this relation and its
connection with both Zipf and Heaps laws. Our results then contribute to the
theoretical understanding of text structure and offer practical implications
for fields like natural language processing.
[COMMENTS]
12 pages, 10 figures, 7 tables
[LINK]
http://arxiv.org/abs/2411.10227v1
[DATE]
2024-11-15 22:40:59+08:00
[CATEGORIES]
cs.CL
Increasing the Accessibility of Causal Domain Knowledge via Causal Information Extraction Methods: A Case Study in the Semiconductor Manufacturing Industry
[AUTHORS]
Houssam Razouk, Leonie Benischke, Daniel Garber, Roman Kern
[ABSTRACT]
The extraction of causal information from textual data is crucial in the
industry for identifying and mitigating potential failures, enhancing process
efficiency, prompting quality improvements, and addressing various operational
challenges. This paper presents a study on the development of automated methods
for causal information extraction from actual industrial documents in the
semiconductor manufacturing industry. The study proposes two types of causal
information extraction methods, single-stage sequence tagging (SST) and
multi-stage sequence tagging (MST), and evaluates their performance using
existing documents from a semiconductor manufacturing company, including
presentation slides and FMEA (Failure Mode and Effects Analysis) documents. The
study also investigates the effect of representation learning on downstream
tasks. The presented case study showcases that the proposed MST methods for
extracting causal information from industrial documents are suitable for
practical applications, especially for semi structured documents such as FMEAs,
with a 93\% F1 score. Additionally, MST achieves a 73\% F1 score on texts
extracted from presentation slides. Finally, the study highlights the
importance of choosing a language model that is more aligned with the domain
and in-domain fine-tuning.
[COMMENTS]
17 pages, 2 figures
[LINK]
http://arxiv.org/abs/2411.10172v1
[DATE]
2024-11-15 21:18:18+08:00
[CATEGORIES]
cs.CL
Evaluating the role of `Constitutions’ for learning from AI feedback
[AUTHORS]
Saskia Redgate, Andrew M. Bean, Adam Mahdi
[COMMENTS]
4 pages, 2 figures. In NeurIPS 2024 Workshop on Language Gamification
[LINK]
http://arxiv.org/abs/2411.10168v1
[DATE]
2024-11-15 21:16:11+08:00
[CATEGORIES]
cs.CL
Everything is a Video: Unifying Modalities through Next-Frame Prediction
[AUTHORS]
G. Thomas Hudson, Dean Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed
[ABSTRACT]
Multimodal learning, which involves integrating information from various
modalities such as text, images, audio, and video, is pivotal for numerous
complex tasks like visual question answering, cross-modal retrieval, and
caption generation. Traditional approaches rely on modality-specific encoders
and late fusion techniques, which can hinder scalability and flexibility when
adapting to new tasks or modalities. To address these limitations, we introduce
a novel framework that extends the concept of task reformulation beyond natural
language processing (NLP) to multimodal learning. We propose to reformulate
diverse multimodal tasks into a unified next-frame prediction problem, allowing
a single model to handle different modalities without modality-specific
components. This method treats all inputs and outputs as sequential frames in a
video, enabling seamless integration of modalities and effective knowledge
transfer across tasks. Our approach is evaluated on a range of tasks, including
text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text,
demonstrating the model’s ability to generalize across modalities with minimal
adaptation. We show that task reformulation can significantly simplify
multimodal model design across various tasks, laying the groundwork for more
generalized multimodal foundation models.
[COMMENTS]
10 pages, 10 figures
[LINK]
http://arxiv.org/abs/2411.10503v1
[DATE]
2024-11-15 20:59:37+08:00
[CATEGORIES]
cs.CL
cs.LG
Prompting and Fine-tuning Large Language Models for Automated Code Review Comment Generation
[AUTHORS]
Md. Asif Haider, Ayesha Binte Mostofa, Sk. Sabit Bin Mosaddek, Anindya Iqbal, Toufique Ahmed
[ABSTRACT]
Generating accurate code review comments remains a significant challenge due
to the inherently diverse and non-unique nature of the task output. Large
language models pretrained on both programming and natural language data tend
to perform well in code-oriented tasks. However, large-scale pretraining is not
always feasible due to its environmental impact and project-specific
generalizability issues. In this work, first we fine-tune open-source Large
language models (LLM) in parameter-efficient, quantized low-rank (QLoRA)
fashion on consumer-grade hardware to improve review comment generation. Recent
studies demonstrate the efficacy of augmenting semantic metadata information
into prompts to boost performance in other code-related tasks. To explore this
in code review activities, we also prompt proprietary, closed-source LLMs
augmenting the input code patch with function call graphs and code summaries.
Both of our strategies improve the review comment generation performance, with
function call graph augmented few-shot prompting on the GPT-3.5 model
surpassing the pretrained baseline by around 90% BLEU-4 score on the
CodeReviewer dataset. Moreover, few-shot prompted Gemini-1.0 Pro, QLoRA
fine-tuned Code Llama and Llama 3.1 models achieve competitive results (ranging
from 25% to 83% performance improvement) on this task. An additional human
evaluation study further validates our experimental findings, reflecting
real-world developers’ perceptions of LLM-generated code review comments based
on relevant qualitative metrics.
[LINK]
http://arxiv.org/abs/2411.10129v1
[DATE]
2024-11-15 20:01:38+08:00
[CATEGORIES]
cs.CL
cs.LG
Understanding The Effect Of Temperature On Alignment With Human Opinions
[AUTHORS]
Maja Pavlovic, Massimo Poesio
[ABSTRACT]
With the increasing capabilities of LLMs, recent studies focus on
understanding whose opinions are represented by them and how to effectively
extract aligned opinion distributions. We conducted an empirical analysis of
three straightforward methods for obtaining distributions and evaluated the
results across a variety of metrics. Our findings suggest that sampling and
log-probability approaches with simple parameter adjustments can return better
aligned outputs in subjective tasks compared to direct prompting. Yet, assuming
models reflect human opinions may be limiting, highlighting the need for
further research on how human subjectivity affects model uncertainty.
[LINK]
http://arxiv.org/abs/2411.10080v1
[DATE]
2024-11-15 17:50:27+08:00
[CATEGORIES]
cs.CL
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity
[AUTHORS]
Zichen Song, Sitan Huang, Yuxin Wu, Zhongfeng Kang
[ABSTRACT]
Evaluating the importance of different layers in large language models (LLMs)
is crucial for optimizing model performance and interpretability. This paper
first explores layer importance using the Activation Variance-Sparsity Score
(AVSS), which combines normalized activation variance and sparsity to quantify
each layer’s contribution to overall model performance. By ranking layers based
on AVSS and pruning the least impactful 25\%, our experiments on tasks such as
question answering, language modeling, and sentiment classification show that
over 90\% of the original performance is retained, highlighting potential
redundancies in LLM architectures. Building on AVSS, we propose an enhanced
version tailored to assess hallucination propensity across layers (EAVSS). This
improved approach introduces Hallucination-Specific Activation Variance (HSAV)
and Hallucination-Specific Sparsity (HSS) metrics, allowing precise
identification of hallucination-prone layers. By incorporating contrastive
learning on these layers, we effectively mitigate hallucination generation,
contributing to more robust and efficient LLMs(The maximum performance
improvement is 12\%). Our results on the NQ, SciQ, TriviaQA, TruthfulQA, and
WikiQA datasets demonstrate the efficacy of this method, offering a
comprehensive framework for both layer importance evaluation and hallucination
mitigation in LLMs.
[COMMENTS]
20 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.10069v1
[DATE]
2024-11-15 17:33:47+08:00
[CATEGORIES]
cs.CL
CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation
[AUTHORS]
Xiaofei Zhu, Jiawei Cheng, Zhou Yang, Zhuo Chen, Qingyang Wang, Jianfeng Yao
[ABSTRACT]
Multimodal emotion recognition in conversation (MER) aims to accurately
identify emotions in conversational utterances by integrating multimodal
information. Previous methods usually treat multimodal information as equal
quality and employ symmetric architectures to conduct multimodal fusion.
However, in reality, the quality of different modalities usually varies
considerably, and utilizing a symmetric architecture is difficult to accurately
recognize conversational emotions when dealing with uneven modal information.
Furthermore, fusing multi-modality information in a single granularity may fail
to adequately integrate modal information, exacerbating the inaccuracy in
emotion recognition. In this paper, we propose a novel Cross-Modality Augmented
Transformer with Hierarchical Variational Distillation, called CMATH, which
consists of two major components, i.e., Multimodal Interaction Fusion and
Hierarchical Variational Distillation. The former is comprised of two
submodules, including Modality Reconstruction and Cross-Modality Augmented
Transformer (CMA-Transformer), where Modality Reconstruction focuses on
obtaining high-quality compressed representation of each modality, and
CMA-Transformer adopts an asymmetric fusion strategy which treats one modality
as the central modality and takes others as auxiliary modalities. The latter
first designs a variational fusion network to fuse the fine-grained
representations learned by CMA- Transformer into a coarse-grained
representations. Then, it introduces a hierarchical distillation framework to
maintain the consistency between modality representations with different
granularities. Experiments on the IEMOCAP and MELD datasets demonstrate that
our proposed model outperforms previous state-of-the-art baselines.
Implementation codes can be available at https://github.com/ cjw-MER/CMATH.
[LINK]
http://arxiv.org/abs/2411.10060v1
[DATE]
2024-11-15 17:23:02+08:00
[CATEGORIES]
cs.CL
Towards unearthing neglected climate innovations from scientific literature using Large Language Models
[AUTHORS]
César Quilodrán-Casas, Christopher Waite, Nicole Alhadeff, Diyona Dsouza, Cathal Hughes, Larissa Kunstel-Tabet, Alyssa Gilbert
[COMMENTS]
10 pages. Accepted in the LatinX in AI workshop at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10055v1
[DATE]
2024-11-15 17:17:40+08:00
[CATEGORIES]
cs.CL
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
[AUTHORS]
Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun
[ABSTRACT]
Unsupervised automatic speech recognition (ASR) aims to learn the mapping
between the speech signal and its corresponding textual transcription without
the supervision of paired speech-text data. A word/phoneme in the speech signal
is represented by a segment of speech signal with variable length and unknown
boundary, and this segmental structure makes learning the mapping between
speech and text challenging, especially without paired data. In this paper, we
propose REBORN,Reinforcement-Learned Boundary Segmentation with Iterative
Training for Unsupervised ASR. REBORN alternates between (1) training a
segmentation model that predicts the boundaries of the segmental structures in
speech signals and (2) training the phoneme prediction model, whose input is
the speech feature segmented by the segmentation model, to predict a phoneme
transcription. Since supervised data for training the segmentation model is not
available, we use reinforcement learning to train the segmentation model to
favor segmentations that yield phoneme sequence predictions with a lower
perplexity. We conduct extensive experiments and find that under the same
setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech,
TIMIT, and five non-English languages in Multilingual LibriSpeech. We
comprehensively analyze why the boundaries learned by REBORN improve the
unsupervised ASR performance.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2402.03988v3
[DATE]
2024-11-15 16:38:26+08:00
[CATEGORIES]
cs.CL
Once More, With Feeling: Measuring Emotion of Acting Performances in Contemporary American Film
[AUTHORS]
Naitian Zhou, David Bamman
[ABSTRACT]
Narrative film is a composition of writing, cinematography, editing, and
performance. While much computational work has focused on the writing or visual
style in film, we conduct in this paper a computational exploration of acting
performance. Applying speech emotion recognition models and a variationist
sociolinguistic analytical framework to a corpus of popular, contemporary
American film, we find narrative structure, diachronic shifts, and genre- and
dialogue-based constraints located in spoken performances.
[COMMENTS]
Accepted CHR 2024
[LINK]
http://arxiv.org/abs/2411.10018v1
[DATE]
2024-11-15 15:53:02+08:00
[CATEGORIES]
cs.CL
VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models
[AUTHORS]
Jingtao Cao, Zheng Zhang, Hongru Wang, Kam-Fai Wong
[ABSTRACT]
Progress in Text-to-Image (T2I) models has significantly improved the
generation of images from textual descriptions. However, existing evaluation
metrics do not adequately assess the models’ ability to handle a diverse range
of textual prompts, which is crucial for their generalizability. To address
this, we introduce a new metric called Visual Language Evaluation Understudy
(VLEU). VLEU uses large language models to sample from the visual text domain,
the set of all possible input texts for T2I models, to generate a wide variety
of prompts. The images generated from these prompts are evaluated based on
their alignment with the input text using the CLIP model.VLEU quantifies a
model’s generalizability by computing the Kullback-Leibler divergence between
the marginal distribution of the visual text and the conditional distribution
of the images generated by the model. This metric provides a quantitative way
to compare different T2I models and track improvements during model finetuning.
Our experiments demonstrate the effectiveness of VLEU in evaluating the
generalization capability of various T2I models, positioning it as an essential
metric for future research in text-to-image synthesis.
[COMMENTS]
accepted by EMNLP2024(long paper,main conference)
[LINK]
http://arxiv.org/abs/2409.14704v2
[DATE]
2024-11-15 15:19:03+08:00
[CATEGORIES]
cs.CL
Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs
[AUTHORS]
Yuxuan Huang
[ABSTRACT]
The development of large language models (LLMs) has been catalyzed by
advancements in pre-training techniques. These models have demonstrated robust
reasoning capabilities through manually designed prompts. In this work, we
evaluate the conversational reasoning capabilities of the current
state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the
performance of LLMs is constrained due to a lack of KG environment awareness
and the difficulties in developing effective optimization mechanisms for
intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG
reasoning agent designed to deliver precise and adaptable predictions on KG
paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate
state information within each reasoning step. We reframe the challenge of
multi-hop reasoning on the KG as a sequential decision-making task. Utilizing
the Proximal Policy Optimization (PPO) online policy gradient reinforcement
learning algorithm, our model is optimized to learn from rich reward signals.
Additionally, we conduct an evaluation of our model and GPT-4 on the OpenDialKG
dataset. The experimental results reveal that LLaMA-2-7B-ARK outperforms the
current state-of-the-art model by 5.28 percentage points, with a performance
rate of 36.39% on the target@1 evaluation metric. Meanwhile, GPT-4 scored
14.91%, further demonstrating the effectiveness of our method. Our code is
available on GitHub (https://github.com/Aipura/LLM-ARK) for further access.
[LINK]
http://arxiv.org/abs/2312.11282v3
[DATE]
2024-11-15 14:48:58+08:00
[CATEGORIES]
cs.CL
Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems
[AUTHORS]
Taaha Kazi, Ruiliang Lyu, Sizhe Zhou, Dilek Hakkani-Tur, Gokhan Tur
[ABSTRACT]
Traditionally, offline datasets have been used to evaluate task-oriented
dialogue (TOD) models. These datasets lack context awareness, making them
suboptimal benchmarks for conversational systems. In contrast, user-agents,
which are context-aware, can simulate the variability and unpredictability of
human conversations, making them better alternatives as evaluators. Prior
research has utilized large language models (LLMs) to develop user-agents. Our
work builds upon this by using LLMs to create user-agents for the evaluation of
TOD systems. This involves prompting an LLM, using in-context examples as
guidance, and tracking the user-goal state. Our evaluation of diversity and
task completion metrics for the user-agents shows improved performance with the
use of better prompts. Additionally, we propose methodologies for the automatic
evaluation of TOD models within this dynamic framework.
[LINK]
http://arxiv.org/abs/2411.09972v1
[DATE]
2024-11-15 14:05:45+08:00
[CATEGORIES]
cs.CL
LoRA-LiteE: A Computationally Efficient Framework for Chatbot Preference-Tuning
[AUTHORS]
Yahe Yang, Chunliang Tao, Xiaojing Fan
[ABSTRACT]
Effective preference tuning is pivotal in aligning chatbot responses with
human expectations, enhancing user satisfaction and engagement. Traditional
approaches, notably Reinforcement Learning from Human Feedback (RLHF) as
employed in advanced models like GPT-4, have demonstrated considerable success
in this domain. However, RLHF methods are often computationally intensive and
resource-demanding, limiting their scalability and accessibility for broader
applications. To address these challenges, this study introduces LoRA-Lite
Ensemble (LoRA-LiteE), an innovative framework that combines Supervised
Fine-tuning (SFT) with Low-Rank Adaptation (LoRA) and Ensemble Learning
techniques to effectively aggregate predictions of lightweight models, which
aim to achieve a balance between the performance and computational cost.
Utilizing the Chatbot Arena benchmark dataset, we conduct a comprehensive
comparative analysis among our LoRA-LiteE model, corresponding base models at
different scales, and GPT-4 trained with RLHF. Our empirical results
demonstrate that the proposed LoRA-LiteE model achieves comparable performance
to un-finetuned GPT-4 and outperforms the single larger-scale models under
limited resource constraints. These findings highlight that our LoRA-LiteE
provides a feasible and efficient methodology for human preference prediction
in chatbot systems, enhancing scalability and accessibility, and thereby
broadening the applicability of preference-tuned chatbots in
resource-constrained environments.
[LINK]
http://arxiv.org/abs/2411.09947v1
[DATE]
2024-11-15 12:57:13+08:00
[CATEGORIES]
cs.CL
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
[AUTHORS]
Thang M. Pham, Phat T. Nguyen, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Trung Bui
[ABSTRACT]
While small language models (SLMs) show promises for mobile deployment, their
real-world performance and applications on smartphones remains underexplored.
We present SlimLM, a series of SLMs optimized for document assistance tasks on
mobile devices. Through extensive experiments on a Samsung Galaxy S24, we
identify the optimal trade-offs between model size (ranging from 125M to 7B
parameters), context length, and inference time for efficient on-device
processing. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on
DocAssist, our constructed dataset for summarization, question answering and
suggestion tasks. Our smallest model demonstrates efficient performance on S24,
while larger variants offer enhanced capabilities within mobile constraints. We
evaluate SlimLM against existing SLMs, showing comparable or superior
performance and offering a benchmark for future research in on-device language
models. We also provide an Android application, offering practical insights
into SLM deployment. Our findings provide valuable insights and illuminate the
capabilities of running advanced language models on high-end smartphones,
potentially reducing server costs and enhancing privacy through on-device
processing.
[LINK]
http://arxiv.org/abs/2411.09944v1
[DATE]
2024-11-15 12:44:34+08:00
[CATEGORIES]
cs.CL
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
[AUTHORS]
Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang
[ABSTRACT]
Recent studies show that collaborating multiple large language model (LLM)
powered agents is a promising way for task solving. However, current approaches
are constrained by using a fixed number of agents and static communication
structures. In this work, we propose automatically selecting a team of agents
from candidates to collaborate in a dynamic communication structure toward
different tasks and domains. Specifically, we build a framework named Dynamic
LLM-Powered Agent Network ($\textbf{DyLAN}$) for LLM-powered agent
collaboration, operating a two-stage paradigm: (1) Team Optimization and (2)
Task Solving. During the first stage, we utilize an $\textit{agent selection}$
algorithm, based on an unsupervised metric called $\textit{Agent Importance
Score}$, enabling the selection of best agents according to their contributions
in a preliminary trial, oriented to the given task. Then, in the second stage,
the selected agents collaborate dynamically according to the query.
Empirically, we demonstrate that DyLAN outperforms strong baselines in code
generation, decision-making, general reasoning, and arithmetic reasoning tasks
with moderate computational cost. On specific subjects in MMLU, selecting a
team of agents in the team optimization stage improves accuracy by up to 25.0%
in DyLAN.
[COMMENTS]
Published in COLM2024. Code Repo: https://github.com/SALT-NLP/DyLAN
[LINK]
http://arxiv.org/abs/2310.02170v2
[DATE]
2024-11-15 12:30:04+08:00
[CATEGORIES]
cs.CL
Refined and Segmented Price Sentiment Indices from Survey Comments
[AUTHORS]
Masahiro Suzuki, Hiroki Sakaji
[ABSTRACT]
We aim to enhance a price sentiment index and to more precisely understand
price trends from the perspective of not only consumers but also businesses. We
extract comments related to prices from the Economy Watchers Survey conducted
by the Cabinet Office of Japan and classify price trends using a large language
model (LLM). We classify whether the survey sample reflects the perspective of
consumers or businesses, and whether the comments pertain to goods or services
by utilizing information on the fields of comments and the industries of
respondents included in the Economy Watchers Survey. From these classified
price-related comments, we construct price sentiment indices not only for a
general purpose but also for more specific objectives by combining perspectives
on consumers and prices, as well as goods and services. It becomes possible to
achieve a more accurate classification of price directions by employing a LLM
for classification. Furthermore, integrating the outputs of multiple LLMs
suggests the potential for the better performance of the classification. The
use of more accurately classified comments allows for the construction of an
index with a higher correlation to existing indices than previous studies. We
demonstrate that the correlation of the price index for consumers, which has a
larger sample size, is further enhanced by selecting comments for aggregation
based on the industry of the survey respondents.
[COMMENTS]
Accepted to IEEE BigData 2024. 9 pages, 11 tables, 1 figure
[LINK]
http://arxiv.org/abs/2411.09937v1
[DATE]
2024-11-15 12:22:21+08:00
[CATEGORIES]
cs.CL
JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging
[AUTHORS]
Kaito Baba, Ryota Yagi, Junichiro Takahashi, Risa Kishikawa, Satoshi Kodera
[ABSTRACT]
With the rapid advancement of large language models (LLMs), foundational
models (FMs) have seen significant advancements. Healthcare is one of the most
crucial application areas for these FMs, given the significant time and effort
required for physicians to analyze large volumes of patient data. Recent
efforts have focused on adapting multimodal FMs to the medical domain through
techniques like instruction-tuning, leading to the development of medical
foundation models (MFMs). However, these approaches typically require large
amounts of training data to effectively adapt models to the medical field.
Moreover, most existing models are trained on English datasets, limiting their
practicality in non-English-speaking regions where healthcare professionals and
patients are not always fluent in English. The need for translation introduces
additional costs and inefficiencies. To address these challenges, we propose a
\textbf{J}apanese \textbf{Radi}ology report generation model enhanced by
\textbf{Evo}lutionary optimization of model merging (JRadiEvo). This is the
first attempt to extend a non-medical vision-language foundation model to the
medical domain through evolutionary optimization of model merging. We
successfully created a model that generates accurate Japanese reports from
X-ray images using only 50 translated samples from publicly available data.
This model, developed with highly efficient use of limited data, outperformed
leading models from recent research trained on much larger datasets.
Additionally, with only 8 billion parameters, this relatively compact
foundation model can be deployed locally within hospitals, making it a
practical solution for environments where APIs and other external services
cannot be used due to strict privacy and security requirements.
[COMMENTS]
Accepted by NeurIPS‘24 Workshop on AIM-FM: Advancements In Medical
Foundation Models: Explainability, Robustness, Security, and Beyond
[LINK]
http://arxiv.org/abs/2411.09933v1
[DATE]
2024-11-15 12:16:50+08:00
[CATEGORIES]
cs.CL
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
[AUTHORS]
Seongyun Lee, Geewook Kim, Jiyeon Kim, Hyunji Lee, Hoyeon Chang, Sue Hyun Park, Minjoon Seo
[ABSTRACT]
Vision-Language adaptation (VL adaptation) transforms Large Language Models
(LLMs) into Large Vision-Language Models (LVLMs) for multimodal tasks, but this
process often compromises the inherent safety capabilities embedded in the
original LLMs. Despite potential harmfulness due to weakened safety measures,
in-depth analysis on the effects of VL adaptation on safety remains
under-explored. This study examines how VL adaptation influences safety and
evaluates the impact of safety fine-tuning methods. Our analysis reveals that
safety degradation occurs during VL adaptation, even when the training data is
safe. While safety tuning techniques like supervised fine-tuning with safety
datasets or reinforcement learning from human feedback mitigate some risks,
they still lead to safety degradation and a reduction in helpfulness due to
over-rejection issues. Further analysis of internal model weights suggests that
VL adaptation may impact certain safety-related layers, potentially lowering
overall safety levels. Additionally, our findings demonstrate that the
objectives of VL adaptation and safety tuning are divergent, which often
results in their simultaneous application being suboptimal. To address this, we
suggest the weight merging approach as an optimal solution effectively reducing
safety degradation while maintaining helpfulness. These insights help guide the
development of more reliable and secure LVLMs for real-world applications.
[COMMENTS]
Work in Progress
[LINK]
http://arxiv.org/abs/2410.07571v2
[DATE]
2024-11-15 11:20:57+08:00
[CATEGORIES]
cs.CL
DriveThru: a Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives
[AUTHORS]
Mohammad Rifqi Farhansyah, Muhammad Zuhdi Fikri Johari, Afinzaki Amiral, Ayu Purwarianti, Kumara Ari Yuana, Derry Tanti Wijaya
[ABSTRACT]
Indonesia is one of the most diverse countries linguistically. However,
despite this linguistic diversity, Indonesian languages remain underrepresented
in Natural Language Processing (NLP) research and technologies. In the past two
years, several efforts have been conducted to construct NLP resources for
Indonesian languages. However, most of these efforts have been focused on
creating manual resources thus difficult to scale to more languages. Although
many Indonesian languages do not have a web presence, locally there are
resources that document these languages well in printed forms such as books,
magazines, and newspapers. Digitizing these existing resources will enable
scaling of Indonesian language resource construction to many more languages. In
this paper, we propose an alternative method of creating datasets by digitizing
documents, which have not previously been used to build digital language
resources in Indonesia. DriveThru is a platform for extracting document content
utilizing Optical Character Recognition (OCR) techniques in its system to
provide language resource building with less manual effort and cost. This paper
also studies the utility of current state-of-the-art LLM for post-OCR
correction to show the capability of increasing the character accuracy rate
(CAR) and word accuracy rate (WAR) compared to off-the-shelf OCR.
[COMMENTS]
12 pages, 3 figures, 6 tables
[LINK]
http://arxiv.org/abs/2411.09318v2
[DATE]
2024-11-15 10:42:59+08:00
[CATEGORIES]
cs.CL
Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography
[AUTHORS]
Harshavardhana T. Gowda, Zachary D. McNaughton, Lee M. Miller
[ABSTRACT]
Each year, millions of individuals lose the ability to speak intelligibly due
to causes such as neuromuscular disease, stroke, trauma, and head/neck cancer
surgery (e.g. laryngectomy) or treatment (e.g. radiotherapy toxicity to the
speech articulators). Effective communication is crucial for daily activities,
and losing the ability to speak leads to isolation, depression, anxiety, and a
host of detrimental sequelae. Noninvasive surface electromyography (sEMG) has
shown promise to restore speech output in these individuals. The goal is to
collect sEMG signals from multiple articulatory sites as people silently
produce speech and then decode the signals to enable fluent and natural
communication. Currently, many fundamental properties of orofacial
neuromuscular signals relating to speech articulation remain unanswered. They
include questions relating to 1) the data structure of the orofacial sEMG
signals, 2)the signal distribution shift of sEMG across individuals, 3) ability
of sEMG signals to span the entire English language phonetic space during
silent speech articulations, and 4) the generalization capability of
non-invasive sEMG based silent speech interfaces. We address these questions
through a series of experiments involving healthy human subjects. We show that
sEMG signals evince graph data structure and that the signal distribution shift
is given by a change of basis. Furthermore, we show that silently voiced
articulations spanning the entire English language phonetic space can be
decoded using small neural networks which can be trained with little data and
that such architectures work well across individuals. To ensure transparency
and reproducibility, we open-source all the data and codes used in this study.
[LINK]
http://arxiv.org/abs/2411.02591v2
[DATE]
2024-11-15 10:33:29+08:00
[CATEGORIES]
cs.CL
Evaluating Gender Bias in Large Language Models
[AUTHORS]
Michael Döll, Markus Döhring, Andreas Müller
[ABSTRACT]
Gender bias in artificial intelligence has become an important issue,
particularly in the context of language models used in communication-oriented
applications. This study examines the extent to which Large Language Models
(LLMs) exhibit gender bias in pronoun selection in occupational contexts. The
analysis evaluates the models GPT-4, GPT-4o, PaLM 2 Text Bison and Gemini 1.0
Pro using a self-generated dataset. The jobs considered include a range of
occupations, from those with a significant male presence to those with a
notable female concentration, as well as jobs with a relatively equal gender
distribution. Three different sentence processing methods were used to assess
potential gender bias: masked tokens, unmasked sentences, and sentence
completion. In addition, the LLMs suggested names of individuals in specific
occupations, which were then examined for gender distribution. The results show
a positive correlation between the models’ pronoun choices and the gender
distribution present in U.S. labor force data. Female pronouns were more often
associated with female-dominated occupations, while male pronouns were more
often associated with male-dominated occupations. Sentence completion showed
the strongest correlation with actual gender distribution, while name
generation resulted in a more balanced ‘politically correct’ gender
distribution, albeit with notable variations in predominantly male or female
occupations. Overall, the prompting method had a greater impact on gender
distribution than the model selection itself, highlighting the complexity of
addressing gender bias in LLMs. The findings highlight the importance of
prompting in gender mapping.
[COMMENTS]
13 pages, 12 figures, 1 table
[LINK]
http://arxiv.org/abs/2411.09826v1
[DATE]
2024-11-15 06:23:13+08:00
[CATEGORIES]
cs.CL
Security and Privacy Challenges of Large Language Models: A Survey
[AUTHORS]
Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
[ABSTRACT]
Large Language Models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text,
language translation, and question-answering. Nowadays, LLM is becoming a very
popular tool in computerized language processing tasks, with the capability to
analyze complicated linguistic patterns and provide relevant and appropriate
responses depending on the context. While offering significant advantages,
these models are also vulnerable to security and privacy attacks, such as
jailbreaking attacks, data poisoning attacks, and Personally Identifiable
Information (PII) leakage attacks. This survey provides a thorough review of
the security and privacy challenges of LLMs for both training data and users,
along with the application-based risks in various domains, such as
transportation, education, and healthcare. We assess the extent of LLM
vulnerabilities, investigate emerging security and privacy attacks for LLMs,
and review the potential defense mechanisms. Additionally, the survey outlines
existing research gaps in this domain and highlights future research
directions.
[LINK]
http://arxiv.org/abs/2402.00888v2
[DATE]
2024-11-15 06:20:49+08:00
[CATEGORIES]
cs.CL
Methods of Automatic Matrix Language Determination for Code-Switched Speech
[AUTHORS]
Olga Iakovenko, Thomas Hain
[ABSTRACT]
Code-switching (CS) is the process of speakers interchanging between two or
more languages which in the modern world becomes increasingly common. In order
to better describe CS speech the Matrix Language Frame (MLF) theory introduces
the concept of a Matrix Language, which is the language that provides the
grammatical structure for a CS utterance. In this work the MLF theory was used
to develop systems for Matrix Language Identity (MLID) determination. The MLID
of English/Mandarin and English/Spanish CS text and speech was compared to
acoustic language identity (LID), which is a typical way to identify a language
in monolingual utterances. MLID predictors from audio show higher correlation
with the textual principles than LID in all cases while also outperforming LID
in an MLID recognition task based on F1 macro (60%) and correlation score
(0.38). This novel approach has identified that non-English languages (Mandarin
and Spanish) are preferred over the English language as the ML contrary to the
monolingual choice of LID.
[COMMENTS]
EMNLP 2024
[LINK]
http://arxiv.org/abs/2410.02521v2
[DATE]
2024-11-15 03:36:43+08:00
[CATEGORIES]
cs.CL
Evaluating the Predictive Capacity of ChatGPT for Academic Peer Review Outcomes Across Multiple Platforms
[AUTHORS]
Mike Thelwall, Abdullah Yaghi
[ABSTRACT]
While previous studies have demonstrated that Large Language Models (LLMs)
can predict peer review outcomes to some extent, this paper builds on that by
introducing two new contexts and employing a more robust method - averaging
multiple ChatGPT scores. The findings that averaging 30 ChatGPT predictions,
based on reviewer guidelines and using only the submitted titles and abstracts,
failed to predict peer review outcomes for F1000Research (Spearman’s rho=0.00).
However, it produced mostly weak positive correlations with the quality
dimensions of SciPost Physics (rho=0.25 for validity, rho=0.25 for originality,
rho=0.20 for significance, and rho = 0.08 for clarity) and a moderate positive
correlation for papers from the International Conference on Learning
Representations (ICLR) (rho=0.38). Including the full text of articles
significantly increased the correlation for ICLR (rho=0.46) and slightly
improved it for F1000Research (rho=0.09), while it had variable effects on the
four quality dimension correlations for SciPost LaTeX files. The use of
chain-of-thought system prompts slightly increased the correlation for
F1000Research (rho=0.10), marginally reduced it for ICLR (rho=0.37), and
further decreased it for SciPost Physics (rho=0.16 for validity, rho=0.18 for
originality, rho=0.18 for significance, and rho=0.05 for clarity). Overall, the
results suggest that in some contexts, ChatGPT can produce weak pre-publication
quality assessments. However, the effectiveness of these assessments and the
optimal strategies for employing them vary considerably across different
platforms, journals, and conferences. Additionally, the most suitable inputs
for ChatGPT appear to differ depending on the platform.
[LINK]
http://arxiv.org/abs/2411.09763v1
[DATE]
2024-11-15 03:20:33+08:00
[CATEGORIES]
cs.CL
A Bayesian Optimization Approach to Machine Translation Reranking
[AUTHORS]
Julius Cheng, Maike Züfle, Vilém Zouhar, Andreas Vlachos
[ABSTRACT]
Reranking a list of candidates from a machine translation system with an
external scoring model and returning the highest-scoring candidate remains a
simple and effective method for improving the overall output quality.
Translation scoring models continue to grow in size, with the best models being
comparable to generation models. Thus, reranking can add substantial
computational cost to the translation pipeline. In this work, we pose reranking
as a Bayesian optimization (BayesOpt) problem. By strategically selecting
candidates to score based on a balance of exploration and exploitation, we show
that it is possible to find top-scoring candidates when scoring only a fraction
of the candidate list. For instance, our method achieves the same CometKiwi
score using only 70 scoring evaluations compared a baseline system using 180.
We present a multi-fidelity setting for BayesOpt, where the candidates are
first scored with a cheaper but noisier proxy scoring model, which further
improves the cost-performance tradeoff when using smaller but well-trained
distilled proxy scorers.
[COMMENTS]
v1: Preprint version
[LINK]
http://arxiv.org/abs/2411.09694v1
[DATE]
2024-11-15 02:58:23+08:00
[CATEGORIES]
cs.CL
Squeezed Attention: Accelerating Long Context Length LLM Inference
[AUTHORS]
Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Monishwaran Maheswaran, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
[ABSTRACT]
Emerging Large Language Model (LLM) applications require long input prompts
to perform complex downstream tasks like document analysis and code generation.
For these long context length applications, the length of the input prompt
poses a significant challenge in terms of inference efficiency since the
inference costs increase linearly with sequence length. However, for many of
these applications, much of the context in the prompt is fixed across different
user inputs, thereby providing the opportunity to perform offline optimizations
to process user inputs quickly, as they are received. In this work, we propose
Squeezed Attention as a mechanism to accelerate LLM applications where a large
portion of the input prompt is fixed. We first leverage K-means clustering
offline to group the keys for the fixed context based on semantic similarity
and represent each cluster with a single centroid value. During inference, we
compare query tokens from the user input with the centroids to predict which of
the keys from the fixed context are semantically relevant and need to be loaded
during inference. We then compute exact attention using only these important
keys from the fixed context, thereby reducing bandwidth and computational
costs. We also extend our method to use a hierarchical centroid lookup to
identify important keys, which can reduce the complexity of attention from
linear to logarithmic with respect to the context length. We implement
optimized Triton kernels for centroid comparison and sparse FlashAttention with
important keys, achieving more than 4x speedups during both the prefill and
generation phases for long-context inference. Furthermore, we have extensively
evaluated our method on various long-context benchmarks including LongBench,
where it achieves a 3x reduction in KV cache budget without accuracy loss and
up to an 8x reduction with <0.5 point accuracy gap for various models.
[LINK]
http://arxiv.org/abs/2411.09688v1
[DATE]
2024-11-15 02:54:19+08:00
[CATEGORIES]
cs.CL
Quantitative Assessment of Intersectional Empathetic Bias and Understanding
[AUTHORS]
Vojtech Formanek, Ondrej Sotolar
[ABSTRACT]
A growing amount of literature critiques the current operationalizations of
empathy based on loose definitions of the construct. Such definitions
negatively affect dataset quality, model robustness, and evaluation
reliability. We propose an empathy evaluation framework that operationalizes
empathy close to its psychological origins. The framework measures the variance
in responses of LLMs to prompts using existing metrics for empathy and
emotional valence. The variance is introduced through the controlled generation
of the prompts by varying social biases affecting context understanding, thus
impacting empathetic understanding. The control over generation ensures high
theoretical validity of the constructs in the prompt dataset. Also, it makes
high-quality translation, especially into languages that currently have
little-to-no way of evaluating empathy or bias, such as the Slavonic family,
more manageable. Using chosen LLMs and various prompt types, we demonstrate the
empathy evaluation with the framework, including multiple-choice answers and
free generation. The variance in our initial evaluation sample is small and we
were unable to measure convincing differences between the empathetic
understanding in contexts given by different social groups. However, the
results are promising because the models showed significant alterations their
reasoning chains needed to capture the relatively subtle changes in the
prompts. This provides the basis for future research into the construction of
the evaluation sample and statistical methods for measuring the results.
[LINK]
http://arxiv.org/abs/2411.05777v2
[DATE]
2024-11-15 02:35:19+08:00
[CATEGORIES]
cs.CL
Adaptive Decoding via Latent Preference Optimization
[AUTHORS]
Shehzaad Dhuliawala, Ilia Kulikov, Ping Yu, Asli Celikyilmaz, Jason Weston, Sainbayar Sukhbaatar, Jack Lanchantin
[ABSTRACT]
During language model decoding, it is known that using higher temperature
sampling gives more creative responses, while lower temperatures are more
factually accurate. However, such models are commonly applied to general
instruction following, which involves both creative and fact seeking tasks,
using a single fixed temperature across all examples and tokens. In this work,
we introduce Adaptive Decoding, a layer added to the model to select the
sampling temperature dynamically at inference time, at either the token or
example level, in order to optimize performance. To learn its parameters we
introduce Latent Preference Optimization (LPO) a general approach to train
discrete latent variables such as choices of temperature. Our method
outperforms all fixed decoding temperatures across a range of tasks that
require different temperatures, including UltraFeedback, Creative Story
Writing, and GSM8K.
[LINK]
http://arxiv.org/abs/2411.09661v1
[DATE]
2024-11-15 02:31:39+08:00
[CATEGORIES]
cs.CL
On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse
[AUTHORS]
Alkis Kalavasis, Anay Mehrotra, Grigoris Velegkas
[ABSTRACT]
Specifying all desirable properties of a language model is challenging, but
certain requirements seem essential. Given samples from an unknown language,
the trained model should produce valid strings not seen in training and be
expressive enough to capture the language’s full richness. Otherwise,
outputting invalid strings constitutes “hallucination,” and failing to capture
the full range leads to “mode collapse.” We ask if a language model can meet
both requirements.
We investigate this within a statistical language generation setting building
on Gold and Angluin. Here, the model receives random samples from a
distribution over an unknown language K, which belongs to a possibly infinite
collection of languages. The goal is to generate unseen strings from K. We say
the model generates from K with consistency and breadth if, as training size
increases, its output converges to all unseen strings in K.
Kleinberg and Mullainathan [KM24] asked if consistency and breadth in
language generation are possible. We answer this negatively: for a large class
of language models, including next-token prediction models, this is impossible
for most collections of candidate languages. This contrasts with [KM24]’s
result, showing consistent generation without breadth is possible for any
countable collection of languages. Our finding highlights that generation with
breadth fundamentally differs from generation without breadth.
As a byproduct, we establish near-tight bounds on the number of samples
needed for generation with or without breadth.
Finally, our results offer hope: consistent generation with breadth is
achievable for any countable collection of languages when negative examples
(strings outside K) are available alongside positive ones. This suggests that
post-training feedback, which encodes negative examples, can be crucial in
reducing hallucinations while limiting mode collapse.
[COMMENTS]
Abstract shortened to fit arXiv limit
[LINK]
http://arxiv.org/abs/2411.09642v1
[DATE]
2024-11-15 02:06:55+08:00
[CATEGORIES]
cs.LG
cs.CL
VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models
[AUTHORS]
Hang Gao, Yongfeng Zhang
[ABSTRACT]
Vector retrieval algorithms are essential for semantic queries within the
rapidly evolving landscape of Large Language Models (LLMs). The ability to
retrieve vectors that satisfy both similarity and diversity criteria
substantially enhances the performance of LLMs. Although Maximal Marginal
Relevance (MMR) is widely employed in retrieval scenarios requiring relevance
and diversity, variations in the parameter $\lambda$ lead to fluctuations that
complicate the optimization trajectory in vector spaces. This obscures the
direction of improvement and highlights the lack of a robust theoretical
analysis regarding similarity and diversity constraints in retrieval processes.
To address these challenges, this paper introduces a novel approach that
characterizes both constraints through the relationship between the sum vector
and the query vector. The proximity of these vectors ensures the similarity
constraint, while requiring individual vectors within the sum vector to diverge
in their alignment with the query vector satisfies the diversity constraint. We
first formulate a new combinatorial optimization problem, selecting k vectors
from a candidate set such that their sum vector maximally aligns with the query
vector, and demonstrate that this problem is NP-complete. This result
underscores the inherent difficulty of simultaneously achieving similarity and
diversity in vector retrieval, thereby providing a theoretical foundation for
future research. Subsequently, we present the heuristic algorithm Vectors
Retrieval with Similarity and Diversity, VRSD, which features a clear
optimization objective and eliminates the need for preset parameters. VRSD also
achieves a modest reduction in time complexity compared to MMR. Empirical
validation confirms that VRSD significantly outperforms MMR across various
datasets.
[LINK]
http://arxiv.org/abs/2407.04573v2
[DATE]
2024-11-15 02:01:10+08:00
[CATEGORIES]
cs.CL
Value Residual Learning For Alleviating Attention Concentration In Transformers
[AUTHORS]
Zhanchao Zhou, Tianyi Wu, Zhiyun Jiang, Zhenzhong Lan
[ABSTRACT]
Transformers can capture long-range dependencies using self-attention,
allowing tokens to attend to all others directly. However, stacking multiple
attention layers leads to attention concentration. One natural way to address
this issue is to use cross-layer attention, allowing information from earlier
layers to be directly accessible to later layers. However, this approach is
computationally expensive. To address this problem, we propose Transformer with
residual value (ResFormer) which approximates cross-layer attention through
adding a residual connection from the values of the the first layer to all
subsequent layers. Based on this method, one variant is the Transformer with
single layer value (SVFormer), where all layers share the same value embedding
from first layer, reducing the $KV$ cache by nearly 50\%. Comprehensive
empirical evidence demonstrates that ResFormer mitigates attention
concentration problem in deeper layers and enhances representation across most
layers, outperforming the vanilla Transformer, DenseFormer, and NeuTRENO in
training error as well as downstream tasks. Further visualization results
suggest that Resformer alleviates attention sinks through avoiding value-state
drains. SVFormer trains significantly faster than the vanilla Transformer and
performs better than other methods like GQA and CLA, with performance
influenced by sequence length and cumulative learning rate.
[LINK]
http://arxiv.org/abs/2410.17897v2
[DATE]
2024-11-15 01:46:04+08:00
[CATEGORIES]
cs.CL
PTR: Precision-Driven Tool Recommendation for Large Language Models
[AUTHORS]
Hang Gao, Yongfeng Zhang
[ABSTRACT]
By augmenting Large Language Models (LLMs) with external tools, their
capacity to solve complex problems has been significantly enhanced. However,
despite ongoing advancements in the parsing capabilities of LLMs, incorporating
all available tools simultaneously in the prompt remains impractical due to the
vast number of external tools. Consequently, it is essential to provide LLMs
with a precise set of tools tailored to the specific task, considering both
quantity and quality. Current tool retrieval methods primarily focus on
refining the ranking list of tools and directly packaging a fixed number of
top-ranked tools as the tool set. However, these approaches often fail to equip
LLMs with the optimal set of tools prior to execution, since the optimal number
of tools for different tasks could be different, resulting in inefficiencies
such as redundant or unsuitable tools, which impede immediate access to the
most relevant tools. This paper addresses the challenge of recommending precise
toolsets for LLMs. We introduce the problem of tool recommendation, define its
scope, and propose a novel Precision-driven Tool Recommendation (PTR) approach.
PTR captures an initial, concise set of tools by leveraging historical tool
bundle usage and dynamically adjusts the tool set by performing tool matching,
culminating in a multi-view-based tool addition. Additionally, we present a new
dataset, RecTools, and a metric, TRACC, designed to evaluate the effectiveness
of tool recommendation for LLMs. We further validate our design choices through
comprehensive experiments, demonstrating promising accuracy across two open
benchmarks and our RecTools dataset.
[LINK]
http://arxiv.org/abs/2411.09613v1
[DATE]
2024-11-15 01:33:36+08:00
[CATEGORIES]
cs.CL
Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
[AUTHORS]
Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin
[ABSTRACT]
This report provides an initial look at partial results from the TREC 2024
Retrieval-Augmented Generation (RAG) Track. We have identified RAG evaluation
as a barrier to continued progress in information access (and more broadly,
natural language processing and artificial intelligence), and it is our hope
that we can contribute to tackling the many challenges in this space. The
central hypothesis we explore in this work is that the nugget evaluation
methodology, originally developed for the TREC Question Answering Track in
2003, provides a solid foundation for evaluating RAG systems. As such, our
efforts have focused on “refactoring” this methodology, specifically applying
large language models to both automatically create nuggets and to automatically
assign nuggets to system answers. We call this the AutoNuggetizer framework.
Within the TREC setup, we are able to calibrate our fully automatic process
against a manual process whereby nuggets are created by human assessors
semi-manually and then assigned manually to system answers. Based on initial
results across 21 topics from 45 runs, we observe a strong correlation between
scores derived from a fully automatic nugget evaluation and a (mostly) manual
nugget evaluation by human assessors. This suggests that our fully automatic
evaluation process can be used to guide future iterations of RAG systems.
[LINK]
http://arxiv.org/abs/2411.09607v1
[DATE]
2024-11-15 01:25:43+08:00
[CATEGORIES]
cs.CL
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
[AUTHORS]
Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, Xiaohui Zeng
[ABSTRACT]
This work explores expanding the capabilities of large language models (LLMs)
pretrained on text to generate 3D meshes within a unified model. This offers
key advantages of (1) leveraging spatial knowledge already embedded in LLMs,
derived from textual sources like 3D tutorials, and (2) enabling conversational
3D generation and mesh understanding. A primary challenge is effectively
tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly.
To address this, we introduce LLaMA-Mesh, a novel approach that represents the
vertex coordinates and face definitions of 3D meshes as plain text, allowing
direct integration with LLMs without expanding the vocabulary. We construct a
supervised fine-tuning (SFT) dataset enabling pretrained LLMs to (1) generate
3D meshes from text prompts, (2) produce interleaved text and 3D mesh outputs
as required, and (3) understand and interpret 3D meshes. Our work is the first
to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge
for 3D mesh generation in a text-based format, effectively unifying the 3D and
text modalities. LLaMA-Mesh achieves mesh generation quality on par with models
trained from scratch while maintaining strong text generation performance.
[COMMENTS]
See the project website at
https://research.nvidia.com/labs/toronto-ai/LLaMA-Mesh/
[LINK]
http://arxiv.org/abs/2411.09595v1
[DATE]
2024-11-15 01:08:23+08:00
[CATEGORIES]
cs.LG
cs.CL
BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
[AUTHORS]
Akari Haga, Akiyo Fukatsu, Miyu Oba, Arianna Bisazza, Yohei Oseki
[ABSTRACT]
While current large language models have achieved a remarkable success, their
data efficiency remains a challenge to overcome. Recently it has been suggested
that child-directed speech (CDS) can improve training data efficiency of modern
language models based on Transformer neural networks. However, it is not yet
understood which specific properties of CDS are effective for training these
models. In the context of the BabyLM Challenge, we focus on Variation Sets
(VSs), sets of consecutive utterances expressing a similar intent with slightly
different words and structures, which are ubiquitous in CDS. To assess the
impact of VSs on training data efficiency, we augment CDS data with different
proportions of artificial VSs and use these datasets to train an
auto-regressive model, GPT-2. We find that the best proportion of VSs depends
on the evaluation benchmark: BLiMP and GLUE scores benefit from the presence of
VSs, but EWOK scores do not. Additionally, the results vary depending on
multiple factors such as the number of epochs and the order of utterance
presentation. Taken together, these findings suggest that VSs can have a
beneficial influence on language models, while leaving room for further
investigation.
[COMMENTS]
This paper accepted BabyLM challenge 2024 at CONLL 2024
[LINK]
http://arxiv.org/abs/2411.09587v1
[DATE]
2024-11-15 00:57:46+08:00
[CATEGORIES]
cs.CL
RETR: Multi-View Radar Detection Transformer for Indoor Perception
[AUTHORS]
Ryoma Yataka, Adriano Cardace, Pu Perry Wang, Petros Boufounos, Ryuhei Takahashi
[ABSTRACT]
Indoor radar perception has seen rising interest due to affordable costs
driven by emerging automotive imaging radar developments and the benefits of
reduced privacy concerns and reliability under hazardous conditions (e.g., fire
and smoke). However, existing radar perception pipelines fail to account for
distinctive characteristics of the multi-view radar setting. In this paper, we
propose Radar dEtection TRansformer (RETR), an extension of the popular DETR
architecture, tailored for multi-view radar perception. RETR inherits the
advantages of DETR, eliminating the need for hand-crafted components for object
detection and segmentation in the image plane. More importantly, RETR
incorporates carefully designed modifications such as 1) depth-prioritized
feature similarity via a tunable positional encoding (TPE); 2) a tri-plane loss
from both radar and camera coordinates; and 3) a learnable radar-to-camera
transformation via reparameterization, to account for the unique multi-view
radar setting. Evaluated on two indoor radar perception datasets, our approach
outperforms existing state-of-the-art methods by a margin of 15.38+ AP for
object detection and 11.77+ IoU for instance segmentation, respectively.
[COMMENTS]
24 pages, Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.10293v1
[DATE]
2024-11-15 23:51:25+08:00
[CATEGORIES]
cs.LG
Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing
[AUTHORS]
Ismail Erbas, Aporva Amarnath, Vikas Pandey, Karthik Swaminathan, Naigang Wang, Xavier Intes
[ABSTRACT]
Fluorescence lifetime imaging (FLI) is a widely used technique in the
biomedical field for measuring the decay times of fluorescent molecules,
providing insights into metabolic states, protein interactions, and
ligand-receptor bindings. However, its broader application in fast biological
processes, such as dynamic activity monitoring, and clinical use, such as in
guided surgery, is limited by long data acquisition times and computationally
demanding data processing. While deep learning has reduced post-processing
times, time-resolved data acquisition remains a bottleneck for real-time
applications. To address this, we propose a method to achieve real-time FLI
using an FPGA-based hardware accelerator. Specifically, we implemented a
GRU-based sequence-to-sequence (Seq2Seq) model on an FPGA board compatible with
time-resolved cameras. The GRU model balances accurate processing with the
resource constraints of FPGAs, which have limited DSP units and BRAM. The
limited memory and computational resources on the FPGA require efficient
scheduling of operations and memory allocation to deploy deep learning models
for low-latency applications. We address these challenges by using STOMP, a
queue-based discrete-event simulator that automates and optimizes task
scheduling and memory management on hardware. By integrating a GRU-based
Seq2Seq model and its compressed version, called Seq2SeqLite, generated through
knowledge distillation, we were able to process multiple pixels in parallel,
reducing latency compared to sequential processing. We explore various levels
of parallelism to achieve an optimal balance between performance and resource
utilization. Our results indicate that the proposed techniques achieved a 17.7x
and 52.0x speedup over manual scheduling for the Seq2Seq model and the
Seq2SeqLite model, respectively.
[COMMENTS]
7 pages, 6 figures
[LINK]
http://arxiv.org/abs/2410.07364v2
[DATE]
2024-11-15 23:46:00+08:00
[CATEGORIES]
cs.LG
Harnessing Machine Learning for Single-Shot Measurement of Free Electron Laser Pulse Power
[AUTHORS]
Till Korten, Vladimir Rybnikov, Mathias Vogt, Juliane Roensch-Schulenburg, Peter Steinbach, Najmeh Mirian
[ABSTRACT]
Electron beam accelerators are essential in many scientific and technological
fields. Their operation relies heavily on the stability and precision of the
electron beam. Traditional diagnostic techniques encounter difficulties in
addressing the complex and dynamic nature of electron beams. Particularly in
the context of free-electron lasers (FELs), it is fundamentally impossible to
measure the lasing-on and lasingoff electron power profiles for a single
electron bunch. This is a crucial hurdle in the exact reconstruction of the
photon pulse profile. To overcome this hurdle, we developed a machine learning
model that predicts the temporal power profile of the electron bunch in the
lasing-off regime using machine parameters that can be obtained when lasing is
on. The model was statistically validated and showed superior predictions
compared to the state-of-the-art batch calibrations. The work we present here
is a critical element for a virtual pulse reconstruction diagnostic (VPRD) tool
designed to reconstruct the power profile of individual photon pulses without
requiring repeated measurements in the lasing-off regime. This promises to
significantly enhance the diagnostic capabilities in FELs at large.
[COMMENTS]
10 pages, 4 figures, Machine Learning and the Physical Sciences
Workshop, NeurIPS 2024 https://neurips.cc/virtual/2024/100009
[LINK]
http://arxiv.org/abs/2411.09468v2
[DATE]
2024-11-15 23:38:17+08:00
[CATEGORIES]
cs.LG
Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation
[AUTHORS]
Tim Elsner, Paula Usinger, Julius Nehring-Wirxel, Gregor Kobsik, Victor Czech, Yanjiang He, Isaak Lim, Leif Kobbelt
[LINK]
http://arxiv.org/abs/2411.10281v1
[DATE]
2024-11-15 23:36:48+08:00
[CATEGORIES]
cs.LG
Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review
[AUTHORS]
Hossein Hassani, Roozbeh Razavi-Far, Mehrdad Saif, Liang Lin
[ABSTRACT]
Reinforcement learning (RL) is a sub-domain of machine learning, mainly
concerned with solving sequential decision-making problems by a learning agent
that interacts with the decision environment to improve its behavior through
the reward it receives from the environment. This learning paradigm is,
however, well-known for being time-consuming due to the necessity of collecting
a large amount of data, making RL suffer from sample inefficiency and difficult
generalization. Furthermore, the construction of an explicit reward function
that accounts for the trade-off between multiple desiderata of a decision
problem is often a laborious task. These challenges have been recently
addressed utilizing transfer and inverse reinforcement learning (T-IRL). In
this regard, this paper is devoted to a comprehensive review of realizing the
sample efficiency and generalization of RL algorithms through T-IRL. Following
a brief introduction to RL, the fundamental T-IRL methods are presented and the
most recent advancements in each research field have been extensively reviewed.
Our findings denote that a majority of recent research works have dealt with
the aforementioned challenges by utilizing human-in-the-loop and sim-to-real
strategies for the efficient transfer of knowledge from source domains to the
target domain under the transfer learning scheme. Under the IRL structure,
training schemes that require a low number of experience transitions and
extension of such frameworks to multi-agent and multi-intention problems have
been the priority of researchers in recent years.
[LINK]
http://arxiv.org/abs/2411.10268v1
[DATE]
2024-11-15 23:18:57+08:00
[CATEGORIES]
cs.LG
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion
[AUTHORS]
Zijun Long, George Killick, Lipeng Zhuang, Gerardo Aragon-Camarasa, Zaiqiao Meng, Richard Mccreadie
[ABSTRACT]
State-of-the-art pre-trained image models predominantly adopt a two-stage
approach: initial unsupervised pre-training on large-scale datasets followed by
task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been
demonstrated that CE can compromise model generalization and stability. While
recent works employing contrastive learning address some of these limitations
by enhancing the quality of embeddings and producing better decision
boundaries, they often overlook the importance of hard negative mining and rely
on resource intensive and slow training using large sample batches. To counter
these issues, we introduce a novel approach named CLCE, which integrates
Label-Aware Contrastive Learning with CE. Our approach not only maintains the
strengths of both loss functions but also leverages hard negative mining in a
synergistic way to enhance performance. Experimental results demonstrate that
CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks,
achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in
transfer learning settings with the BEiT-3 model. Importantly, our proposed
CLCE approach effectively mitigates the dependency of contrastive learning on
large batch sizes such as 4096 samples per batch, a limitation that has
previously constrained the application of contrastive learning in
budget-limited hardware environments.
[LINK]
http://arxiv.org/abs/2402.14551v2
[DATE]
2024-11-15 23:16:56+08:00
[CATEGORIES]
cs.LG
MDHP-Net: Detecting Injection Attacks on In-vehicle Network using Multi-Dimensional Hawkes Process and Temporal Model
[AUTHORS]
Qi Liu, Yanchen Liu, Ruifeng Li, Chenhong Cao, Yufeng Li, Xingyu Li, Peng Wang, Runhan Feng
[ABSTRACT]
The integration of intelligent and connected technologies in modern vehicles,
while offering enhanced functionalities through Electronic Control Unit and
interfaces like OBD-II and telematics, also exposes the vehicle’s in-vehicle
network (IVN) to potential cyberattacks. In this paper, we consider a specific
type of cyberattack known as the injection attack. As demonstrated by empirical
data from real-world cybersecurity adversarial competitions(available at
https://mimic2024.xctf.org.cn/race/qwmimic2024 ), these injection attacks have
excitation effect over time, gradually manipulating network traffic and
disrupting the vehicle’s normal functioning, ultimately compromising both its
stability and safety. To profile the abnormal behavior of attackers, we propose
a novel injection attack detector to extract long-term features of attack
behavior. Specifically, we first provide a theoretical analysis of modeling the
time-excitation effects of the attack using Multi-Dimensional Hawkes Process
(MDHP). A gradient descent solver specifically tailored for MDHP, MDHP-GDS, is
developed to accurately estimate optimal MDHP parameters. We then propose an
injection attack detector, MDHP-Net, which integrates optimal MDHP parameters
with MDHP-LSTM blocks to enhance temporal feature extraction. By introducing
MDHP parameters, MDHP-Net captures complex temporal features that standard Long
Short-Term Memory (LSTM) cannot, enriching temporal dependencies within our
customized structure. Extensive evaluations demonstrate the effectiveness of
our proposed detection approach.
[LINK]
http://arxiv.org/abs/2411.10258v1
[DATE]
2024-11-15 23:05:01+08:00
[CATEGORIES]
cs.LG
The Unreasonable Effectiveness of Guidance for Diffusion Models
[AUTHORS]
Tim Kaiser, Nikolas Adaloglou, Markus Kollmann
[ABSTRACT]
Guidance is an error-correcting technique used to improve the perceptual
quality of images generated by diffusion models. Typically, the correction is
achieved by linear extrapolation, using an auxiliary diffusion model that has
lower performance than the primary model. Using a 2D toy example, we show that
it is highly beneficial when the auxiliary model exhibits similar errors as the
primary one but stronger. We verify this finding in higher dimensions, where we
show that competitive generative performance to state-of-the-art guidance
methods can be achieved when the auxiliary model differs from the primary one
only by having stronger weight regularization. As an independent contribution,
we investigate whether upweighting long-range spatial dependencies improves
visual fidelity. The result is a novel guidance method, which we call sliding
window guidance (SWG), that guides the primary model with itself by
constraining its receptive field. Intriguingly, SWG aligns better with human
preferences than state-of-the-art guidance methods while requiring neither
training, architectural modifications, nor class conditioning. The code will be
released.
[COMMENTS]
Preprint. 19 pages, 14 figures in total, including references and
appendix
[LINK]
http://arxiv.org/abs/2411.10257v1
[DATE]
2024-11-15 23:04:04+08:00
[CATEGORIES]
cs.LG
Uncertainty in Supply Chain Digital Twins: A Quantum-Classical Hybrid Approach
[AUTHORS]
Abdullah Abdullah, Fannya Ratana Sandjaja, Ayesha Abdul Majeed, Gyan Wickremasinghe, Karen Rafferty, Vishal Sharma
[ABSTRACT]
This study investigates uncertainty quantification (UQ) using
quantum-classical hybrid machine learning (ML) models for applications in
complex and dynamic fields, such as attaining resiliency in supply chain
digital twins and financial risk assessment. Although quantum feature
transformations have been integrated into ML models for complex data tasks, a
gap exists in determining their impact on UQ within their hybrid architectures
(quantum-classical approach). This work applies existing UQ techniques for
different models within a hybrid framework, examining how quantum feature
transformation affects uncertainty propagation. Increasing qubits from 4 to 16
shows varied model responsiveness to outlier detection (OD) samples, which is a
critical factor for resilient decision-making in dynamic environments. This
work shows how quantum computing techniques can transform data features for UQ,
particularly when combined with traditional methods.
[LINK]
http://arxiv.org/abs/2411.10254v1
[DATE]
2024-11-15 23:02:35+08:00
[CATEGORIES]
cs.LG
Learning rheological parameters of non-Newtonian fluids from velocimetry data
[AUTHORS]
Alexandros Kontogiannis, Richard Hodgkinson, Emily L. Manchester
[ABSTRACT]
We solve a Bayesian inverse Navier-Stokes (N-S) problem that assimilates
velocimetry data in order to jointly reconstruct the flow field and learn the
unknown N-S parameters. By incorporating a Carreau shear-thinning viscosity
model into the N-S problem, we devise an algorithm that learns the most likely
Carreau parameters of a shear-thinning fluid, and estimates their
uncertainties, from velocimetry data alone. We then conduct a flow-MRI
experiment to obtain velocimetry data of an axisymmetric laminar jet through an
idealised medical device (FDA nozzle) for a blood analogue fluid. We show that
the algorithm can successfully reconstruct the flow field by learning the most
likely Carreau parameters, and that the learned parameters are in very good
agreement with rheometry measurements. The algorithm accepts any algebraic
effective viscosity model, as long as the model is differentiable, and it can
be extended to more complicated non-Newtonian fluids (e.g. Oldroyd-B fluid) if
a viscoelastic model is incorporated into the N-S problem.
[LINK]
http://arxiv.org/abs/2408.02604v2
[DATE]
2024-11-15 22:55:00+08:00
[CATEGORIES]
cs.LG
Efficient Neural Hybrid System Learning and Transition System Abstraction for Dynamical Systems
[AUTHORS]
Yejiang Yang, Zihao Mo, Weiming Xiang
[ABSTRACT]
This paper proposes a neural network hybrid modeling framework for dynamics
learning to promote an interpretable, computationally efficient way of dynamics
learning and system identification. First, a low-level model will be trained to
learn the system dynamics, which utilizes multiple simple neural networks to
approximate the local dynamics generated from data-driven partitions. Then,
based on the low-level model, a high-level model will be trained to abstract
the low-level neural hybrid system model into a transition system that allows
Computational Tree Logic Verification to promote the model’s ability with human
interaction and verification efficiency.
[LINK]
http://arxiv.org/abs/2411.10240v1
[DATE]
2024-11-15 22:53:34+08:00
[CATEGORIES]
cs.LG
GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization
[AUTHORS]
Yingzhou Lu, Minjie Shen, Ling Yue, Chenhao Li, Lulu Chen, Fan Meng, Xiao Wang, David Herrington, Yue Wang, Yue Zhao, Tianfan Fu, Capucine Van Rechem
[ABSTRACT]
The surge in high-throughput omics data has reshaped the landscape of
biological research, underlining the need for powerful, user-friendly data
analysis and interpretation tools. This paper presents GenoCraft, a web-based
comprehensive software solution designed to handle the entire pipeline of omics
data processing. GenoCraft offers a unified platform featuring advanced
bioinformatics tools, covering all aspects of omics data analysis. It
encompasses a range of functionalities, such as normalization, quality control,
differential analysis, network analysis, pathway analysis, and diverse
visualization techniques. This software makes state-of-the-art omics data
analysis more accessible to a wider range of users. With GenoCraft, researchers
and data scientists have access to an array of cutting-edge bioinformatics
tools under a user-friendly interface, making it a valuable resource for
managing and analyzing large-scale omics data. The API with an interactive web
interface is publicly available at https://genocraft.stanford. edu/. We also
release all the codes in https://github.com/futianfan/GenoCraft.
[LINK]
http://arxiv.org/abs/2312.14249v3
[DATE]
2024-11-15 22:49:21+08:00
[CATEGORIES]
cs.LG
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
[AUTHORS]
Sanath Budakegowdanadoddi Nagaraju, Brian Bernhard Moser, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel
[ABSTRACT]
Transformer-based Super-Resolution (SR) models have recently advanced image
reconstruction quality, yet challenges remain due to computational complexity
and an over-reliance on large patch sizes, which constrain fine-grained detail
enhancement. In this work, we propose TaylorIR to address these limitations by
utilizing a patch size of 1x1, enabling pixel-level processing in any
transformer-based SR model. To address the significant computational demands
under the traditional self-attention mechanism, we employ the TaylorShift
attention mechanism, a memory-efficient alternative based on Taylor series
expansion, achieving full token-to-token interactions with linear complexity.
Experimental results demonstrate that our approach achieves new
state-of-the-art SR performance while reducing memory consumption by up to 60%
compared to traditional self-attention-based transformers.
[LINK]
http://arxiv.org/abs/2411.10231v1
[DATE]
2024-11-15 22:43:58+08:00
[CATEGORIES]
cs.LG
RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively
[AUTHORS]
Yao Lu, Peixin Zhang, Jingyi Wang, Lei Ma, Xiaoniu Yang, Qi Xuan
[ABSTRACT]
Deep learning has revolutionized computing in many real-world applications,
arguably due to its remarkable performance and extreme convenience as an
end-to-end solution. However, deep learning models can be costly to train and
to use, especially for those large-scale models, making it necessary to
optimize the original overly complicated models into smaller ones in scenarios
with limited resources such as mobile applications or simply for resource
saving. The key question in such model optimization is, how can we effectively
identify and measure the redundancy in a deep learning model structure. While
several common metrics exist in the popular model optimization techniques to
measure the performance of models after optimization, they are not able to
quantitatively inform the degree of remaining redundancy. To address the
problem, we present a novel testing approach, i.e., RedTest, which proposes a
novel testing metric called Model Structural Redundancy Score (MSRS) to
quantitatively measure the degree of redundancy in a deep learning model
structure. We first show that MSRS is effective in both revealing and assessing
the redundancy issues in many state-of-the-art models, which urgently calls for
model optimization. Then, we utilize MSRS to assist deep learning model
developers in two practical application scenarios: 1) in Neural Architecture
Search, we design a novel redundancy-aware algorithm to guide the search for
the optimal model structure and demonstrate its effectiveness by comparing it
to existing standard NAS practice; 2) in the pruning of large-scale pre-trained
models, we prune the redundant layers of pre-trained models with the guidance
of layer similarity to derive less redundant ones of much smaller size.
Extensive experimental results demonstrate that removing such redundancy has a
negligible effect on the model utility.
[LINK]
http://arxiv.org/abs/2411.10507v1
[DATE]
2024-11-15 22:36:07+08:00
[CATEGORIES]
cs.LG
A Survey on State-of-the-art Deep Learning Applications and Challenges
[AUTHORS]
Mohd Halim Mohd Noor, Ayokunle Olalekan Ige
[ABSTRACT]
Deep learning, a branch of artificial intelligence, is a data-driven method
that uses multiple layers of interconnected units (neurons) to learn intricate
patterns and representations directly from raw input data. Empowered by this
learning capability, it has become a powerful tool for solving complex problems
and is the core driver of many groundbreaking technologies and innovations.
Building a deep learning model is challenging due to the algorithm’s complexity
and the dynamic nature of real-world problems. Several studies have reviewed
deep learning concepts and applications. However, the studies mostly focused on
the types of deep learning models and convolutional neural network
architectures, offering limited coverage of the state-of-the-art deep learning
models and their applications in solving complex problems across different
domains. Therefore, motivated by the limitations, this study aims to
comprehensively review the state-of-the-art deep learning models in computer
vision, natural language processing, time series analysis and pervasive
computing. We highlight the key features of the models and their effectiveness
in solving the problems within each domain. Furthermore, this study presents
the fundamentals of deep learning, various deep learning model types and
prominent convolutional neural network architectures. Finally, challenges and
future directions in deep learning research are discussed to offer a broader
perspective for future researchers.
[LINK]
http://arxiv.org/abs/2403.17561v5
[DATE]
2024-11-15 22:30:43+08:00
[CATEGORIES]
cs.LG
Supra-Laplacian Encoding for Transformer on Dynamic Graphs
[AUTHORS]
Yannis Karmim, Marc Lafon, Raphael Fournier S’niehotta, Nicolas Thome
[ABSTRACT]
Fully connected Graph Transformers (GT) have rapidly become prominent in the
static graph community as an alternative to Message-Passing models, which
suffer from a lack of expressivity, oversquashing, and under-reaching. However,
in a dynamic context, by interconnecting all nodes at multiple snapshots with
self-attention, GT loose both structural and temporal information. In this
work, we introduce Supra-LAplacian encoding for spatio-temporal TransformErs
(SLATE), a new spatio-temporal encoding to leverage the GT architecture while
keeping spatio-temporal information. Specifically, we transform Discrete Time
Dynamic Graphs into multi-layer graphs and take advantage of the spectral
properties of their associated supra-Laplacian matrix. Our second contribution
explicitly model nodes’ pairwise relationships with a cross-attention
mechanism, providing an accurate edge representation for dynamic link
prediction. SLATE outperforms numerous state-of-the-art methods based on
Message-Passing Graph Neural Networks combined with recurrent models (e.g
LSTM), and Dynamic Graph Transformers, on 9 datasets. Code is available at:
github.com/ykrmm/SLATE.
[LINK]
http://arxiv.org/abs/2409.17986v2
[DATE]
2024-11-15 22:22:00+08:00
[CATEGORIES]
cs.LG
Embedding Byzantine Fault Tolerance into Federated Learning via Virtual Data-Driven Consistency Scoring Plugin
[AUTHORS]
Youngjoon Lee, Jinu Gong, Joonhyuk Kang
[ABSTRACT]
Given sufficient data from multiple edge devices, federated learning (FL)
enables training a shared model without transmitting private data to a central
server. However, FL is generally vulnerable to Byzantine attacks from
compromised edge devices, which can significantly degrade the model
performance. In this paper, we propose a intuitive plugin that can be
integrated into existing FL techniques to achieve Byzantine-Resilience. Key
idea is to generate virtual data samples and evaluate model consistency scores
across local updates to effectively filter out compromised edge devices. By
utilizing this scoring mechanism before the aggregation phase, the proposed
plugin enables existing FL techniques to become robust against Byzantine
attacks while maintaining their original benefits. Numerical results on medical
image classification task validate that plugging the proposed approach into
representative FL algorithms, effectively achieves Byzantine resilience.
Furthermore, the proposed plugin maintains the original convergence properties
of the base FL algorithms when no Byzantine attacks are present.
[COMMENTS]
7 pages
[LINK]
http://arxiv.org/abs/2411.10212v1
[DATE]
2024-11-15 22:17:19+08:00
[CATEGORIES]
cs.LG
Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport
[AUTHORS]
Michael Wilson, Tom Needham, Anuj Srivastava
[ABSTRACT]
Wasserstein distances form a family of metrics on spaces of probability
measures that have recently seen many applications. However, statistical
analysis in these spaces is complex due to the nonlinearity of Wasserstein
spaces. One potential solution to this problem is Linear Optimal Transport
(LOT). This method allows one to find a Euclidean embedding, called LOT
embedding, of measures in some Wasserstein spaces, but some information is lost
in this embedding. So, to understand whether statistical analysis relying on
LOT embeddings can make valid inferences about original data, it is helpful to
quantify how well these embeddings describe that data. To answer this question,
we present a decomposition of the Fr'echet variance of a set of measures in
the 2-Wasserstein space, which allows one to compute the percentage of variance
explained by LOT embeddings of those measures. We then extend this
decomposition to the Fused Gromov-Wasserstein setting. We also present several
experiments that explore the relationship between the dimension of the LOT
embedding, the percentage of variance explained by the embedding, and the
classification accuracy of machine learning classifiers built on the embedded
data. We use the MNIST handwritten digits dataset, IMDB-50000 dataset, and
Diffusion Tensor MRI images for these experiments. Our results illustrate the
effectiveness of low dimensional LOT embeddings in terms of the percentage of
variance explained and the classification accuracy of models built on the
embedded data.
[LINK]
http://arxiv.org/abs/2411.10204v1
[DATE]
2024-11-15 22:10:52+08:00
[CATEGORIES]
cs.LG
DEEP-IoT: Downlink-Enhanced Efficient-Power Internet of Things
[AUTHORS]
Yulin Shao
[ABSTRACT]
At the heart of the Internet of Things (IoT) – a domain witnessing explosive
growth – the imperative for energy efficiency and the extension of device
lifespans has never been more pressing. This paper presents DEEP-IoT, an
innovative communication paradigm poised to redefine how IoT devices
communicate. Through a pioneering feedback channel coding strategy, DEEP-IoT
challenges and transforms the traditional transmitter (IoT devices)-centric
communication model to one where the receiver (the access point) play a pivotal
role, thereby cutting down energy use and boosting device longevity. We not
only conceptualize DEEP-IoT but also actualize it by integrating deep
learning-enhanced feedback channel codes within a narrow-band system.
Simulation results show a significant enhancement in the operational lifespan
of IoT cells – surpassing traditional systems using Turbo and Polar codes by
up to 52.71%. This leap signifies a paradigm shift in IoT communications,
setting the stage for a future where IoT devices boast unprecedented efficiency
and durability.
[LINK]
http://arxiv.org/abs/2403.00321v3
[DATE]
2024-11-15 21:42:28+08:00
[CATEGORIES]
cs.LG
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
[AUTHORS]
Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn
[ABSTRACT]
The paper concerns the $d$-dimensional stochastic approximation recursion, $$
\theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ \{
\Phi_n \}$ is a stochastic process on a general state space, satisfying a
conditional Markov property that allows for parameter-dependent noise. The main
results are established under additional conditions on the mean flow and a
version of the Donsker-Varadhan Lyapunov drift condition known as (DV3):
(i) An appropriate Lyapunov function is constructed that implies convergence
of the estimates in $L_4$.
(ii) A functional central limit theorem (CLT) is established, as well as the
usual one-dimensional CLT for the normalized error. Moment bounds combined with
the CLT imply convergence of the normalized covariance $\textsf{E}[ z_n z_n^T
]$ to the asymptotic covariance in the CLT, where $z_n =:
(\theta_n-\theta^)/\sqrt{\alpha_n}$.
(iii) The CLT holds for the normalized version $z^{\text{PR}}_n =: \sqrt{n}
[\theta^{\text{PR}}_n -\theta^]$, of the averaged parameters
$\theta^{\text{PR}}n =:n^{-1} \sum{k=1}^n\theta_k$, subject to standard
assumptions on the step-size. Moreover, the covariance in the CLT coincides
with the minimal covariance of Polyak and Ruppert.
(iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and
$\Phi$ is a geometrically ergodic Markov chain but does not satisfy (DV3).
While the algorithm is convergent, the second moment of $\theta_n$ is unbounded
and in fact diverges.
This arXiv version represents a major extension of the results in prior
versions.The main results now allow for parameter-dependent noise, as is often
the case in applications to reinforcement learning.
[COMMENTS]
This arXiv version represents a major extension of the results in
prior versions.The main results now allow for parameter-dependent noise, as
is often the case in applications to reinforcement learning. 2 figures
[LINK]
http://arxiv.org/abs/2110.14427v6
[DATE]
2024-11-15 21:34:34+08:00
[CATEGORIES]
cs.LG
CART: Compositional Auto-Regressive Transformer for Image Generation
[AUTHORS]
Siddharth Roheda
[ABSTRACT]
In recent years, image synthesis has achieved remarkable advancements,
enabling diverse applications in content creation, virtual reality, and beyond.
We introduce a novel approach to image generation using Auto-Regressive (AR)
modeling, which leverages a next-detail prediction strategy for enhanced
fidelity and scalability. While AR models have achieved transformative success
in language modeling, replicating this success in vision tasks has presented
unique challenges due to the inherent spatial dependencies in images. Our
proposed method addresses these challenges by iteratively adding finer details
to an image compositionally, constructing it as a hierarchical combination of
base and detail image factors. This strategy is shown to be more effective than
the conventional next-token prediction and even surpasses the state-of-the-art
next-scale prediction approaches. A key advantage of this method is its
scalability to higher resolutions without requiring full model retraining,
making it a versatile solution for high-resolution image generation.
[COMMENTS]
under review at CVPR 2025
[LINK]
http://arxiv.org/abs/2411.10180v1
[DATE]
2024-11-15 21:29:44+08:00
[CATEGORIES]
cs.LG
The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning
[AUTHORS]
Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Joschka Boedecker
[ABSTRACT]
Visual Reinforcement Learning (RL) methods often require extensive amounts of
data. As opposed to model-free RL, model-based RL (MBRL) offers a potential
solution with efficient data utilization through planning. Additionally, RL
lacks generalization capabilities for real-world tasks. Prior work has shown
that incorporating pre-trained visual representations (PVRs) enhances sample
efficiency and generalization. While PVRs have been extensively studied in the
context of model-free RL, their potential in MBRL remains largely unexplored.
In this paper, we benchmark a set of PVRs on challenging control tasks in a
model-based RL setting. We investigate the data efficiency, generalization
capabilities, and the impact of different properties of PVRs on the performance
of model-based agents. Our results, perhaps surprisingly, reveal that for MBRL
current PVRs are not more sample efficient than learning representations from
scratch, and that they do not generalize better to out-of-distribution (OOD)
settings. To explain this, we analyze the quality of the trained dynamics
model. Furthermore, we show that data diversity and network architecture are
the most important contributors to OOD generalization performance.
[COMMENTS]
Published at the 38th Conference on Neural Information Processing
Systems (NeurIPS 2024). Project page: https://schneimo.com/pvr4mbrl/
[LINK]
http://arxiv.org/abs/2411.10175v1
[DATE]
2024-11-15 21:21:26+08:00
[CATEGORIES]
cs.LG
Continuous Bayesian Model Selection for Multivariate Causal Discovery
[AUTHORS]
Anish Dhir, Ruby Sedgwick, Avinash Kori, Ben Glocker, Mark van der Wilk
[ABSTRACT]
Current causal discovery approaches require restrictive model assumptions or
assume access to interventional data to ensure structure identifiability. These
assumptions often do not hold in real-world applications leading to a loss of
guarantees and poor accuracy in practice. Recent work has shown that, in the
bivariate case, Bayesian model selection can greatly improve accuracy by
exchanging restrictive modelling for more flexible assumptions, at the cost of
a small probability of error. We extend the Bayesian model selection approach
to the important multivariate setting by making the large discrete selection
problem scalable through a continuous relaxation. We demonstrate how for our
choice of Bayesian non-parametric model, the Causal Gaussian Process
Conditional Density Estimator (CGP-CDE), an adjacency matrix can be constructed
from the model hyperparameters. This adjacency matrix is then optimised using
the marginal likelihood and an acyclicity regulariser, outputting the maximum a
posteriori causal graph. We demonstrate the competitiveness of our approach on
both synthetic and real-world datasets, showing it is possible to perform
multivariate causal discovery without infeasible assumptions using Bayesian
model selection.
[LINK]
http://arxiv.org/abs/2411.10154v1
[DATE]
2024-11-15 20:55:05+08:00
[CATEGORIES]
cs.LG
DaYu: Data-Driven Model for Geostationary Satellite Observed Cloud Images Forecasting
[AUTHORS]
Xujun Wei, Feng Zhang, Renhe Zhang, Wenwen Li, Cuiping Liu, Bin Guo, Jingwei Li, Haoyang Fu, Xu Tang
[ABSTRACT]
In the past few years, Artificial Intelligence (AI)-based weather forecasting
methods have widely demonstrated strong competitiveness among the weather
forecasting systems. However, these methods are insufficient for
high-spatial-resolution short-term nowcasting within 6 hours, which is crucial
for warning short-duration, mesoscale and small-scale weather events.
Geostationary satellite remote sensing provides detailed, high spatio-temporal
and all-day observations, which can address the above limitations of existing
methods. Therefore, this paper proposed an advanced data-driven thermal
infrared cloud images forecasting model, “DaYu.” Unlike existing data-driven
weather forecasting models, DaYu is specifically designed for geostationary
satellite observations, with a temporal resolution of 0.5 hours and a spatial
resolution of ${0.05}^\circ$ $\times$ ${0.05}^\circ$. DaYu is based on a
large-scale transformer architecture, which enables it to capture fine-grained
cloud structures and learn fast-changing spatio-temporal evolution features
effectively. Moreover, its attention mechanism design achieves a balance in
computational complexity, making it practical for applications. DaYu not only
achieves accurate forecasts up to 3 hours with a correlation coefficient higher
than 0.9, 6 hours higher than 0.8, and 12 hours higher than 0.7, but also
detects short-duration, mesoscale, and small-scale weather events with enhanced
detail, effectively addressing the shortcomings of existing methods in
providing detailed short-term nowcasting within 6 hours. Furthermore, DaYu has
significant potential in short-term climate disaster prevention and mitigation.
[LINK]
http://arxiv.org/abs/2411.10144v1
[DATE]
2024-11-15 20:36:01+08:00
[CATEGORIES]
cs.LG
Arithmetical Binary Decision Tree Traversals
[AUTHORS]
Jinxiong Zhang
[ABSTRACT]
This paper introduces a series of methods for traversing binary decision
trees using arithmetic operations. We present a suite of binary tree traversal
algorithms that leverage novel representation matrices to flatten the full
binary tree structure and embed the aggregated internal node Boolean tests into
a single binary vector. Our approach, grounded in maximum inner product search,
offers new insights into decision tree.
[COMMENTS]
Correct some citation format and typoes
[LINK]
http://arxiv.org/abs/2209.04825v8
[DATE]
2024-11-15 20:29:07+08:00
[CATEGORIES]
cs.LG
FGCE: Feasible Group Counterfactual Explanations for Auditing Fairness
[AUTHORS]
Christos Fragkathoulas, Vasiliki Papanikou, Evaggelia Pitoura, Evimaria Terzi
[ABSTRACT]
This paper introduces the first graph-based framework for generating group
counterfactual explanations to audit model fairness, a crucial aspect of
trustworthy machine learning. Counterfactual explanations are instrumental in
understanding and mitigating unfairness by revealing how inputs should change
to achieve a desired outcome. Our framework, named Feasible Group
Counterfactual Explanations (FGCEs), captures real-world feasibility
constraints and constructs subgroups with similar counterfactuals, setting it
apart from existing methods. It also addresses key trade-offs in counterfactual
generation, including the balance between the number of counterfactuals, their
associated costs, and the breadth of coverage achieved. To evaluate these
trade-offs and assess fairness, we propose measures tailored to group
counterfactual generation. Our experimental results on benchmark datasets
demonstrate the effectiveness of our approach in managing feasibility
constraints and trade-offs, as well as the potential of our proposed metrics in
identifying and quantifying fairness issues.
[LINK]
http://arxiv.org/abs/2410.22591v2
[DATE]
2024-11-15 20:02:15+08:00
[CATEGORIES]
cs.LG
On the Universal Statistical Consistency of Expansive Hyperbolic Deep Convolutional Neural Networks
[AUTHORS]
Sagar Ghosh, Kushal Bose, Swagatam Das
[ABSTRACT]
The emergence of Deep Convolutional Neural Networks (DCNNs) has been a
pervasive tool for accomplishing widespread applications in computer vision.
Despite its potential capability to capture intricate patterns inside the data,
the underlying embedding space remains Euclidean and primarily pursues
contractive convolution. Several instances can serve as a precedent for the
exacerbating performance of DCNNs. The recent advancement of neural networks in
the hyperbolic spaces gained traction, incentivizing the development of
convolutional deep neural networks in the hyperbolic space. In this work, we
propose Hyperbolic DCNN based on the Poincar'{e} Disc. The work predominantly
revolves around analyzing the nature of expansive convolution in the context of
the non-Euclidean domain. We further offer extensive theoretical insights
pertaining to the universal consistency of the expansive convolution in the
hyperbolic space. Several simulations were performed not only on the synthetic
datasets but also on some real-world datasets. The experimental results reveal
that the hyperbolic convolutional architecture outperforms the Euclidean ones
by a commendable margin.
[LINK]
http://arxiv.org/abs/2411.10128v1
[DATE]
2024-11-15 20:01:03+08:00
[CATEGORIES]
cs.LG
Adversarial Robustness of VAEs across Intersectional Subgroups
[AUTHORS]
Chethan Krishnamurthy Ramanaik, Arjun Roy, Eirini Ntoutsi
[ABSTRACT]
Despite advancements in Autoencoders (AEs) for tasks like dimensionality
reduction, representation learning and data generation, they remain vulnerable
to adversarial attacks. Variational Autoencoders (VAEs), with their
probabilistic approach to disentangling latent spaces, show stronger resistance
to such perturbations compared to deterministic AEs; however, their resilience
against adversarial inputs is still a concern. This study evaluates the
robustness of VAEs against non-targeted adversarial attacks by optimizing
minimal sample-specific perturbations to cause maximal damage across diverse
demographic subgroups (combinations of age and gender). We investigate two
questions: whether there are robustness disparities among subgroups, and what
factors contribute to these disparities, such as data scarcity and
representation entanglement. Our findings reveal that robustness disparities
exist but are not always correlated with the size of the subgroup. By using
downstream gender and age classifiers and examining latent embeddings, we
highlight the vulnerability of subgroups like older women, who are prone to
misclassification due to adversarial perturbations pushing their
representations toward those of other subgroups.
[LINK]
http://arxiv.org/abs/2407.03864v2
[DATE]
2024-11-15 19:51:10+08:00
[CATEGORIES]
cs.LG
Energy-GNoME: A Living Database of Selected Materials for Energy Applications
[AUTHORS]
Paolo De Angelis, Giovanni Trezza, Giulio Barletta, Pietro Asinari, Eliodoro Chiavazzo
[ABSTRACT]
Artificial Intelligence (AI) in materials science is driving significant
advancements in the discovery of advanced materials for energy applications.
The recent GNoME protocol identifies over 380,000 novel stable crystals. From
this, we identify over 33,000 materials with potential as energy materials
forming the Energy-GNoME database. Leveraging Machine Learning (ML) and Deep
Learning (DL) tools, our protocol mitigates cross-domain data bias using
feature spaces to identify potential candidates for thermoelectric materials,
novel battery cathodes, and novel perovskites. Classifiers with both structural
and compositional features identify domains of applicability, where we expect
enhanced accuracy of the regressors. Such regressors are trained to predict key
materials properties like, thermoelectric figure of merit (zT), band gap (Eg),
and cathode voltage ($\Delta V_c$). This method significantly narrows the pool
of potential candidates, serving as an efficient guide for experimental and
computational chemistry investigations and accelerating the discovery of
materials suited for electricity generation, energy storage and conversion.
[COMMENTS]
60 pages, 16 figures
[LINK]
http://arxiv.org/abs/2411.10125v1
[DATE]
2024-11-15 19:48:14+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks Do Not Always Oversmooth
[AUTHORS]
Bastian Epping, Alexandre René, Moritz Helias, Michael T. Schaub
[ABSTRACT]
Graph neural networks (GNNs) have emerged as powerful tools for processing
relational data in applications. However, GNNs suffer from the problem of
oversmoothing, the property that the features of all nodes exponentially
converge to the same vector over layers, prohibiting the design of deep GNNs.
In this work we study oversmoothing in graph convolutional networks (GCNs) by
using their Gaussian process (GP) equivalence in the limit of infinitely many
hidden features. By generalizing methods from conventional deep neural networks
(DNNs), we can describe the distribution of features at the output layer of
deep GCNs in terms of a GP: as expected, we find that typical parameter choices
from the literature lead to oversmoothing. The theory, however, allows us to
identify a new, non-oversmoothing phase: if the initial weights of the network
have sufficiently large variance, GCNs do not oversmooth, and node features
remain informative even at large depth. We demonstrate the validity of this
prediction in finite-size GCNs by training a linear classifier on their output.
Moreover, using the linearization of the GCN GP, we generalize the concept of
propagation depth of information from DNNs to GCNs. This propagation depth
diverges at the transition between the oversmoothing and non-oversmoothing
phase. We test the predictions of our approach and find good agreement with
finite-size GCNs. Initializing GCNs near the transition to the
non-oversmoothing phase, we obtain networks which are both deep and expressive.
[LINK]
http://arxiv.org/abs/2406.02269v2
[DATE]
2024-11-15 19:41:56+08:00
[CATEGORIES]
cs.LG
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
[AUTHORS]
Mathis Koroglu, Hugo Caselles-Dupré, Guillaume Jeanneret Sanmiguel, Matthieu Cord
[ABSTRACT]
We consider the problem of text-to-video generation tasks with precise
control for various applications such as camera movement control and
video-to-video editing. Most methods tacking this problem rely on providing
user-defined controls, such as binary masks or camera movement embeddings. In
our approach we propose OnlyFlow, an approach leveraging the optical flow
firstly extracted from an input video to condition the motion of generated
videos. Using a text prompt and an input video, OnlyFlow allows the user to
generate videos that respect the motion of the input video as well as the text
prompt. This is implemented through an optical flow estimation model applied on
the input video, which is then fed to a trainable optical flow encoder. The
output feature maps are then injected into the text-to-video backbone model. We
perform quantitative, qualitative and user preference studies to show that
OnlyFlow positively compares to state-of-the-art methods on a wide range of
tasks, even though OnlyFlow was not specifically trained for such tasks.
OnlyFlow thus constitutes a versatile, lightweight yet efficient method for
controlling motion in text-to-video generation. Models and code will be made
available on GitHub and HuggingFace.
[LINK]
http://arxiv.org/abs/2411.10501v1
[DATE]
2024-11-15 19:19:25+08:00
[CATEGORIES]
cs.LG
Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces
[AUTHORS]
Dongnan Jin, Yali Liu, Qiuzhi Song, Xunju Ma, Yue Liu, Dehao Wu
[ABSTRACT]
To effectively search for the optimal motion template in dynamic
multidimensional space, this paper proposes a novel optimization algorithm,
Dynamic Dimension Wrapping (DDW).The algorithm combines Dynamic Time Warping
(DTW) and Euclidean distance, and designs a fitness function that adapts to
dynamic multidimensional space by establishing a time-data chain mapping across
dimensions. This paper also proposes a novel update mechanism,Optimal Dimension
Collection (ODC), combined with the search strategy of traditional optimization
algorithms, enables DDW to adjust both the dimension values and the number of
dimensions of the population individuals simultaneously. In this way, DDW
significantly reduces computational complexity and improves search accuracy.
Experimental results show that DDW performs excellently in dynamic
multidimensional space, outperforming 31 traditional optimization algorithms.
This algorithm provides a novel approach to solving dynamic multidimensional
optimization problems and demonstrates broad application potential in fields
such as motion data analysis.
[LINK]
http://arxiv.org/abs/2407.11626v3
[DATE]
2024-11-15 19:01:19+08:00
[CATEGORIES]
cs.LG
Neural Port-Hamiltonian Models for Nonlinear Distributed Control: An Unconstrained Parametrization Approach
[AUTHORS]
Muhammad Zakwan, Giancarlo Ferrari-Trecate
[ABSTRACT]
The control of large-scale cyber-physical systems requires optimal
distributed policies relying solely on limited communication with neighboring
agents. However, computing stabilizing controllers for nonlinear systems while
optimizing complex costs remains a significant challenge. Neural Networks
(NNs), known for their expressivity, can be leveraged to parametrize control
policies that yield good performance. However, NNs’ sensitivity to small input
changes poses a risk of destabilizing the closed-loop system. Many existing
approaches enforce constraints on the controllers’ parameter space to guarantee
closed-loop stability, leading to computationally expensive optimization
procedures. To address these problems, we leverage the framework of
port-Hamiltonian systems to design continuous-time distributed control policies
for nonlinear systems that guarantee closed-loop stability and finite
$\mathcal{L}_2$ or incremental $\mathcal{L}_2$ gains, independent of the
optimzation parameters of the controllers. This eliminates the need to
constrain parameters during optimization, allowing the use of standard
techniques such as gradient-based methods. Additionally, we discuss
discretization schemes that preserve the dissipation properties of these
controllers for implementation on embedded systems. The effectiveness of the
proposed distributed controllers is demonstrated through consensus control of
non-holonomic mobile robots subject to collision avoidance and averaged voltage
regulation with weighted power sharing in DC microgrids.
[COMMENTS]
The paper has 15 pages, and has been submitted for a possible
publication. arXiv admin note: text overlap with arXiv:2403.17785
[LINK]
http://arxiv.org/abs/2411.10096v1
[DATE]
2024-11-15 18:44:29+08:00
[CATEGORIES]
cs.LG
PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse
[AUTHORS]
Einari Vaaras, Manu Airaksinen, Okko Räsänen
[ABSTRACT]
Self-supervised learning (SSL) is a data-driven learning approach that
utilizes the innate structure of the data to guide the learning process. In
contrast to supervised learning, which depends on external labels, SSL utilizes
the inherent characteristics of the data to produce its own supervisory signal.
However, one frequent issue with SSL methods is representation collapse, where
the model outputs a constant input-invariant feature representation. This issue
hinders the potential application of SSL methods to new data modalities, as
trying to avoid representation collapse wastes researchers’ time and effort.
This paper introduces a novel SSL algorithm for time-series data called
Prediction of Functionals from Masked Latents (PFML). Instead of predicting
masked input signals or their latent representations directly, PFML operates by
predicting statistical functionals of the input signal corresponding to masked
embeddings, given a sequence of unmasked embeddings. The algorithm is designed
to avoid representation collapse, rendering it straightforwardly applicable to
different time-series data domains, such as novel sensor modalities in clinical
data. We demonstrate the effectiveness of PFML through complex, real-life
classification tasks across three different data modalities: infant posture and
movement classification from multi-sensor inertial measurement unit data,
emotion recognition from speech data, and sleep stage classification from EEG
data. The results show that PFML is superior to a conceptually similar
pre-existing SSL method and competitive against the current state-of-the-art
SSL method, while also being conceptually simpler and without suffering from
representation collapse.
[LINK]
http://arxiv.org/abs/2411.10087v1
[DATE]
2024-11-15 18:16:38+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks and Differential Equations: A hybrid approach for data assimilation of fluid flows
[AUTHORS]
M. Quattromini, M. A. Bucci, S. Cherubini, O. Semeraro
[ABSTRACT]
This study presents a novel hybrid approach that combines Graph Neural
Networks (GNNs) with Reynolds-Averaged Navier Stokes (RANS) equations to
enhance the accuracy of mean flow reconstruction across a range of fluid
dynamics applications. Traditional purely data-driven Neural Networks (NNs)
models, often struggle maintaining physical consistency. Moreover, they
typically require large datasets to achieve reliable performances. The GNN
framework, which naturally handles unstructured data such as complex geometries
in Computational Fluid Dynamics (CFD), is here integrated with RANS equations
as a physical baseline model. The methodology leverages the adjoint method,
enabling the use of RANS-derived gradients as optimization terms in the GNN
training process. This ensures that the learned model adheres to the governing
physics, maintaining physical consistency while improving the prediction
accuracy. We test our approach on multiple CFD scenarios, including cases
involving generalization with respect to the Reynolds number, sparse
measurements, denoising and inpainting of missing portions of the mean flow.
The results demonstrate significant improvements in the accuracy of the
reconstructed mean flow compared to purely data-driven models, using limited
amounts of data in the training dataset. The key strengths of this study are
the integration of physical laws into the training process of the GNN, and the
ability to achieve high-accuracy predictions with a limited amount of data,
making this approach particularly valuable for applications in fluid dynamics
where data is often scarce.
[LINK]
http://arxiv.org/abs/2411.09476v2
[DATE]
2024-11-15 18:09:33+08:00
[CATEGORIES]
cs.LG
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits
[AUTHORS]
Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel
[ABSTRACT]
We address the problem of stochastic combinatorial semi-bandits, where a
player selects among P actions from the power set of a set containing d base
items. Adaptivity to the problem’s structure is essential in order to obtain
optimal regret upper bounds. As estimating the coefficients of a covariance
matrix can be manageable in practice, leveraging them should improve the
regret. We design “optimistic” covariance-adaptive algorithms relying on online
estimations of the covariance structure, called OLS-UCB-C and COS-V (only the
variances for the latter). They both yields improved gap-free regret. Although
COS-V can be slightly suboptimal, it improves on computational complexity by
taking inspiration from ThompsonSampling approaches. It is the first
sampling-based algorithm satisfying a T^1/2 gap-free regret (up to poly-logs).
We also show that in some cases, our approach efficiently leverages the
semi-bandit feedback and outperforms bandit feedback approaches, not only in
exponential regimes where P » d but also when P <= d, which is not covered by
existing analyses.
[LINK]
http://arxiv.org/abs/2402.15171v4
[DATE]
2024-11-15 17:41:26+08:00
[CATEGORIES]
cs.LG
Evidential Federated Learning for Skin Lesion Image Classification
[AUTHORS]
Rutger Hendrix, Federica Proietto Salanitri, Concetto Spampinato, Simone Palazzo, Ulas Bagci
[ABSTRACT]
We introduce FedEvPrompt, a federated learning approach that integrates
principles of evidential deep learning, prompt tuning, and knowledge
distillation for distributed skin lesion classification. FedEvPrompt leverages
two sets of prompts: b-prompts (for low-level basic visual knowledge) and
t-prompts (for task-specific knowledge) prepended to frozen pre-trained Vision
Transformer (ViT) models trained in an evidential learning framework to
maximize class evidences. Crucially, knowledge sharing across federation
clients is achieved only through knowledge distillation on attention maps
generated by the local ViT models, ensuring enhanced privacy preservation
compared to traditional parameter or synthetic image sharing methodologies.
FedEvPrompt is optimized within a round-based learning paradigm, where each
round involves training local models followed by attention maps sharing with
all federation clients. Experimental validation conducted in a real distributed
setting, on the ISIC2019 dataset, demonstrates the superior performance of
FedEvPrompt against baseline federated learning algorithms and knowledge
distillation methods, without sharing model parameters. In conclusion,
FedEvPrompt offers a promising approach for federated learning, effectively
addressing challenges such as data heterogeneity, imbalance, privacy
preservation, and knowledge sharing.
[COMMENTS]
Published as a conference paper at ICPR 2024
[LINK]
http://arxiv.org/abs/2411.10071v1
[DATE]
2024-11-15 17:34:28+08:00
[CATEGORIES]
cs.LG
Calibration of ordinal regression networks
[AUTHORS]
Daehwan Kim, Haejun Chung, Ikbeom Jang
[ABSTRACT]
Recent studies have shown that deep neural networks are not well-calibrated
and often produce over-confident predictions. The miscalibration issue
primarily stems from using cross-entropy in classifications, which aims to
align predicted softmax probabilities with one-hot labels. In ordinal
regression tasks, this problem is compounded by an additional challenge: the
expectation that softmax probabilities should exhibit unimodal distribution is
not met with cross-entropy. The ordinal regression literature has focused on
learning orders and overlooked calibration. To address both issues, we propose
a novel loss function that introduces order-aware calibration, ensuring that
prediction confidence adheres to ordinal relationships between classes. It
incorporates soft ordinal encoding and order-aware regularization to enforce
both calibration and unimodality. Extensive experiments across three popular
ordinal regression benchmarks demonstrate that our approach achieves
state-of-the-art calibration without compromising accuracy.
[LINK]
http://arxiv.org/abs/2410.15658v2
[DATE]
2024-11-15 17:34:23+08:00
[CATEGORIES]
cs.LG
Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening
[AUTHORS]
Zhangfan Yang, Junkai Ji, Shan He, Jianqiang Li, Ruibin Bai, Zexuan Zhu, Yew Soon Ong
[ABSTRACT]
Molecular docking enables virtual screening of compound libraries to identify
potential ligands that target proteins of interest, a crucial step in drug
development; however, as the size of the compound library increases, the
computational complexity of traditional docking models increases. Deep learning
algorithms can provide data-driven research and development models to increase
the speed of the docking process. Unfortunately, few models can achieve
superior screening performance compared to that of traditional models.
Therefore, a novel deep learning-based docking approach named Dockformer is
introduced in this study. Dockformer leverages multimodal information to
capture the geometric topology and structural knowledge of molecules and can
directly generate binding conformations with the corresponding confidence
measures in an end-to-end manner. The experimental results show that Dockformer
achieves success rates of 90.53\% and 82.71\% on the PDBbind core set and
PoseBusters benchmarks, respectively, and more than a 100-fold increase in the
inference process speed, outperforming almost all state-of-the-art docking
methods. In addition, the ability of Dockformer to identify the main protease
inhibitors of coronaviruses is demonstrated in a real-world virtual screening
scenario. Considering its high docking accuracy and screening efficiency,
Dockformer can be regarded as a powerful and robust tool in the field of drug
design.
[COMMENTS]
14 pages, 10 figures
[LINK]
http://arxiv.org/abs/2411.06740v2
[DATE]
2024-11-15 17:31:52+08:00
[CATEGORIES]
cs.LG
Adaptive Physics-Guided Neural Network
[AUTHORS]
David Shulman, Itai Dattner
[ABSTRACT]
This paper introduces an adaptive physics-guided neural network (APGNN)
framework for predicting quality attributes from image data by integrating
physical laws into deep learning models. The APGNN adaptively balances
data-driven and physics-informed predictions, enhancing model accuracy and
robustness across different environments. Our approach is evaluated on both
synthetic and real-world datasets, with comparisons to conventional data-driven
models such as ResNet. For the synthetic data, 2D domains were generated using
three distinct governing equations: the diffusion equation, the
advection-diffusion equation, and the Poisson equation. Non-linear
transformations were applied to these domains to emulate complex physical
processes in image form.
In real-world experiments, the APGNN consistently demonstrated superior
performance in the diverse thermal image dataset. On the cucumber dataset,
characterized by low material diversity and controlled conditions, APGNN and
PGNN showed similar performance, both outperforming the data-driven ResNet.
However, in the more complex thermal dataset, particularly for outdoor
materials with higher environmental variability, APGNN outperformed both PGNN
and ResNet by dynamically adjusting its reliance on physics-based versus
data-driven insights. This adaptability allowed APGNN to maintain robust
performance across structured, low-variability settings and more heterogeneous
scenarios. These findings underscore the potential of adaptive physics-guided
learning to integrate physical constraints effectively, even in challenging
real-world contexts with diverse environmental conditions.
[LINK]
http://arxiv.org/abs/2411.10064v1
[DATE]
2024-11-15 17:28:55+08:00
[CATEGORIES]
cs.LG
Unsupervised Congestion Status Identification Using LMP Data
[AUTHORS]
Kedi Zheng, Qixin Chen, Yi Wang, Chongqing Kang, Le Xie
[ABSTRACT]
Having a better understanding of how locational marginal prices (LMPs) change
helps in price forecasting and market strategy making. This paper investigates
the fundamental distribution of the congestion part of LMPs in high-dimensional
Euclidean space using an unsupervised approach. LMP models based on the
lossless and lossy DC optimal power flow (DC-OPF) are analyzed to show the
overlapping subspace property of the LMP data. The congestion part of LMPs is
spanned by certain row vectors of the power transfer distribution factor (PTDF)
matrix, and the subspace attributes of an LMP vector uniquely are found to
reflect the instantaneous congestion status of all the transmission lines. The
proposed method searches for the basis vectors that span the subspaces of
congestion LMP data in hierarchical ways. In the bottom-up search, the data
belonging to 1-dimensional subspaces are detected, and other data are projected
on the orthogonal subspaces. This procedure is repeated until all the basis
vectors are found or the basis gap appears. Top-down searching is used to
address the basis gap by hyperplane detection with outliers. Once all the basis
vectors are detected, the congestion status can be identified. Numerical
experiments based on the IEEE 30-bus system, IEEE 118-bus system, Illinois
200-bus system, and Southwest Power Pool are conducted to show the performance
of the proposed method.
[COMMENTS]
Paper accepted for IEEE Transactions on Smart Grid. Personal use of
this material is permitted. Permission from IEEE must be obtained for all
other uses
[LINK]
http://arxiv.org/abs/2411.10058v1
[DATE]
2024-11-15 17:21:54+08:00
[CATEGORIES]
cs.LG
KuaiFormer: Transformer-Based Retrieval at Kuaishou
[AUTHORS]
Chi Liu, Jiangxia Cao, Rui Huang, Kai Zheng, Qiang Luo, Kun Gai, Guorui Zhou
[ABSTRACT]
In large-scale content recommendation systems, retrieval serves as the
initial stage in the pipeline, responsible for selecting thousands of candidate
items from billions of options to pass on to ranking modules. Traditionally,
the dominant retrieval method has been Embedding-Based Retrieval (EBR) using a
Deep Neural Network (DNN) dual-tower structure. However, applying transformer
in retrieval tasks has been the focus of recent research, though real-world
industrial deployment still presents significant challenges. In this paper, we
introduce KuaiFormer, a novel transformer-based retrieval framework deployed in
a large-scale content recommendation system. KuaiFormer fundamentally redefines
the retrieval process by shifting from conventional score estimation tasks
(such as click-through rate estimate) to a transformer-driven Next Action
Prediction paradigm. This shift enables more effective real-time interest
acquisition and multi-interest extraction, significantly enhancing retrieval
performance. KuaiFormer has been successfully integrated into Kuaishou App’s
short-video recommendation system since May 2024, serving over 400 million
daily active users and resulting in a marked increase in average daily usage
time of Kuaishou users. We provide insights into both the technical and
business aspects of deploying transformer in large-scale recommendation
systems, addressing practical challenges encountered during industrial
implementation. Our findings offer valuable guidance for engineers and
researchers aiming to leverage transformer models to optimize large-scale
content recommendation systems.
[LINK]
http://arxiv.org/abs/2411.10057v1
[DATE]
2024-11-15 17:20:46+08:00
[CATEGORIES]
cs.LG
That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design
[AUTHORS]
Anna Goldie, Azalia Mirhoseini, Jeff Dean
[ABSTRACT]
In 2020, we introduced a deep reinforcement learning method capable of
generating superhuman chip layouts, which we then published in Nature and
open-sourced on GitHub. AlphaChip has inspired an explosion of work on AI for
chip design, and has been deployed in state-of-the-art chips across Alphabet
and extended by external chipmakers. Even so, a non-peer-reviewed invited paper
at ISPD 2023 questioned its performance claims, despite failing to run our
method as described in Nature. For example, it did not pre-train the RL method
(removing its ability to learn from prior experience), used substantially fewer
compute resources (20x fewer RL experience collectors and half as many GPUs),
did not train to convergence (standard practice in machine learning), and
evaluated on test cases that are not representative of modern chips. Recently,
Igor Markov published a meta-analysis of three papers: our peer-reviewed Nature
paper, the non-peer-reviewed ISPD paper, and Markov’s own unpublished paper
(though he does not disclose that he co-authored it). Although AlphaChip has
already achieved widespread adoption and impact, we publish this response to
ensure that no one is wrongly discouraged from innovating in this impactful
area.
[LINK]
http://arxiv.org/abs/2411.10053v1
[DATE]
2024-11-15 17:11:10+08:00
[CATEGORIES]
cs.LG
Physics-informed neural networks need a physicist to be accurate: the case of mass and heat transport in Fischer-Tropsch catalyst particles
[AUTHORS]
Tymofii Nikolaienko, Harshil Patel, Aniruddha Panda, Subodh Madhav Joshi, Stanislav Jaso, Kaushic Kalyanaraman
[ABSTRACT]
Physics-Informed Neural Networks (PINNs) have emerged as an influential
technology, merging the swift and automated capabilities of machine learning
with the precision and dependability of simulations grounded in theoretical
physics. PINNs are often employed to solve algebraic or differential equations
to replace some or even all steps of multi-stage computational workflows,
leading to their significant speed-up. However, wide adoption of PINNs is still
hindered by reliability issues, particularly at extreme ends of the input
parameter ranges. In this study, we demonstrate this in the context of a system
of coupled non-linear differential reaction-diffusion and heat transfer
equations related to Fischer-Tropsch synthesis, which are solved by a
finite-difference method with a PINN used in evaluating their source terms. It
is shown that the testing strategies traditionally used to assess the accuracy
of neural networks as function approximators can overlook the peculiarities
which ultimately cause instabilities of the finite-difference solver. We
propose a domain knowledge-based modifications to the PINN architecture
ensuring its correct asymptotic behavior. When combined with an improved
numerical scheme employed as an initial guess generator, the proposed
modifications are shown to recover the overall stability of the simulations,
while preserving the speed-up brought by PINN as the workflow component. We
discuss the possible applications of the proposed hybrid transport equation
solver in context of chemical reactors simulations.
[LINK]
http://arxiv.org/abs/2411.10048v1
[DATE]
2024-11-15 16:55:31+08:00
[CATEGORIES]
cs.LG
A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering
[AUTHORS]
Yang Xiang, Li Fan, Tulika Saha, Xiaoying Pang, Yushan Pan, Haiyang Zhang, Chengtao Ji
[ABSTRACT]
Graph clustering is an essential aspect of network analysis that involves
grouping nodes into separate clusters. Recent developments in deep learning
have resulted in advanced deep graph clustering techniques, which have proven
effective in many applications. Nonetheless, these methods often encounter
difficulties when dealing with the complexities of real-world graphs,
particularly in the presence of noisy edges. Additionally, many denoising graph
clustering strategies tend to suffer from lower performance compared to their
non-denoised counterparts, training instability, and challenges in scaling to
large datasets. To tackle these issues, we introduce a new framework called the
Dual Adaptive Assignment Approach for Robust Graph-Based Clustering (RDSA).
RDSA consists of three key components: (i) a node embedding module that
effectively integrates the graph’s topological features and node attributes;
(ii) a structure-based soft assignment module that improves graph modularity by
utilizing an affinity matrix for node assignments; and (iii) a node-based soft
assignment module that identifies community landmarks and refines node
assignments to enhance the model’s robustness. We assess RDSA on various
real-world datasets, demonstrating its superior performance relative to
existing state-of-the-art methods. Our findings indicate that RDSA provides
robust clustering across different graph types, excelling in clustering
effectiveness and robustness, including adaptability to noise, stability, and
scalability.
[LINK]
http://arxiv.org/abs/2410.21745v2
[DATE]
2024-11-15 16:54:31+08:00
[CATEGORIES]
cs.LG
Approximate Probabilistic Inference for Time-Series Data A Robust Latent Gaussian Model With Temporal Awareness
[AUTHORS]
Anton Johansson, Arunselvan Ramaswamy
[ABSTRACT]
The development of robust generative models for highly varied non-stationary
time series data is a complex yet important problem. Traditional models for
time series data prediction, such as Long Short-Term Memory (LSTM), are
inefficient and generalize poorly as they cannot capture complex temporal
relationships. In this paper, we present a probabilistic generative model that
can be trained to capture temporal information, and that is robust to data
errors. We call it Time Deep Latent Gaussian Model (tDLGM). Its novel
architecture is inspired by Deep Latent Gaussian Model (DLGM). Our model is
trained to minimize a loss function based on the negative log loss. One
contributing factor to Time Deep Latent Gaussian Model (tDLGM) robustness is
our regularizer, which accounts for data trends. Experiments conducted show
that tDLGM is able to reconstruct and generate complex time series data, and
that it is robust against to noise and faulty data.
[COMMENTS]
New revision added a space between “for” and “Time-Series” in the
title
[LINK]
http://arxiv.org/abs/2411.09312v2
[DATE]
2024-11-15 16:17:22+08:00
[CATEGORIES]
cs.LG
Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching
[AUTHORS]
Yasi Zhang, Peiyu Yu, Yaxuan Zhu, Yingshan Chang, Feng Gao, Ying Nian Wu, Oscar Leong
[ABSTRACT]
Generative models based on flow matching have attracted significant attention
for their simplicity and superior performance in high-resolution image
synthesis. By leveraging the instantaneous change-of-variables formula, one can
directly compute image likelihoods from a learned flow, making them enticing
candidates as priors for downstream tasks such as inverse problems. In
particular, a natural approach would be to incorporate such image probabilities
in a maximum-a-posteriori (MAP) estimation problem. A major obstacle, however,
lies in the slow computation of the log-likelihood, as it requires
backpropagating through an ODE solver, which can be prohibitively slow for
high-dimensional problems. In this work, we propose an iterative algorithm to
approximate the MAP estimator efficiently to solve a variety of linear inverse
problems. Our algorithm is mathematically justified by the observation that the
MAP objective can be approximated by a sum of $N$ “local MAP” objectives,
where $N$ is the number of function evaluations. By leveraging Tweedie’s
formula, we show that we can perform gradient steps to sequentially optimize
these objectives. We validate our approach for various linear inverse problems,
such as super-resolution, deblurring, inpainting, and compressed sensing, and
demonstrate that we can outperform other methods based on flow matching. Code
is available at https://github.com/YasminZhang/ICTM.
[COMMENTS]
Accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2405.18816v3
[DATE]
2024-11-15 16:10:51+08:00
[CATEGORIES]
cs.LG
Model Inversion Attacks: A Survey of Approaches and Countermeasures
[AUTHORS]
Zhanke Zhou, Jianing Zhu, Fengfei Yu, Xuan Li, Xiong Peng, Tongliang Liu, Bo Han
[ABSTRACT]
The success of deep neural networks has driven numerous research studies and
applications from Euclidean to non-Euclidean data. However, there are
increasing concerns about privacy leakage, as these networks rely on processing
private data. Recently, a new type of privacy attack, the model inversion
attacks (MIAs), aims to extract sensitive features of private data for training
by abusing access to a well-trained model. The effectiveness of MIAs has been
demonstrated in various domains, including images, texts, and graphs. These
attacks highlight the vulnerability of neural networks and raise awareness
about the risk of privacy leakage within the research community. Despite the
significance, there is a lack of systematic studies that provide a
comprehensive overview and deeper insights into MIAs across different domains.
This survey aims to summarize up-to-date MIA methods in both attacks and
defenses, highlighting their contributions and limitations, underlying modeling
principles, optimization challenges, and future directions. We hope this survey
bridges the gap in the literature and facilitates future research in this
critical area. Besides, we are maintaining a repository to keep track of
relevant research at
https://github.com/AndrewZhou924/Awesome-model-inversion-attack.
[COMMENTS]
40 pages, 17 figures
[LINK]
http://arxiv.org/abs/2411.10023v1
[DATE]
2024-11-15 16:09:28+08:00
[CATEGORIES]
cs.LG
Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation
[AUTHORS]
Shen Yuan, Haotian Liu, Hongteng Xu
[ABSTRACT]
While following different technical routes, both low-rank and orthogonal
adaptation techniques can efficiently adapt large-scale pre-training models in
specific tasks or domains based on a small piece of trainable parameters. In
this study, we bridge the gap between these two techniques, proposing a simple
but effective adaptation method based on Householder reflections. Given a
pre-trained model, our method fine-tunes its layers by multiplying each frozen
weight matrix with an orthogonal matrix constructed by a chain of learnable
Householder reflections (HRs). This HR-based orthogonal fine-tuning is
equivalent to an adaptive low-rank adaptation. Moreover, we show that the
orthogonality of the reflection planes corresponding to the HRs impacts the
model capacity and regularity. The analysis motivates us to regularize the
orthogonality of the HRs, leading to different implementations of the proposed
Householder reflection adaptation (HRA) method. Compared with state-of-the-art
methods, HRA achieves superior performance with fewer learnable parameters when
adapting large language models and conditional image generators. The code of
the experiments is available at \url{https://github.com/DaShenZi721/HRA}, and
the method has been merged into the
\href{https://github.com/huggingface/peft}{PEFT} package.
[LINK]
http://arxiv.org/abs/2405.17484v3
[DATE]
2024-11-15 16:02:03+08:00
[CATEGORIES]
cs.LG
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations
[AUTHORS]
Laura O’Mahony, Nikola S. Nikolov, David JP O’Sullivan
[ABSTRACT]
Recent efforts to understand intermediate representations in deep neural
networks have commonly attempted to label individual neurons and combinations
of neurons that make up linear directions in the latent space by examining
extremal neuron activations and the highest direction projections. In this
paper, we show that this approach, although yielding a good approximation for
many purposes, fails to capture valuable information about the behaviour of a
representation. Neural network activations are generally dense, and so a more
complex, but realistic scenario is that linear directions encode information at
various levels of stimulation. We hypothesise that non-extremal level
activations contain complex information worth investigating, such as
statistical associations, and thus may be used to locate confounding human
interpretable concepts. We explore the value of studying a range of neuron
activations by taking the case of mid-level output neuron activations and
demonstrate on a synthetic dataset how they can inform us about aspects of
representations in the penultimate layer not evident through analysing maximal
activations alone. We use our findings to develop a method to curate data from
mid-range logit samples for retraining to mitigate spurious correlations, or
confounding concepts in the penultimate layer, on real benchmark datasets. The
success of our method exemplifies the utility of inspecting non-maximal
activations to extract complex relationships learned by models.
[COMMENTS]
18 pages, 11 figures
[LINK]
http://arxiv.org/abs/2411.10019v1
[DATE]
2024-11-15 15:54:14+08:00
[CATEGORIES]
cs.LG
MicroCrackAttentionNeXt: Advancing Microcrack Detection in Wave Field Analysis Using Deep Neural Networks through Feature Visualization
[AUTHORS]
Fatahlla Moreh, Yusuf Hasan, Bilal Zahid Hussain, Mohammad Ammar, Sven Tomforde
[ABSTRACT]
Micro Crack detection using deep neural networks (DNNs) through an automated
pipeline using wave fields interacting with the damaged areas is highly sought
after. These high-dimensional spatio-temporal crack data are limited, and these
datasets have large dimensions in the temporal domain. The dataset presents a
substantial class imbalance, with crack pixels constituting an average of only
5% of the total pixels per sample. This extreme class imbalance poses a
challenge for deep learning models with the different micro-scale cracks, as
the network can be biased toward predicting the majority class, generally
leading to poor detection accuracy. This study builds upon the previous
benchmark SpAsE-Net, an asymmetric encoder-decoder network for micro-crack
detection. The impact of various activation and loss functions were examined
through feature space visualization using the manifold discovery and analysis
(MDA) algorithm. The optimized architecture and training methodology achieved
an accuracy of 86.85%.
[LINK]
http://arxiv.org/abs/2411.10015v1
[DATE]
2024-11-15 15:50:01+08:00
[CATEGORIES]
cs.LG
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
[AUTHORS]
Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji
[ABSTRACT]
In this work, we build a simple but strong baseline for sounding video
generation. Given base diffusion models for audio and video, we integrate them
with additional modules into a single model and train it to make the model
jointly generate audio and video. To enhance alignment between audio-video
pairs, we introduce two novel mechanisms in our model. The first one is
timestep adjustment, which provides different timestep information to each base
model. It is designed to align how samples are generated along with timesteps
across modalities. The second one is a new design of the additional modules,
termed Cross-Modal Conditioning as Positional Encoding (CMC-PE). In CMC-PE,
cross-modal information is embedded as if it represents temporal position
information, and the embeddings are fed into the model like positional
encoding. Compared with the popular cross-attention mechanism, CMC-PE provides
a better inductive bias for temporal alignment in the generated data.
Experimental results validate the effectiveness of the two newly introduced
mechanisms and also demonstrate that our method outperforms existing methods.
[COMMENTS]
The source code is available:
https://github.com/SonyResearch/SVG_baseline
[LINK]
http://arxiv.org/abs/2409.17550v2
[DATE]
2024-11-15 15:48:26+08:00
[CATEGORIES]
cs.LG
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
[AUTHORS]
Yongfan Liu, Hyoukjun Kwon
[ABSTRACT]
Stereo depth estimation is a fundamental component in augmented reality (AR)
applications. Although AR applications require very low latency for their
real-time applications, traditional depth estimation models often rely on
time-consuming preprocessing steps such as rectification to achieve high
accuracy. Also, non standard ML operator based algorithms such as cost volume
also require significant latency, which is aggravated on compute
resource-constrained mobile platforms. Therefore, we develop hardware-friendly
alternatives to the costly cost volume and preprocessing and design two new
models based on them, MultiHeadDepth and HomoDepth. Our approaches for cost
volume is replacing it with a new group-pointwise convolution-based operator
and approximation of consine similarity based on layernorm and dot product. For
online stereo rectification (preprocessing), we introduce homograhy matrix
prediction network with a rectification positional encoding (RPE), which
delivers both low latency and robustness to unrectified images, which
eliminates the needs for preprocessing. Our MultiHeadDepth, which includes
optimized cost volume, provides 11.8-30.3% improvements in accuracy and
22.9-25.2% reduction in latency compared to a state-of-the-art depth estimation
model for AR glasses from industry. Our HomoDepth, which includes optimized
preprocessing (Homograhpy + RPE) upon MultiHeadDepth, can process unrectified
images and reduce the end-to-end latency by 44.5%. We adopt a multi-task
learning framework to handle misaligned stereo inputs on HomoDepth, which
reduces theAbsRel error by 10.0-24.3%. The results demonstrate the efficacy of
our approaches in achieving both high model performance with low latency, which
makes a step forward toward practical depth estimation on future AR devices.
[LINK]
http://arxiv.org/abs/2411.10013v1
[DATE]
2024-11-15 15:43:45+08:00
[CATEGORIES]
cs.LG
DuSEGO: Dual Second-order Equivariant Graph Ordinary Differential Equation
[AUTHORS]
Yingxu Wang, Nan Yin, Mingyan Xiao, Xinhao Yi, Siwei Liu, Shangsong Liang
[ABSTRACT]
Graph Neural Networks (GNNs) with equivariant properties have achieved
significant success in modeling complex dynamic systems and molecular
properties. However, their expressiveness ability is limited by: (1) Existing
methods often overlook the over-smoothing issue caused by traditional GNN
models, as well as the gradient explosion or vanishing problems in deep GNNs.
(2) Most models operate on first-order information, neglecting that the real
world often consists of second-order systems, which further limits the model’s
representation capabilities. To address these issues, we propose the
\textbf{Du}al \textbf{S}econd-order \textbf{E}quivariant \textbf{G}raph
\textbf{O}rdinary Differential Equation (\method{}) for equivariant
representation. Specifically, \method{} apply the dual second-order equivariant
graph ordinary differential equations (Graph ODEs) on graph embeddings and node
coordinates, simultaneously. Theoretically, we first prove that \method{}
maintains the equivariant property. Furthermore, we provide theoretical
insights showing that \method{} effectively alleviates the over-smoothing
problem in both feature representation and coordinate update. Additionally, we
demonstrate that the proposed \method{} mitigates the exploding and vanishing
gradients problem, facilitating the training of deep multi-layer GNNs.
Extensive experiments on benchmark datasets validate the superiority of the
proposed \method{} compared to baselines.
[LINK]
http://arxiv.org/abs/2411.10000v1
[DATE]
2024-11-15 15:15:05+08:00
[CATEGORIES]
cs.LG
Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training
[AUTHORS]
Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee
[ABSTRACT]
As a highly expressive generative model, diffusion models have demonstrated
exceptional success across various domains, including image generation, natural
language processing, and combinatorial optimization. However, as data
distributions grow more complex, training these models to convergence becomes
increasingly computationally intensive. While diffusion models are typically
trained using uniform timestep sampling, our research shows that the variance
in stochastic gradients varies significantly across timesteps, with
high-variance timesteps becoming bottlenecks that hinder faster convergence. To
address this issue, we introduce a non-uniform timestep sampling method that
prioritizes these more critical timesteps. Our method tracks the impact of
gradient updates on the objective for each timestep, adaptively selecting those
most likely to minimize the objective effectively. Experimental results
demonstrate that this approach not only accelerates the training process, but
also leads to improved performance at convergence. Furthermore, our method
shows robust performance across various datasets, scheduling strategies, and
diffusion architectures, outperforming previously proposed timestep sampling
and weighting heuristics that lack this degree of robustness.
[LINK]
http://arxiv.org/abs/2411.09998v1
[DATE]
2024-11-15 15:12:18+08:00
[CATEGORIES]
cs.LG
Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time
[AUTHORS]
Vladimir Braverman, Prathamesh Dharangutte, Shreyas Pai, Vihan Shah, Chen Wang
[ABSTRACT]
We study the dynamic correlation clustering problem with $\textit{adaptive}$
edge label flips. In correlation clustering, we are given a $n$-vertex complete
graph whose edges are labeled either $(+)$ or $(-)$, and the goal is to
minimize the total number of $(+)$ edges between clusters and the number of
$(-)$ edges within clusters. We consider the dynamic setting with adversarial
robustness, in which the $\textit{adaptive}$ adversary could flip the label of
an edge based on the current output of the algorithm. Our main result is a
randomized algorithm that always maintains an $O(1)$-approximation to the
optimal correlation clustering with $O(\log^{2}{n})$ amortized update time.
Prior to our work, no algorithm with $O(1)$-approximation and
$\text{polylog}{(n)}$ update time for the adversarially robust setting was
known. We further validate our theoretical results with experiments on
synthetic and real-world datasets with competitive empirical performances. Our
main technical ingredient is an algorithm that maintains $\textit{sparse-dense
decomposition}$ with $\text{polylog}{(n)}$ update time, which could be of
independent interest.
[LINK]
http://arxiv.org/abs/2411.09979v1
[DATE]
2024-11-15 14:26:37+08:00
[CATEGORIES]
cs.LG
Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness
[AUTHORS]
Suhyeok Jang, Seojin Kim, Jinwoo Shin, Jongheon Jeong
[ABSTRACT]
The remarkable advances in deep learning have led to the emergence of many
off-the-shelf classifiers, e.g., large pre-trained models. However, since they
are typically trained on clean data, they remain vulnerable to adversarial
attacks. Despite this vulnerability, their superior performance and
transferability make off-the-shelf classifiers still valuable in practice,
demanding further work to provide adversarial robustness for them in a post-hoc
manner. A recently proposed method, denoised smoothing, leverages a denoiser
model in front of the classifier to obtain provable robustness without
additional training. However, the denoiser often creates hallucination, i.e.,
images that have lost the semantics of their originally assigned class, leading
to a drop in robustness. Furthermore, its noise-and-denoise procedure
introduces a significant distribution shift from the original distribution,
causing the denoised smoothing framework to achieve sub-optimal robustness. In
this paper, we introduce Fine-Tuning with Confidence-Aware Denoised Image
Selection (FT-CADIS), a novel fine-tuning scheme to enhance the certified
robustness of off-the-shelf classifiers. FT-CADIS is inspired by the
observation that the confidence of off-the-shelf classifiers can effectively
identify hallucinated images during denoised smoothing. Based on this, we
develop a confidence-aware training objective to handle such hallucinated
images and improve the stability of fine-tuning from denoised images. In this
way, the classifier can be fine-tuned using only images that are beneficial for
adversarial robustness. We also find that such a fine-tuning can be done by
updating a small fraction of parameters of the classifier. Extensive
experiments demonstrate that FT-CADIS has established the state-of-the-art
certified robustness among denoised smoothing methods across all
$\ell_2$-adversary radius in various benchmarks.
[COMMENTS]
26 pages; TMLR 2024; Code is available at
https://github.com/suhyeok24/FT-CADIS
[LINK]
http://arxiv.org/abs/2411.08933v2
[DATE]
2024-11-15 14:13:33+08:00
[CATEGORIES]
cs.LG
Establishing and Evaluating Trustworthy AI: Overview and Research Challenges
[AUTHORS]
Dominik Kowald, Sebastian Scher, Viktoria Pammer-Schindler, Peter Müllner, Kerstin Waxnegger, Lea Demelius, Angela Fessl, Maximilian Toller, Inti Gabriel Mendoza Estrada, Ilija Simic, Vedran Sabol, Andreas Truegler, Eduardo Veas, Roman Kern, Tomislav Nad, Simone Kopeinik
[ABSTRACT]
Artificial intelligence (AI) technologies (re-)shape modern life, driving
innovation in a wide range of sectors. However, some AI systems have yielded
unexpected or undesirable outcomes or have been used in questionable manners.
As a result, there has been a surge in public and academic discussions about
aspects that AI systems must fulfill to be considered trustworthy. In this
paper, we synthesize existing conceptualizations of trustworthy AI along six
requirements: 1) human agency and oversight, 2) fairness and
non-discrimination, 3) transparency and explainability, 4) robustness and
accuracy, 5) privacy and security, and 6) accountability. For each one, we
provide a definition, describe how it can be established and evaluated, and
discuss requirement-specific research challenges. Finally, we conclude this
analysis by identifying overarching research challenges across the requirements
with respect to 1) interdisciplinary research, 2) conceptual clarity, 3)
context-dependency, 4) dynamics in evolving systems, and 5) investigations in
real-world contexts. Thus, this paper synthesizes and consolidates a
wide-ranging and active discussion currently taking place in various academic
sub-communities and public forums. It aims to serve as a reference for a broad
audience and as a basis for future research directions.
[COMMENTS]
Accepted in Frontiers in Big Data and AI, Research Topic: Towards
Fair AI for Trustworthy Artificial Intelligence
[LINK]
http://arxiv.org/abs/2411.09973v1
[DATE]
2024-11-15 14:05:52+08:00
[CATEGORIES]
cs.LG
Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era
[AUTHORS]
Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen
[ABSTRACT]
The rapid advancement of large language models (LLMs) and multimodal learning
has transformed digital content creation and manipulation. Traditional visual
editing tools require significant expertise, limiting accessibility. Recent
strides in instruction-based editing have enabled intuitive interaction with
visual content, using natural language as a bridge between user intent and
complex editing operations. This survey provides an overview of these
techniques, focusing on how LLMs and multimodal models empower users to achieve
precise visual modifications without deep technical knowledge. By synthesizing
over 100 publications, we explore methods from generative adversarial networks
to diffusion models, examining multimodal integration for fine-grained content
control. We discuss practical applications across domains such as fashion, 3D
scene manipulation, and video synthesis, highlighting increased accessibility
and alignment with human intuition. Our survey compares existing literature,
emphasizing LLM-empowered editing, and identifies key challenges to stimulate
further research. We aim to democratize powerful visual editing across various
industries, from entertainment to education. Interested readers are encouraged
to access our repository at
https://github.com/tamlhp/awesome-instruction-editing.
[LINK]
http://arxiv.org/abs/2411.09955v1
[DATE]
2024-11-15 13:18:15+08:00
[CATEGORIES]
cs.LG
AdapShare: An RL-Based Dynamic Spectrum Sharing Solution for O-RAN
[AUTHORS]
Sneihil Gopal, David Griffith, Richard A. Rouil, Chunmei Liu
[COMMENTS]
arXiv admin note: text overlap with arXiv:2404.09110
[LINK]
http://arxiv.org/abs/2408.16842v2
[DATE]
2024-11-15 13:08:56+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Ding Li, Ziqi Zhang, Mengyu Yao, Yifeng Cai, Yao Guo, Xiangqun Chen [ABSTRACT]
Trusted Execution Environments (TEE) are used to safeguard on-device models.
However, directly employing TEEs to secure the entire DNN model is challenging
due to the limited computational speed. Utilizing GPU can accelerate DNN’s
computation speed but commercial widely-available GPUs usually lack security
protection. To this end, scholars introduce TSDP, a method that protects
privacy-sensitive weights within TEEs and offloads insensitive weights to GPUs.
Nevertheless, current methods do not consider the presence of a knowledgeable
adversary who can access abundant publicly available pre-trained models and
datasets. This paper investigates the security of existing methods against such
a knowledgeable adversary and reveals their inability to fulfill their security
promises. Consequently, we introduce a novel partition before training
strategy, which effectively separates privacy-sensitive weights from other
components of the model. Our evaluation demonstrates that our approach can
offer full model protection with a computational cost reduced by a factor of
[COMMENTS]
Accepted by TOSEM. Extended version of the S&P24 paper
(arXiv:2310.07152) [LINK]
http://arxiv.org/abs/2411.09945v1 [DATE]
2024-11-15 12:52:11+08:00 [CATEGORIES]
cs.LG
Zero-shot Voice Conversion with Diffusion Transformers
[AUTHORS]
Songting Liu
[ABSTRACT]
Zero-shot voice conversion aims to transform a source speech utterance to
match the timbre of a reference speech from an unseen speaker. Traditional
approaches struggle with timbre leakage, insufficient timbre representation,
and mismatches between training and inference tasks. We propose Seed-VC, a
novel framework that addresses these issues by introducing an external timbre
shifter during training to perturb the source speech timbre, mitigating leakage
and aligning training with inference. Additionally, we employ a diffusion
transformer that leverages the entire reference speech context, capturing
fine-grained timbre features through in-context learning. Experiments
demonstrate that Seed-VC outperforms strong baselines like OpenVoice and
CosyVoice, achieving higher speaker similarity and lower word error rates in
zero-shot voice conversion tasks. We further extend our approach to zero-shot
singing voice conversion by incorporating fundamental frequency (F0)
conditioning, resulting in comparative performance to current state-of-the-art
methods. Our findings highlight the effectiveness of Seed-VC in overcoming core
challenges, paving the way for more accurate and versatile voice conversion
systems.
[LINK]
http://arxiv.org/abs/2411.09943v1
[DATE]
2024-11-15 12:43:44+08:00
[CATEGORIES]
cs.LG
Adaptive Transfer Clustering: A Unified Framework
[AUTHORS]
Yuqi Gu, Zhongyuan Lyu, Kaizheng Wang
[ABSTRACT]
We propose a general transfer learning framework for clustering given a main
dataset and an auxiliary one about the same subjects. The two datasets may
reflect similar but different latent grouping structures of the subjects. We
propose an adaptive transfer clustering (ATC) algorithm that automatically
leverages the commonality in the presence of unknown discrepancy, by optimizing
an estimated bias-variance decomposition. It applies to a broad class of
statistical models including Gaussian mixture models, stochastic block models,
and latent class models. A theoretical analysis proves the optimality of ATC
under the Gaussian mixture model and explicitly quantifies the benefit of
transfer. Extensive simulations and real data experiments confirm our method’s
effectiveness in various scenarios.
[COMMENTS]
55 pages, 8 figures
[LINK]
http://arxiv.org/abs/2410.21263v3
[DATE]
2024-11-15 12:32:55+08:00
[CATEGORIES]
cs.LG
Efficient Pauli channel estimation with logarithmic quantum memory
[AUTHORS]
Sitan Chen, Weiyuan Gong
[ABSTRACT]
Here we revisit one of the prototypical tasks for characterizing the
structure of noise in quantum devices: estimating every eigenvalue of an
$n$-qubit Pauli noise channel to error $\epsilon$. Prior work [14] proved no-go
theorems for this task in the practical regime where one has a limited amount
of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum
memory must make exponentially many measurements, provided it is
non-concatenating. Such protocols can only interact with the channel by
repeatedly preparing a state, passing it through the channel, and measuring
immediately afterward.
This left open a natural question: does the lower bound hold even for general
protocols, i.e. ones which chain together many queries to the channel,
interleaved with arbitrary data-processing channels, before measuring?
Surprisingly, in this work we show the opposite: there is a protocol that can
estimate the eigenvalues of a Pauli channel to error $\epsilon$ using only
$O(\log n/\epsilon^2)$ ancilla and $\tilde{O}(n^2/\epsilon^2)$ measurements. In
contrast, we show that any protocol with zero ancilla, even a concatenating
one, must make $\Omega(2^n/\epsilon^2)$ measurements, which is tight.
Our results imply, to our knowledge, the first quantum learning task where
logarithmically many qubits of quantum memory suffice for an exponential
statistical advantage. Our protocol can be naturally extended to a protocol
that learns the eigenvalues of Pauli terms within any subset $A$ of a Pauli
channel with $O(\log\log(|A|)/\epsilon^2)$ ancilla and
$\tilde{O}(n^2/\epsilon^2)$ measurements.
[COMMENTS]
57 pages, 1 figure
[LINK]
http://arxiv.org/abs/2309.14326v3
[DATE]
2024-11-15 11:40:41+08:00
[CATEGORIES]
cs.LG
Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation
[AUTHORS]
Sitan Chen, Weiyuan Gong, Qi Ye, Zhihan Zhang
[ABSTRACT]
We study the task of agnostic tomography: given copies of an unknown
$n$-qubit state $\rho$ which has fidelity $\tau$ with some state in a given
class $C$, find a state which has fidelity $\ge \tau - \epsilon$ with $\rho$.
We give a new framework, stabilizer bootstrapping, for designing
computationally efficient protocols for this task, and use this to get new
agnostic tomography protocols for the following classes:
Stabilizer states: We give a protocol that runs in time
$\mathrm{poly}(n,1/\epsilon)\cdot (1/\tau)^{O(\log(1/\tau))}$, answering an
open question posed by Grewal, Iyer, Kretschmer, Liang [43] and Anshu and
Arunachalam [6]. Previous protocols ran in time $\mathrm{exp}(\Theta(n))$ or
required $\tau>\cos^2(\pi/8)$.
States with stabilizer dimension $n - t$: We give a protocol that runs in
time $n^3\cdot(2^t/\tau)^{O(\log(1/\epsilon))}$, extending recent work on
learning quantum states prepared by circuits with few non-Clifford gates, which
only applied in the realizable setting where $\tau = 1$ [33, 40, 49, 66].
Discrete product states: If $C = K^{\otimes n}$ for some $\mu$-separated
discrete set $K$ of single-qubit states, we give a protocol that runs in time
$(n/\mu)^{O((1 + \log (1/\tau))/\mu)}/\epsilon^2$. This strictly generalizes a
prior guarantee which applied to stabilizer product states [42]. For stabilizer
product states, we give a further improved protocol that runs in time
$(n^2/\epsilon^2)\cdot (1/\tau)^{O(\log(1/\tau))}$.
As a corollary, we give the first protocol for estimating stabilizer
fidelity, a standard measure of magic for quantum states, to error $\epsilon$
in $n^3 \mathrm{quasipoly}(1/\epsilon)$ time.
[COMMENTS]
68 pages
[LINK]
http://arxiv.org/abs/2408.06967v2
[DATE]
2024-11-15 11:20:45+08:00
[CATEGORIES]
cs.LG
A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines
[AUTHORS]
Zixuan He, Ziqian Kong, Zhengyu Chen, Yuling Zhan, Zijun Que, Zhengguo Xu
[ABSTRACT]
Accurate remaining useful life (RUL) predictions are critical to the safe
operation of aero-engines. Currently, the RUL prediction task is mainly a
regression paradigm with only mean square error as the loss function and lacks
research on feature space structure, the latter of which has shown excellent
performance in a large number of studies. This paper develops a
multi-granularity supervised contrastive (MGSC) framework from plain intuition
that samples with the same RUL label should be aligned in the feature space,
and address the problems of too large minibatch size and unbalanced samples in
the implementation. The RUL prediction with MGSC is implemented on using the
proposed multi-phase training strategy. This paper also demonstrates a simple
and scalable basic network structure and validates the proposed MGSC strategy
on the CMPASS dataset using a convolutional long short-term memory network as a
baseline, which effectively improves the accuracy of RUL prediction.
[LINK]
http://arxiv.org/abs/2411.00461v2
[DATE]
2024-11-15 11:01:59+08:00
[CATEGORIES]
cs.LG
Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move
[AUTHORS]
Takuya Kiyokawa, Eiki Nagata, Yoshihisa Tsurumine, Yuhwan Kwon, Takamitsu Matsubara
[ABSTRACT]
Mobile grasping enhances manipulation efficiency by utilizing robots’
mobility. This study aims to enable a commercial off-the-shelf robot for mobile
grasping, requiring precise timing and pose adjustments. Self-supervised
learning can develop a generalizable policy to adjust the robot’s velocity and
determine grasp position and orientation based on the target object’s shape and
pose. Due to mobile grasping’s complexity, action primitivization and
step-by-step learning are crucial to avoid data sparsity in learning from trial
and error. This study simplifies mobile grasping into two grasp action
primitives and a moving action primitive, which can be operated with limited
degrees of freedom for the manipulator. This study introduces three fully
convolutional neural network (FCN) models to predict static grasp primitive,
dynamic grasp primitive, and residual moving velocity error from visual inputs.
A two-stage grasp learning approach facilitates seamless FCN model learning.
The ablation study demonstrated that the proposed method achieved the highest
grasping accuracy and pick-and-place efficiency. Furthermore, randomizing
object shapes and environments in the simulation effectively achieved
generalizable mobile grasping.
[COMMENTS]
8 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.09904v1
[DATE]
2024-11-15 10:59:16+08:00
[CATEGORIES]
cs.LG
Statistical Analysis of Policy Space Compression Problem
[AUTHORS]
Majid Molaei, Marcello Restelli, Alberto Maria Metelli, Matteo Papini
[ABSTRACT]
Policy search methods are crucial in reinforcement learning, offering a
framework to address continuous state-action and partially observable problems.
However, the complexity of exploring vast policy spaces can lead to significant
inefficiencies. Reducing the policy space through policy compression emerges as
a powerful, reward-free approach to accelerate the learning process. This
technique condenses the policy space into a smaller, representative set while
maintaining most of the original effectiveness. Our research focuses on
determining the necessary sample size to learn this compressed set accurately.
We employ R'enyi divergence to measure the similarity between true and
estimated policy distributions, establishing error bounds for good
approximations. To simplify the analysis, we employ the $l_1$ norm, determining
sample size requirements for both model-based and model-free settings. Finally,
we correlate the error bounds from the $l_1$ norm with those from R'enyi
divergence, distinguishing between policies near the vertices and those in the
middle of the policy space, to determine the lower and upper bounds for the
required sample sizes.
[LINK]
http://arxiv.org/abs/2411.09900v1
[DATE]
2024-11-15 10:46:55+08:00
[CATEGORIES]
cs.LG
Revealing the Evolution of Order in Materials Microstructures Using Multi-Modal Computer Vision
[AUTHORS]
Arman Ter-Petrosyan, Michael Holden, Jenna A. Bilbrey, Sarah Akers, Christina Doty, Kayla H. Yano, Le Wang, Rajendra Paudel, Eric Lang, Khalid Hattar, Ryan B. Comes, Yingge Du, Bethany E. Matthews, Steven R. Spurgeon
[ABSTRACT]
The development of high-performance materials for microelectronics, energy
storage, and extreme environments depends on our ability to describe and direct
property-defining microstructural order. Our present understanding is typically
derived from laborious manual analysis of imaging and spectroscopy data, which
is difficult to scale, challenging to reproduce, and lacks the ability to
reveal latent associations needed for mechanistic models. Here, we demonstrate
a multi-modal machine learning (ML) approach to describe order from electron
microscopy analysis of the complex oxide La$_{1-x}$Sr$_x$FeO$_3$. We construct
a hybrid pipeline based on fully and semi-supervised classification, allowing
us to evaluate both the characteristics of each data modality and the value
each modality adds to the ensemble. We observe distinct differences in the
performance of uni- and multi-modal models, from which we draw general lessons
in describing crystal order using computer vision.
[COMMENTS]
30 pages, 5 figures, 2 tables
[LINK]
http://arxiv.org/abs/2411.09896v1
[DATE]
2024-11-15 10:44:32+08:00
[CATEGORIES]
cs.LG
Deep learning robotics using self-supervised spatial differentiation drive autonomous contact-based semiconductor characterization
[AUTHORS]
Alexander E. Siemenn, Basita Das, Kangyu Ji, Fang Sheng, Tonio Buonassisi
[ABSTRACT]
Integrating autonomous contact-based robotic characterization into
self-driving laboratories can enhance measurement quality, reliability, and
throughput. While deep learning models support robust autonomy, current methods
lack pixel-precision positioning and require extensive labeled data. To
overcome these challenges, we propose a self-supervised convolutional neural
network with a spatially differentiable loss function, incorporating shape
priors to refine predictions of optimal robot contact poses for semiconductor
characterization. This network improves valid pose generation by 20.0%,
relative to existing models. We demonstrate our network’s performance by
driving a 4-degree-of-freedom robot to characterize photoconductivity at 3,025
predicted poses across a gradient of perovskite compositions, achieving
throughputs over 125 measurements per hour. Spatially mapping photoconductivity
onto each drop-casted film reveals regions of inhomogeneity. With this
self-supervised deep learning-driven robotic system, we enable high-precision
and reliable automation of contact-based characterization techniques at high
throughputs, thereby allowing the measurement of previously inaccessible yet
important semiconductor properties for self-driving laboratories.
[LINK]
http://arxiv.org/abs/2411.09892v1
[DATE]
2024-11-15 10:36:36+08:00
[CATEGORIES]
cs.LG
Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation
[AUTHORS]
Yihong Guo, Yixuan Wang, Yuanyuan Shi, Pan Xu, Anqi Liu
[ABSTRACT]
Training a policy in a source domain for deployment in the target domain
under a dynamics shift can be challenging, often resulting in performance
degradation. Previous work tackles this challenge by training on the source
domain with modified rewards derived by matching distributions between the
source and the target optimal trajectories. However, pure modified rewards only
ensure the behavior of the learned policy in the source domain resembles
trajectories produced by the target optimal policies, which does not guarantee
optimal performance when the learned policy is actually deployed to the target
domain. In this work, we propose to utilize imitation learning to transfer the
policy learned from the reward modification to the target domain so that the
new policy can generate the same trajectories in the target domain. Our
approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL),
utilizes the reward modification for domain adaptation and follows the general
framework of generative adversarial imitation learning from observation (GAIfO)
by applying a reward augmented estimator for the policy optimization step.
Theoretically, we present an error bound for our method under a mild assumption
regarding the dynamics shift to justify the motivation of our method.
Empirically, our method outperforms the pure modified reward method without
imitation learning and also outperforms other baselines in benchmark
off-dynamics environments.
[COMMENTS]
Published at Neurips 2024
[LINK]
http://arxiv.org/abs/2411.09891v1
[DATE]
2024-11-15 10:35:20+08:00
[CATEGORIES]
cs.LG
DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators
[AUTHORS]
Shao-Ting Chiu, Junyuan Hong, Ulisses Braga-Neto
[ABSTRACT]
We introduce DeepSets Operator Networks (DeepOSets), an efficient,
non-autoregressive neural network architecture for in-context operator
learning. In-context learning allows a trained machine learning model to learn
from a user prompt without further training. DeepOSets adds in-context learning
capabilities to Deep Operator Networks (DeepONets) by combining it with the
DeepSets architecture. As the first non-autoregressive model for in-context
operator learning, DeepOSets allow the user prompt to be processed in parallel,
leading to significant computational savings. Here, we present the application
of DeepOSets in the problem of learning supervised learning algorithms, which
are operators mapping a finite-dimensional space of labeled data into an
infinite-dimensional hypothesis space of prediction functions. In an empirical
comparison with a popular autoregressive (transformer-based) model for
in-context learning of linear regression in one and five dimensions, DeepOSets
reduced the number of model weights by several orders of magnitude and required
a fraction of training and inference time. Furthermore, DeepOSets proved to be
less sensitive to noise, significantly outperforming the transformer model in
noisy settings.
[COMMENTS]
Janossy pooling results were added; Figures 1 and 2 were updated;
minor edits were made throughout
[LINK]
http://arxiv.org/abs/2410.09298v2
[DATE]
2024-11-15 09:09:27+08:00
[CATEGORIES]
cs.LG
Adversarial Environment Design via Regret-Guided Diffusion Models
[AUTHORS]
Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh
[ABSTRACT]
Training agents that are robust to environmental changes remains a
significant challenge in deep reinforcement learning (RL). Unsupervised
environment design (UED) has recently emerged to address this issue by
generating a set of training environments tailored to the agent’s capabilities.
While prior works demonstrate that UED has the potential to learn a robust
policy, their performance is constrained by the capabilities of the environment
generation. To this end, we propose a novel UED algorithm, adversarial
environment design via regret-guided diffusion models (ADD). The proposed
method guides the diffusion-based environment generator with the regret of the
agent to produce environments that the agent finds challenging but conducive to
further improvement. By exploiting the representation power of diffusion
models, ADD can directly generate adversarial environments while maintaining
the diversity of training environments, enabling the agent to effectively learn
a robust policy. Our experimental results demonstrate that the proposed method
successfully generates an instructive curriculum of environments, outperforming
UED baselines in zero-shot generalization across novel, out-of-distribution
environments. Project page: https://rllab-snu.github.io/projects/ADD
[COMMENTS]
38th Conference on Neural Information Processing Systems
[LINK]
http://arxiv.org/abs/2410.19715v2
[DATE]
2024-11-15 09:01:44+08:00
[CATEGORIES]
cs.LG
Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization
[AUTHORS]
Juyoung Yun
[ABSTRACT]
In deep learning, Residual Networks (ResNets) have proven effective in
addressing the vanishing gradient problem, allowing for the successful training
of very deep networks. However, skip connections in ResNets can lead to
gradient overlap, where gradients from both the learned transformation and the
skip connection combine, potentially resulting in overestimated gradients. This
overestimation can cause inefficiencies in optimization, as some updates may
overshoot optimal regions, affecting weight updates. To address this, we
examine Z-score Normalization (ZNorm) as a technique to manage gradient
overlap. ZNorm adjusts the gradient scale, standardizing gradients across
layers and reducing the negative impact of overlapping gradients. Our
experiments demonstrate that ZNorm improves training process, especially in
non-convex optimization scenarios common in deep learning, where finding
optimal solutions is challenging. These findings suggest that ZNorm can affect
the gradient flow, enhancing performance in large-scale data processing where
accuracy is critical.
[LINK]
http://arxiv.org/abs/2410.21564v3
[DATE]
2024-11-15 08:32:50+08:00
[CATEGORIES]
cs.LG
EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records
[AUTHORS]
Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, Amrit Krishnan
[ABSTRACT]
Transformers have significantly advanced the modeling of Electronic Health
Records (EHR), yet their deployment in real-world healthcare is limited by
several key challenges. Firstly, the quadratic computational cost and
insufficient context length of these models hinder hospitals’ ability in
processing the extensive medical histories typical in EHR data. Additionally,
existing models employ separate finetuning for each clinical task, complicating
maintenance in healthcare environments. Moreover, these models focus
exclusively on either clinical prediction or EHR forecasting, lacking
proficiency in both tasks. To overcome these limitations, we introduce
EHRMamba, a robust foundation model built on the Mamba architecture. EHRMamba
can process sequences up to 300% longer than previous models due to its linear
computational cost. We also introduce a novel approach to Multitask Prompted
Finetuning (MPF) for EHR data, which enables EHRMamba to simultaneously learn
multiple clinical tasks in a single finetuning phase, significantly enhancing
deployment and cross-task generalization. Furthermore, our model leverages the
HL7 FHIR data standard to simplify integration into existing hospital systems.
Alongside EHRMamba, we open-source Odyssey, a toolkit designed to support the
development and deployment of EHR foundation models, with an emphasis on data
standardization and interpretability. Our evaluations on the MIMIC-IV dataset
demonstrate that EHRMamba advances state-of-the-art performance across 6 major
clinical tasks and excels in EHR forecasting, marking a significant leap
forward in the field.
[COMMENTS]
17 Pages, 4 Figures
[LINK]
http://arxiv.org/abs/2405.14567v3
[DATE]
2024-11-15 08:24:00+08:00
[CATEGORIES]
cs.LG
Fair Secretaries with Unfair Predictions
[AUTHORS]
Eric Balkanski, Will Ma, Andreas Maggiori
[ABSTRACT]
Algorithms with predictions is a recent framework for decision-making under
uncertainty that leverages the power of machine-learned predictions without
making any assumption about their quality. The goal in this framework is for
algorithms to achieve an improved performance when the predictions are accurate
while maintaining acceptable guarantees when the predictions are erroneous. A
serious concern with algorithms that use predictions is that these predictions
can be biased and, as a result, cause the algorithm to make decisions that are
deemed unfair. We show that this concern manifests itself in the classical
secretary problem in the learning-augmented setting – the state-of-the-art
algorithm can have zero probability of accepting the best candidate, which we
deem unfair, despite promising to accept a candidate whose expected value is at
least $\max\{\Omega (1) , 1 - O(\epsilon)\}$ times the optimal value, where
$\epsilon$ is the prediction error. We show how to preserve this promise while
also guaranteeing to accept the best candidate with probability $\Omega(1)$.
Our algorithm and analysis are based on a new “pegging” idea that diverges from
existing works and simplifies/unifies some of their results. Finally, we extend
to the $k$-secretary problem and complement our theoretical analysis with
experiments.
[COMMENTS]
to appear at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.09854v1
[DATE]
2024-11-15 08:23:59+08:00
[CATEGORIES]
cs.LG
InterFormer: Towards Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
[AUTHORS]
Zhichen Zeng, Xiaolong Liu, Mengyue Hang, Xiaoyi Liu, Qinghai Zhou, Chaofei Yang, Yiqun Liu, Yichen Ruan, Laming Chen, Yuxin Chen, Yujia Hao, Jiaqi Xu, Jade Nie, Xi Liu, Buyun Zhang, Wei Wen, Siyang Yuan, Kai Wang, Wen-Yen Chen, Yiping Han, Huayu Li, Chunzhi Yang, Bo Long, Philip S. Yu, Hanghang Tong, Jiyan Yang
[ABSTRACT]
Click-through rate (CTR) prediction, which predicts the probability of a user
clicking an ad, is a fundamental task in recommender systems. The emergence of
heterogeneous information, such as user profile and behavior sequences, depicts
user interests from different aspects. A mutually beneficial integration of
heterogeneous information is the cornerstone towards the success of CTR
prediction. However, most of the existing methods suffer from two fundamental
limitations, including (1) insufficient inter-mode interaction due to the
unidirectional information flow between modes, and (2) aggressive information
aggregation caused by early summarization, resulting in excessive information
loss. To address the above limitations, we propose a novel module named
InterFormer to learn heterogeneous information interaction in an interleaving
style. To achieve better interaction learning, InterFormer enables
bidirectional information flow for mutually beneficial learning across
different modes. To avoid aggressive information aggregation, we retain
complete information in each data mode and use a separate bridging arch for
effective information selection and summarization. Our proposed InterFormer
achieves state-of-the-art performance on three public datasets and a
large-scale industrial dataset.
[COMMENTS]
10 pages, 6 figures
[LINK]
http://arxiv.org/abs/2411.09852v1
[DATE]
2024-11-15 08:20:36+08:00
[CATEGORIES]
cs.LG
ActNAS : Generating Efficient YOLO Models using Activation NAS
[AUTHORS]
Sudhakar Sah, Ravish Kumar, Darshan C. Ganji, Ehsan Saboori
[COMMENTS]
7 pages, 4 figures, FITML workshop, NeuRIPS 2024
[LINK]
http://arxiv.org/abs/2410.10887v2
[DATE]
2024-11-15 08:18:50+08:00
[CATEGORIES]
cs.LG
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
[AUTHORS]
Shiwei Liu, Guanchen Tao, Yifei Zou, Derek Chow, Zichen Fan, Kauna Lei, Bangfei Pan, Dennis Sylvester, Gregory Kielian, Mehdi Saligane
[ABSTRACT]
The self-attention mechanism distinguishes transformer-based large language
models (LLMs) apart from convolutional and recurrent neural networks. Despite
the performance improvement, achieving real-time LLM inference on silicon
remains challenging due to the extensive use of Softmax in self-attention. In
addition to the non-linearity, the low arithmetic intensity significantly
limits processing parallelism, especially when working with longer contexts. To
address this challenge, we propose Constant Softmax (ConSmax), a
software-hardware co-design that serves as an efficient alternative to Softmax.
ConSmax utilizes differentiable normalization parameters to eliminate the need
for maximum searching and denominator summation in Softmax. This approach
enables extensive parallelization while still executing the essential functions
of Softmax. Moreover, a scalable ConSmax hardware design with a bitwidth-split
look-up table (LUT) can achieve lossless non-linear operations and support
mixed-precision computing. Experimental results show that ConSmax achieves a
minuscule power consumption of 0.2mW and an area of 0.0008mm^2 at 1250MHz
working frequency in 16nm FinFET technology. For open-source contribution, we
further implement our design with the OpenROAD toolchain under SkyWater’s 130nm
CMOS technology. The corresponding power is 2.69mW and the area is 0.007mm^2.
ConSmax achieves 3.35x power savings and 2.75x area savings in 16nm technology,
and 3.15x power savings and 4.14x area savings with the open-source EDA
toolchain. In the meantime, it also maintains comparable accuracy on the GPT-2
model and the WikiText103 dataset. The project is available at
https://github.com/ReaLLMASIC/ConSmax
[LINK]
http://arxiv.org/abs/2402.10930v3
[DATE]
2024-11-15 08:09:44+08:00
[CATEGORIES]
cs.LG
SymbolFit: Automatic Parametric Modeling with Symbolic Regression
[AUTHORS]
Ho Fung Tsoi, Dylan Rankin, Cecile Caillol, Miles Cranmer, Sridhara Dasu, Javier Duarte, Philip Harris, Elliot Lipeles, Vladimir Loncar
[ABSTRACT]
We introduce SymbolFit, a framework that automates parametric modeling by
using symbolic regression to perform a machine-search for functions that fit
the data, while simultaneously providing uncertainty estimates in a single run.
Traditionally, constructing a parametric model to accurately describe binned
data has been a manual and iterative process, requiring an adequate functional
form to be determined before the fit can be performed. The main challenge
arises when the appropriate functional forms cannot be derived from first
principles, especially when there is no underlying true closed-form function
for the distribution. In this work, we address this problem by utilizing
symbolic regression, a machine learning technique that explores a vast space of
candidate functions without needing a predefined functional form, treating the
functional form itself as a trainable parameter. Our approach is demonstrated
in data analysis applications in high-energy physics experiments at the CERN
Large Hadron Collider (LHC). We demonstrate its effectiveness and efficiency
using five real proton-proton collision datasets from new physics searches at
the LHC, namely the background modeling in resonance searches for high-mass
dijet, trijet, paired-dijet, diphoton, and dimuon events. We also validate the
framework using several toy datasets with one and more variables.
[COMMENTS]
53 pages, 35 figures. Under review
[LINK]
http://arxiv.org/abs/2411.09851v1
[DATE]
2024-11-15 08:09:37+08:00
[CATEGORIES]
cs.LG
Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements
[AUTHORS]
Shijie Zhou, Huaisheng Zhu, Rohan Sharma, Ruiyi Zhang, Kaiyi Ji, Changyou Chen
[ABSTRACT]
Diffusion models have emerged as a powerful foundation model for visual
generation. With an appropriate sampling process, it can effectively serve as a
generative prior to solve general inverse problems. Current posterior sampling
based methods take the measurement (i.e., degraded image sample) into the
posterior sampling to infer the distribution of the target data (i.e., clean
image sample). However, in this manner, we show that high-frequency information
can be prematurely introduced during the early stages, which could induce
larger posterior estimate errors during the restoration sampling. To address
this issue, we first reveal that forming the log posterior gradient with the
noisy measurement ( i.e., samples from a diffusion forward process) instead of
the clean one can benefit the reverse process. Consequently, we propose a novel
diffusion posterior sampling method DPS-CM, which incorporates a Crafted
Measurement (i.e., samples generated by a reverse denoising process, compared
to random sampling with noise in standard methods) to form the posterior
estimate. This integration aims to mitigate the misalignment with the diffusion
prior caused by cumulative posterior estimate errors. Experimental results
demonstrate that our approach significantly improves the overall capacity to
solve general and noisy inverse problems, such as Gaussian deblurring,
super-resolution, inpainting, nonlinear deblurring, and tasks with Poisson
noise, relative to existing approaches.
[LINK]
http://arxiv.org/abs/2411.09850v1
[DATE]
2024-11-15 08:06:57+08:00
[CATEGORIES]
cs.LG
Boosted Neural Decoders: Achieving Extreme Reliability of LDPC Codes for 6G Networks
[AUTHORS]
Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No
[ABSTRACT]
Ensuring extremely high reliability in channel coding is essential for 6G
networks. The next-generation of ultra-reliable and low-latency communications
(xURLLC) scenario within 6G networks requires frame error rate (FER) below
$10^{-9}$. However, low-density parity-check (LDPC) codes, the standard in 5G
new radio (NR), encounter a challenge known as the error floor phenomenon,
which hinders to achieve such low rates. To tackle this problem, we introduce
an innovative solution: boosted neural min-sum (NMS) decoder. This decoder
operates identically to conventional NMS decoders, but is trained by novel
training methods including: i) boosting learning with uncorrected vectors, ii)
block-wise training schedule to address the vanishing gradient issue, iii)
dynamic weight sharing to minimize the number of trainable weights, iv)
transfer learning to reduce the required sample count, and v) data augmentation
to expedite the sampling process. Leveraging these training strategies, the
boosted NMS decoder achieves the state-of-the art performance in reducing the
error floor as well as superior waterfall performance. Remarkably, we fulfill
the 6G xURLLC requirement for 5G LDPC codes without a severe error floor.
Additionally, the boosted NMS decoder, once its weights are trained, can
perform decoding without additional modules, making it highly practical for
immediate application. The source code is available at
https://github.com/ghy1228/LDPC_Error_Floor.
[COMMENTS]
14 pages, 11 figures
[LINK]
http://arxiv.org/abs/2405.13413v2
[DATE]
2024-11-15 07:54:29+08:00
[CATEGORIES]
cs.LG
Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods
[AUTHORS]
Yiming Zhou, Zixuan Zeng, Andi Chen, Xiaofan Zhou, Haowei Ni, Shiyao Zhang, Panfeng Li, Liangxi Liu, Mengyao Zheng, Xupeng Chen
[ABSTRACT]
Exploring the capabilities of Neural Radiance Fields (NeRF) and
Gaussian-based methods in the context of 3D scene reconstruction, this study
contrasts these modern approaches with traditional Simultaneous Localization
and Mapping (SLAM) systems. Utilizing datasets such as Replica and ScanNet, we
assess performance based on tracking accuracy, mapping fidelity, and view
synthesis. Findings reveal that NeRF excels in view synthesis, offering unique
capabilities in generating new perspectives from existing data, albeit at
slower processing speeds. Conversely, Gaussian-based methods provide rapid
processing and significant expressiveness but lack comprehensive scene
completion. Enhanced by global optimization and loop closure techniques, newer
methods like NICE-SLAM and SplaTAM not only surpass older frameworks such as
ORB-SLAM2 in terms of robustness but also demonstrate superior performance in
dynamic and complex environments. This comparative analysis bridges theoretical
research with practical implications, shedding light on future developments in
robust 3D scene reconstruction across various real-world applications.
[COMMENTS]
Accepted by 2024 6th International Conference on Data-driven
Optimization of Complex Systems
[LINK]
http://arxiv.org/abs/2408.04268v2
[DATE]
2024-11-15 07:46:34+08:00
[CATEGORIES]
cs.LG
Deep Autoencoders for Unsupervised Anomaly Detection in Wildfire Prediction
[AUTHORS]
İrem Üstek, Miguel Arana-Catania, Alexander Farr, Ivan Petrunin
[ABSTRACT]
Wildfires pose a significantly increasing hazard to global ecosystems due to
the climate crisis. Due to its complex nature, there is an urgent need for
innovative approaches to wildfire prediction, such as machine learning. This
research took a unique approach, differentiating from classical supervised
learning, and addressed the gap in unsupervised wildfire prediction using
autoencoders and clustering techniques for anomaly detection. Historical
weather and normalised difference vegetation index datasets of Australia for
2005 - 2021 were utilised. Two main unsupervised approaches were analysed. The
first used a deep autoencoder to obtain latent features, which were then fed
into clustering models, isolation forest, local outlier factor and one-class
SVM for anomaly detection. The second approach used a deep autoencoder to
reconstruct the input data and use reconstruction errors to identify anomalies.
Long Short-Term Memory (LSTM) autoencoders and fully connected (FC)
autoencoders were employed in this part, both in an unsupervised way learning
only from nominal data. The FC autoencoder outperformed its counterparts,
achieving an accuracy of 0.71, an F1-score of 0.74, and an MCC of 0.42. These
findings highlight the practicality of this method, as it effectively predicts
wildfires in the absence of ground truth, utilising an unsupervised learning
technique.
[COMMENTS]
33 pages, 18 figure, 16 tables. To appear in Earth and Space Science
[LINK]
http://arxiv.org/abs/2411.09844v1
[DATE]
2024-11-15 07:19:55+08:00
[CATEGORIES]
cs.LG
Interpolating neural network: A lightweight yet precise architecture for data training, equation solving, and parameter calibration
[AUTHORS]
Chanwook Park, Sourav Saha, Jiachen Guo, Hantao Zhang, Xiaoyu Xie, Miguel A. Bessa, Dong Qian, Wei Chen, Gregory J. Wagner, Jian Cao, Wing Kam Liu
[ABSTRACT]
Artificial intelligence (AI) has revolutionized software development,
shifting from task-specific codes (Software 1.0) to neural network-based
approaches (Software 2.0). However, applying this transition in engineering
software presents challenges, including low surrogate model accuracy, the curse
of dimensionality in inverse design, and rising complexity in physical
simulations. We introduce an interpolating neural network (INN), grounded in
interpolation theory and tensor decomposition, to realize Engineering Software
2.0 by advancing data training, partial differential equation solving, and
parameter calibration. INN offers orders of magnitude fewer trainable/solvable
parameters for comparable model accuracy than traditional multi-layer
perceptron (MLP) or physics-informed neural networks (PINN). Demonstrated in
metal additive manufacturing, INN rapidly constructs an accurate surrogate
model of Laser Powder Bed Fusion (L-PBF) heat transfer simulation, achieving
sub-10-micrometer resolution for a 10 mm path in under 15 minutes on a single
GPU. This makes a transformative step forward across all domains essential to
engineering software.
[COMMENTS]
9 pages, 2 figures
[LINK]
http://arxiv.org/abs/2404.10296v4
[DATE]
2024-11-15 07:07:03+08:00
[CATEGORIES]
cs.LG
Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models
[AUTHORS]
Kirill Vasilevski, Dayi Lin, Ahmed Hassan
[ABSTRACT]
To balance the quality and inference cost of a Foundation Model (FM, such as
large language models (LLMs)) powered software, people often opt to train a
routing model that routes requests to FMs with different sizes and
capabilities. Existing routing models rely on learning the optimal routing
decision from carefully curated data, require complex computations to be
updated, and do not consider the potential evolution of weaker FMs. In this
paper, we propose Real-time Adaptive Routing (RAR), an approach to continuously
adapt FM routing decisions while using guided in-context learning to enhance
the capabilities of weaker FM. The goal is to reduce reliance on stronger, more
expensive FMs. We evaluate our approach on different subsets of the popular
MMLU benchmark. Over time, our approach routes 50.2% fewer requests to
computationally expensive models while maintaining around 90.5% of the general
response quality. In addition, the guides generated from stronger models have
shown intra-domain generalization and led to a better quality of responses
compared to an equivalent approach with a standalone weaker FM.
[LINK]
http://arxiv.org/abs/2411.09837v1
[DATE]
2024-11-15 07:02:30+08:00
[CATEGORIES]
cs.LG
The Good, The Efficient and the Inductive Biases: Exploring Efficiency in Deep Learning Through the Use of Inductive Biases
[AUTHORS]
David W. Romero
[ABSTRACT]
The emergence of Deep Learning has marked a profound shift in machine
learning, driven by numerous breakthroughs achieved in recent years. However,
as Deep Learning becomes increasingly present in everyday tools and
applications, there is a growing need to address unresolved challenges related
to its efficiency and sustainability. This dissertation delves into the role of
inductive biases – particularly, continuous modeling and symmetry preservation
– as strategies to enhance the efficiency of Deep Learning. It is structured
in two main parts.
The first part investigates continuous modeling as a tool to improve the
efficiency of Deep Learning algorithms. Continuous modeling involves the idea
of parameterizing neural operations in a continuous space. The research
presented here demonstrates substantial benefits for the (i) computational
efficiency – in time and memory, (ii) the parameter efficiency, and (iii)
design efficiency – the complexity of designing neural architectures for new
datasets and tasks.
The second focuses on the role of symmetry preservation on Deep Learning
efficiency. Symmetry preservation involves designing neural operations that
align with the inherent symmetries of data. The research presented in this part
highlights significant gains both in data and parameter efficiency through the
use of symmetry preservation. However, it also acknowledges a resulting
trade-off of increased computational costs.
The dissertation concludes with a critical evaluation of these findings,
openly discussing their limitations and proposing strategies to address them,
informed by literature and the author insights. It ends by identifying
promising future research avenues in the exploration of inductive biases for
efficiency, and their wider implications for Deep Learning.
[COMMENTS]
PhD Dissertation
[LINK]
http://arxiv.org/abs/2411.09827v1
[DATE]
2024-11-15 06:24:59+08:00
[CATEGORIES]
cs.LG
Model-Change Active Learning in Graph-Based Semi-Supervised Learning
[AUTHORS]
Kevin Miller, Andrea L. Bertozzi
[ABSTRACT]
Active learning in semi-supervised classification involves introducing
additional labels for unlabelled data to improve the accuracy of the underlying
classifier. A challenge is to identify which points to label to best improve
performance while limiting the number of new labels. “Model Change” active
learning quantifies the resulting change incurred in the classifier by
introducing the additional label(s). We pair this idea with graph-based
semi-supervised learning methods, that use the spectrum of the graph Laplacian
matrix, which can be truncated to avoid prohibitively large computational and
storage costs. We consider a family of convex loss functions for which the
acquisition function can be efficiently approximated using the Laplace
approximation of the posterior distribution. We show a variety of multiclass
examples that illustrate improved performance over prior state-of-art.
[LINK]
http://arxiv.org/abs/2110.07739v2
[DATE]
2024-11-15 06:18:31+08:00
[CATEGORIES]
cs.LG
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
[AUTHORS]
Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George Konidaris
[ABSTRACT]
Reinforcement learning algorithms typically rely on the assumption that the
environment dynamics and value function can be expressed in terms of a
Markovian state representation. However, when state information is only
partially observable, how can an agent learn such a state representation, and
how can it detect when it has found one? We introduce a metric that can
accomplish both objectives, without requiring access to – or knowledge of –
an underlying, unobservable state space. Our metric, the $\lambda$-discrepancy,
is the difference between two distinct temporal difference (TD) value
estimates, each computed using TD($\lambda$) with a different value of
$\lambda$. Since TD($\lambda{=}0$) makes an implicit Markov assumption and
TD($\lambda{=}1$) does not, a discrepancy between these estimates is a
potential indicator of a non-Markovian state representation. Indeed, we prove
that the $\lambda$-discrepancy is exactly zero for all Markov decision
processes and almost always non-zero for a broad class of partially observable
environments. We also demonstrate empirically that, once detected, minimizing
the $\lambda$-discrepancy can help with learning a memory function to mitigate
the corresponding partial observability. We then train a reinforcement learning
agent that simultaneously constructs two recurrent value networks with
different $\lambda$ parameters and minimizes the difference between them as an
auxiliary loss. The approach scales to challenging partially observable
domains, where the resulting agent frequently performs significantly better
(and never performs worse) than a baseline recurrent agent with only a single
value network.
[COMMENTS]
GitHub URL: https://github.com/brownirl/lambda_discrepancy; Project
page: https://lambda-discrepancy.github.io/
[LINK]
http://arxiv.org/abs/2407.07333v3
[DATE]
2024-11-15 06:17:25+08:00
[CATEGORIES]
cs.LG
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking
[AUTHORS]
Yunchao, Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr
[ABSTRACT]
While deep learning has revolutionized computer-aided drug discovery, the AI
community has predominantly focused on model innovation and placed less
emphasis on establishing best benchmarking practices. We posit that without a
sound model evaluation framework, the AI community’s efforts cannot reach their
full potential, thereby slowing the progress and transfer of innovation into
real-world drug discovery. Thus, in this paper, we seek to establish a new gold
standard for small molecule drug discovery benchmarking, WelQrate.
Specifically, our contributions are threefold: WelQrate Dataset Collection - we
introduce a meticulously curated collection of 9 datasets spanning 5
therapeutic target classes. Our hierarchical curation pipelines, designed by
drug discovery experts, go beyond the primary high-throughput screen by
leveraging additional confirmatory and counter screens along with rigorous
domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS)
filtering, to ensure the high-quality data in the datasets; WelQrate Evaluation
Framework - we propose a standardized model evaluation framework considering
high-quality datasets, featurization, 3D conformation generation, evaluation
metrics, and data splits, which provides a reliable benchmarking for drug
discovery experts conducting real-world virtual screening; Benchmarking - we
evaluate model performance through various research questions using the
WelQrate dataset collection, exploring the effects of different models, dataset
quality, featurization methods, and data splitting strategies on the results.
In summary, we recommend adopting our proposed WelQrate as the gold standard in
small molecule drug discovery benchmarking. The WelQrate dataset collection,
along with the curation codes, and experimental scripts are all publicly
available at WelQrate.org.
[COMMENTS]
* denotes equal contribution
[LINK]
http://arxiv.org/abs/2411.09820v1
[DATE]
2024-11-15 05:49:41+08:00
[CATEGORIES]
cs.LG
Learning Parameter Sharing with Tensor Decompositions and Sparsity
[AUTHORS]
Cem Üyük, Mike Lasby, Mohamed Yassin, Utku Evci, Yani Ioannou
[ABSTRACT]
Large neural networks achieve remarkable performance, but their size hinders
deployment on resource-constrained devices. While various compression
techniques exist, parameter sharing remains relatively unexplored. This paper
introduces Fine-grained Parameter Sharing (FiPS), a novel algorithm that
leverages the relationship between parameter sharing, tensor decomposition, and
sparsity to efficiently compress large vision transformer models. FiPS employs
a shared base and sparse factors to represent shared neurons across multi-layer
perception (MLP) modules. Shared parameterization is initialized via Singular
Value Decomposition (SVD) and optimized by minimizing block-wise reconstruction
error. Experiments demonstrate that FiPS compresses DeiT-B and Swin-L MLPs to
25-40% of their original parameter count while maintaining accuracy within 1
percentage point of the original models.
[LINK]
http://arxiv.org/abs/2411.09816v1
[DATE]
2024-11-15 05:29:58+08:00
[CATEGORIES]
cs.LG
MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data
[AUTHORS]
Chika Maduabuchi, Ericmoore Jossou, Matteo Bucci
[ABSTRACT]
Purpose: High-speed video (HSV) phase detection (PD) segmentation is vital in
nuclear reactors, chemical processing, and electronics cooling for detecting
vapor, liquid, and microlayer phases. Traditional segmentation models face
pixel-level accuracy and generalization issues in multimodal data. MSEG-VCUQ
introduces VideoSAM, a hybrid framework leveraging convolutional neural
networks (CNNs) and transformer-based vision models to enhance segmentation
accuracy and generalizability across complex multimodal PD tasks. Methods:
VideoSAM combines U-Net CNN and the Segment Anything Model (SAM) for advanced
feature extraction and segmentation across diverse HSV PD modalities, spanning
fluids like water, FC-72, nitrogen, and argon under varied heat flux
conditions. The framework also incorporates uncertainty quantification (UQ) to
assess pixel-based discretization errors, delivering reliable metrics such as
contact line density and dry area fraction under experimental conditions.
Results: VideoSAM outperforms SAM and modality-specific CNN models in
segmentation accuracy, excelling in environments with complex phase boundaries,
overlapping bubbles, and dynamic liquid-vapor interactions. Its hybrid
architecture supports cross-dataset generalization, adapting effectively to
varying modalities. The UQ module provides accurate error estimates, enhancing
the reliability of segmentation outputs for advanced HSV PD research.
Conclusion: MSEG-VCUQ, via VideoSAM, offers a robust solution for HSV PD
segmentation, addressing previous limitations with advanced deep learning and
UQ techniques. The open-source datasets and tools introduced enable scalable,
precise, and adaptable segmentation for multimodal PD datasets, supporting
advancements in HSV analysis and autonomous experimentation. The codes and data
used for this paper are publicly available at
https://github.com/chikap421/mseg_vcuq
[COMMENTS]
Under Review in EAAI
[LINK]
http://arxiv.org/abs/2411.07463v3
[DATE]
2024-11-15 05:20:34+08:00
[CATEGORIES]
cs.LG
Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments
[AUTHORS]
Farnaz Niknia, Ping Wang
[ABSTRACT]
This paper addresses the challenge of edge caching in dynamic environments,
where rising traffic loads strain backhaul links and core networks. We propose
a Proximal Policy Optimization (PPO)-based caching strategy that fully
incorporates key file attributes such as size, lifetime, importance, and
popularity, while also considering random file request arrivals, reflecting
more realistic edge caching scenarios. In dynamic environments, changes such as
shifts in content popularity and variations in request rates frequently occur,
making previously learned policies less effective as they were optimized for
earlier conditions. Without adaptation, caching efficiency and response times
can degrade. While learning a new policy from scratch in a new environment is
an option, it is highly inefficient and computationally expensive. Thus,
adapting an existing policy to these changes is critical. To address this, we
develop a mechanism that detects changes in content popularity and request
rates, ensuring timely adjustments to the caching strategy. We also propose a
transfer learning-based PPO algorithm that accelerates convergence in new
environments by leveraging prior knowledge. Simulation results demonstrate the
significant effectiveness of our approach, outperforming a recent Deep
Reinforcement Learning (DRL)-based method.
[LINK]
http://arxiv.org/abs/2411.09812v1
[DATE]
2024-11-15 05:01:29+08:00
[CATEGORIES]
cs.LG
Physics-informed neural networks for parameter learning of wildfire spreading
[AUTHORS]
Konstantinos Vogiatzoglou, Costas Papadimitriou, Vasilis Bontozoglou, Konstantinos Ampountolas
[ABSTRACT]
Wildland fires pose a terrifying natural hazard, underscoring the urgent need
to develop data-driven and physics-informed digital twins for wildfire
prevention, monitoring, intervention, and response. In this direction of
research, this work introduces a physics-informed neural network (PiNN)
designed to learn the unknown parameters of an interpretable wildfire spreading
model. The considered modeling approach integrates fundamental physical laws
articulated by key model parameters essential for capturing the complex
behavior of wildfires. The proposed machine learning framework leverages the
theory of artificial neural networks with the physical constraints governing
wildfire dynamics, including the first principles of mass and energy
conservation. Training of the PiNN for physics-informed parameter
identification is realized using synthetic data on the spatiotemporal evolution
of one- and two-dimensional firefronts, derived from a high-fidelity simulator,
as well as empirical data (ground surface thermal images) from the Troy Fire
that occurred on June 19, 2002, in California. The parameter learning results
demonstrate the predictive ability of the proposed PiNN in uncovering the
unknown coefficients of the wildfire model in one- and two-dimensional fire
spreading scenarios as well as the Troy Fire. Additionally, this methodology
exhibits robustness by identifying the same parameters even in the presence of
noisy data. By integrating this PiNN approach into a comprehensive framework,
the envisioned physics-informed digital twin will enhance intelligent wildfire
management and risk assessment, providing a powerful tool for proactive and
reactive strategies.
[COMMENTS]
32 pages, 14 figures, 2 Tables
[LINK]
http://arxiv.org/abs/2406.14591v3
[DATE]
2024-11-15 04:59:15+08:00
[CATEGORIES]
cs.LG
Evaluating Loss Landscapes from a Topology Perspective
[AUTHORS]
Tiankai Xie, Caleb Geniesse, Jiaqing Chen, Yaoqing Yang, Dmitriy Morozov, Michael W. Mahoney, Ross Maciejewski, Gunther H. Weber
[ABSTRACT]
Characterizing the loss of a neural network with respect to model parameters,
i.e., the loss landscape, can provide valuable insights into properties of that
model. Various methods for visualizing loss landscapes have been proposed, but
less emphasis has been placed on quantifying and extracting actionable and
reproducible insights from these complex representations. Inspired by powerful
tools from topological data analysis (TDA) for summarizing the structure of
high-dimensional data, here we characterize the underlying shape (or topology)
of loss landscapes, quantifying the topology to reveal new insights about
neural networks. To relate our findings to the machine learning (ML)
literature, we compute simple performance metrics (e.g., accuracy, error), and
we characterize the local structure of loss landscapes using Hessian-based
metrics (e.g., largest eigenvalue, trace, eigenvalue spectral density).
Following this approach, we study established models from image pattern
recognition (e.g., ResNets) and scientific ML (e.g., physics-informed neural
networks), and we show how quantifying the shape of loss landscapes can provide
new insights into model performance and learning dynamics.
[LINK]
http://arxiv.org/abs/2411.09807v1
[DATE]
2024-11-15 04:46:26+08:00
[CATEGORIES]
cs.LG
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
[AUTHORS]
Xiaohui Tu, Yossiri Adulyasak, Nima Akbarzadeh, Erick Delage
[ABSTRACT]
We consider fair resource allocation in sequential decision-making
environments modeled as weakly coupled Markov decision processes, where
resource constraints couple the action spaces of $N$ sub-Markov decision
processes (sub-MDPs) that would otherwise operate independently. We adopt a
fairness definition using the generalized Gini function instead of the
traditional utilitarian (total-sum) objective. After introducing a general but
computationally prohibitive solution scheme based on linear programming, we
focus on the homogeneous case where all sub-MDPs are identical. For this case,
we show for the first time that the problem reduces to optimizing the
utilitarian objective over the class of “permutation invariant” policies. This
result is particularly useful as we can exploit Whittle index policies in the
restless bandits setting while, for the more general setting, we introduce a
count-proportion-based deep reinforcement learning approach. Finally, we
validate our theoretical findings with comprehensive experiments, confirming
the effectiveness of our proposed method in achieving fairness.
[LINK]
http://arxiv.org/abs/2411.09804v1
[DATE]
2024-11-15 04:40:55+08:00
[CATEGORIES]
cs.LG
Modeling human decomposition: a Bayesian approach
[AUTHORS]
D. Hudson Smith, Noah Nisbet, Carl Ehrett, Cristina I. Tica, Madeline M. Atwell, Katherine E. Weisensee
[ABSTRACT]
Environmental and individualistic variables affect the rate of human
decomposition in complex ways. These effects complicate the estimation of the
postmortem interval (PMI) based on observed decomposition characteristics. In
this work, we develop a generative probabilistic model for decomposing human
remains based on PMI and a wide range of environmental and individualistic
variables. This model explicitly represents the effect of each variable,
including PMI, on the appearance of each decomposition characteristic, allowing
for direct interpretation of model effects and enabling the use of the model
for PMI inference and optimal experimental design. In addition, the
probabilistic nature of the model allows for the integration of expert
knowledge in the form of prior distributions. We fit this model to a diverse
set of 2,529 cases from the GeoFOR dataset. We demonstrate that the model
accurately predicts 24 decomposition characteristics with an ROC AUC score of
0.85. Using Bayesian inference techniques, we invert the decomposition model to
predict PMI as a function of the observed decomposition characteristics and
environmental and individualistic variables, producing an R-squared measure of
71%. Finally, we demonstrate how to use the fitted model to design future
experiments that maximize the expected amount of new information about the
mechanisms of decomposition using the Expected Information Gain formalism.
[LINK]
http://arxiv.org/abs/2411.09802v1
[DATE]
2024-11-15 04:37:10+08:00
[CATEGORIES]
cs.LG
Explainable Differential Privacy-Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring
[AUTHORS]
Fardin Jalil Piran, Prathyush P. Poduval, Hamza Errahmouni Barkam, Mohsen Imani, Farhad Imani
[ABSTRACT]
Machine Learning (ML) models combined with in-situ sensing offer a powerful
solution to address defect detection challenges in Additive Manufacturing (AM),
yet this integration raises critical data privacy concerns, such as data
leakage and sensor data compromise, potentially exposing sensitive information
about part design and material composition. Differential Privacy (DP), which
adds mathematically controlled noise to ML models, provides a way to balance
data utility with privacy by concealing identifiable traces from sensor data.
However, introducing noise into ML models, especially black-box Artificial
Intelligence (AI) models, complicates the prediction of how noise impacts model
accuracy. This study presents the Differential Privacy-Hyperdimensional
Computing (DP-HD) framework, which leverages Explainable AI (XAI) and the
vector symbolic paradigm to quantify noise effects on accuracy. By defining a
Signal-to-Noise Ratio (SNR) metric, DP-HD assesses the contribution of training
data relative to DP noise, allowing selection of an optimal balance between
accuracy and privacy. Experimental results using high-speed melt pool data for
anomaly detection in AM demonstrate that DP-HD achieves superior operational
efficiency, prediction accuracy, and privacy protection. For instance, with a
privacy budget set at 1, DP-HD achieves 94.43% accuracy, outperforming
state-of-the-art ML models. Furthermore, DP-HD maintains high accuracy under
substantial noise additions to enhance privacy, unlike current models that
experience significant accuracy declines under stringent privacy constraints.
[COMMENTS]
28 pages, 13 figures
[LINK]
http://arxiv.org/abs/2407.07066v3
[DATE]
2024-11-15 04:13:19+08:00
[CATEGORIES]
cs.LG
Reinforced Disentanglers on Random Unitary Circuits
[AUTHORS]
Ning Bao, Keiichiro Furuya, Gun Suer
[ABSTRACT]
We search for efficient disentanglers on random Clifford circuits of
two-qubit gates arranged in a brick-wall pattern, using the proximal policy
optimization (PPO) algorithm
\cite{schulman2017proximalpolicyoptimizationalgorithms}. Disentanglers are
defined as a set of projective measurements inserted between consecutive
entangling layers. An efficient disentangler is a set of projective
measurements that minimize the averaged von Neumann entropy of the final state
with the least number of total projections possible. The problem is naturally
amenable to reinforcement learning techniques by taking the binary matrix
representing the projective measurements along the circuit as our state, and
actions as bit flipping operations on this binary matrix that add or delete
measurements at specified locations. We give rewards to our agent dependent on
the averaged von Neumann entropy of the final state and the configuration of
measurements, such that the agent learns the optimal policy that will take him
from the initial state of no measurements to the optimal measurement state that
minimizes the entanglement entropy. Our results indicate that the number of
measurements required to disentangle a random quantum circuit is drastically
less than the numerical results of measurement-induced phase transition papers.
Additionally, the reinforcement learning procedure enables us to characterize
the pattern of optimal disentanglers, which is not possible in the works of
measurement-induced phase transitions.
[COMMENTS]
9 pages, 7 figures, 1 table. Submitted to QIP 2025
[LINK]
http://arxiv.org/abs/2411.09784v1
[DATE]
2024-11-15 03:51:26+08:00
[CATEGORIES]
cs.LG
Decentralized Coordination of Distributed Energy Resources through Local Energy Markets and Deep Reinforcement Learning
[AUTHORS]
Daniel May, Matthew Taylor, Petr Musilek
[ABSTRACT]
As distributed energy resources (DERs) grow, the electricity grid faces
increased net load variability at the grid edge, impacting operability and
reliability. Transactive energy, facilitated through local energy markets,
offers a decentralized, indirect demand response solution, with model-free
control techniques, such as deep reinforcement learning (DRL), enabling
automated, decentralized participation. However, existing studies largely
overlook community-level net load variability, focusing instead on
socioeconomic metrics.
This study addresses this gap by using DRL agents to automate end-user
participation in a local energy market (ALEX), where agents act independently
to minimize individual energy bills. Results reveal a strong link between bill
reduction and decreased net load variability, assessed across metrics such as
ramping rate, load factor, and peak demand over various time horizons. Using a
no-control baseline, DRL agents are benchmarked against a near-optimal dynamic
programming approach. The dynamic programming benchmark achieves reductions of
22.05 percent, 83.92 percent, and 24.09 percent in daily import, export, and
peak demand, respectively, while the DRL agents show comparable or superior
results with reductions of 21.93 percent, 84.46 percent, and 27.02 percent.
This study demonstrates the effectiveness of DRL in decentralized grid
management, highlighting its scalability and near-optimal performance in
reducing net load variability within community-driven energy markets.
[COMMENTS]
preprint, submitted to Energy and AI
[LINK]
http://arxiv.org/abs/2404.13142v2
[DATE]
2024-11-15 03:36:14+08:00
[CATEGORIES]
cs.LG
Partial Multi-View Clustering via Meta-Learning and Contrastive Feature Alignment
[AUTHORS]
BoHao Chen
[ABSTRACT]
Partial multi-view clustering (PVC) presents significant challenges practical
research problem for data analysis in real-world applications, especially when
some views of the data are partially missing. Existing clustering methods
struggle to handle incomplete views effectively, leading to suboptimal
clustering performance. In this paper, we propose a novel dual optimization
framework based on contrastive learning, which aims to maximize the consistency
of latent features in incomplete multi-view data and improve clustering
performance through deep learning models. By combining a fine-tuned Vision
Transformer and k-nearest neighbors (KNN), we fill in missing views and
dynamically adjust view weights using self-supervised learning and
meta-learning. Experimental results demonstrate that our framework outperforms
state-of-the-art clustering models on the BDGP and HW datasets, particularly in
handling complex and incomplete multi-view data.
[LINK]
http://arxiv.org/abs/2411.09758v1
[DATE]
2024-11-15 03:16:01+08:00
[CATEGORIES]
cs.LG
Adversarial Attacks Using Differentiable Rendering: A Survey
[AUTHORS]
Matthew Hull, Chao Zhang, Zsolt Kira, Duen Horng Chau
[ABSTRACT]
Differentiable rendering methods have emerged as a promising means for
generating photo-realistic and physically plausible adversarial attacks by
manipulating 3D objects and scenes that can deceive deep neural networks
(DNNs). Recently, differentiable rendering capabilities have evolved
significantly into a diverse landscape of libraries, such as Mitsuba,
PyTorch3D, and methods like Neural Radiance Fields and 3D Gaussian Splatting
for solving inverse rendering problems that share conceptually similar
properties commonly used to attack DNNs, such as back-propagation and
optimization. However, the adversarial machine learning research community has
not yet fully explored or understood such capabilities for generating attacks.
Some key reasons are that researchers often have different attack goals, such
as misclassification or misdetection, and use different tasks to accomplish
these goals by manipulating different representation in a scene, such as the
mesh or texture of an object. This survey adopts a task-oriented unifying
framework that systematically summarizes common tasks, such as manipulating
textures, altering illumination, and modifying 3D meshes to exploit
vulnerabilities in DNNs. Our framework enables easy comparison of existing
works, reveals research gaps and spotlights exciting future research directions
in this rapidly evolving field. Through focusing on how these tasks enable
attacks on various DNNs such as image classification, facial recognition,
object detection, optical flow and depth estimation, our survey helps
researchers and practitioners better understand the vulnerabilities of computer
vision systems against photorealistic adversarial attacks that could threaten
real-world applications.
[LINK]
http://arxiv.org/abs/2411.09749v1
[DATE]
2024-11-15 03:03:11+08:00
[CATEGORIES]
cs.LG
Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations
[AUTHORS]
Carlos Heredia
[ABSTRACT]
In this paper, we propose a continuous-time formulation for the AdaGrad,
RMSProp, and Adam optimization algorithms by modeling them as first-order
integro-differential equations. We perform numerical simulations of these
equations to demonstrate their validity as accurate approximations of the
original algorithms. Our results indicate a strong agreement between the
behavior of the continuous-time models and the discrete implementations, thus
providing a new perspective on the theoretical understanding of adaptive
optimization methods.
[COMMENTS]
22 pages
[LINK]
http://arxiv.org/abs/2411.09734v1
[DATE]
2024-11-15 03:00:01+08:00
[CATEGORIES]
cs.LG
To bootstrap or to rollout? An optimal and adaptive interpolation
[AUTHORS]
Wenlong Mou, Jian Qian
[ABSTRACT]
Bootstrapping and rollout are two fundamental principles for value function
estimation in reinforcement learning (RL). We introduce a novel class of
Bellman operators, called subgraph Bellman operators, that interpolate between
bootstrapping and rollout methods. Our estimator, derived by solving the fixed
point of the empirical subgraph Bellman operator, combines the strengths of the
bootstrapping-based temporal difference (TD) estimator and the rollout-based
Monte Carlo (MC) methods. Specifically, the error upper bound of our estimator
approaches the optimal variance achieved by TD, with an additional term
depending on the exit probability of a selected subset of the state space. At
the same time, the estimator exhibits the finite-sample adaptivity of MC, with
sample complexity depending only on the occupancy measure of this subset. We
complement the upper bound with an information-theoretic lower bound, showing
that the additional term is unavoidable given a reasonable sample size.
Together, these results establish subgraph Bellman estimators as an optimal and
adaptive framework for reconciling TD and MC methods in policy evaluation.
[LINK]
http://arxiv.org/abs/2411.09731v1
[DATE]
2024-11-15 03:00:00+08:00
[CATEGORIES]
cs.LG
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
[AUTHORS]
Alexander C. Li, Yuandong Tian, Beidi Chen, Deepak Pathak, Xinlei Chen
[COMMENTS]
NeurIPS 2024. Code:
https://github.com/alexlioralexli/attention-transfer
[LINK]
http://arxiv.org/abs/2411.09702v1
[DATE]
2024-11-15 02:59:40+08:00
[CATEGORIES]
cs.LG
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)
[AUTHORS]
Nicolas Drapier, Aladine Chetouani, Aurélien Chateigner
[ABSTRACT]
The prediction of ship trajectories is a growing field of study in artificial
intelligence. Traditional methods rely on the use of LSTM, GRU networks, and
even Transformer architectures for the prediction of spatio-temporal series.
This study proposes a viable alternative for predicting these trajectories
using only GNSS positions. It considers this spatio-temporal problem as a
natural language processing problem. The latitude/longitude coordinates of AIS
messages are transformed into cell identifiers using the H3 index. Thanks to
the pseudo-octal representation, it becomes easier for language models to learn
the spatial hierarchy of the H3 index. The method is compared with a classical
Kalman filter, widely used in the maritime domain, and introduces the Fr'echet
distance as the main evaluation metric. We show that it is possible to predict
ship trajectories quite precisely up to 8 hours ahead with 30 minutes of
context, using solely GNSS positions, without relying on any additional
information such as speed, course, or external conditions - unlike many
traditional methods. We demonstrate that this alternative works well enough to
predict trajectories worldwide.
[COMMENTS]
28 pages, 18 figures
[LINK]
http://arxiv.org/abs/2405.09596v2
[DATE]
2024-11-15 02:57:09+08:00
[CATEGORIES]
cs.LG
Conditional regression for the Nonlinear Single-Variable Model
[AUTHORS]
Yantao Wu, Mauro Maggioni
[ABSTRACT]
Several statistical models for regression of a function $F$ on $\mathbb{R}^d$
without the statistical and computational curse of dimensionality exist, for
example by imposing and exploiting geometric assumptions on the distribution of
the data (e.g. that its support is low-dimensional), or strong smoothness
assumptions on $F$, or a special structure $F$. Among the latter, compositional
models assume $F=f\circ g$ with $g$ mapping to $\mathbb{R}^r$ with $r\ll d$,
have been studied, and include classical single- and multi-index models and
recent works on neural networks. While the case where $g$ is linear is rather
well-understood, much less is known when $g$ is nonlinear, and in particular
for which $g$’s the curse of dimensionality in estimating $F$, or both $f$ and
$g$, may be circumvented. In this paper, we consider a model
$F(X):=f(\Pi_\gamma X) $ where $\Pi_\gamma:\mathbb{R}^d\to[0,\rm{len}\gamma]$
is the closest-point projection onto the parameter of a regular curve $\gamma:
[0,\rm{len}\gamma]\to\mathbb{R}^d$ and $f:[0,\rm{len}\gamma]\to\mathbb{R}^1$.
The input data $X$ is not low-dimensional, far from $\gamma$, conditioned on
$\Pi\gamma(X)$ being well-defined. The distribution of the data, $\gamma$ and
$f$ are unknown. This model is a natural nonlinear generalization of the
single-index model, which corresponds to $\gamma$ being a line. We propose a
nonparametric estimator, based on conditional regression, and show that under
suitable assumptions, the strongest of which being that $f$ is coarsely
monotone, it can achieve the $one$-$dimensional$ optimal min-max rate for
non-parametric regression, up to the level of noise in the observations, and be
constructed in time $\mathcal{O}(d^2n\log n)$. All the constants in the
learning bounds, in the minimal number of samples required for our bounds to
hold, and in the computational complexity are at most low-order polynomials in
$d$.
[COMMENTS]
55 pages, 10 figures
[LINK]
http://arxiv.org/abs/2411.09686v1
[DATE]
2024-11-15 02:53:51+08:00
[CATEGORIES]
cs.LG
Towards a Classification of Open-Source ML Models and Datasets for Software Engineering
[AUTHORS]
Alexandra González, Xavier Franch, David Lo, Silverio Martínez-Fernández
[ABSTRACT]
Background: Open-Source Pre-Trained Models (PTMs) and datasets provide
extensive resources for various Machine Learning (ML) tasks, yet these
resources lack a classification tailored to Software Engineering (SE) needs.
Aims: We apply an SE-oriented classification to PTMs and datasets on a popular
open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs
over time. Method: We conducted a repository mining study. We started with a
systematically gathered database of PTMs and datasets from the HF API. Our
selection was refined by analyzing model and dataset cards and metadata, such
as tags, and confirming SE relevance using Gemini 1.5 Pro. All analyses are
replicable, with a publicly accessible replication package. Results: The most
common SE task among PTMs and datasets is code generation, with a primary focus
on software development and limited attention to software management. Popular
PTMs and datasets mainly target software development. Among ML tasks, text
generation is the most common in SE PTMs and datasets. There has been a marked
increase in PTMs for SE since 2023 Q2. Conclusions: This study underscores the
need for broader task coverage to enhance the integration of ML within SE
practices.
[COMMENTS]
5 pages, 8 figures
[LINK]
http://arxiv.org/abs/2411.09683v1
[DATE]
2024-11-15 02:52:05+08:00
[CATEGORIES]
cs.LG
NeuralDEM – Real-time Simulation of Industrial Particulate Flows
[AUTHORS]
Benedikt Alkin, Tobias Kronlachner, Samuele Papa, Stefan Pirker, Thomas Lichtenegger, Johannes Brandstetter
[ABSTRACT]
Advancements in computing power have made it possible to numerically simulate
large-scale fluid-mechanical and/or particulate systems, many of which are
integral to core industrial processes. Among the different numerical methods
available, the discrete element method (DEM) provides one of the most accurate
representations of a wide range of physical systems involving granular and
discontinuous materials. Consequently, DEM has become a widely accepted
approach for tackling engineering problems connected to granular flows and
powder mechanics. Additionally, DEM can be integrated with grid-based
computational fluid dynamics (CFD) methods, enabling the simulation of chemical
processes taking place, e.g., in fluidized beds. However, DEM is
computationally intensive because of the intrinsic multiscale nature of
particulate systems, restricting simulation duration or number of particles.
Towards this end, NeuralDEM presents an end-to-end approach to replace slow
numerical DEM routines with fast, adaptable deep learning surrogates. NeuralDEM
is capable of picturing long-term transport processes across different regimes
using macroscopic observables without any reference to microscopic model
parameters. First, NeuralDEM treats the Lagrangian discretization of DEM as an
underlying continuous field, while simultaneously modeling macroscopic behavior
directly as additional auxiliary fields. Second, NeuralDEM introduces
multi-branch neural operators scalable to real-time modeling of
industrially-sized scenarios - from slow and pseudo-steady to fast and
transient. Such scenarios have previously posed insurmountable challenges for
deep learning models. Notably, NeuralDEM faithfully models coupled CFD-DEM
fluidized bed reactors of 160k CFD cells and 500k DEM particles for
trajectories of 28s. NeuralDEM will open many new doors to advanced engineering
and much faster process cycles.
[COMMENTS]
Project page: https://nx-ai.github.io/NeuralDEM/
[LINK]
http://arxiv.org/abs/2411.09678v1
[DATE]
2024-11-15 02:44:31+08:00
[CATEGORIES]
cs.LG
Single-Loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions
[AUTHORS]
Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang
[ABSTRACT]
In this paper, we study a class of non-smooth non-convex problems in the form
of $\min_{x}[\max_{y\in Y}\phi(x, y) - \max_{z\in Z}\psi(x, z)]$, where both
$\Phi(x) = \max_{y\in Y}\phi(x, y)$ and $\Psi(x)=\max_{z\in Z}\psi(x, z)$ are
weakly convex functions, and $\phi(x, y), \psi(x, z)$ are strongly concave
functions in terms of $y$ and $z$, respectively. It covers two families of
problems that have been studied but are missing single-loop stochastic
algorithms, i.e., difference of weakly convex functions and weakly convex
strongly-concave min-max problems. We propose a stochastic Moreau envelope
approximate gradient method dubbed SMAG, the first single-loop algorithm for
solving these problems, and provide a state-of-the-art non-asymptotic
convergence rate. The key idea of the design is to compute an approximate
gradient of the Moreau envelopes of $\Phi, \Psi$ using only one step of
stochastic gradient update of the primal and dual variables. Empirically, we
conduct experiments on positive-unlabeled (PU) learning and partial area under
ROC curve (pAUC) optimization with an adversarial fairness regularizer to
validate the effectiveness of our proposed algorithms.
[LINK]
http://arxiv.org/abs/2405.18577v4
[DATE]
2024-11-15 02:27:48+08:00
[CATEGORIES]
cs.LG
How do Machine Learning Models Change?
[AUTHORS]
Joel Castaño, Rafael Cabañas, Antonio Salmerón, David Lo, Silverio Martínez-Fernández
[ABSTRACT]
The proliferation of Machine Learning (ML) models and their open-source
implementations has transformed Artificial Intelligence research and
applications. Platforms like Hugging Face (HF) enable the development, sharing,
and deployment of these models, fostering an evolving ecosystem. While previous
studies have examined aspects of models hosted on platforms like HF, a
comprehensive longitudinal study of how these models change remains
underexplored. This study addresses this gap by utilizing both repository
mining and longitudinal analysis methods to examine over 200,000 commits and
1,200 releases from over 50,000 models on HF. We replicate and extend an ML
change taxonomy for classifying commits and utilize Bayesian networks to
uncover patterns in commit and release activities over time. Our findings
indicate that commit activities align with established data science
methodologies, such as CRISP-DM, emphasizing iterative refinement and
continuous improvement. Additionally, release patterns tend to consolidate
significant updates, particularly in documentation, distinguishing between
granular changes and milestone-based releases. Furthermore, projects with
higher popularity prioritize infrastructure enhancements early in their
lifecycle, and those with intensive collaboration practices exhibit improved
documentation standards. These and other insights enhance the understanding of
model changes on community platforms and provide valuable guidance for best
practices in model maintenance.
[LINK]
http://arxiv.org/abs/2411.09645v1
[DATE]
2024-11-15 02:14:32+08:00
[CATEGORIES]
cs.LG
Neural Operators Can Play Dynamic Stackelberg Games
[AUTHORS]
Guillermo Alvarez, Ibrahim Ekren, Anastasis Kratsios, Xuwei Yang
[ABSTRACT]
Dynamic Stackelberg games are a broad class of two-player games in which the
leader acts first, and the follower chooses a response strategy to the leader’s
strategy. Unfortunately, only stylized Stackelberg games are explicitly
solvable since the follower’s best-response operator (as a function of the
control of the leader) is typically analytically intractable. This paper
addresses this issue by showing that the \textit{follower’s best-response
operator} can be approximately implemented by an \textit{attention-based neural
operator}, uniformly on compact subsets of adapted open-loop controls for the
leader. We further show that the value of the Stackelberg game where the
follower uses the approximate best-response operator approximates the value of
the original Stackelberg game. Our main result is obtained using our universal
approximation theorem for attention-based neural operators between spaces of
square-integrable adapted stochastic processes, as well as stability results
for a general class of Stackelberg games.
[LINK]
http://arxiv.org/abs/2411.09644v1
[DATE]
2024-11-15 02:12:06+08:00
[CATEGORIES]
cs.LG
Counterfactual Uncertainty Quantification of Factual Estimand of Efficacy from Before-and-After Treatment Repeated Measures Randomized Controlled Trials
[AUTHORS]
Xingya Wang, Yang Han, Yushi Liu, Szu-Yu Tang, Jason C. Hsu
[ABSTRACT]
The ideal estimand for comparing a new treatment $Rx$ with a control $C$ is
the $\textit{counterfactual}$ efficacy $Rx:C$, the expected differential
outcome between $Rx$ and $C$ if each patient were given $\textit{both}$. While
counterfactual $\textit{point estimation}$ from $\textit{factual}$ Randomized
Controlled Trials (RCTs) has been available, this article shows
$\textit{counterfactual}$ uncertainty quantification (CUQ), quantifying
uncertainty for factual point estimates but in a counterfactual setting, is
surprisingly achievable. We achieve CUQ whose variability is typically smaller
than factual UQ, by creating a new statistical modeling principle called ETZ
which is applicable to RCTs with $\textit{Before-and-After}$ treatment Repeated
Measures, common in many therapeutic areas.
We urge caution when estimate of the unobservable true condition of a patient
before treatment has measurement error, because that violation of standard
regression assumption can cause attenuation in estimating treatment effects.
Fortunately, we prove that, for traditional medicine in general, and for
targeted therapy with efficacy defined as averaged over the population,
counterfactual point estimation is unbiased. However, for targeted therapy,
both Real Human and Digital Twins approaches should respect this limitation,
lest predicted treatment effect in $\textit{subgroups}$ will have bias.
[COMMENTS]
39 pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.09635v1
[DATE]
2024-11-15 02:01:02+08:00
[CATEGORIES]
cs.LG
SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation
[AUTHORS]
Mikhail Khodak, Lester Mackey, Alexandra Chouldechova, Miroslav Dudík
[ABSTRACT]
Disaggregated evaluation – estimation of performance of a machine learning
model on different subpopulations – is a core task when assessing performance
and group-fairness of AI systems. A key challenge is that evaluation data is
scarce, and subpopulations arising from intersections of attributes (e.g.,
race, sex, age) are often tiny. Today, it is common for multiple clients to
procure the same AI model from a model developer, and the task of disaggregated
evaluation is faced by each customer individually. This gives rise to what we
call the multi-task disaggregated evaluation problem, wherein multiple clients
seek to conduct a disaggregated evaluation of a given model in their own data
setting (task). In this work we develop a disaggregated evaluation method
called SureMap that has high estimation accuracy for both multi-task and
single-task disaggregated evaluations of blackbox models. SureMap’s efficiency
gains come from (1) transforming the problem into structured simultaneous
Gaussian mean estimation and (2) incorporating external data, e.g., from the AI
system creator or from their other clients. Our method combines maximum a
posteriori (MAP) estimation using a well-chosen prior together with
cross-validation-free tuning via Stein’s unbiased risk estimate (SURE). We
evaluate SureMap on disaggregated evaluation tasks in multiple domains,
observing significant accuracy improvements over several strong competitors.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.09730v1
[DATE]
2024-11-15 01:53:35+08:00
[CATEGORIES]
cs.LG
Local deployment of large-scale music AI models on commodity hardware
[AUTHORS]
Xun Zhou, Charlie Ruan, Zihe Zhao, Tianqi Chen, Chris Donahue
[ABSTRACT]
We present the MIDInfinite, a web application capable of generating symbolic
music using a large-scale generative AI model locally on commodity hardware.
Creating this demo involved porting the Anticipatory Music Transformer, a large
language model (LLM) pre-trained on the Lakh MIDI dataset, to the Machine
Learning Compilation (MLC) framework. Once the model is ported, MLC facilitates
inference on a variety of runtimes including C++, mobile, and the browser. We
envision that MLC has the potential to bridge the gap between the landscape of
increasingly capable music AI models and technology more familiar to music
software developers. As a proof of concept, we build a web application that
allows users to generate endless streams of multi-instrumental MIDI in the
browser, either from scratch or conditioned on a prompt. On commodity hardware
(an M3 Macbook Pro), our demo can generate 51 notes per second, which is faster
than real-time playback for 72.9% of generations, and increases to 86.3% with 2
seconds of upfront buffering.
[COMMENTS]
2 pages
[LINK]
http://arxiv.org/abs/2411.09625v1
[DATE]
2024-11-15 01:49:27+08:00
[CATEGORIES]
cs.LG
MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI
[AUTHORS]
Nancy R. Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E. Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak, Dominika Ciupek, Antonio Tristán Vega, Santiago Aja-Fernández, Maciej Malawski, Gani Ouedraogo, Julia Machnio, Christian Ewert, Paul M. Thompson, Neda Jahanshad, Eleftherios Garyfallidis, Bennett A. Landman
[ABSTRACT]
White matter alterations are increasingly implicated in neurological diseases
and their progression. International-scale studies use diffusion-weighted
magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white
matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI
data is hindered by inconsistencies stemming from varying acquisition
protocols. There is a pressing need to harmonize the preprocessing of DW-MRI
datasets to ensure the derivation of robust quantitative diffusion metrics
across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants
were provided raw data from the same individuals collected on the same scanner
but with two different acquisitions and tasked with preprocessing the DW-MRI to
minimize acquisition differences while retaining biological variation.
Submissions are evaluated on the reproducibility and comparability of
cross-acquisition bundle-wise microstructure measures, bundle shape features,
and connectomics. The key innovations of the QuantConn challenge are that (1)
we assess bundles and tractography in the context of harmonization for the
first time, (2) we assess connectomics in the context of harmonization for the
first time, and (3) we have 10x additional subjects over prior harmonization
challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area,
fractional anisotropy, connectome assortativity, betweenness centrality, edge
count, modularity, nodal strength, and participation coefficient measures are
most biased by acquisition and that machine learning voxel-wise correction,
RISH mapping, and NeSH methods effectively reduce these biases. In addition,
microstructure measures AD, MD, RD, bundle length, connectome density,
efficiency, and path length are least biased by these acquisition differences.
[COMMENTS]
Accepted for publication at the Journal of Machine Learning for
Biomedical Imaging (MELBA) https://melba-journal.org/2024/019
[LINK]
http://arxiv.org/abs/2411.09618v1
[DATE]
2024-11-15 01:37:19+08:00
[CATEGORIES]
cs.LG
Latency Optimization in LEO Satellite Communications with Hybrid Beam Pattern and Interference Control
[AUTHORS]
Qianqian Zhang, Ye Hu, Minchae Jung
[ABSTRACT]
The rapid advancement of low Earth orbit (LEO) satellite communication
systems has significantly enhanced global connectivity, offering high-capacity,
low-latency services crucial for next-generation applications. However, the
dense configuration of LEO constellations poses challenges in resource
allocation optimization and interference management, complicating coexistence
with other communication systems. To address these limitations, this paper
proposes a novel framework for optimizing the beam scheduling and resource
allocation in multi-beam LEO systems. To satisfy the uneven terrestrial traffic
demand, a hybrid beam pattern is employed to enhance the downlink quality of
service and minimize the transmission latency from LEO satellites to ground
user terminals. Additionally, a dynamic co-channel interference (CCI) control
mechanism is developed to mitigate inter-beam interference within the LEO
constellation and limit cross-system interference affecting protected users
from other networks. The problem of user-beam-frequency allocation with power
optimization is formulated as a mixed-integer dynamic programming model and
solved using a low-complexity neural network-based graph generation algorithm.
Simulation results show that the proposed approach outperforms the baseline
methods of full frequency reuse and single-channel transmission, and highlights
the potential for further performance improvement with multi-user
transmissions.
[LINK]
http://arxiv.org/abs/2411.09600v1
[DATE]
2024-11-15 01:18:24+08:00
[CATEGORIES]
cs.LG
Stable Consistency Tuning: Understanding and Improving Consistency Models
[AUTHORS]
Fu-Yun Wang, Zhengyang Geng, Hongsheng Li
[ABSTRACT]
Diffusion models achieve superior generation quality but suffer from slow
generation speed due to the iterative nature of denoising. In contrast,
consistency models, a new generative family, achieve competitive performance
with significantly faster sampling. These models are trained either through
consistency distillation, which leverages pretrained diffusion models, or
consistency training/tuning directly from raw data. In this work, we propose a
novel framework for understanding consistency models by modeling the denoising
process of the diffusion model as a Markov Decision Process (MDP) and framing
consistency model training as the value estimation through Temporal
Difference~(TD) Learning. More importantly, this framework allows us to analyze
the limitations of current consistency training/tuning strategies. Built upon
Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT),
which incorporates variance-reduced learning using the score identity. SCT
leads to significant performance improvements on benchmarks such as CIFAR-10
and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID
1.55, a new SoTA for consistency models.
[COMMENTS]
Code is available at
https://github.com/G-U-N/Stable-Consistency-Tuning
[LINK]
http://arxiv.org/abs/2410.18958v2
[DATE]
2024-11-15 01:06:55+08:00
[CATEGORIES]
cs.LG
Physics-informed neural networks (PINNs) for numerical model error approximation and superresolution
[AUTHORS]
Bozhou Zhuang, Sashank Rana, Brandon Jones, Danny Smyl
[ABSTRACT]
Numerical modeling errors are unavoidable in finite element analysis. The
presence of model errors inherently reflects both model accuracy and
uncertainty. To date there have been few methods for explicitly quantifying
errors at points of interest (e.g. at finite element nodes). The lack of
explicit model error approximators has been addressed recently with the
emergence of machine learning (ML), which closes the loop between numerical
model features/solutions and explicit model error approximations. In this
paper, we propose physics-informed neural networks (PINNs) for simultaneous
numerical model error approximation and superresolution. To test our approach,
numerical data was generated using finite element simulations on a
two-dimensional elastic plate with a central opening. Four- and eight-node
quadrilateral elements were used in the discretization to represent the
reduced-order and higher-order models, respectively. It was found that the
developed PINNs effectively predict model errors in both x and y displacement
fields with small differences between predictions and ground truth. Our
findings demonstrate that the integration of physics-informed loss functions
enables neural networks (NNs) to surpass a purely data-driven approach for
approximating model errors.
[LINK]
http://arxiv.org/abs/2411.09728v1
[DATE]
2024-11-15 01:03:09+08:00
[CATEGORIES]
cs.LG
From Imitation to Refinement – Residual RL for Precise Assembly
[AUTHORS]
Lars Ankile, Anthony Simeonov, Idan Shenfeld, Marcel Torne, Pulkit Agrawal
[ABSTRACT]
Advances in behavior cloning (BC), like action-chunking and diffusion, have
enabled impressive capabilities. Still, imitation alone remains insufficient
for learning reliable policies for tasks requiring precise aligning and
inserting of objects, like assembly. Our key insight is that chunked BC
policies effectively function as trajectory planners, enabling long-horizon
tasks. Conversely, as they execute action chunks open-loop, they lack the
fine-grained reactivity necessary for reliable execution. Further, we find that
the performance of BC policies saturates despite increasing data. Reinforcement
learning (RL) is a natural way to overcome BC’s limitations, but it is not
straightforward to apply directly to action-chunked models like diffusion
policies. We present a simple yet effective method, ResiP (Residual for Precise
Manipulation), that sidesteps these challenges by augmenting a frozen, chunked
BC model with a fully closed-loop residual policy trained with RL. The residual
policy is trained via on-policy RL, addressing distribution shifts and
introducing reactive control without altering the BC trajectory planner.
Evaluation on high-precision manipulation tasks demonstrates strong performance
of ResiP over BC methods and direct RL fine-tuning. Videos, code, and data are
available at https://residual-assembly.github.io.
[COMMENTS]
Project website: https://residual-assembly.github.io
[LINK]
http://arxiv.org/abs/2407.16677v3
[DATE]
2024-11-15 00:54:02+08:00
[CATEGORIES]
cs.LG
Causal Discovery and Classification Using Lempel-Ziv Complexity
[AUTHORS]
Dhruthi, Nithin Nagaraj, Harikrishnan N B
[ABSTRACT]
Inferring causal relationships in the decision-making processes of machine
learning algorithms is a crucial step toward achieving explainable Artificial
Intelligence (AI). In this research, we introduce a novel causality measure and
a distance metric derived from Lempel-Ziv (LZ) complexity. We explore how the
proposed causality measure can be used in decision trees by enabling splits
based on features that most strongly \textit{cause} the outcome. We further
evaluate the effectiveness of the causality-based decision tree and the
distance-based decision tree in comparison to a traditional decision tree using
Gini impurity. While the proposed methods demonstrate comparable classification
performance overall, the causality-based decision tree significantly
outperforms both the distance-based decision tree and the Gini-based decision
tree on datasets generated from causal models. This result indicates that the
proposed approach can capture insights beyond those of classical decision
trees, especially in causally structured data. Based on the features used in
the LZ causal measure based decision tree, we introduce a causal strength for
each features in the dataset so as to infer the predominant causal variables
for the occurrence of the outcome.
[COMMENTS]
17 pages, 8 figures, 5 tables
[LINK]
http://arxiv.org/abs/2411.01881v2
[DATE]
2024-11-15 00:17:40+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Guangyi Wang, Wei Peng, Lijiang Li, Wenyu Chen, Yuren Cai, Songzhi Su [ABSTRACT]
Diffusion Probabilistic Models (DPMs) have demonstrated exceptional
performance in generative tasks, but this comes at the expense of sampling
efficiency. To enhance sampling speed without sacrificing quality, various
distillation-based accelerated sampling algorithms have been recently proposed.
However, they typically require significant additional training costs and model
parameter storage, which limit their practical application. In this work, we
propose PCA-based Adaptive Search (PAS), which optimizes existing solvers for
DPMs with minimal learnable parameters and training costs. Specifically, we
first employ PCA to obtain a few orthogonal unit basis vectors to span the
high-dimensional sampling space, which enables us to learn just a set of
coordinates to correct the sampling direction; furthermore, based on the
observation that the cumulative truncation error exhibits an “S”-shape, we
design an adaptive search strategy that further enhances the sampling
efficiency and reduces the number of stored parameters to approximately 10.
Extensive experiments demonstrate that PAS can significantly enhance existing
fast solvers in a plug-and-play manner with negligible costs. For instance, on
CIFAR10, PAS requires only 12 parameters and less than 1 minute of training on
a single NVIDIA A100 GPU to optimize the DDIM from 15.69 FID (NFE=10) to 4.37. [LINK]
http://arxiv.org/abs/2411.06503v2 [DATE]
2024-11-15 00:15:20+08:00 [CATEGORIES]
cs.LG
Knowledge Bases in Support of Large Language Models for Processing Web News
[AUTHORS]
Yihe Zhang, Nabin Pakka, Nian-Feng Tzeng
[ABSTRACT]
Large Language Models (LLMs) have received considerable interest in wide
applications lately. During pre-training via massive datasets, such a model
implicitly memorizes the factual knowledge of trained datasets in its hidden
parameters. However, knowledge held implicitly in parameters often makes its
use by downstream applications ineffective due to the lack of common-sense
reasoning. In this article, we introduce a general framework that permits to
build knowledge bases with an aid of LLMs, tailored for processing Web news.
The framework applies a rule-based News Information Extractor (NewsIE) to news
items for extracting their relational tuples, referred to as knowledge bases,
which are then graph-convoluted with the implicit knowledge facts of news items
obtained by LLMs, for their classification. It involves two lightweight
components: 1) NewsIE: for extracting the structural information of every news
item, in the form of relational tuples; 2) BERTGraph: for graph convoluting the
implicit knowledge facts with relational tuples extracted by NewsIE. We have
evaluated our framework under different news-related datasets for news category
classification, with promising experimental results.
[COMMENTS]
10 pages, 5 figures
[LINK]
http://arxiv.org/abs/2411.08278v2
[DATE]
2024-11-14 23:49:46+08:00
[CATEGORIES]
cs.CL
The Use of Readability Metrics in Legal Text: A Systematic Literature Review
[AUTHORS]
Yu Han, Aaron Ceross, Jeroen H. M. Bergmann
[ABSTRACT]
Understanding the text in legal documents can be challenging due to their
complex structure and the inclusion of domain-specific jargon. Laws and
regulations are often crafted in such a manner that engagement with them
requires formal training, potentially leading to vastly different
interpretations of the same texts. Linguistic complexity is an important
contributor to the difficulties experienced by readers. Simplifying texts could
enhance comprehension across a broader audience, not just among trained
professionals. Various metrics have been developed to measure document
readability. Therefore, we adopted a systematic review approach to examine the
linguistic and readability metrics currently employed for legal and regulatory
texts. A total of 3566 initial papers were screened, with 34 relevant studies
found and further assessed. Our primary objective was to identify which current
metrics were applied for evaluating readability within the legal field. Sixteen
different metrics were identified, with the Flesch-Kincaid Grade Level being
the most frequently used method. The majority of studies (73.5%) were found in
the domain of “informed consent forms”. From the analysis, it is clear that not
all legal domains are well represented in terms of readability metrics and that
there is a further need to develop more consensus on which metrics should be
applied for legal documents.
[LINK]
http://arxiv.org/abs/2411.09497v1
[DATE]
2024-11-14 23:04:17+08:00
[CATEGORIES]
cs.CL
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs
[AUTHORS]
Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing Zhao
[ABSTRACT]
Large language models (LLMs) excel in high-resource languages but face
notable challenges in low-resource languages like Mongolian. This paper
addresses these challenges by categorizing capabilities into language abilities
(syntax and semantics) and cognitive abilities (knowledge and reasoning). To
systematically evaluate these areas, we developed MM-Eval, a specialized
dataset based on Modern Mongolian Language Textbook I and enriched with WebQSP
and MGSM datasets.
Preliminary experiments on models including Qwen2-7B-Instruct, GLM4-9b-chat,
Llama3.1-8B-Instruct, GPT-4, and DeepseekV2.5 revealed that: 1) all models
performed better on syntactic tasks than semantic tasks, highlighting a gap in
deeper language understanding; and 2) knowledge tasks showed a moderate
decline, suggesting that models can transfer general knowledge from
high-resource to low-resource contexts.
The release of MM-Eval, comprising 569 syntax, 677 semantics, 344 knowledge,
and 250 reasoning tasks, offers valuable insights for advancing NLP and LLMs in
low-resource languages like Mongolian. The dataset is available at
https://github.com/joenahm/MM-Eval.
[LINK]
http://arxiv.org/abs/2411.09492v1
[DATE]
2024-11-14 22:58:38+08:00
[CATEGORIES]
cs.CL
Improving Arabic Multi-Label Emotion Classification using Stacked Embeddings and Hybrid Loss Function
[AUTHORS]
Muhammad Azeem Aslam, Wang Jun, Nisar Ahmed, Muhammad Imran Zaman, Li Yanan, Hu Hongfei, Wang Shiyu, Xin Liu
[ABSTRACT]
In multi-label emotion classification, particularly for low-resource
languages like Arabic, the challenges of class imbalance and label correlation
hinder model performance, especially in accurately predicting minority
emotions. To address these issues, this study proposes a novel approach that
combines stacked embeddings, meta-learning, and a hybrid loss function to
enhance multi-label emotion classification for the Arabic language. The study
extracts contextual embeddings from three fine-tuned language
models-ArabicBERT, MarBERT, and AraBERT-which are then stacked to form enriched
embeddings. A meta-learner is trained on these stacked embeddings, and the
resulting concatenated representations are provided as input to a Bi-LSTM
model, followed by a fully connected neural network for multi-label
classification. To further improve performance, a hybrid loss function is
introduced, incorporating class weighting, label correlation matrix, and
contrastive learning, effectively addressing class imbalances and improving the
handling of label correlations. Extensive experiments validate the proposed
model’s performance across key metrics such as Precision, Recall, F1-Score,
Jaccard Accuracy, and Hamming Loss. The class-wise performance analysis
demonstrates the hybrid loss function’s ability to significantly reduce
disparities between majority and minority classes, resulting in a more balanced
emotion classification. An ablation study highlights the contribution of each
component, showing the superiority of the model compared to baseline approaches
and other loss functions. This study not only advances multi-label emotion
classification for Arabic but also presents a generalizable framework that can
be adapted to other languages and domains, providing a significant step forward
in addressing the challenges of low-resource emotion classification tasks.
[COMMENTS]
The paper is submitted in Scientific Reports and is currently under
review
[LINK]
http://arxiv.org/abs/2410.03979v3
[DATE]
2024-11-14 22:34:13+08:00
[CATEGORIES]
cs.CL
Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric
[AUTHORS]
Hyukhun Koh, Dohyung Kim, Minwoo Lee, Kyomin Jung
[COMMENTS]
8 page long
[LINK]
http://arxiv.org/abs/2402.06900v5
[DATE]
2024-11-14 22:28:58+08:00
[CATEGORIES]
cs.CL
SLIMER-IT: Zero-Shot NER on Italian Language
[AUTHORS]
Andrew Zamai, Leonardo Rigutini, Marco Maggini, Andrea Zugarini
[ABSTRACT]
Traditional approaches to Named Entity Recognition (NER) frame the task into
a BIO sequence labeling problem. Although these systems often excel in the
downstream task at hand, they require extensive annotated data and struggle to
generalize to out-of-distribution input domains and unseen entity types. On the
contrary, Large Language Models (LLMs) have demonstrated strong zero-shot
capabilities. While several works address Zero-Shot NER in English, little has
been done in other languages. In this paper, we define an evaluation framework
for Zero-Shot NER, applying it to the Italian language. Furthermore, we
introduce SLIMER-IT, the Italian version of SLIMER, an instruction-tuning
approach for zero-shot NER leveraging prompts enriched with definition and
guidelines. Comparisons with other state-of-the-art models, demonstrate the
superiority of SLIMER-IT on never-seen-before entity tags.
[LINK]
http://arxiv.org/abs/2409.15933v2
[DATE]
2024-11-14 21:59:15+08:00
[CATEGORIES]
cs.CL
Robot Tasks with Fuzzy Time Requirements from Natural Language Instructions
[AUTHORS]
Sascha Sucker, Michael Neubauer, Dominik Henrich
[ABSTRACT]
Natural language allows robot programming to be accessible to everyone.
However, the inherent fuzziness in natural language poses challenges for
inflexible, traditional robot systems. We focus on instructions with fuzzy time
requirements (e.g., “start in a few minutes”). Building on previous robotics
research, we introduce fuzzy skills. These define an execution by the robot
with so-called satisfaction functions representing vague execution time
requirements. Such functions express a user’s satisfaction over potential
starting times for skill execution. When the robot handles multiple fuzzy
skills, the satisfaction function provides a temporal tolerance window for
execution, thus, enabling optimal scheduling based on satisfaction. We
generalized such functions based on individual user expectations with a user
study. The participants rated their satisfaction with an instruction’s
execution at various times. Our investigations reveal that trapezoidal
functions best approximate the users’ satisfaction. Additionally, the results
suggest that users are more lenient if the execution is specified further into
the future.
[COMMENTS]
9 pages, 8 figures, to be published in 2024 IEEE International
Conference on Robotic Computing (IRC)
[LINK]
http://arxiv.org/abs/2411.09436v1
[DATE]
2024-11-14 21:34:16+08:00
[CATEGORIES]
cs.CL
Less is More: Unseen Domain Fake News Detection via Causal Propagation Substructures
[AUTHORS]
Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris
[ABSTRACT]
The spread of fake news on social media poses significant threats to
individuals and society. Text-based and graph-based models have been employed
for fake news detection by analysing news content and propagation networks,
showing promising results in specific scenarios. However, these data-driven
models heavily rely on pre-existing in-distribution data for training, limiting
their performance when confronted with fake news from emerging or previously
unseen domains, known as out-of-distribution (OOD) data. Tackling OOD fake news
is a challenging yet critical task. In this paper, we introduce the Causal
Subgraph-oriented Domain Adaptive Fake News Detection (CSDA) model, designed to
enhance zero-shot fake news detection by extracting causal substructures from
propagation graphs using in-distribution data and generalising this approach to
OOD data. The model employs a graph neural network based mask generation
process to identify dominant nodes and edges within the propagation graph,
using these substructures for fake news detection. Additionally, the
performance of CSDA is further improved through contrastive learning in
few-shot scenarios, where a limited amount of OOD data is available for
training. Extensive experiments on public social media datasets demonstrate
that CSDA effectively handles OOD fake news detection, achieving a 7 to 16
percents accuracy improvement over other state-of-the-art models.
[COMMENTS]
9 pages, 2 figures, 5 tables
[LINK]
http://arxiv.org/abs/2411.09389v1
[DATE]
2024-11-14 20:05:35+08:00
[CATEGORIES]
cs.CL
cs.LG
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
[AUTHORS]
Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong
[ABSTRACT]
It is widely acknowledged that large language models (LLMs) encode a vast
reservoir of knowledge after being trained on mass data. Recent studies
disclose knowledge conflicts in LLM generation, wherein outdated or incorrect
parametric knowledge (i.e., encoded knowledge) contradicts new knowledge
provided in the context. To mitigate such knowledge conflicts, we propose a
novel framework, IRCAN (Identifying and Reweighting Context-Aware Neurons) to
capitalize on neurons that are crucial in processing contextual cues.
Specifically, IRCAN first identifies neurons that significantly contribute to
context processing, utilizing a context-aware attribution score derived from
integrated gradients. Subsequently, the identified context-aware neurons are
strengthened via reweighting. In doing so, we steer LLMs to generate
context-sensitive outputs with respect to the new knowledge provided in the
context. Extensive experiments conducted across a variety of models and tasks
demonstrate that IRCAN not only achieves remarkable improvements in handling
knowledge conflicts but also offers a scalable, plug-and-play solution that can
be integrated seamlessly with existing models. Our codes are released at
https://github.com/danshi777/IRCAN.
[COMMENTS]
NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.18406v2
[DATE]
2024-11-14 18:55:14+08:00
[CATEGORIES]
cs.CL
Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
[AUTHORS]
Zixing Zhang, Zhongren Dong, Weixiang Xu, Jing Han
[ABSTRACT]
With the increasing implementation of machine learning models on edge or
Internet-of-Things (IoT) devices, deploying advanced models on
resource-constrained IoT devices remains challenging. Transformer models, a
currently dominant neural architecture, have achieved great success in broad
domains but their complexity hinders its deployment on IoT devices with limited
computation capability and storage size. Although many model compression
approaches have been explored, they often suffer from notorious performance
degradation. To address this issue, we introduce a new method, namely
Transformer Re-parameterization, to boost the performance of lightweight
Transformer models. It consists of two processes: the High-Rank Factorization
(HRF) process in the training stage and the deHigh-Rank Factorization (deHRF)
process in the inference stage. In the former process, we insert an additional
linear layer before the Feed-Forward Network (FFN) of the lightweight
Transformer. It is supposed that the inserted HRF layers can enhance the model
learning capability. In the later process, the auxiliary HRF layer will be
merged together with the following FFN layer into one linear layer and thus
recover the original structure of the lightweight model. To examine the
effectiveness of the proposed method, we evaluate it on three widely used
Transformer variants, i.e., ConvTransformer, Conformer, and SpeechFormer
networks, in the application of speech emotion recognition on the IEMOCAP, M3ED
and DAIC-WOZ datasets. Experimental results show that our proposed method
consistently improves the performance of lightweight Transformers, even making
them comparable to large models. The proposed re-parameterization approach
enables advanced Transformer models to be deployed on resource-constrained IoT
devices.
[LINK]
http://arxiv.org/abs/2411.09339v1
[DATE]
2024-11-14 18:36:19+08:00
[CATEGORIES]
cs.CL
Continuous Rating as Reliable Human Evaluation of Simultaneous Speech Translation
[AUTHORS]
Dávid Javorský, Dominik Macháček, Ondřej Bojar
[ABSTRACT]
Simultaneous speech translation (SST) can be evaluated on simulated online
events where human evaluators watch subtitled videos and continuously express
their satisfaction by pressing buttons (so called Continuous Rating).
Continuous Rating is easy to collect, but little is known about its
reliability, or relation to comprehension of foreign language document by SST
users. In this paper, we contrast Continuous Rating with factual questionnaires
on judges with different levels of source language knowledge. Our results show
that Continuous Rating is easy and reliable SST quality assessment if the
judges have at least limited knowledge of the source language. Our study
indicates users’ preferences on subtitle layout and presentation style and,
most importantly, provides a significant evidence that users with advanced
source language knowledge prefer low latency over fewer re-translations.
[COMMENTS]
Published at WMT 2022: https://aclanthology.org/2022.wmt-1.9/
[LINK]
http://arxiv.org/abs/2203.02458v2
[DATE]
2024-11-14 18:15:37+08:00
[CATEGORIES]
cs.CL
DTELS: Towards Dynamic Granularity of Timeline Summarization
[AUTHORS]
Chenlong Zhang, Tong Zhou, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao
[ABSTRACT]
The rapid proliferation of online news has posed significant challenges in
tracking the continuous development of news topics. Traditional timeline
summarization constructs a chronological summary of the events but often lacks
the flexibility to meet the diverse granularity needs. To overcome this
limitation, we introduce a new paradigm, Dynamic-granularity TimELine
Summarization, (DTELS), which aims to construct adaptive timelines based on
user instructions or requirements. This paper establishes a comprehensive
benchmark for DTLES that includes: (1) an evaluation framework grounded in
journalistic standards to assess the timeline quality across four dimensions:
Informativeness, Granular Consistency, Factuality, and Coherence; (2) a
large-scale, multi-source dataset with multiple granularity timeline
annotations based on a consensus process to facilitate authority; (3) extensive
experiments and analysis with two proposed solutions based on Large Language
Models (LLMs) and existing state-of-the-art TLS methods. The experimental
results demonstrate the effectiveness of LLM-based solutions. However, even the
most advanced LLMs struggle to consistently generate timelines that are both
informative and granularly consistent, highlighting the challenges of the DTELS
task.
[COMMENTS]
Under review
[LINK]
http://arxiv.org/abs/2411.09297v1
[DATE]
2024-11-14 17:16:48+08:00
[CATEGORIES]
cs.CL
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
[AUTHORS]
Dilxat Muhtar, Yelong Shen, Yaming Yang, Xiaodong Liu, Yadong Lu, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Xueliang Zhang, Jianfeng Gao, Weizhu Chen, Qi Zhang
[ABSTRACT]
In-context learning (ICL) allows large language models (LLMs) to adapt to new
tasks directly from the given demonstrations without requiring gradient
updates. While recent advances have expanded context windows to accommodate
more demonstrations, this approach increases inference costs without
necessarily improving performance. To mitigate these issues, We propose
StreamAdapter, a novel approach that directly updates model parameters from
context at test time, eliminating the need for explicit in-context
demonstrations. StreamAdapter employs context mapping and weight absorption
mechanisms to dynamically transform ICL demonstrations into parameter updates
with minimal additional parameters. By reducing reliance on numerous in-context
examples, StreamAdapter significantly reduce inference costs and allows for
efficient inference with constant time complexity, regardless of demonstration
count. Extensive experiments across diverse tasks and model architectures
demonstrate that StreamAdapter achieves comparable or superior adaptation
capability to ICL while requiring significantly fewer demonstrations. The
superior task adaptation and context encoding capabilities of StreamAdapter on
both language understanding and generation tasks provides a new perspective for
adapting LLMs at test time using context, allowing for more efficient
adaptation across scenarios and more cost-effective inference
[COMMENTS]
22 Pages, 9 Figures
[LINK]
http://arxiv.org/abs/2411.09289v1
[DATE]
2024-11-14 17:03:54+08:00
[CATEGORIES]
cs.CL
More Expressive Attention with Negative Weights
[AUTHORS]
Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan
[ABSTRACT]
We propose a novel attention mechanism, named Cog Attention, that enables
attention weights to be negative for enhanced expressiveness, which stems from
two key factors: (1) Cog Attention can shift the token deletion and copying
function from a static OV matrix to dynamic QK inner products, with the OV
matrix now focusing more on refinement or modification. The attention head can
simultaneously delete, copy, or retain tokens by assigning them negative,
positive, or minimal attention weights, respectively. As a result, a single
attention head becomes more flexible and expressive. (2) Cog Attention improves
the model’s robustness against representational collapse, which can occur when
earlier tokens are over-squashed into later positions, leading to homogeneous
representations. Negative weights reduce effective information paths from
earlier to later tokens, helping to mitigate this issue. We develop
Transformer-like models which use Cog Attention as attention modules, including
decoder-only models for language modeling and U-ViT diffusion models for image
generation. Experiments show that models using Cog Attention exhibit superior
performance compared to those employing traditional softmax attention modules.
Our approach suggests a promising research direction for rethinking and
breaking the entrenched constraints of traditional softmax attention, such as
the requirement for non-negative weights.
[LINK]
http://arxiv.org/abs/2411.07176v2
[DATE]
2024-11-14 16:20:22+08:00
[CATEGORIES]
cs.CL
cs.LG
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
[AUTHORS]
Xuannan Liu, Xing Cui, Peipei Li, Zekun Li, Huaibo Huang, Shuhan Xia, Miaoxuan Zhang, Yueying Zou, Ran He
[ABSTRACT]
The rapid evolution of multimodal foundation models has led to significant
advancements in cross-modal understanding and generation across diverse
modalities, including text, images, audio, and video. However, these models
remain susceptible to jailbreak attacks, which can bypass built-in safety
mechanisms and induce the production of potentially harmful content.
Consequently, understanding the methods of jailbreak attacks and existing
defense mechanisms is essential to ensure the safe deployment of multimodal
generative models in real-world scenarios, particularly in security-sensitive
applications. To provide comprehensive insight into this topic, this survey
reviews jailbreak and defense in multimodal generative models. First, given the
generalized lifecycle of multimodal jailbreak, we systematically explore
attacks and corresponding defense strategies across four levels: input,
encoder, generator, and output. Based on this analysis, we present a detailed
taxonomy of attack methods, defense mechanisms, and evaluation frameworks
specific to multimodal generative models. Additionally, we cover a wide range
of input-output configurations, including modalities such as Any-to-Text,
Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight
current research challenges and propose potential directions for future
research.The open-source repository corresponding to this work can be found at
https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak.
[COMMENTS]
ongoing work
[LINK]
http://arxiv.org/abs/2411.09259v1
[DATE]
2024-11-14 15:51:51+08:00
[CATEGORIES]
cs.CL
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine
[AUTHORS]
Jean Seo, Jongwon Lim, Dongjun Jang, Hyopil Shin
[ABSTRACT]
We introduce DAHL, a benchmark dataset and automated evaluation system
designed to assess hallucination in long-form text generation, specifically
within the biomedical domain. Our benchmark dataset, meticulously curated from
biomedical research papers, consists of 8,573 questions across 29 categories.
DAHL evaluates fact-conflicting hallucinations in Large Language Models (LLMs)
by deconstructing responses into atomic units, each representing a single piece
of information. The accuracy of these responses is averaged to produce the DAHL
Score, offering a more in-depth evaluation of hallucinations compared to
previous methods that rely on multiple-choice tasks. We conduct experiments
with 8 different models, finding that larger models tend to hallucinate less;
however, beyond a model size of 7 to 8 billion parameters, further scaling does
not significantly improve factual accuracy. The DAHL Score holds potential as
an efficient alternative to human-annotated preference labels, being able to be
expanded to other specialized domains. We release the dataset and code in
public.
[COMMENTS]
EMNLP2024/FEVER
[LINK]
http://arxiv.org/abs/2411.09255v1
[DATE]
2024-11-14 15:41:34+08:00
[CATEGORIES]
cs.CL
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR
[AUTHORS]
Minghan Wang, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari
[COMMENTS]
Accepted to EMNLP 2024 Findings
[LINK]
http://arxiv.org/abs/2406.10880v2
[DATE]
2024-11-14 15:01:07+08:00
[CATEGORIES]
cs.CL
Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?
[AUTHORS]
Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani
[ABSTRACT]
Small Language Models (SLMs) are generally considered more compact versions
of large language models (LLMs). This study investigates the ability of SLMs
with parameters between 1 and 3 billion to learn, retain, and subsequently
eliminate different types of noise present in the data. Four pre-trained SLMs
were utilized for this: Olmo 1B, Qwen1.5 1.8B, Gemma 2B, and Phi2 2.7B. The
models were instruction-tuned on noise-free data and tested using in-context
examples to determine if they could learn noise through examples. Subsequently,
noise patterns were introduced in instruction tuning to evaluate the noise
learning, unlearning, and retention capabilities of the models. Olmo, the
smallest model, was highly sensitive to noise, quickly adapting to noisy
patterns. Phi2 resisted learning character-level and transliteration noise,
likely due to its carefully curated, structured, and high-quality pretraining
data. Gemma excelled with transliteration noise, likely benefiting from its
multilingual pretraining. The findings can be used to develop robust training
strategies for SLMs.
[LINK]
http://arxiv.org/abs/2407.00996v2
[DATE]
2024-11-14 14:55:27+08:00
[CATEGORIES]
cs.CL
cs.LG
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering
[AUTHORS]
Nghia Trung Ngo, Chien Van Nguyen, Franck Dernoncourt, Thien Huu Nguyen
[ABSTRACT]
Retrieval-augmented generation (RAG) has emerged as a promising approach to
enhance the performance of large language models (LLMs) in knowledge-intensive
tasks such as those from medical domain. However, the sensitive nature of the
medical domain necessitates a completely accurate and trustworthy system. While
existing RAG benchmarks primarily focus on the standard retrieve-answer
setting, they overlook many practical scenarios that measure crucial aspects of
a reliable medical system. This paper addresses this gap by providing a
comprehensive evaluation framework for medical question-answering (QA) systems
in a RAG setting for these situations, including sufficiency, integration, and
robustness. We introduce Medical Retrieval-Augmented Generation Benchmark
(MedRGB) that provides various supplementary elements to four medical QA
datasets for testing LLMs’ ability to handle these specific scenarios.
Utilizing MedRGB, we conduct extensive evaluations of both state-of-the-art
commercial LLMs and open-source models across multiple retrieval conditions.
Our experimental results reveals current models’ limited ability to handle
noise and misinformation in the retrieved documents. We further analyze the
LLMs’ reasoning processes to provides valuable insights and future directions
for developing RAG systems in this critical medical domain.
[LINK]
http://arxiv.org/abs/2411.09213v1
[DATE]
2024-11-14 14:19:18+08:00
[CATEGORIES]
cs.CL
On Context Utilization in Summarization with Large Language Models
[AUTHORS]
Mathieu Ravaut, Aixin Sun, Nancy F. Chen, Shafiq Joty
[COMMENTS]
ACL 2024. 9 pages, 7 figures, 3 tables
[LINK]
http://arxiv.org/abs/2310.10570v6
[DATE]
2024-11-14 14:09:47+08:00
[CATEGORIES]
cs.CL
Large Language Models for Power Scheduling: A User-Centric Approach
[AUTHORS]
Thomas Mongaillard, Samson Lasaulce, Othman Hicheur, Chao Zhang, Lina Bariah, Vineeth S. Varma, Hang Zou, Qiyang Zhao, Merouane Debbah
[ABSTRACT]
While traditional optimization and scheduling schemes are designed to meet
fixed, predefined system requirements, future systems are moving toward
user-driven approaches and personalized services, aiming to achieve high
quality-of-experience (QoE) and flexibility. This challenge is particularly
pronounced in wireless and digitalized energy networks, where users’
requirements have largely not been taken into consideration due to the lack of
a common language between users and machines. The emergence of powerful large
language models (LLMs) marks a radical departure from traditional
system-centric methods into more advanced user-centric approaches by providing
a natural communication interface between users and devices. In this paper, for
the first time, we introduce a novel architecture for resource scheduling
problems by constructing three LLM agents to convert an arbitrary user’s voice
request (VRQ) into a resource allocation vector. Specifically, we design an LLM
intent recognition agent to translate the request into an optimization problem
(OP), an LLM OP parameter identification agent, and an LLM OP solving agent. To
evaluate system performance, we construct a database of typical VRQs in the
context of electric vehicle (EV) charging. As a proof of concept, we primarily
use Llama 3 8B. Through testing with different prompt engineering scenarios,
the obtained results demonstrate the efficiency of the proposed architecture.
The conducted performance analysis allows key insights to be extracted. For
instance, having a larger set of candidate OPs to model the real-world problem
might degrade the final performance because of a higher recognition/OP
classification noise level. All results and codes are open source.
[LINK]
http://arxiv.org/abs/2407.00476v3
[DATE]
2024-11-14 14:06:09+08:00
[CATEGORIES]
cs.CL
Unsupervised Summarization Re-ranking
[AUTHORS]
Mathieu Ravaut, Shafiq Joty, Nancy Chen
[COMMENTS]
9 pages, 1 figure, 10 tables, 23 appendix pages, ACL Findings 2023
[LINK]
http://arxiv.org/abs/2212.09593v4
[DATE]
2024-11-14 14:00:39+08:00
[CATEGORIES]
cs.CL
Characterization of Political Polarized Users Attacked by Language Toxicity on Twitter
[AUTHORS]
Wentao Xu
[ABSTRACT]
Understanding the dynamics of language toxicity on social media is important
for us to investigate the propagation of misinformation and the development of
echo chambers for political scenarios such as U.S. presidential elections.
Recent research has used large-scale data to investigate the dynamics across
social media platforms. However, research on the toxicity dynamics is not
enough. This study aims to provide a first exploration of the potential
language toxicity flow among Left, Right and Center users. Specifically, we aim
to examine whether Left users were easier to be attacked by language toxicity.
In this study, more than 500M Twitter posts were examined. It was discovered
that Left users received much more toxic replies than Right and Center users.
[LINK]
http://arxiv.org/abs/2407.12471v2
[DATE]
2024-11-14 13:49:31+08:00
[CATEGORIES]
cs.CL
Unstructured Text Enhanced Open-domain Dialogue System: A Systematic Survey
[AUTHORS]
Longxuan Ma, Mingda Li, Weinan Zhang, Jiapeng Li, Ting Liu
[ABSTRACT]
Incorporating external knowledge into dialogue generation has been proven to
benefit the performance of an open-domain Dialogue System (DS), such as
generating informative or stylized responses, controlling conversation topics.
In this article, we study the open-domain DS that uses unstructured text as
external knowledge sources (\textbf{U}nstructured \textbf{T}ext
\textbf{E}nhanced \textbf{D}ialogue \textbf{S}ystem, \textbf{UTEDS}). The
existence of unstructured text entails distinctions between UTEDS and
traditional data-driven DS and we aim to analyze these differences. We first
give the definition of the UTEDS related concepts, then summarize the recently
released datasets and models. We categorize UTEDS into Retrieval and Generative
models and introduce them from the perspective of model components. The
retrieval models consist of Fusion, Matching, and Ranking modules, while the
generative models comprise Dialogue and Knowledge Encoding, Knowledge
Selection, and Response Generation modules. We further summarize the evaluation
methods utilized in UTEDS and analyze the current models’ performance. At last,
we discuss the future development trends of UTEDS, hoping to inspire new
research in this field.
[COMMENTS]
45 pages, 3 Figures, 11 Tables
[LINK]
http://arxiv.org/abs/2411.09166v1
[DATE]
2024-11-14 11:54:42+08:00
[CATEGORIES]
cs.CL
From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
[AUTHORS]
Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao
[ABSTRACT]
Large language models (LLMs) have acquired the ability to solve general tasks
by utilizing instruction finetuning (IFT). However, IFT still relies heavily on
instance training of extensive task data, which greatly limits the adaptability
of LLMs to real-world scenarios where labeled task instances are scarce and
broader task generalization becomes paramount. Contrary to LLMs, humans acquire
skills and complete tasks not merely through repeated practice but also by
understanding and following instructional guidelines. This paper is dedicated
to simulating human learning to address the shortcomings of instance training,
focusing on instruction learning to enhance cross-task generalization. Within
this context, we introduce Task Adapters Generation from Instructions (TAGI),
which automatically constructs the task-specific model in a parameter
generation manner based on the given task instructions without retraining for
unseen tasks. Specifically, we utilize knowledge distillation to enhance the
consistency between TAGI developed through Learning with Instruction and
task-specific models developed through Training with Instance, by aligning the
labels, output logits, and adapter parameters between them. TAGI is endowed
with cross-task generalization capabilities through a two-stage training
process that includes hypernetwork pretraining and finetuning. We evaluate TAGI
on the Super-Natural Instructions and P3 datasets. The experimental results
demonstrate that TAGI can match or even outperform traditional meta-trained
models and other hypernetwork models, while significantly reducing
computational requirements.
[COMMENTS]
accepted to NeurIPS 2024
[LINK]
http://arxiv.org/abs/2406.12382v3
[DATE]
2024-11-14 11:10:45+08:00
[CATEGORIES]
cs.CL
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
[AUTHORS]
Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing
[ABSTRACT]
Aligning Large Language Models (LLMs) traditionally relies on costly training
and human preference annotations. Self-alignment seeks to reduce these expenses
by enabling models to align themselves. To further lower costs and achieve
alignment without any expensive tuning or annotations, we introduce a new
tuning-free approach for self-alignment, Dynamic Rewarding with Prompt
Optimization (DRPO). Our approach leverages a search-based optimization
framework that allows LLMs to iteratively self-improve and craft the optimal
alignment instructions, all without additional training or human intervention.
The core of DRPO is a dynamic rewarding mechanism, which identifies and
rectifies model-specific alignment weaknesses, allowing LLMs to adapt
efficiently to diverse alignment challenges. Empirical evaluations on eight
recent LLMs, both open- and closed-sourced, demonstrate that DRPO significantly
enhances alignment performance, with base models outperforming their
SFT/RLHF-tuned counterparts. Moreover, the prompts automatically optimized by
DRPO surpass those curated by human experts, further validating the
effectiveness of our approach. Our findings highlight the great potential of
current LLMs to achieve adaptive self-alignment through inference-time
optimization, complementing tuning-based alignment methods.
[COMMENTS]
EMNLP 2024 Main
[LINK]
http://arxiv.org/abs/2411.08733v2
[DATE]
2024-11-14 10:36:58+08:00
[CATEGORIES]
cs.CL
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
[AUTHORS]
Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao
[ABSTRACT]
Key-Value (KV) caching is a common technique to enhance the computational
efficiency of Large Language Models (LLMs), but its memory overhead grows
rapidly with input length. Prior work has shown that not all tokens are equally
important for text generation, proposing layer-level KV cache compression to
selectively retain key information. Recognizing the distinct roles of attention
heads in generation, we propose HeadKV, a head-level KV cache compression
method, and HeadKV-R2, which leverages a novel contextual reasoning ability
estimation for compression. Our approach operates at the level of individual
heads, estimating their importance for contextual QA tasks that require both
retrieval and reasoning capabilities. Extensive experiments across diverse
benchmarks (LongBench, LooGLE), model architectures (e.g., Llama-3-8B-Instruct,
Mistral-7B-Instruct), and long-context abilities tests demonstrate that our
head-level KV cache compression significantly outperforms strong baselines,
particularly in low-resource settings (KV size = 64 & 128). Notably, our method
retains just 1.5% of the KV cache while achieving 97% of the performance of the
full KV cache on the contextual question answering benchmark.Codes are
available at https://github.com/FYYFU/HeadKV
[COMMENTS]
18pages
[LINK]
http://arxiv.org/abs/2410.19258v3
[DATE]
2024-11-14 09:56:11+08:00
[CATEGORIES]
cs.CL
DROJ: A Prompt-Driven Attack against Large Language Models
[AUTHORS]
Leyang Hu, Boran Wang
[ABSTRACT]
Large Language Models (LLMs) have demonstrated exceptional capabilities
across various natural language processing tasks. Due to their training on
internet-sourced datasets, LLMs can sometimes generate objectionable content,
necessitating extensive alignment with human feedback to avoid such outputs.
Despite massive alignment efforts, LLMs remain susceptible to adversarial
jailbreak attacks, which usually are manipulated prompts designed to circumvent
safety mechanisms and elicit harmful responses. Here, we introduce a novel
approach, Directed Rrepresentation Optimization Jailbreak (DROJ), which
optimizes jailbreak prompts at the embedding level to shift the hidden
representations of harmful queries towards directions that are more likely to
elicit affirmative responses from the model. Our evaluations on LLaMA-2-7b-chat
model show that DROJ achieves a 100\% keyword-based Attack Success Rate (ASR),
effectively preventing direct refusals. However, the model occasionally
produces repetitive and non-informative responses. To mitigate this, we
introduce a helpfulness system prompt that enhances the utility of the model’s
responses. Our code is available at
https://github.com/Leon-Leyang/LLM-Safeguard.
[LINK]
http://arxiv.org/abs/2411.09125v1
[DATE]
2024-11-14 09:48:08+08:00
[CATEGORIES]
cs.CL
LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
[AUTHORS]
Weiquan Huang, Aoqi Wu, Yifan Yang, Xufang Luo, Yuqing Yang, Liang Hu, Qi Dai, Xiyang Dai, Dongdong Chen, Chong Luo, Lili Qiu
[ABSTRACT]
CLIP is one of the most important multimodal foundational models today. What
powers CLIP’s capabilities? The rich supervision signals provided by natural
language, the carrier of human knowledge, shape a powerful cross-modal
representation space. However, with the rapid advancements in large language
models LLMs like GPT-4 and LLaMA, the boundaries of language comprehension and
generation are continually being pushed. This raises an intriguing question:
can the capabilities of LLMs be harnessed to further improve multimodal
representation learning? The potential benefits of incorporating LLMs into CLIP
are clear. LLMs’ strong textual understanding can fundamentally improve CLIP’s
ability to handle image captions, drastically enhancing its ability to process
long and complex texts, a well-known limitation of vanilla CLIP. Moreover, LLMs
are trained on a vast corpus of text, possessing open-world knowledge. This
allows them to expand on caption information during training, increasing the
efficiency of the learning process. In this paper, we propose LLM2CLIP, a novel
approach that embraces the power of LLMs to unlock CLIP’s potential. By
fine-tuning the LLM in the caption space with contrastive learning, we extract
its textual capabilities into the output embeddings, significantly improving
the output layer’s textual discriminability. We then design an efficient
training process where the fine-tuned LLM acts as a powerful teacher for CLIP’s
visual encoder. Thanks to the LLM’s presence, we can now incorporate longer and
more complex captions without being restricted by vanilla CLIP’s text encoder’s
context window and ability limitations. Our experiments demonstrate that this
approach brings substantial improvements in cross-modal tasks.
[LINK]
http://arxiv.org/abs/2411.04997v2
[DATE]
2024-11-14 09:36:12+08:00
[CATEGORIES]
cs.CL
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
[AUTHORS]
Yidan Zhang, Boyi Deng, Yu Wan, Baosong Yang, Haoran Wei, Fei Huang, Bowen Yu, Junyang Lin, Fei Huang, Jingren Zhou
[ABSTRACT]
Recent advancements in large language models (LLMs) showcase varied
multilingual capabilities across tasks like translation, code generation, and
reasoning. Previous assessments often limited their scope to fundamental
natural language processing (NLP) or isolated capability-specific tasks. To
alleviate this drawback, we aim to present a comprehensive multilingual
multitask benchmark. First, we present a pipeline for selecting available and
reasonable benchmarks from massive ones, addressing the oversight in previous
work regarding the utility of these benchmarks, i.e., their ability to
differentiate between models being evaluated. Leveraging this pipeline, we
introduce P-MMEval, a large-scale benchmark covering effective fundamental and
capability-specialized datasets. Furthermore, P-MMEval delivers consistent
language coverage across various datasets and provides parallel samples.
Finally, we conduct extensive experiments on representative multilingual model
series to compare performances across models, analyze dataset effectiveness,
examine prompt impacts on model performances, and explore the relationship
between multilingual performances and factors such as tasks, model sizes, and
languages. These insights offer valuable guidance for future research. The
dataset is available at https://huggingface.co/datasets/Qwen/P-MMEval.
[LINK]
http://arxiv.org/abs/2411.09116v1
[DATE]
2024-11-14 09:29:36+08:00
[CATEGORIES]
cs.CL
A Cognitive Architecture for Machine Consciousness and Artificial Superintelligence: Thought Is Structured by the Iterative Updating of Working Memory
[AUTHORS]
Jared Edward Reser
[ABSTRACT]
This article provides an analytical framework for how to simulate human-like
thought processes within a computer. It describes how attention and memory
should be structured, updated, and utilized to search for associative additions
to the stream of thought. The focus is on replicating the dynamics of the
mammalian working memory system, which features two forms of persistent
activity: sustained firing (preserving information on the order of seconds) and
synaptic potentiation (preserving information from minutes to hours). The
article uses a series of figures to systematically demonstrate how the
iterative updating of these working memory stores provides functional
organization to behavior, cognition, and awareness.
In a machine learning implementation, these two memory stores should be
updated continuously and in an iterative fashion. This means each state should
preserve a proportion of the coactive representations from the state before it
(where each representation is an ensemble of neural network nodes). This makes
each state a revised iteration of the preceding state and causes successive
configurations to overlap and blend with respect to the information they
contain. Thus, the set of concepts in working memory will evolve gradually and
incrementally over time. Transitions between states happen as persistent
activity spreads activation energy throughout the hierarchical network,
searching long-term memory for the most appropriate representation to be added
to the global workspace. The result is a chain of associatively linked
intermediate states capable of advancing toward a solution or goal. Iterative
updating is conceptualized here as an information processing strategy, a model
of working memory, a theory of consciousness, and an algorithm for designing
and programming artificial intelligence (AI, AGI, and ASI).
[COMMENTS]
88 pages and 53 figures
[LINK]
http://arxiv.org/abs/2203.17255v7
[DATE]
2024-11-14 09:06:47+08:00
[CATEGORIES]
cs.CL
Equation-informed data-driven identification of flow budgets and dynamics
[AUTHORS]
Nataliya Sevryugina, Serena Costanzo, Steve de Bruyn Kops, Colm-cille Caulfield, Iraj Mortazavi, Taraneh Sayadi
[ABSTRACT]
Computational Fluid Dynamics (CFD) is an indispensable method of fluid
modelling in engineering applications, reducing the need for physical
prototypes and testing for tasks such as design optimisation and performance
analysis. Depending on the complexity of the system under consideration, models
ranging from low to high fidelity can be used for prediction, allowing
significant speed-up. However, the choice of model requires information about
the actual dynamics of the flow regime. Correctly identifying the
regions/clusters of flow that share the same dynamics has been a challenging
research topic to date. In this study, we propose a novel hybrid approach to
flow clustering. It consists of characterising each sample point of the system
with equation-based features, i.e. features are budgets that represent the
contribution of each term from the original governing equation to the local
dynamics at each sample point. This was achieved by applying the Sparse
Identification of Nonlinear Dynamical systems (SINDy) method pointwise to time
evolution data. The method proceeds with equation-based clustering using the
Girvan-Newman algorithm. This allows the detection of communities that share
the same physical dynamics. The algorithm is implemented in both Eulerian and
Lagrangian frameworks. In the Lagrangian, i.e. dynamic approach, the clustering
is performed on the trajectory of each point, allowing the change of clusters
to be represented also in time. The performance of the algorithm is first
tested on a flow around a cylinder. The construction of the dynamic clusters in
this test case clearly shows the evolution of the wake from the steady state
solution through the transient to the oscillatory solution. Dynamic clustering
was then successfully tested on turbulent flow data. Two distinct and
well-defined clusters were identified and their temporal evolution was
reconstructed.
[LINK]
http://arxiv.org/abs/2411.09545v1
[DATE]
2024-11-14 23:59:41+08:00
[CATEGORIES]
cs.LG
Randomized Truthful Auctions with Learning Agents
[AUTHORS]
Gagan Aggarwal, Anupam Gupta, Andres Perlroth, Grigoris Velegkas
[ABSTRACT]
We study a setting where agents use no-regret learning algorithms to
participate in repeated auctions. \citet{kolumbus2022auctions} showed, rather
surprisingly, that when bidders participate in second-price auctions using
no-regret bidding algorithms, no matter how large the number of interactions
$T$ is, the runner-up bidder may not converge to bidding truthfully. Our first
result shows that this holds for \emph{general deterministic} truthful
auctions. We also show that the ratio of the learning rates of the bidders can
\emph{qualitatively} affect the convergence of the bidders. Next, we consider
the problem of revenue maximization in this environment. In the setting with
fully rational bidders, \citet{myerson1981optimal} showed that revenue can be
maximized by using a second-price auction with reserves.We show that, in stark
contrast, in our setting with learning bidders, \emph{randomized} auctions can
have strictly better revenue guarantees than second-price auctions with
reserves, when $T$ is large enough. Finally, we study revenue maximization in
the non-asymptotic regime. We define a notion of {\em auctioneer regret}
comparing the revenue generated to the revenue of a second price auction with
truthful bids. When the auctioneer has to use the same auction throughout the
interaction, we show an (almost) tight regret bound of $\smash{\widetilde
\Theta(T^{3/4})}.$ If the auctioneer can change auctions during the
interaction, but in a way that is oblivious to the bids, we show an (almost)
tight bound of $\smash{\widetilde \Theta(\sqrt{T})}.$
[LINK]
http://arxiv.org/abs/2411.09517v1
[DATE]
2024-11-14 23:28:40+08:00
[CATEGORIES]
cs.LG
GAN-Based Architecture for Low-dose Computed Tomography Imaging Denoising
[AUTHORS]
Yunuo Wang, Ningning Yang, Jialin Li
[ABSTRACT]
Generative Adversarial Networks (GANs) have surfaced as a revolutionary
element within the domain of low-dose computed tomography (LDCT) imaging,
providing an advanced resolution to the enduring issue of reconciling radiation
exposure with image quality. This comprehensive review synthesizes the rapid
advancements in GAN-based LDCT denoising techniques, examining the evolution
from foundational architectures to state-of-the-art models incorporating
advanced features such as anatomical priors, perceptual loss functions, and
innovative regularization strategies. We critically analyze various GAN
architectures, including conditional GANs (cGANs), CycleGANs, and
Super-Resolution GANs (SRGANs), elucidating their unique strengths and
limitations in the context of LDCT denoising. The evaluation provides both
qualitative and quantitative results related to the improvements in performance
in benchmark and clinical datasets with metrics such as PSNR, SSIM, and LPIPS.
After highlighting the positive results, we discuss some of the challenges
preventing a wider clinical use, including the interpretability of the images
generated by GANs, synthetic artifacts, and the need for clinically relevant
metrics. The review concludes by highlighting the essential significance of
GAN-based methodologies in the progression of precision medicine via tailored
LDCT denoising models, underlining the transformative possibilities presented
by artificial intelligence within contemporary radiological practice.
[LINK]
http://arxiv.org/abs/2411.09512v1
[DATE]
2024-11-14 23:26:10+08:00
[CATEGORIES]
cs.LG
Golden Noise for Diffusion Models: A Learning Framework
[AUTHORS]
Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie
[ABSTRACT]
Text-to-image diffusion model is a popular paradigm that synthesizes
personalized images by providing a text prompt and a random Gaussian noise.
While people observe that some noises are “golden noises” that can achieve
better text-image alignment and higher human preference than others, we still
lack a machine learning framework to obtain those golden noises. To learn
golden noises for diffusion sampling, we mainly make three contributions in
this paper. First, we identify a new concept termed the \textit{noise prompt},
which aims at turning a random Gaussian noise into a golden noise by adding a
small desirable perturbation derived from the text prompt. Following the
concept, we first formulate the \textit{noise prompt learning} framework that
systematically learns “prompted” golden noise associated with a text prompt
for diffusion models. Second, we design a noise prompt data collection pipeline
and collect a large-scale \textit{noise prompt dataset}~(NPD) that contains
100k pairs of random noises and golden noises with the associated text prompts.
With the prepared NPD as the training dataset, we trained a small \textit{noise
prompt network}~(NPNet) that can directly learn to transform a random noise
into a golden noise. The learned golden noise perturbation can be considered as
a kind of prompt for noise, as it is rich in semantic information and tailored
to the given text prompt. Third, our extensive experiments demonstrate the
impressive effectiveness and generalization of NPNet on improving the quality
of synthesized images across various diffusion models, including SDXL,
DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and
efficient controller that acts as a plug-and-play module with very limited
additional inference and computational costs, as it just provides a golden
noise instead of a random noise without accessing the original pipeline.
[LINK]
http://arxiv.org/abs/2411.09502v1
[DATE]
2024-11-14 23:13:13+08:00
[CATEGORIES]
cs.LG
Developement of Reinforcement Learning based Optimisation Method for Side-Sill Design
[AUTHORS]
Aditya Borse, Rutwik Gulakala, Marcus Stoffel
[ABSTRACT]
Optimisation for crashworthiness is a critical part of the vehicle
development process. Due to stringent regulations and increasing market
demands, multiple factors must be considered within a limited timeframe.
However, for optimal crashworthiness design, multiobjective optimisation is
necessary, and for complex parts, multiple design parameters must be evaluated.
This crashworthiness analysis requires computationally intensive finite element
simulations. This challenge leads to the need for inverse multi-parameter
multi-objective optimisation. This challenge leads to the need for
multi-parameter, multi-objective inverse optimisation. This article
investigates a machine learning-based method for this type of optimisation,
focusing on the design optimisation of a multi-cell side sill to improve
crashworthiness results. Furthermore, the optimiser is coupled with an FE
solver to achieve improved results.
[LINK]
http://arxiv.org/abs/2411.09499v1
[DATE]
2024-11-14 23:06:50+08:00
[CATEGORIES]
cs.LG
Generative Forests
[AUTHORS]
Richard Nock, Mathieu Guillame-Bert
[ABSTRACT]
We focus on generative AI for a type of data that still represent one of the
most prevalent form of data: tabular data. Our paper introduces two key
contributions: a new powerful class of forest-based models fit for such tasks
and a simple training algorithm with strong convergence guarantees in a
boosting model that parallels that of the original weak / strong supervised
learning setting. This algorithm can be implemented by a few tweaks to the most
popular induction scheme for decision tree induction (i.e. supervised learning)
with two classes. Experiments on the quality of generated data display
substantial improvements compared to the state of the art. The losses our
algorithm minimize and the structure of our models make them practical for
related tasks that require fast estimation of a density given a generative
model and an observation (even partially specified): such tasks include missing
data imputation and density estimation. Additional experiments on these tasks
reveal that our models can be notably good contenders to diverse state of the
art methods, relying on models as diverse as (or mixing elements of) trees,
neural nets, kernels or graphical models.
[COMMENTS]
NeurIPS‘24
[LINK]
http://arxiv.org/abs/2308.03648v3
[DATE]
2024-11-14 23:06:12+08:00
[CATEGORIES]
cs.LG
DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination
[AUTHORS]
Jia Fu, Xiao Zhang, Sepideh Pashami, Fatemeh Rahimian, Anders Holst
[ABSTRACT]
In the ever-evolving adversarial machine learning landscape, developing
effective defenses against patch attacks has become a critical challenge,
necessitating reliable solutions to safeguard real-world AI systems. Although
diffusion models have shown remarkable capacity in image synthesis and have
been recently utilized to counter $\ell_p$-norm bounded attacks, their
potential in mitigating localized patch attacks remains largely underexplored.
In this work, we propose DiffPAD, a novel framework that harnesses the power of
diffusion models for adversarial patch decontamination. DiffPAD first performs
super-resolution restoration on downsampled input images, then adopts
binarization, dynamic thresholding scheme and sliding window for effective
localization of adversarial patches. Such a design is inspired by the
theoretically derived correlation between patch size and diffusion restoration
error that is generalized across diverse patch attack scenarios. Finally,
DiffPAD applies inpainting techniques to the original input images with the
estimated patch region being masked. By integrating closed-form solutions for
super-resolution restoration and image inpainting into the conditional reverse
sampling process of a pre-trained diffusion model, DiffPAD obviates the need
for text guidance or fine-tuning. Through comprehensive experiments, we
demonstrate that DiffPAD not only achieves state-of-the-art adversarial
robustness against patch attacks but also excels in recovering naturalistic
images without patch remnants. The source code is available at
https://github.com/JasonFu1998/DiffPAD.
[COMMENTS]
Accepted to 2025 IEEE/CVF Winter Conference on Applications of
Computer Vision (WACV)
[LINK]
http://arxiv.org/abs/2410.24006v2
[DATE]
2024-11-14 22:58:26+08:00
[CATEGORIES]
cs.LG
Sparse Bayesian Generative Modeling for Compressive Sensing
[AUTHORS]
Benedikt Böck, Sadaf Syed, Wolfgang Utschick
[ABSTRACT]
This work addresses the fundamental linear inverse problem in compressive
sensing (CS) by introducing a new type of regularizing generative prior. Our
proposed method utilizes ideas from classical dictionary-based CS and, in
particular, sparse Bayesian learning (SBL), to integrate a strong
regularization towards sparse solutions. At the same time, by leveraging the
notion of conditional Gaussianity, it also incorporates the adaptability from
generative models to training data. However, unlike most state-of-the-art
generative models, it is able to learn from a few compressed and noisy data
samples and requires no optimization algorithm for solving the inverse problem.
Additionally, similar to Dirichlet prior networks, our model parameterizes a
conjugate prior enabling its application for uncertainty quantification. We
support our approach theoretically through the concept of variational inference
and validate it empirically using different types of compressible signals.
[LINK]
http://arxiv.org/abs/2411.09483v1
[DATE]
2024-11-14 22:37:47+08:00
[CATEGORIES]
cs.LG
Terracorder: Sense Long and Prosper
[AUTHORS]
Josh Millar, Sarab Sethi, Hamed Haddadi, Anil Madhavapeddy
[ABSTRACT]
In-situ sensing devices need to be deployed in remote environments for long
periods of time; minimizing their power consumption is vital for maximising
both their operational lifetime and coverage. We introduce Terracorder – a
versatile multi-sensor device – and showcase its exceptionally low power
consumption using an on-device reinforcement learning scheduler. We prototype a
unique device setup for biodiversity monitoring and compare its battery life
using our scheduler against a number of fixed schedules; the scheduler captures
more than 80% of events at less than 50% of the number of activations of the
best-performing fixed schedule. We then explore how a collaborative scheduler
can maximise the useful operation of a network of devices, improving overall
network power consumption and robustness.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2408.02407v3
[DATE]
2024-11-14 22:26:42+08:00
[CATEGORIES]
cs.LG
IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR images
[AUTHORS]
Vincent Roca, Grégory Kuchcinski, Jean-Pierre Pruvo, Dorian Manouvriez, Renaud Lopes
[ABSTRACT]
In MRI studies, the aggregation of imaging data from multiple acquisition
sites enhances sample size but may introduce site-related variabilities that
hinder consistency in subsequent analyses. Deep learning methods for image
translation have emerged as a solution for harmonizing MR images across sites.
In this study, we introduce IGUANe (Image Generation with Unified Adversarial
Networks), an original 3D model that leverages the strengths of domain
translation and straightforward application of style transfer methods for
multicenter brain MR image harmonization. IGUANe extends CycleGAN by
integrating an arbitrary number of domains for training through a many-to-one
architecture. The framework based on domain pairs enables the implementation of
sampling strategies that prevent confusion between site-related and biological
variabilities. During inference, the model can be applied to any image, even
from an unknown acquisition site, making it a universal generator for
harmonization. Trained on a dataset comprising T1-weighted images from 11
different scanners, IGUANe was evaluated on data from unseen sites. The
assessments included the transformation of MR images with traveling subjects,
the preservation of pairwise distances between MR images within domains, the
evolution of volumetric patterns related to age and Alzheimer$’$s disease (AD),
and the performance in age regression and patient classification tasks.
Comparisons with other harmonization and normalization methods suggest that
IGUANe better preserves individual information in MR images and is more
suitable for maintaining and reinforcing variabilities related to age and AD.
Future studies may further assess IGUANe in other multicenter contexts, either
using the same model or retraining it for applications to different image
modalities. IGUANe is available at
https://github.com/RocaVincent/iguane_harmonization.git.
[COMMENTS]
29 pages, 14 figures
[LINK]
http://arxiv.org/abs/2402.03227v4
[DATE]
2024-11-14 22:11:57+08:00
[CATEGORIES]
cs.LG
Caravan MultiMet: Extending Caravan with Multiple Weather Nowcasts and Forecasts
[AUTHORS]
Guy Shalev, Frederik Kratzert
[ABSTRACT]
The Caravan large-sample hydrology dataset (Kratzert et al., 2023) was
created to standardize and harmonize streamflow data from various regional
datasets, combined with globally available meteorological forcing and catchment
attributes. This community-driven project also allows researchers to
conveniently extend the dataset for additional basins, as done 6 times to date
(see https://github.com/kratzert/Caravan/discussions/10). We present a novel
extension to Caravan, focusing on enriching the meteorological forcing data.
Our extension adds three precipitation nowcast products (CPC, IMERG v07 Early,
and CHIRPS) and three weather forecast products (ECMWF IFS HRES, GraphCast, and
CHIRPS-GEFS) to the existing ERA5-Land reanalysis data. The inclusion of
diverse data sources, particularly weather forecasts, enables more robust
evaluation and benchmarking of hydrological models, especially for real-time
forecasting scenarios. To the best of our knowledge, this extension makes
Caravan the first large-sample hydrology dataset to incorporate weather
forecast data, significantly enhancing its capabilities and fostering
advancements in hydrological research, benchmarking, and real-time hydrologic
forecasting. The data is publicly available under a CC-BY-4.0 license on Zenodo
in two parts (https://zenodo.org/records/14161235,
https://zenodo.org/records/14161281) and on Google Cloud Platform (GCP) - see
more under the Data Availability chapter.
[LINK]
http://arxiv.org/abs/2411.09459v1
[DATE]
2024-11-14 22:10:31+08:00
[CATEGORIES]
cs.LG
Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
[AUTHORS]
Chen-Long Duan, Yong Li, Xiu-Shen Wei, Lin Zhao
[ABSTRACT]
Pre-training plays a vital role in various vision tasks, such as object
recognition and detection. Commonly used pre-training methods, which typically
rely on randomized approaches like uniform or Gaussian distributions to
initialize model parameters, often fall short when confronted with long-tailed
distributions, especially in detection tasks. This is largely due to extreme
data imbalance and the issue of simplicity bias. In this paper, we introduce a
novel pre-training framework for object detection, called Dynamic Rebalancing
Contrastive Learning with Dual Reconstruction (2DRCL). Our method builds on a
Holistic-Local Contrastive Learning mechanism, which aligns pre-training with
object detection by capturing both global contextual semantics and detailed
local patterns. To tackle the imbalance inherent in long-tailed data, we design
a dynamic rebalancing strategy that adjusts the sampling of underrepresented
instances throughout the pre-training process, ensuring better representation
of tail classes. Moreover, Dual Reconstruction addresses simplicity bias by
enforcing a reconstruction task aligned with the self-consistency principle,
specifically benefiting underrepresented tail classes. Experiments on COCO and
LVIS v1.0 datasets demonstrate the effectiveness of our method, particularly in
improving the mAP/AP scores for tail classes.
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.09453v1
[DATE]
2024-11-14 21:59:01+08:00
[CATEGORIES]
cs.LG
DiffRoad: Realistic and Diverse Road Scenario Generation for Autonomous Vehicle Testing
[AUTHORS]
Junjie Zhou, Lin Wang, Qiang Meng, Xiaofan Wang
[ABSTRACT]
Generating realistic and diverse road scenarios is essential for autonomous
vehicle testing and validation. Nevertheless, owing to the complexity and
variability of real-world road environments, creating authentic and varied
scenarios for intelligent driving testing is challenging. In this paper, we
propose DiffRoad, a novel diffusion model designed to produce controllable and
high-fidelity 3D road scenarios. DiffRoad leverages the generative capabilities
of diffusion models to synthesize road layouts from white noise through an
inverse denoising process, preserving real-world spatial features. To enhance
the quality of generated scenarios, we design the Road-UNet architecture,
optimizing the balance between backbone and skip connections for high-realism
scenario generation. Furthermore, we introduce a road scenario evaluation
module that screens adequate and reasonable scenarios for intelligent driving
testing using two critical metrics: road continuity and road reasonableness.
Experimental results on multiple real-world datasets demonstrate DiffRoad’s
ability to generate realistic and smooth road structures while maintaining the
original distribution. Additionally, the generated scenarios can be fully
automated into the OpenDRIVE format, facilitating generalized autonomous
vehicle simulation testing. DiffRoad provides a rich and diverse scenario
library for large-scale autonomous vehicle testing and offers valuable insights
for future infrastructure designs that are better suited for autonomous
vehicles.
[COMMENTS]
14 pages, 9 figures
[LINK]
http://arxiv.org/abs/2411.09451v1
[DATE]
2024-11-14 21:56:02+08:00
[CATEGORIES]
cs.LG
Volume-Preserving Transformers for Learning Time Series Data with Structure
[AUTHORS]
Benedikt Brantner, Guillaume de Romemont, Michael Kraus, Zeyuan Li
[ABSTRACT]
Two of the many trends in neural network research of the past few years have
been (i) the learning of dynamical systems, especially with recurrent neural
networks such as long short-term memory networks (LSTMs) and (ii) the
introduction of transformer neural networks for natural language processing
(NLP) tasks.
While some work has been performed on the intersection of these two trends,
those efforts were largely limited to using the vanilla transformer directly
without adjusting its architecture for the setting of a physical system.
In this work we develop a transformer-inspired neural network and use it to
learn a dynamical system. We (for the first time) change the activation
function of the attention layer to imbue the transformer with
structure-preserving properties to improve long-term stability. This is shown
to be of great advantage when applying the neural network to learning the
trajectory of a rigid body.
[COMMENTS]
Will be published as part of “Cemracs Proceedings 2023” (status:
accepted)
[LINK]
http://arxiv.org/abs/2312.11166v4
[DATE]
2024-11-14 21:54:32+08:00
[CATEGORIES]
cs.LG
Learning efficient and provably convergent splitting methods
[AUTHORS]
L. M. Kreusser, H. E. Lockyer, E. H. Müller, P. Singh
[ABSTRACT]
Splitting methods are widely used for solving initial value problems (IVPs)
due to their ability to simplify complicated evolutions into more manageable
subproblems which can be solved efficiently and accurately. Traditionally,
these methods are derived using analytic and algebraic techniques from
numerical analysis, including truncated Taylor series and their Lie algebraic
analogue, the Baker–Campbell–Hausdorff formula. These tools enable the
development of high-order numerical methods that provide exceptional accuracy
for small timesteps. Moreover, these methods often (nearly) conserve important
physical invariants, such as mass, unitarity, and energy. However, in many
practical applications the computational resources are limited. Thus, it is
crucial to identify methods that achieve the best accuracy within a fixed
computational budget, which might require taking relatively large timesteps. In
this regime, high-order methods derived with traditional methods often exhibit
large errors since they are only designed to be asymptotically optimal. Machine
Learning techniques offer a potential solution since they can be trained to
efficiently solve a given IVP with less computational resources. However, they
are often purely data-driven, come with limited convergence guarantees in the
small-timestep regime and do not necessarily conserve physical invariants. In
this work, we propose a framework for finding machine learned splitting methods
that are computationally efficient for large timesteps and have provable
convergence and conservation guarantees in the small-timestep limit. We
demonstrate numerically that the learned methods, which by construction
converge quadratically in the timestep size, can be significantly more
efficient than established methods for the Schr"{o}dinger equation if the
computational budget is limited.
[LINK]
http://arxiv.org/abs/2411.09444v1
[DATE]
2024-11-14 21:45:22+08:00
[CATEGORIES]
cs.LG
Machine learning-enabled velocity model building with uncertainty quantification
[AUTHORS]
Rafael Orozco, Huseyin Tuna Erdinc, Yunlin Zeng, Mathias Louboutin, Felix J. Herrmann
[ABSTRACT]
Accurately characterizing migration velocity models is crucial for a wide
range of geophysical applications, from hydrocarbon exploration to monitoring
of CO2 sequestration projects. Traditional velocity model building methods such
as Full-Waveform Inversion (FWI) are powerful but often struggle with the
inherent complexities of the inverse problem, including noise, limited
bandwidth, receiver aperture and computational constraints. To address these
challenges, we propose a scalable methodology that integrates generative
modeling, in the form of Diffusion networks, with physics-informed summary
statistics, making it suitable for complicated imaging problems including field
datasets. By defining these summary statistics in terms of subsurface-offset
image volumes for poor initial velocity models, our approach allows for
computationally efficient generation of Bayesian posterior samples for
migration velocity models that offer a useful assessment of uncertainty. To
validate our approach, we introduce a battery of tests that measure the quality
of the inferred velocity models, as well as the quality of the inferred
uncertainties. With modern synthetic datasets, we reconfirm gains from using
subsurface-image gathers as the conditioning observable. For complex velocity
model building involving salt, we propose a new iterative workflow that refines
amortized posterior approximations with salt flooding and demonstrate how the
uncertainty in the velocity model can be propagated to the final product
reverse time migrated images. Finally, we present a proof of concept on field
datasets to show that our method can scale to industry-sized problems.
[LINK]
http://arxiv.org/abs/2411.06651v2
[DATE]
2024-11-14 21:26:35+08:00
[CATEGORIES]
cs.LG
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
[AUTHORS]
Shravan Venkatraman, Jaskaran Singh Walia, Joe Dhanith P R
[ABSTRACT]
Image classification is a computer vision task where a model analyzes an
image to categorize it into a specific label. Vision Transformers (ViT) improve
this task by leveraging self-attention to capture complex patterns and long
range relationships between image patches. However, a key challenge for ViTs is
efficiently incorporating multiscale feature representations, which is inherent
in CNNs through their hierarchical structure. In this paper, we introduce the
Scale-Aware Graph Attention Vision Transformer (SAG-ViT), a novel framework
that addresses this challenge by integrating multi-scale features. Using
EfficientNet as a backbone, the model extracts multi-scale feature maps, which
are divided into patches to preserve semantic information. These patches are
organized into a graph based on spatial and feature similarities, with a Graph
Attention Network (GAT) refining the node embeddings. Finally, a Transformer
encoder captures long-range dependencies and complex interactions. The SAG-ViT
is evaluated on benchmark datasets, demonstrating its effectiveness in
enhancing image classification performance.
[COMMENTS]
10 pages, 4 figures, 3 tables
[LINK]
http://arxiv.org/abs/2411.09420v1
[DATE]
2024-11-14 21:15:27+08:00
[CATEGORIES]
cs.LG
Doob’s Lagrangian: A Sample-Efficient Variational Approach to Transition Path Sampling
[AUTHORS]
Yuanqi Du, Michael Plainer, Rob Brekelmans, Chenru Duan, Frank Noé, Carla P. Gomes, Alán Aspuru-Guzik, Kirill Neklyudov
[ABSTRACT]
Rare event sampling in dynamical systems is a fundamental problem arising in
the natural sciences, which poses significant computational challenges due to
an exponentially large space of trajectories. For settings where the dynamical
system of interest follows a Brownian motion with known drift, the question of
conditioning the process to reach a given endpoint or desired rare event is
definitively answered by Doob’s h-transform. However, the naive estimation of
this transform is infeasible, as it requires simulating sufficiently many
forward trajectories to estimate rare event probabilities. In this work, we
propose a variational formulation of Doob’s h-transform as an optimization
problem over trajectories between a given initial point and the desired ending
point. To solve this optimization, we propose a simulation-free training
objective with a model parameterization that imposes the desired boundary
conditions by design. Our approach significantly reduces the search space over
trajectories and avoids expensive trajectory simulation and inefficient
importance sampling estimators which are required in existing methods. We
demonstrate the ability of our method to find feasible transition paths on
real-world molecular simulation and protein folding tasks.
[COMMENTS]
Accepted as Spotlight at Conference on Neural Information Processing
Systems (NeurIPS 2024); Alanine dipeptide results updated after fixing
unphysical parameterization
[LINK]
http://arxiv.org/abs/2410.07974v3
[DATE]
2024-11-14 20:51:52+08:00
[CATEGORIES]
cs.LG
Tract-RLFormer: A Tract-Specific RL policy based Decoder-only Transformer Network
[AUTHORS]
Ankita Joshi, Ashutosh Sharma, Anoushkrit Goel, Ranjeet Ranjan Jha, Chirag Ahuja, Arnav Bhavsar, Aditya Nigam
[ABSTRACT]
Fiber tractography is a cornerstone of neuroimaging, enabling the detailed
mapping of the brain’s white matter pathways through diffusion MRI. This is
crucial for understanding brain connectivity and function, making it a valuable
tool in neurological applications. Despite its importance, tractography faces
challenges due to its complexity and susceptibility to false positives,
misrepresenting vital pathways. To address these issues, recent strategies have
shifted towards deep learning, utilizing supervised learning, which depends on
precise ground truth, or reinforcement learning, which operates without it. In
this work, we propose Tract-RLFormer, a network utilizing both supervised and
reinforcement learning, in a two-stage policy refinement process that markedly
improves the accuracy and generalizability across various data-sets. By
employing a tract-specific approach, our network directly delineates the tracts
of interest, bypassing the traditional segmentation process. Through rigorous
validation on datasets such as TractoInferno, HCP, and ISMRM-2015, our
methodology demonstrates a leap forward in tractography, showcasing its ability
to accurately map the brain’s white matter tracts.
[COMMENTS]
Accepted at 27th International Conference on Pattern Recognition
(ICPR), 2024
[LINK]
http://arxiv.org/abs/2411.05757v2
[DATE]
2024-11-14 20:12:15+08:00
[CATEGORIES]
cs.LG
Inherently Interpretable and Uncertainty-Aware Models for Online Learning in Cyber-Security Problems
[AUTHORS]
Benjamin Kolicic, Alberto Caron, Chris Hicks, Vasilios Mavroudis
[ABSTRACT]
In this paper, we address the critical need for interpretable and
uncertainty-aware machine learning models in the context of online learning for
high-risk industries, particularly cyber-security. While deep learning and
other complex models have demonstrated impressive predictive capabilities,
their opacity and lack of uncertainty quantification present significant
questions about their trustworthiness. We propose a novel pipeline for online
supervised learning problems in cyber-security, that harnesses the inherent
interpretability and uncertainty awareness of Additive Gaussian Processes
(AGPs) models. Our approach aims to balance predictive performance with
transparency while improving the scalability of AGPs, which represents their
main drawback, potentially enabling security analysts to better validate threat
detection, troubleshoot and reduce false positives, and generally make
trustworthy, informed decisions. This work contributes to the growing field of
interpretable AI by proposing a class of models that can be significantly
beneficial for high-stake decision problems such as the ones typical of the
cyber-security domain. The source code is available.
[LINK]
http://arxiv.org/abs/2411.09393v1
[DATE]
2024-11-14 20:11:08+08:00
[CATEGORIES]
cs.LG
A survey of probabilistic generative frameworks for molecular simulations
[AUTHORS]
Richard John, Lukas Herron, Pratyush Tiwary
[ABSTRACT]
Generative artificial intelligence is now a widely used tool in molecular
science. Despite the popularity of probabilistic generative models, numerical
experiments benchmarking their performance on molecular data are lacking. In
this work, we introduce and explain several classes of generative models,
broadly sorted into two categories: flow-based models and diffusion models. We
select three representative models: Neural Spline Flows, Conditional Flow
Matching, and Denoising Diffusion Probabilistic Models, and examine their
accuracy, computational cost, and generation speed across datasets with tunable
dimensionality, complexity, and modal asymmetry. Our findings are varied, with
no one framework being the best for all purposes. In a nutshell, (i) Neural
Spline Flows do best at capturing mode asymmetry present in low-dimensional
data, (ii) Conditional Flow Matching outperforms other models for
high-dimensional data with low complexity, and (iii) Denoising Diffusion
Probabilistic Models appears the best for low-dimensional data with high
complexity. Our datasets include a Gaussian mixture model and the dihedral
torsion angle distribution of the Aib\textsubscript{9} peptide, generated via a
molecular dynamics simulation. We hope our taxonomy of probabilistic generative
frameworks and numerical results may guide model selection for a wide range of
molecular tasks.
[LINK]
http://arxiv.org/abs/2411.09388v1
[DATE]
2024-11-14 20:05:08+08:00
[CATEGORIES]
cs.LG
SeMaScore : a new evaluation metric for automatic speech recognition tasks
[AUTHORS]
Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar
[ABSTRACT]
In this study, we present SeMaScore, generated using a segment-wise mapping
and scoring algorithm that serves as an evaluation metric for automatic speech
recognition tasks. SeMaScore leverages both the error rate and a more robust
similarity score. We show that our algorithm’s score generation improves upon
the state-of-the-art BERTScore. Our experimental results show that SeMaScore
corresponds well with expert human assessments, signal-to-noise ratio levels,
and other natural language metrics. We outperform BERTScore by 41x in metric
computation speed. Overall, we demonstrate that SeMaScore serves as a more
dependable evaluation metric, particularly in real-world situations involving
atypical speech patterns.
[COMMENTS]
Accepted at Interspeech 2024
[LINK]
http://arxiv.org/abs/2401.07506v2
[DATE]
2024-11-14 20:02:01+08:00
[CATEGORIES]
cs.LG
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
[AUTHORS]
Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang
[ABSTRACT]
Large Language Models for code (code LLMs) have witnessed tremendous progress
in recent years. With the rapid development of code LLMs, many popular
evaluation benchmarks, such as HumanEval, DS-1000, and MBPP, have emerged to
measure the performance of code LLMs with a particular focus on code generation
tasks. However, they are insufficient to cover the full range of expected
capabilities of code LLMs, which span beyond code generation to answering
diverse coding-related questions. To fill this gap, we propose InfiBench, the
first large-scale freeform question-answering (QA) benchmark for code to our
knowledge, comprising 234 carefully selected high-quality Stack Overflow
questions that span across 15 programming languages. InfiBench uses four types
of model-free automatic metrics to evaluate response correctness where domain
experts carefully concretize the criterion for each question. We conduct a
systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a
series of novel and insightful findings. Our detailed analyses showcase
potential directions for further advancement of code LLMs. InfiBench is fully
open source at https://infi-coder.github.io/infibench and continuously
expanding to foster more scientific and systematic practices for code LLM
evaluation.
[COMMENTS]
31 pages. Appear at NeurIPS 2024 Datasets and Benchmarks track.
Project website: https://infi-coder.github.io/infibench
[LINK]
http://arxiv.org/abs/2404.07940v3
[DATE]
2024-11-14 19:51:00+08:00
[CATEGORIES]
cs.LG
Are nuclear masks all you need for improved out-of-domain generalisation? A closer look at cancer classification in histopathology
[AUTHORS]
Dhananjay Tomar, Alexander Binder, Andreas Kleppe
[ABSTRACT]
Domain generalisation in computational histopathology is challenging because
the images are substantially affected by differences among hospitals due to
factors like fixation and staining of tissue and imaging equipment. We
hypothesise that focusing on nuclei can improve the out-of-domain (OOD)
generalisation in cancer detection. We propose a simple approach to improve OOD
generalisation for cancer detection by focusing on nuclear morphology and
organisation, as these are domain-invariant features critical in cancer
detection. Our approach integrates original images with nuclear segmentation
masks during training, encouraging the model to prioritise nuclei and their
spatial arrangement. Going beyond mere data augmentation, we introduce a
regularisation technique that aligns the representations of masks and original
images. We show, using multiple datasets, that our method improves OOD
generalisation and also leads to increased robustness to image corruptions and
adversarial attacks. The source code is available at
https://github.com/undercutspiky/SFL/
[COMMENTS]
Poster at NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.09373v1
[DATE]
2024-11-14 19:27:15+08:00
[CATEGORIES]
cs.LG
Stability and Generalization for Distributed SGDA
[AUTHORS]
Miaoxi Zhu, Yan Sun, Li Shen, Bo Du, Dacheng Tao
[ABSTRACT]
Minimax optimization is gaining increasing attention in modern machine
learning applications. Driven by large-scale models and massive volumes of data
collected from edge devices, as well as the concern to preserve client privacy,
communication-efficient distributed minimax optimization algorithms become
popular, such as Local Stochastic Gradient Descent Ascent (Local-SGDA), and
Local Decentralized SGDA (Local-DSGDA). While most existing research on
distributed minimax algorithms focuses on convergence rates, computation
complexity, and communication efficiency, the generalization performance
remains underdeveloped, whereas generalization ability is a pivotal indicator
for evaluating the holistic performance of a model when fed with unknown data.
In this paper, we propose the stability-based generalization analytical
framework for Distributed-SGDA, which unifies two popular distributed minimax
algorithms including Local-SGDA and Local-DSGDA, and conduct a comprehensive
analysis of stability error, generalization gap, and population risk across
different metrics under various settings, e.g., (S)C-(S)C, PL-SC, and NC-NC
cases. Our theoretical results reveal the trade-off between the generalization
gap and optimization error and suggest hyperparameters choice to obtain the
optimal population risk. Numerical experiments for Local-SGDA and Local-DSGDA
validate the theoretical results.
[LINK]
http://arxiv.org/abs/2411.09365v1
[DATE]
2024-11-14 19:16:32+08:00
[CATEGORIES]
cs.LG
Time-to-Event Pretraining for 3D Medical Imaging
[AUTHORS]
Zepeng Huo, Jason Alan Fries, Alejandro Lozano, Jeya Maria Jose Valanarasu, Ethan Steinberg, Louis Blankemeier, Akshay S. Chaudhari, Curtis Langlotz, Nigam H. Shah
[ABSTRACT]
With the rise of medical foundation models and the growing availability of
imaging data, scalable pretraining techniques offer a promising way to identify
imaging biomarkers predictive of future disease risk. While current
self-supervised methods for 3D medical imaging models capture local structural
features like organ morphology, they fail to link pixel biomarkers with
long-term health outcomes due to a missing context problem. Current approaches
lack the temporal context necessary to identify biomarkers correlated with
disease progression, as they rely on supervision derived only from images and
concurrent text descriptions. To address this, we introduce time-to-event
pretraining, a pretraining framework for 3D medical imaging models that
leverages large-scale temporal supervision from paired, longitudinal electronic
health records (EHRs). Using a dataset of 18,945 CT scans (4.2 million 2D
images) and time-to-event distributions across thousands of EHR-derived tasks,
our method improves outcome prediction, achieving an average AUROC increase of
23.7% and a 29.4% gain in Harrell’s C-index across 8 benchmark tasks.
Importantly, these gains are achieved without sacrificing diagnostic
classification performance. This study lays the foundation for integrating
longitudinal EHR and 3D imaging data to advance clinical risk prediction.
[COMMENTS]
34 pages, 19 figures
[LINK]
http://arxiv.org/abs/2411.09361v1
[DATE]
2024-11-14 19:08:54+08:00
[CATEGORIES]
cs.LG
Joint Estimation of Conditional Mean and Covariance for Unbalanced Panels
[AUTHORS]
Damir Filipovic, Paul Schneider
[ABSTRACT]
We propose a nonparametric, kernel-based joint estimator for conditional mean
and covariance matrices in large unbalanced panels. Our estimator, with proven
consistency and finite-sample guarantees, is applied to a comprehensive panel
of monthly US stock excess returns from 1962 to 2021, conditioned on
macroeconomic and firm-specific covariates. The estimator captures time-varying
cross-sectional dependencies effectively, demonstrating robust statistical
performance. In asset pricing, it generates conditional mean-variance efficient
portfolios with out-of-sample Sharpe ratios that substantially exceed those of
equal-weighted benchmarks.
[LINK]
http://arxiv.org/abs/2410.21858v3
[DATE]
2024-11-14 18:54:53+08:00
[CATEGORIES]
cs.LG
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
[AUTHORS]
Yuang Cai, Yuyu Yuan, Jinsheng Shi, Qinhong Lin
[ABSTRACT]
The alignment of large language models (LLMs) is crucial for generating
helpful and harmless content. Existing approaches leverage preference-based
human feedback data to learn the reward function and align the LLM with the
feedback data. However, these approaches focus on modeling the reward
difference between the chosen and rejected demonstrations, rather than directly
modeling the true reward from each demonstration. Moreover, these approaches
assume that the reward is only obtained at the end of the sentence, which
overlooks the modeling of intermediate rewards. These issues lead to
insufficient use of training signals in the feedback data, limiting the
representation and generalization ability of the reward and potentially
resulting in reward hacking. In this paper, we formulate LLM alignment as a
Bayesian Inverse Reinforcement Learning (BIRL) problem and propose a novel
training objective, Approximated Variational Alignment (AVA), to perform LLM
alignment through Approximated Variational Reward Imitation Learning (AVRIL).
The BIRL formulation facilitates intermediate reward modeling and direct reward
modeling on each single demonstration, which enhances the utilization of
training signals in the feedback data. Experiments show that AVA outperforms
existing LLM alignment approaches in reward modeling, RL fine-tuning, and
direct optimization.
[LINK]
http://arxiv.org/abs/2411.09341v1
[DATE]
2024-11-14 18:37:34+08:00
[CATEGORIES]
cs.LG
Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs
[AUTHORS]
Florentia Afentaki, Michael Hefenbrock, Georgios Zervakis, Mehdi B. Tahoori
[ABSTRACT]
Printed Electronics (PE) stands out as a promisingtechnology for widespread
computing due to its distinct attributes, such as low costs and flexible
manufacturing. Unlike traditional silicon-based technologies, PE enables
stretchable, conformal,and non-toxic hardware. However, PE are constrained by
larger feature sizes, making it challenging to implement complex circuits such
as machine learning (ML) classifiers. Approximate computing has been proven to
reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs).
In this paper, we maximize the benefits of approximate computing by integrating
hardware approximation into the MLP training process. Due to the discrete
nature of hardware approximation, we propose and implement a genetic-based,
approximate, hardware-aware training approach specifically designed for printed
MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction
compared to the baseline while outperforming state of-the-art approximate and
stochastic printed MLPs.
[COMMENTS]
Accepted for publication at the 27th Design, Automation and Test in
Europe Conference (DATE’24), Mar 25-27 2024, Valencia, Spain
[LINK]
http://arxiv.org/abs/2402.02930v2
[DATE]
2024-11-14 18:24:06+08:00
[CATEGORIES]
cs.LG
Improving hp-Variational Physics-Informed Neural Networks for Steady-State Convection-Dominated Problems
[AUTHORS]
Thivin Anandh, Divij Ghose, Himanshu Jain, Pratham Sunkad, Sashikumaar Ganesan, Volker John
[ABSTRACT]
This paper proposes and studies two extensions of applying hp-variational
physics-informed neural networks, more precisely the FastVPINNs framework, to
convection-dominated convection-diffusion-reaction problems. First, a term in
the spirit of a SUPG stabilization is included in the loss functional and a
network architecture is proposed that predicts spatially varying stabilization
parameters. Having observed that the selection of the indicator function in
hard-constrained Dirichlet boundary conditions has a big impact on the accuracy
of the computed solutions, the second novelty is the proposal of a network
architecture that learns good parameters for a class of indicator functions.
Numerical studies show that both proposals lead to noticeably more accurate
results than approaches that can be found in the literature.
[COMMENTS]
25 pages, 11 figures, 8 tables
[LINK]
http://arxiv.org/abs/2411.09329v1
[DATE]
2024-11-14 18:21:41+08:00
[CATEGORIES]
cs.LG
Early-Scheduled Handover Preparation in 5G NR Millimeter-Wave Systems
[AUTHORS]
Dino Pjanić, Alexandros Sopasakis, Andres Reial, Fredrik Tufvesson
[ABSTRACT]
The handover (HO) procedure is one of the most critical functions in a
cellular network driven by measurements of the user channel of the serving and
neighboring cells. The success rate of the entire HO procedure is significantly
affected by the preparation stage. As massive Multiple-Input Multiple-Output
(MIMO) systems with large antenna arrays allow resolving finer details of
channel behavior, we investigate how machine learning can be applied to time
series data of beam measurements in the Fifth Generation (5G) New Radio (NR)
system to improve the HO procedure. This paper introduces the Early-Scheduled
Handover Preparation scheme designed to enhance the robustness and efficiency
of the HO procedure, particularly in scenarios involving high mobility and
dense small cell deployments. Early-Scheduled Handover Preparation focuses on
optimizing the timing of the HO preparation phase by leveraging machine
learning techniques to predict the earliest possible trigger points for HO
events. We identify a new early trigger for HO preparation and demonstrate how
it can beneficially reduce the required time for HO execution reducing channel
quality degradation. These insights enable a new HO preparation scheme that
offers a novel, user-aware, and proactive HO decision making in MIMO scenarios
incorporating mobility.
[LINK]
http://arxiv.org/abs/2411.09720v1
[DATE]
2024-11-14 18:19:49+08:00
[CATEGORIES]
cs.LG
ResBit: Residual Bit Vector for Categorical Values
[AUTHORS]
Masane Fuchi, Amar Zanashir, Hiroto Minami, Tomohiro Takagi
[ABSTRACT]
One-hot vectors, a common method for representing discrete/categorical data,
in machine learning are widely used because of their simplicity and
intuitiveness. However, one-hot vectors suffer from a linear increase in
dimensionality, posing computational and memory challenges, especially when
dealing with datasets containing numerous categories. In this paper, we focus
on tabular data generation, and reveal the multinomial diffusion faces the mode
collapse phenomenon when the cardinality is high. Moreover, due to the
limitations of one-hot vectors, the training phase takes time longer in such a
situation. To address these issues, we propose Residual Bit Vectors (ResBit), a
technique for densely representing categorical data. ResBit is an extension of
analog bits and overcomes limitations of analog bits when applied to tabular
data generation. Our experiments demonstrate that ResBit not only accelerates
training but also maintains performance when compared with the situations
before applying ResBit. Furthermore, our results indicate that many existing
methods struggle with high-cardinality data, underscoring the need for
lower-dimensional representations, such as ResBit and latent vectors.
[COMMENTS]
25 pages, 29 tables, and 10 figures
[LINK]
http://arxiv.org/abs/2309.17196v4
[DATE]
2024-11-14 18:16:35+08:00
[CATEGORIES]
cs.LG
How to Boost Any Loss Function
[AUTHORS]
Richard Nock, Yishay Mansour
[ABSTRACT]
Boosting is a highly successful ML-born optimization setting in which one is
required to computationally efficiently learn arbitrarily good models based on
the access to a weak learner oracle, providing classifiers performing at least
slightly differently from random guessing. A key difference with gradient-based
optimization is that boosting’s original model does not requires access to
first order information about a loss, yet the decades long history of boosting
has quickly evolved it into a first order optimization setting – sometimes
even wrongfully defining it as such. Owing to recent progress extending
gradient-based optimization to use only a loss’ zeroth ($0^{th}$) order
information to learn, this begs the question: what loss functions can be
efficiently optimized with boosting and what is the information really needed
for boosting to meet the original boosting blueprint’s requirements?
We provide a constructive formal answer essentially showing that any loss
function can be optimized with boosting and thus boosting can achieve a feat
not yet known to be possible in the classical $0^{th}$ order setting, since
loss functions are not required to be be convex, nor differentiable or
Lipschitz – and in fact not required to be continuous either. Some tools we
use are rooted in quantum calculus, the mathematical field – not to be
confounded with quantum computation – that studies calculus without passing to
the limit, and thus without using first order information.
[COMMENTS]
NeurIPS‘24
[LINK]
http://arxiv.org/abs/2407.02279v2
[DATE]
2024-11-14 18:15:35+08:00
[CATEGORIES]
cs.LG
Pie: Pooling CPU Memory for LLM Inference
[AUTHORS]
Yi Xu, Ziming Mao, Xiangxi Mo, Shu Liu, Ion Stoica
[ABSTRACT]
The rapid growth of LLMs has revolutionized natural language processing and
AI analysis, but their increasing size and memory demands present significant
challenges. A common solution is to spill over to CPU memory; however,
traditional GPU-CPU memory swapping often results in higher latency and lower
throughput.
This paper introduces Pie, an LLM inference framework that addresses these
challenges with performance-transparent swapping and adaptive expansion. By
leveraging predictable memory access patterns and the high bandwidth of modern
hardware like the NVIDIA GH200 Grace Hopper Superchip, Pie enables concurrent
data swapping without affecting foreground computation, expanding effective
memory without added latency. Adaptive expansion dynamically adjusts CPU memory
allocation based on real-time information, optimizing memory usage and
performance under varying conditions.
Pie maintains low computation latency, high throughput, and high elasticity.
Our experimental evaluation demonstrates that Pie achieves optimal swapping
policy during cache warmup and effectively balances increased memory capacity
with negligible impact on computation. With its extended capacity, Pie
outperforms vLLM by up to 1.9X in throughput and 2X in latency. Additionally,
Pie can reduce GPU memory usage by up to 1.67X while maintaining the same
performance. Compared to FlexGen, an offline profiling-based swapping solution,
Pie achieves magnitudes lower latency and 9.4X higher throughput.
[LINK]
http://arxiv.org/abs/2411.09317v1
[DATE]
2024-11-14 17:50:41+08:00
[CATEGORIES]
cs.LG
Compression Method for Solar Polarization Spectra Collected from Hinode SOT/SP Observations
[AUTHORS]
Jargalmaa Batmunkh, Yusuke Iida, Takayoshi Oba, Haruhisa Iijima
[ABSTRACT]
The complex structure and extensive details of solar spectral data, combined
with a recent surge in volume, present significant processing challenges. To
address this, we propose a deep learning-based compression technique using deep
autoencoder (DAE) and 1D-convolutional autoencoder (CAE) models developed with
Hinode SOT/SP data. We focused on compressing Stokes I and V polarization
spectra from the quiet Sun, as well as from active regions, providing a novel
insight into comprehensive spectral analysis by incorporating spectra from
extreme magnetic fields. The results indicate that the CAE model outperforms
the DAE model in reconstructing Stokes profiles, demonstrating greater
robustness and achieving reconstruction errors around the observational noise
level. The proposed method has proven effective in compressing Stokes I and V
spectra from both the quiet Sun and active regions, highlighting its potential
for impactful applications in solar spectral analysis, such as detection of
unusual spectral signals.
[LINK]
http://arxiv.org/abs/2411.09311v1
[DATE]
2024-11-14 17:38:41+08:00
[CATEGORIES]
cs.LG
Hierarchical mixtures of Unigram models for short text clustering: the role of Beta-Liouville priors
[AUTHORS]
Massimo Bilancia, Samuele Magro
[ABSTRACT]
This paper presents a variant of the Multinomial mixture model tailored for
the unsupervised classification of short text data. Traditionally, the
Multinomial probability vector in this hierarchical model is assigned a
Dirichlet prior distribution. Here, however, we explore an alternative
prior–the Beta-Liouville distribution–which offers a more flexible
correlation structure than the Dirichlet. We examine the theoretical properties
of the Beta-Liouville distribution, focusing on its conjugacy with the
Multinomial likelihood. This property enables the derivation of update
equations for a CAVI (Coordinate Ascent Variational Inference) variational
algorithm, facilitating the approximate posterior estimation of model
parameters. Additionally, we propose a stochastic variant of the CAVI algorithm
that enhances scalability. The paper concludes with data examples that
demonstrate effective strategies for setting the Beta-Liouville
hyperparameters.
[COMMENTS]
32 pages, 4 figures. Submitted
[LINK]
http://arxiv.org/abs/2410.21862v2
[DATE]
2024-11-14 17:17:31+08:00
[CATEGORIES]
cs.LG
Enhancing generalization in high energy physics using white-box adversarial attacks
[AUTHORS]
Franck Rothen, Samuel Klein, Matthew Leigh, Tobias Golling
[ABSTRACT]
Machine learning is becoming increasingly popular in the context of particle
physics. Supervised learning, which uses labeled Monte Carlo (MC) simulations,
remains one of the most widely used methods for discriminating signals beyond
the Standard Model. However, this paper suggests that supervised models may
depend excessively on artifacts and approximations from Monte Carlo
simulations, potentially limiting their ability to generalize well to real
data. This study aims to enhance the generalization properties of supervised
models by reducing the sharpness of local minima. It reviews the application of
four distinct white-box adversarial attacks in the context of classifying Higgs
boson decay signals. The attacks are divided into weight space attacks, and
feature space attacks. To study and quantify the sharpness of different local
minima this paper presents two analysis methods: gradient ascent and reduced
Hessian eigenvalue analysis. The results show that white-box adversarial
attacks significantly improve generalization performance, albeit with increased
computational complexity.
[COMMENTS]
10 pages, 4 figures, 8 tables, 3 algorithms, to be published in
Physical Review D (PRD), presented at the ML4Jets 2024 conference
[LINK]
http://arxiv.org/abs/2411.09296v1
[DATE]
2024-11-14 17:15:28+08:00
[CATEGORIES]
cs.LG
An improved tabular data generator with VAE-GMM integration
[AUTHORS]
Patricia A. Apellániz, Juan Parras, Santiago Zazo
[ABSTRACT]
The rising use of machine learning in various fields requires robust methods
to create synthetic tabular data. Data should preserve key characteristics
while addressing data scarcity challenges. Current approaches based on
Generative Adversarial Networks, such as the state-of-the-art CTGAN model,
struggle with the complex structures inherent in tabular data. These data often
contain both continuous and discrete features with non-Gaussian distributions.
Therefore, we propose a novel Variational Autoencoder (VAE)-based model that
addresses these limitations. Inspired by the TVAE model, our approach
incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE
architecture. This avoids the limitations imposed by assuming a strictly
Gaussian latent space, allowing for a more accurate representation of the
underlying data distribution during data generation. Furthermore, our model
offers enhanced flexibility by allowing the use of various differentiable
distributions for individual features, making it possible to handle both
continuous and discrete data types. We thoroughly validate our model on three
real-world datasets with mixed data types, including two medically relevant
ones, based on their resemblance and utility. This evaluation demonstrates
significant outperformance against CTGAN and TVAE, establishing its potential
as a valuable tool for generating synthetic tabular data in various domains,
particularly in healthcare.
[COMMENTS]
7 pages, 3 figures
[LINK]
http://arxiv.org/abs/2404.08434v2
[DATE]
2024-11-14 17:11:26+08:00
[CATEGORIES]
cs.LG
A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning
[AUTHORS]
Ke Xu, Ziliang Wang, Wei Zheng, Yuhao Ma, Chenglin Wang, Nengxue Jiang, Cai Cao
[ABSTRACT]
Cross-domain recommendation (CDR) methods are proposed to tackle the sparsity
problem in click through rate (CTR) estimation. Existing CDR methods directly
transfer knowledge from the source domains to the target domain and ignore the
heterogeneities among domains, including feature dimensional heterogeneity and
latent space heterogeneity, which may lead to negative transfer. Besides, most
of the existing methods are based on single-source transfer, which cannot
simultaneously utilize knowledge from multiple source domains to further
improve the model performance in the target domain. In this paper, we propose a
centralized-distributed transfer model (CDTM) for CDR based on multi-source
heterogeneous transfer learning. To address the issue of feature dimension
heterogeneity, we build a dual embedding structure: domain specific embedding
(DSE) and global shared embedding (GSE) to model the feature representation in
the single domain and the commonalities in the global space,separately. To
solve the latent space heterogeneity, the transfer matrix and attention
mechanism are used to map and combine DSE and GSE adaptively. Extensive offline
and online experiments demonstrate the effectiveness of our model.
[COMMENTS]
Published in: 2022 IEEE International Conference on Data Mining
(ICDM) (The authors were affiliated Hangzhou NetEase Cloud Music Technology
Co., Ltd.)
[LINK]
http://arxiv.org/abs/2411.09286v1
[DATE]
2024-11-14 16:53:23+08:00
[CATEGORIES]
cs.LG
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
[AUTHORS]
Jintao Zhang, Jia wei, Haofeng Huang, Pengle Zhang, Jun Zhu, Jianfei Chen
[ABSTRACT]
The transformer architecture predominates across various models. As the heart
of the transformer, attention has a computational complexity of O(N^2),
compared to O(N) for linear transformations. When handling large sequence
lengths, attention becomes the primary time-consuming component. Although
quantization has proven to be an effective method for accelerating model
inference, existing quantization methods primarily focus on optimizing the
linear layer. In response, we first analyze the feasibility of quantization in
attention detailedly. Following that, we propose SageAttention, a highly
efficient and accurate quantization method for attention. The OPS (operations
per second) of our approach outperforms FlashAttention2 and xformers by about
2.1 times and 2.7 times, respectively. SageAttention also achieves superior
accuracy performance over FlashAttention3. Comprehensive experiments confirm
that our approach incurs almost no end-to-end metrics loss across diverse
models, including those for large language processing, image generation, and
video generation. The codes are available at
https://github.com/thu-ml/SageAttention.
[LINK]
http://arxiv.org/abs/2410.02367v2
[DATE]
2024-11-14 16:39:54+08:00
[CATEGORIES]
cs.LG
Comparative Evaluation of Clustered Federated Learning Methods
[AUTHORS]
Michael Ben Ali, Omar El-Rifai, Imen Megdiche, André Peninou, Olivier Teste
[ABSTRACT]
Over recent years, Federated Learning (FL) has proven to be one of the most
promising methods of distributed learning which preserves data privacy. As the
method evolved and was confronted to various real-world scenarios, new
challenges have emerged. One such challenge is the presence of highly
heterogeneous (often referred as non-IID) data distributions among participants
of the FL protocol. A popular solution to this hurdle is Clustered Federated
Learning (CFL), which aims to partition clients into groups where the
distribution are homogeneous. In the literature, state-of-the-art CFL
algorithms are often tested using a few cases of data heterogeneities, without
systematically justifying the choices. Further, the taxonomy used for
differentiating the different heterogeneity scenarios is not always
straightforward. In this paper, we explore the performance of two
state-of-theart CFL algorithms with respect to a proposed taxonomy of data
heterogeneities in federated learning (FL). We work with three image
classification datasets and analyze the resulting clusters against the
heterogeneity classes using extrinsic clustering metrics. Our objective is to
provide a clearer understanding of the relationship between CFL performances
and data heterogeneity scenarios.
[LINK]
http://arxiv.org/abs/2410.14212v2
[DATE]
2024-11-14 16:31:10+08:00
[CATEGORIES]
cs.LG
Towards efficient compression and communication for prototype-based decentralized learning
[AUTHORS]
Pablo Fernández-Piñeiro, Manuel Ferández-Veiga, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas, Martín González-Soto
[ABSTRACT]
In prototype-based federated learning, the exchange of model parameters
between clients and the master server is replaced by transmission of prototypes
or quantized versions of the data samples to the aggregation server. A fully
decentralized deployment of prototype-based learning, without a central
agregartor of prototypes, is more robust upon network failures and reacts
faster to changes in the statistical distribution of the data, suggesting
potential advantages and quick adaptation in dynamic learning tasks, e.g., when
the data sources are IoT devices or when data is non-iid. In this paper, we
consider the problem of designing a communication-efficient decentralized
learning system based on prototypes. We address the challenge of prototype
redundancy by leveraging on a twofold data compression technique, i.e., sending
only update messages if the prototypes are informationtheoretically useful (via
the Jensen-Shannon distance), and using clustering on the prototypes to
compress the update messages used in the gossip protocol. We also use parallel
instead of sequential gossiping, and present an analysis of its
age-of-information (AoI). Our experimental results show that, with these
improvements, the communications load can be substantially reduced without
decreasing the convergence rate of the learning algorithm.
[COMMENTS]
15 pages, 2 tables, 7 figures, 6 algorithms
[LINK]
http://arxiv.org/abs/2411.09267v1
[DATE]
2024-11-14 16:08:25+08:00
[CATEGORIES]
cs.LG
Rethinking Weight-Averaged Model-merging
[AUTHORS]
Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub
[ABSTRACT]
Weight-averaged model-merging has emerged as a powerful approach in deep
learning, capable of enhancing model performance without fine-tuning or
retraining. However, the underlying mechanisms that explain its effectiveness
remain largely unexplored. In this paper, we investigate this technique from
three novel perspectives to provide deeper insights into how and why
weight-averaged model-merging works: (1) we examine the intrinsic patterns
captured by the learning of the model weights, through the visualizations of
their patterns on several datasets, showing that these weights often encode
structured and interpretable patterns; (2) we investigate model ensemble
merging strategies based on averaging on weights versus averaging on features,
providing detailed analyses across diverse architectures and datasets; and (3)
we explore the impact on model-merging prediction stability in terms of
changing the parameter magnitude, revealing insights into the way of weight
averaging works as regularization by showing the robustness across different
parameter scales. Our findings shed light on the “black box” of weight-averaged
model-merging, offering valuable insights and practical recommendations that
advance the model-merging process.
[LINK]
http://arxiv.org/abs/2411.09263v1
[DATE]
2024-11-14 16:02:14+08:00
[CATEGORIES]
cs.LG
xPerT: Extended Persistence Transformer
[AUTHORS]
Sehun Kim
[ABSTRACT]
A persistence diagram provides a compact summary of persistent homology,
which captures the topological features of a space at different scales.
However, due to its nature as a set, incorporating it as a feature into a
machine learning framework is challenging. Several methods have been proposed
to use persistence diagrams as input for machine learning models, but they
often require complex preprocessing steps and extensive hyperparameter tuning.
In this paper, we propose a novel transformer architecture called the
\textit{Extended Persistence Transformer (xPerT)}, which is highly scalable
than the compared to Persformer, an existing transformer for persistence
diagrams. xPerT reduces GPU memory usage by over 90\% and improves accuracy on
multiple datasets. Additionally, xPerT does not require complex preprocessing
steps or extensive hyperparameter tuning, making it easy to use in practice.
Our code is available at https://github.com/sehunfromdaegu/xpert.
[LINK]
http://arxiv.org/abs/2410.14193v2
[DATE]
2024-11-14 15:58:03+08:00
[CATEGORIES]
cs.LG
FluidML: Fast and Memory Efficient Inference Optimization
[AUTHORS]
Jinjie Liu, Hang Qiu
[ABSTRACT]
Machine learning models deployed on edge devices have enabled numerous
exciting new applications, such as humanoid robots, AR glasses, and autonomous
vehicles. However, the computing resources available on these edge devices are
not catching up with the ever-growing number of parameters in these models. As
the models become bigger and more complicated, the novel yet sophisticated
structure challenges the inference runtime optimization. We present FluidML, a
generic runtime memory management and optimization framework that can flexibly
transform the model execution blueprint to achieve faster and more
memory-efficient inference. Evaluations across different platforms show that
FluidML can consistently reduce the end-to-end inference latency by up to
25.38% for popular language models and reduce peak memory usage by up to
41.47%, compared to state-of-the-art approaches. FluidML is of ~30K line of
codes, built for general-purpose usage, and will be released as an open-source
inference runtime optimization framework to the community.
[LINK]
http://arxiv.org/abs/2411.09242v1
[DATE]
2024-11-14 15:16:23+08:00
[CATEGORIES]
cs.LG
Rethinking the “Heatmap + Monte Carlo Tree Search” Paradigm for Solving Large Scale TSP
[AUTHORS]
Xuanhao Pan, Chenguang Wang, Chaolong Ying, Ye Xue, Tianshu Yu
[ABSTRACT]
The Travelling Salesman Problem (TSP) remains a fundamental challenge in
combinatorial optimization, inspiring diverse algorithmic strategies. This
paper revisits the “heatmap + Monte Carlo Tree Search (MCTS)” paradigm that has
recently gained traction for learning-based TSP solutions. Within this
framework, heatmaps encode the likelihood of edges forming part of the optimal
tour, and MCTS refines this probabilistic guidance to discover optimal
solutions. Contemporary approaches have predominantly emphasized the refinement
of heatmap generation through sophisticated learning models, inadvertently
sidelining the critical role of MCTS. Our extensive empirical analysis reveals
two pivotal insights: 1) The configuration of MCTS strategies profoundly
influences the solution quality, demanding meticulous tuning to leverage their
full potential; 2) Our findings demonstrate that a rudimentary and
parameter-free heatmap, derived from the intrinsic $k$-nearest nature of TSP,
can rival or even surpass the performance of complicated heatmaps, with strong
generalizability across various scales. Empirical evaluations across various
TSP scales underscore the efficacy of our approach, achieving competitive
results. These observations challenge the prevailing focus on heatmap
sophistication, advocating a reevaluation of the paradigm to harness both
components synergistically. Our code is available at:
https://github.com/LOGO-CUHKSZ/rethink_mcts_tsp.
[LINK]
http://arxiv.org/abs/2411.09238v1
[DATE]
2024-11-14 15:13:08+08:00
[CATEGORIES]
cs.LG
The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles
[AUTHORS]
Hanwen Zhang, Dusit Niyato, Wei Zhang, Changyuan Zhao, Hongyang Du, Abbas Jamalipour, Sumei Sun, Yiyang Pei
[ABSTRACT]
With the advancements of generative artificial intelligence (GenAI) models,
their capabilities are expanding significantly beyond content generation and
the models are increasingly being used across diverse applications.
Particularly, GenAI shows great potential in addressing challenges in the
electric vehicle (EV) ecosystem ranging from charging management to
cyber-attack prevention. In this paper, we specifically consider Internet of
electric vehicles (IoEV) and we categorize GenAI for IoEV into four different
layers namely, EV’s battery layer, individual EV layer, smart grid layer, and
security layer. We introduce various GenAI techniques used in each layer of
IoEV applications. Subsequently, public datasets available for training the
GenAI models are summarized. Finally, we provide recommendations for future
directions. This survey not only categorizes the applications of GenAI in IoEV
across different layers but also serves as a valuable resource for researchers
and practitioners by highlighting the design and implementation challenges
within each layer. Furthermore, it provides a roadmap for future research
directions, enabling the development of more robust and efficient IoEV systems
through the integration of advanced GenAI techniques.
[COMMENTS]
25 Pages
[LINK]
http://arxiv.org/abs/2409.15750v3
[DATE]
2024-11-14 14:33:26+08:00
[CATEGORIES]
cs.LG
Classical Verification of Quantum Learning Advantages with Noises
[AUTHORS]
Yinghao Ma, Jiaxi Su, Dong-Ling Deng
[ABSTRACT]
Classical verification of quantum learning allows classical clients to
reliably leverage quantum computing advantages by interacting with untrusted
quantum servers. Yet, current quantum devices available in practice suffers
from a variety of noises and whether existed classical verification protocols
carry over to noisy scenarios remains unclear. Here, we propose an efficient
classical error rectification algorithm to reconstruct the noise-free results
given by the quantum Fourier sampling circuit with practical constant-level
noises. In particular, we prove that the error rectification algorithm can
restore the heavy Fourier coefficients by using a small number of noisy samples
that scales logarithmically with the problem size. We apply this algorithm to
the agnostic parity learning task with uniform input marginal and prove that
this task can be accomplished in an efficient way on noisy quantum devices with
our algorithm. In addition, we prove that a classical client with access to the
random example oracle can verify the agnostic parity learning results from the
noisy quantum prover in an efficient way, under the condition that the Fourier
coefficients are sparse. Our results demonstrate the feasibility of classical
verification of quantum learning advantages with noises, which provide a
valuable guide for both theoretical studies and practical applications with
current noisy intermediate scale quantum devices.
[COMMENTS]
13 pages 1 figure
[LINK]
http://arxiv.org/abs/2411.09210v1
[DATE]
2024-11-14 14:14:39+08:00
[CATEGORIES]
cs.LG
Ghost-Connect Net: A Generalization-Enhanced Guidance For Sparse Deep Networks Under Distribution Shifts
[AUTHORS]
Mary Isabelle Wisell, Salimeh Yasaei Sekeh
[ABSTRACT]
Sparse deep neural networks (DNNs) excel in real-world applications like
robotics and computer vision, by reducing computational demands that hinder
usability. However, recent studies aim to boost DNN efficiency by trimming
redundant neurons or filters based on task relevance, but neglect their
adaptability to distribution shifts. We aim to enhance these existing
techniques by introducing a companion network, Ghost Connect-Net (GC-Net), to
monitor the connections in the original network with distribution
generalization advantage. GC-Net’s weights represent connectivity measurements
between consecutive layers of the original network. After pruning GC-Net, the
pruned locations are mapped back to the original network as pruned connections,
allowing for the combination of magnitude and connectivity-based pruning
methods. Experimental results using common DNN benchmarks, such as CIFAR-10,
Fashion MNIST, and Tiny ImageNet show promising results for hybridizing the
method, and using GC-Net guidance for later layers of a network and direct
pruning on earlier layers. We provide theoretical foundations for GC-Net’s
approach to improving generalization under distribution shifts.
[COMMENTS]
21 pages, 4 figures, 3 subfigures, 42 tables
[LINK]
http://arxiv.org/abs/2411.09199v1
[DATE]
2024-11-14 13:43:42+08:00
[CATEGORIES]
cs.LG
Distributionally Robust Safe Sample Elimination under Covariate Shift
[AUTHORS]
Hiroyuki Hanada, Tatsuya Aoyama, Satoshi Akahane, Tomonari Tanaka, Yoshito Okura, Yu Inatsu, Noriaki Hashimoto, Shion Takeno, Taro Murayama, Hanju Lee, Shinya Kojima, Ichiro Takeuchi
[ABSTRACT]
We consider a machine learning setup where one training dataset is used to
train multiple models across slightly different data distributions. This occurs
when customized models are needed for various deployment environments. To
reduce storage and training costs, we propose the DRSSS method, which combines
distributionally robust (DR) optimization and safe sample screening (SSS). The
key benefit of this method is that models trained on the reduced dataset will
perform the same as those trained on the full dataset for all possible
different environments. In this paper, we focus on covariate shift as a type of
data distribution change and demonstrate the effectiveness of our method
through experiments.
[LINK]
http://arxiv.org/abs/2406.05964v2
[DATE]
2024-11-14 13:00:13+08:00
[CATEGORIES]
cs.LG
No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes
[AUTHORS]
Minbiao Han, Fengxue Zhang, Yuxin Chen
[ABSTRACT]
This paper investigates the challenge of learning in black-box games, where
the underlying utility function is unknown to any of the agents. While there is
an extensive body of literature on the theoretical analysis of algorithms for
computing the Nash equilibrium with complete information about the game,
studies on Nash equilibrium in black-box games are less common. In this paper,
we focus on learning the Nash equilibrium when the only available information
about an agent’s payoff comes in the form of empirical queries. We provide a
no-regret learning algorithm that utilizes Gaussian processes to identify the
equilibrium in such games. Our approach not only ensures a theoretical
convergence rate but also demonstrates effectiveness across a variety
collection of games through experimental validation.
[COMMENTS]
40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)
[LINK]
http://arxiv.org/abs/2405.08318v2
[DATE]
2024-11-14 12:52:16+08:00
[CATEGORIES]
cs.LG
Dynamic technology impact analysis: A multi-task learning approach to patent citation prediction
[AUTHORS]
Youngjin Seol, Jaewoong Choi, Seunghyun Lee, Janghyeok Yoon
[ABSTRACT]
Machine learning (ML) models are valuable tools for analyzing the impact of
technology using patent citation information. However, existing ML-based
methods often struggle to account for the dynamic nature of the technology
impact over time and the interdependencies of these impacts across different
periods. This study proposes a multi-task learning (MTL) approach to enhance
the prediction of technology impact across various time frames by leveraging
knowledge sharing and simultaneously monitoring the evolution of technology
impact. First, we quantify the technology impacts and identify patterns through
citation analysis over distinct time periods. Next, we develop MTL models to
predict citation counts using multiple patent indicators over time. Finally, we
examine the changes in key input indicators and their patterns over different
periods using the SHapley Additive exPlanation method. We also offer guidelines
for validating and interpreting the results by employing statistical methods
and natural language processing techniques. A case study on battery
technologies demonstrates that our approach not only deepens the understanding
of technology impact, but also improves prediction accuracy, yielding valuable
insights for both academia and industry.
[LINK]
http://arxiv.org/abs/2411.09184v1
[DATE]
2024-11-14 12:46:08+08:00
[CATEGORIES]
cs.LG
DeBaTeR: Denoising Bipartite Temporal Graph for Recommendation
[AUTHORS]
Xinyu He, Jose Sepulveda, Mostafa Rahmani, Alyssa Woo, Fei Wang, Hanghang Tong
[ABSTRACT]
Due to the difficulty of acquiring large-scale explicit user feedback,
implicit feedback (e.g., clicks or other interactions) is widely applied as an
alternative source of data, where user-item interactions can be modeled as a
bipartite graph. Due to the noisy and biased nature of implicit real-world
user-item interactions, identifying and rectifying noisy interactions are vital
to enhance model performance and robustness. Previous works on purifying
user-item interactions in collaborative filtering mainly focus on mining the
correlation between user/item embeddings and noisy interactions, neglecting the
benefit of temporal patterns in determining noisy interactions. Time
information, while enhancing the model utility, also bears its natural
advantage in helping to determine noisy edges, e.g., if someone usually watches
horror movies at night and talk shows in the morning, a record of watching a
horror movie in the morning is more likely to be noisy interaction. Armed with
this observation, we introduce a simple yet effective mechanism for generating
time-aware user/item embeddings and propose two strategies for denoising
bipartite temporal graph in recommender systems (DeBaTeR): the first is through
reweighting the adjacency matrix (DeBaTeR-A), where a reliability score is
defined to reweight the edges through both soft assignment and hard assignment;
the second is through reweighting the loss function (DeBaTeR-L), where weights
are generated to reweight user-item samples in the losses. Extensive
experiments have been conducted to demonstrate the efficacy of our methods and
illustrate how time information indeed helps identifying noisy edges.
[LINK]
http://arxiv.org/abs/2411.09181v1
[DATE]
2024-11-14 12:39:30+08:00
[CATEGORIES]
cs.LG
Hybrid deep additive neural networks
[AUTHORS]
Gyu Min Kim, Jeong Min Jeon
[ABSTRACT]
Traditional neural networks (multi-layer perceptrons) have become an
important tool in data science due to their success across a wide range of
tasks. However, their performance is sometimes unsatisfactory, and they often
require a large number of parameters, primarily due to their reliance on the
linear combination structure. Meanwhile, additive regression has been a popular
alternative to linear regression in statistics. In this work, we introduce
novel deep neural networks that incorporate the idea of additive regression.
Our neural networks share architectural similarities with Kolmogorov-Arnold
networks but are based on simpler yet flexible activation and basis functions.
Additionally, we introduce several hybrid neural networks that combine this
architecture with that of traditional neural networks. We derive their
universal approximation properties and demonstrate their effectiveness through
simulation studies and a real-data application. The numerical results indicate
that our neural networks generally achieve better performance than traditional
neural networks while using fewer parameters.
[COMMENTS]
29 pages, 13 figures
[LINK]
http://arxiv.org/abs/2411.09175v1
[DATE]
2024-11-14 12:26:47+08:00
[CATEGORIES]
cs.LG
Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance
[AUTHORS]
Md Fahim Anjum
[ABSTRACT]
Recent advances in image generation, particularly via diffusion models, have
led to impressive improvements in image synthesis quality. Despite this,
diffusion models are still challenged by model-induced artifacts and limited
stability in image fidelity. In this work, we hypothesize that the primary
cause of this issue is the improper resampling operation that introduces
aliasing in the diffusion model and a careful alias-free resampling dictated by
image processing theory can improve the model’s performance in image synthesis.
We propose the integration of alias-free resampling layers into the UNet
architecture of diffusion models without adding extra trainable parameters,
thereby maintaining computational efficiency. We then assess whether these
theory-driven modifications enhance image quality and rotational equivariance.
Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and
MNIST-M, reveal consistent gains in image quality, particularly in terms of FID
and KID scores. Furthermore, we propose a modified diffusion process that
enables user-controlled rotation of generated images without requiring
additional training. Our findings highlight the potential of theory-driven
enhancements such as alias-free resampling in generative models to improve
image quality while maintaining model efficiency and pioneer future research
directions to incorporate them into video-generating diffusion models, enabling
deeper exploration of the applications of alias-free resampling in generative
modeling.
[COMMENTS]
13 pages, 7 figures
[LINK]
http://arxiv.org/abs/2411.09174v1
[DATE]
2024-11-14 12:23:28+08:00
[CATEGORIES]
cs.LG
Online Budgeted Matching with General Bids
[AUTHORS]
Jianyi Yang, Pengfei Li, Adam Wierman, Shaolei Ren
[COMMENTS]
Accepted by NeurIPS 2024
[LINK]
http://arxiv.org/abs/2411.04204v2
[DATE]
2024-11-14 12:14:55+08:00
[CATEGORIES]
cs.LG
Towards Scalable Handwriting Communication via EEG Decoding and Latent Embedding Integration
[AUTHORS]
Jun-Young Kim, Deok-Seon Kim, Seo-Hyun Lee
[ABSTRACT]
In recent years, brain-computer interfaces have made advances in decoding
various motor-related tasks, including gesture recognition and movement
classification, utilizing electroencephalogram (EEG) data. These developments
are fundamental in exploring how neural signals can be interpreted to recognize
specific physical actions. This study centers on a written alphabet
classification task, where we aim to decode EEG signals associated with
handwriting. To achieve this, we incorporate hand kinematics to guide the
extraction of the consistent embeddings from high-dimensional neural recordings
using auxiliary variables (CEBRA). These CEBRA embeddings, along with the EEG,
are processed by a parallel convolutional neural network model that extracts
features from both data sources simultaneously. The model classifies nine
different handwritten characters, including symbols such as exclamation marks
and commas, within the alphabet. We evaluate the model using a quantitative
five-fold cross-validation approach and explore the structure of the embedding
space through visualizations. Our approach achieves a classification accuracy
of 91 % for the nine-class task, demonstrating the feasibility of fine-grained
handwriting decoding from EEG.
[COMMENTS]
4 pages, 2 figures, 1 table, Name of Conference: International
Conference on Brain-Computer Interface
[LINK]
http://arxiv.org/abs/2411.09170v1
[DATE]
2024-11-14 12:12:47+08:00
[CATEGORIES]
cs.LG
GRAINRec: Graph and Attention Integrated Approach for Real-Time Session-Based Item Recommendations
[AUTHORS]
Bhavtosh Rath, Pushkar Chennu, David Relyea, Prathyusha Kanmanth Reddy, Amit Pande
[ABSTRACT]
Recent advancements in session-based recommendation models using deep
learning techniques have demonstrated significant performance improvements.
While they can enhance model sophistication and improve the relevance of
recommendations, they also make it challenging to implement a scalable
real-time solution. To addressing this challenge, we propose GRAINRec: a Graph
and Attention Integrated session-based recommendation model that generates
recommendations in real-time. Our scope of work is item recommendations in
online retail where a session is defined as an ordered sequence of digital
guest actions, such as page views or adds to cart. The proposed model generates
recommendations by considering the importance of all items in the session
together, letting us predict relevant recommendations dynamically as the
session evolves. We also propose a heuristic approach to implement real-time
inferencing that meets Target platform’s service level agreement (SLA). The
proposed architecture lets us predict relevant recommendations dynamically as
the session evolves, rather than relying on pre-computed recommendations for
each item. Evaluation results of the proposed model show an average improvement
of 1.5% across all offline evaluation metrics. A/B tests done over a 2 week
duration showed an increase of 10% in click through rate and 9% increase in
attributable demand. Extensive ablation studies are also done to understand our
model performance for different parameters.
[COMMENTS]
Accepted to the 2024 IEEE International Conference on Big Data (IEEE
BigData 2024)
[LINK]
http://arxiv.org/abs/2411.09152v1
[DATE]
2024-11-14 11:07:57+08:00
[CATEGORIES]
cs.LG
Laplace Transform Interpretation of Differential Privacy
[AUTHORS]
Rishav Chourasia, Uzair Javaid, Biplap Sikdar
[ABSTRACT]
We introduce a set of useful expressions of Differential Privacy (DP) notions
in terms of the Laplace transform of the privacy loss distribution. Its bare
form expression appears in several related works on analyzing DP, either as an
integral or an expectation. We show that recognizing the expression as a
Laplace transform unlocks a new way to reason about DP properties by exploiting
the duality between time and frequency domains. Leveraging our interpretation,
we connect the $(q, \rho(q))$-R'enyi DP curve and the $(\epsilon,
\delta(\epsilon))$-DP curve as being the Laplace and inverse-Laplace transforms
of one another. This connection shows that the R'enyi divergence is
well-defined for complex orders $q = \gamma + i \omega$. Using our Laplace
transform-based analysis, we also prove an adaptive composition theorem for
$(\epsilon, \delta)$-DP guarantees that is exactly tight (i.e., matches even in
constants) for all values of $\epsilon$. Additionally, we resolve an issue
regarding symmetry of $f$-DP on subsampling that prevented equivalence across
all functional DP notions.
[LINK]
http://arxiv.org/abs/2411.09142v1
[DATE]
2024-11-14 10:52:47+08:00
[CATEGORIES]
cs.LG
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
[AUTHORS]
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang
[ABSTRACT]
Recently the state space models (SSMs) with efficient hardware-aware designs,
i.e., the Mamba deep learning model, have shown great potential for long
sequence modeling. Meanwhile building efficient and generic vision backbones
purely upon SSMs is an appealing direction. However, representing visual data
is challenging for SSMs due to the position-sensitivity of visual data and the
requirement of global context for visual understanding. In this paper, we show
that the reliance on self-attention for visual representation learning is not
necessary and propose a new generic vision backbone with bidirectional Mamba
blocks (Vim), which marks the image sequences with position embeddings and
compresses the visual representation with bidirectional state space models. On
ImageNet classification, COCO object detection, and ADE20k semantic
segmentation tasks, Vim achieves higher performance compared to
well-established vision transformers like DeiT, while also demonstrating
significantly improved computation & memory efficiency. For example, Vim is
2.8$\times$ faster than DeiT and saves 86.8% GPU memory when performing batch
inference to extract features on images with a resolution of 1248$\times$1248.
The results demonstrate that Vim is capable of overcoming the computation &
memory constraints on performing Transformer-style understanding for
high-resolution images and it has great potential to be the next-generation
backbone for vision foundation models. Code is available at
https://github.com/hustvl/Vim.
[COMMENTS]
Vision Mamba (Vim) is accepted by ICML 2024. Code is available at
https://github.com/hustvl/Vim
[LINK]
http://arxiv.org/abs/2401.09417v3
[DATE]
2024-11-14 10:00:33+08:00
[CATEGORIES]
cs.LG
Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery
[AUTHORS]
Valentin Frank Ingmar Guenter, Athanasios Sideris
[ABSTRACT]
We propose a novel algorithm for combined unit/filter and layer pruning of
deep neural networks that functions during training and without requiring a
pre-trained network to apply. Our algorithm optimally trades-off learning
accuracy and pruning levels while balancing layer vs. unit/filter pruning and
computational vs. parameter complexity using only three user-defined
parameters, which are easy to interpret and tune. The optimal network structure
is found as the solution of a stochastic optimization problem over the network
weights and the parameters of variational Bernoulli distributions for 0/1
Random Variables scaling the units and layers of the network. Pruning occurs
when a variational parameter converges to 0 rendering the corresponding
structure permanently inactive, thus saving computations during training and
prediction. A key contribution of our approach is to define a cost function
that combines the objectives of prediction accuracy and network pruning in a
computational/parameter complexity-aware manner and the automatic selection of
the many regularization parameters. We show that the solutions of the
optimization problem to which the algorithm converges are deterministic
networks. We analyze the ODE system that underlies our stochastic optimization
algorithm and establish domains of attraction around zero for the dynamics of
the network parameters. These results provide theoretical support for safely
pruning units/filters and/or layers during training and lead to practical
pruning conditions. We evaluate our method on the CIFAR-10/100 and ImageNet
datasets using ResNet architectures and demonstrate that our method improves
upon layer only or unit only pruning and favorably competes with combined
unit/filter and layer pruning algorithms requiring pre-trained networks with
respect to pruning ratios and test accuracy.
[COMMENTS]
28 pages, 4 figures, 5 tables
[LINK]
http://arxiv.org/abs/2411.09127v1
[DATE]
2024-11-14 10:00:22+08:00
[CATEGORIES]
cs.LG
Neural Graph Simulator for Complex Systems
[AUTHORS]
Hoyun Choi, Sungyeop Lee, B. Kahng, Junghyo Jo
[ABSTRACT]
Numerical simulation is a predominant tool for studying the dynamics in
complex systems, but large-scale simulations are often intractable due to
computational limitations. Here, we introduce the Neural Graph Simulator (NGS)
for simulating time-invariant autonomous systems on graphs. Utilizing a graph
neural network, the NGS provides a unified framework to simulate diverse
dynamical systems with varying topologies and sizes without constraints on
evaluation times through its non-uniform time step and autoregressive approach.
The NGS offers significant advantages over numerical solvers by not requiring
prior knowledge of governing equations and effectively handling noisy or
missing data with a robust training scheme. It demonstrates superior
computational efficiency over conventional methods, improving performance by
over $10^5$ times in stiff problems. Furthermore, it is applied to real traffic
data, forecasting traffic flow with state-of-the-art accuracy. The versatility
of the NGS extends beyond the presented cases, offering numerous potential
avenues for enhancement.
[LINK]
http://arxiv.org/abs/2411.09120v1
[DATE]
2024-11-14 09:41:00+08:00
[CATEGORIES]
cs.LG
FxTS-Net: Fixed-Time Stable Learning Framework for Neural ODEs
[AUTHORS]
Chaoyang Luo, Yan Zou, Wanying Li, Nanjing Huang
[ABSTRACT]
Neural Ordinary Differential Equations (Neural ODEs), as a novel category of
modeling big data methods, cleverly link traditional neural networks and
dynamical systems. However, it is challenging to ensure the dynamics system
reaches a correctly predicted state within a user-defined fixed time. To
address this problem, we propose a new method for training Neural ODEs using
fixed-time stability (FxTS) Lyapunov conditions. Our framework, called
FxTS-Net, is based on the novel FxTS loss (FxTS-Loss) designed on Lyapunov
functions, which aims to encourage convergence to accurate predictions in a
user-defined fixed time. We also provide an innovative approach for
constructing Lyapunov functions to meet various tasks and network architecture
requirements, achieved by leveraging supervised information during training. By
developing a more precise time upper bound estimation for bounded
non-vanishingly perturbed systems, we demonstrate that minimizing FxTS-Loss not
only guarantees FxTS behavior of the dynamics but also input perturbation
robustness. For optimising FxTS-Loss, we also propose a learning algorithm, in
which the simulated perturbation sampling method can capture sample points in
critical regions to approximate FxTS-Loss. Experimentally, we find that
FxTS-Net provides better prediction performance and better robustness under
input perturbation.
[LINK]
http://arxiv.org/abs/2411.09118v1
[DATE]
2024-11-14 09:37:24+08:00
[CATEGORIES]
cs.LG
Efficiently learning and sampling multimodal distributions with data-based initialization
[AUTHORS]
Frederic Koehler, Holden Lee, Thuy-Duong Vuong
[ABSTRACT]
We consider the problem of sampling a multimodal distribution with a Markov
chain given a small number of samples from the stationary measure. Although
mixing can be arbitrarily slow, we show that if the Markov chain has a $k$th
order spectral gap, initialization from a set of $\tilde O(k/\varepsilon^2)$
samples from the stationary distribution will, with high probability over the
samples, efficiently generate a sample whose conditional law is
$\varepsilon$-close in TV distance to the stationary measure. In particular,
this applies to mixtures of $k$ distributions satisfying a Poincar'e
inequality, with faster convergence when they satisfy a log-Sobolev inequality.
Our bounds are stable to perturbations to the Markov chain, and in particular
work for Langevin diffusion over $\mathbb R^d$ with score estimation error, as
well as Glauber dynamics combined with approximation error from
pseudolikelihood estimation. This justifies the success of data-based
initialization for score matching methods despite slow mixing for the data
distribution, and improves and generalizes the results of Koehler and Vuong
(2023) to have linear, rather than exponential, dependence on $k$ and apply to
arbitrary semigroups. As a consequence of our results, we show for the first
time that a natural class of low-complexity Ising measures can be efficiently
learned from samples.
[LINK]
http://arxiv.org/abs/2411.09117v1
[DATE]
2024-11-14 09:37:02+08:00
[CATEGORIES]
cs.LG
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics
[AUTHORS]
Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang
[ABSTRACT]
We study a game between autobidding algorithms that compete in an online
advertising platform. Each autobidder is tasked with maximizing its
advertiser’s total value over multiple rounds of a repeated auction, subject to
budget and return-on-investment constraints. We propose a gradient-based
learning algorithm that is guaranteed to satisfy all constraints and achieves
vanishing individual regret. Our algorithm uses only bandit feedback and can be
used with the first- or second-price auction, as well as with any
“intermediate” auction format. Our main result is that when these autobidders
play against each other, the resulting expected liquid welfare over all rounds
is at least half of the expected optimal liquid welfare achieved by any
allocation. This holds whether or not the bidding dynamics converges to an
equilibrium.
[COMMENTS]
Appeared at COLT 2024. Numerical experiments added since Jun’24
version
[LINK]
http://arxiv.org/abs/2301.13306v4
[DATE]
2024-11-14 09:18:01+08:00
[CATEGORIES]
cs.LG