LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
[AUTHORS]
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
[ABSTRACT]
We present LongLoRA, an efficient fine-tuning approach that extends the
context sizes of pre-trained large language models (LLMs), with limited
computation cost. Typically, training LLMs with long context sizes is
computationally expensive, requiring extensive training hours and GPU
resources. For example, training on the context length of 8192 needs 16x
computational costs in self-attention layers as that of 2048. In this paper, we
speed up the context extension of LLMs in two aspects. On the one hand,
although dense global attention is needed during inference, fine-tuning the
model can be effectively and efficiently done by sparse local attention. The
proposed shift short attention effectively enables context extension, leading
to non-trivial computation saving with similar performance to fine-tuning with
vanilla attention. Particularly, it can be implemented with only two lines of
code in training, while being optional in inference. On the other hand, we
revisit the parameter-efficient fine-tuning regime for context expansion.
Notably, we find that LoRA for context extension works well under the premise
of trainable embedding and normalization. LongLoRA demonstrates strong
empirical results on various tasks on LLaMA2 models from 7B/13B to 70B.
LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a
single 8x A100 machine. LongLoRA extends models’ context while retaining their
original architectures, and is compatible with most existing techniques, like
FlashAttention-2. In addition, to make LongLoRA practical, we collect a
dataset, LongQA, for supervised fine-tuning. It contains more than 3k long
context question-answer pairs.
[COMMENTS]
Code, models, dataset, and demo are available at
https://github.com/dvlab-research/LongLoRA
[LINK]
http://arxiv.org/abs/2309.12307v1
[DATE]
2023-09-22 01:59:11+08:00
[CATEGORIES]
cs.CL
cs.LG
Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models
[AUTHORS]
Levon Haroutunian, Zhuang Li, Lucian Galescu, Philip Cohen, Raj Tumuluri, Gholamreza Haffari
[ABSTRACT]
Large language models (LLMs) have demonstrated impressive capabilities in
natural language generation. However, their output quality can be inconsistent,
posing challenges for generating natural language from logical forms (LFs).
This task requires the generated outputs to embody the exact semantics of LFs,
without missing any LF semantics or creating any hallucinations. In this work,
we tackle this issue by proposing a novel generate-and-rerank approach. Our
approach involves initially generating a set of candidate outputs by prompting
an LLM and subsequently reranking them using a task-specific reranker model. In
addition, we curate a manually collected dataset to evaluate the alignment
between different ranking metrics and human judgements. The chosen ranking
metrics are utilized to enhance the training and evaluation of the reranker
model. By conducting extensive experiments on three diverse datasets, we
demonstrate that the candidates selected by our reranker outperform those
selected by baseline methods in terms of semantic consistency and fluency, as
measured by three comprehensive metrics. Our findings provide strong evidence
for the effectiveness of our approach in improving the quality of generated
outputs.
[COMMENTS]
IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.12294v1
[DATE]
2023-09-22 01:54:58+08:00
[CATEGORIES]
cs.CL
On the Relationship between Skill Neurons and Robustness in Prompt Tuning
[AUTHORS]
Leon Ackermann, Xenia Ohmer
[ABSTRACT]
Prompt Tuning is a popular parameter-efficient finetuning method for
pre-trained large language models (PLMs). Recently, based on experiments with
RoBERTa, it has been suggested that Prompt Tuning activates specific neurons in
the transformer’s feed-forward networks, that are highly predictive and
selective for the given task. In this paper, we study the robustness of Prompt
Tuning in relation to these “skill neurons”, using RoBERTa and T5. We show that
prompts tuned for a specific task are transferable to tasks of the same type
but are not very robust to adversarial data, with higher robustness for T5 than
RoBERTa. At the same time, we replicate the existence of skill neurons in
RoBERTa and further show that skill neurons also seem to exist in T5.
Interestingly, the skill neurons of T5 determined on non-adversarial data are
also among the most predictive neurons on the adversarial data, which is not
the case for RoBERTa. We conclude that higher adversarial robustness may be
related to a model’s ability to activate the relevant skill neurons on
adversarial data.
[LINK]
http://arxiv.org/abs/2309.12263v1
[DATE]
2023-09-22 01:13:21+08:00
[CATEGORIES]
cs.CL
Instruction Tuning for Large Language Models: A Survey
[AUTHORS]
Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang
[ABSTRACT]
This paper surveys research works in the quickly advancing field of
instruction tuning (IT), a crucial technique to enhance the capabilities and
controllability of large language models (LLMs). Instruction tuning refers to
the process of further training LLMs on a dataset consisting of
\textsc{(instruction, output)} pairs in a supervised fashion, which bridges the
gap between the next-word prediction objective of LLMs and the users’ objective
of having LLMs adhere to human instructions. In this work, we make a systematic
review of the literature, including the general methodology of IT, the
construction of IT datasets, the training of IT models, and applications to
different modalities, domains and applications, along with an analysis on
aspects that influence the outcome of IT (e.g., generation of instruction
outputs, size of the instruction dataset, etc). We also review the potential
pitfalls of IT along with criticism against it, along with efforts pointing out
current deficiencies of existing strategies and suggest some avenues for
fruitful research.
[COMMENTS]
A Survey paper, Pre-print
[LINK]
http://arxiv.org/abs/2308.10792v2
[DATE]
2023-09-22 00:54:23+08:00
[CATEGORIES]
cs.CL
cs.LG
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References
[AUTHORS]
Matteo Gabburo, Siddhant Garg, Rik Koncel Kedziorski, Alessandro Moschitti
[ABSTRACT]
Evaluation of QA systems is very challenging and expensive, with the most
reliable approach being human annotations of correctness of answers for
questions. Recent works (AVA, BEM) have shown that transformer LM encoder based
similarity metrics transfer well for QA evaluation, but they are limited by the
usage of a single correct reference answer. We propose a new evaluation metric:
SQuArE (Sentence-level QUestion AnsweRing Evaluation), using multiple reference
answers (combining multiple correct and incorrect references) for sentence-form
QA. We evaluate SQuArE on both sentence-level extractive (Answer Selection) and
generative (GenQA) QA systems, across multiple academic and industrial
datasets, and show that it outperforms previous baselines and obtains the
highest correlation with human annotations.
[COMMENTS]
Accepted to IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.12250v1
[DATE]
2023-09-22 00:51:30+08:00
[CATEGORIES]
cs.CL
cs.LG
Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition
[AUTHORS]
Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang
[ABSTRACT]
In this study, we present synchronous bilingual Connectionist Temporal
Classification (CTC), an innovative framework that leverages dual CTC to bridge
the gaps of both modality and language in the speech translation (ST) task.
Utilizing transcript and translation as concurrent objectives for CTC, our
model bridges the gap between audio and text as well as between source and
target languages. Building upon the recent advances in CTC application, we
develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art
performances on the MuST-C ST benchmarks under resource-constrained scenarios.
Intriguingly, our method also yields significant improvements in speech
recognition performance, revealing the effect of cross-lingual learning on
transcription and demonstrating its broad applicability. The source code is
available at https://github.com/xuchennlp/S2T.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.12234v1
[DATE]
2023-09-22 00:28:42+08:00
[CATEGORIES]
cs.CL
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
[AUTHORS]
Deepak Gupta, Kush Attal, Dina Demner-Fushman
[ABSTRACT]
The increase in the availability of online videos has transformed the way we
access information and knowledge. A growing number of individuals now prefer
instructional videos as they offer a series of step-by-step procedures to
accomplish particular tasks. The instructional videos from the medical domain
may provide the best possible visual answers to first aid, medical emergency,
and medical education questions. Toward this, this paper is focused on
answering health-related questions asked by the public by providing visual
answers from medical videos. The scarcity of large-scale datasets in the
medical domain is a key challenge that hinders the development of applications
that can help the public with their health-related questions. To address this
issue, we first proposed a pipelined approach to create two large-scale
datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we proposed monomodal
and multimodal approaches that can effectively provide visual answers from
medical videos to natural language questions. We conducted a comprehensive
analysis of the results, focusing on the impact of the created datasets on
model training and the significance of visual features in enhancing the
performance of the monomodal and multi-modal approaches. Our findings suggest
that these datasets have the potential to enhance the performance of medical
visual answer localization tasks and provide a promising future direction to
further enhance the performance by using pre-trained language-vision models.
[COMMENTS]
Work in progress
[LINK]
http://arxiv.org/abs/2309.12224v1
[DATE]
2023-09-22 00:21:28+08:00
[CATEGORIES]
cs.CL
Environment-biased Feature Ranking for Novelty Detection Robustness
[AUTHORS]
Stefan Smeu, Elena Burceanu, Emanuela Haller, Andrei Liviu Nicolicioiu
[COMMENTS]
ICCV 2024 - Workshop on Out Of Distribution Generalization in
Computer Vision
[LINK]
http://arxiv.org/abs/2309.12301v1
[DATE]
2023-09-22 01:58:26+08:00
[CATEGORIES]
cs.LG
See to Touch: Learning Tactile Dexterity through Visual Incentives
[AUTHORS]
Irmak Guzey, Yinlong Dai, Ben Evans, Soumith Chintala, Lerrel Pinto
[ABSTRACT]
Equipping multi-fingered robots with tactile sensing is crucial for achieving
the precise, contact-rich, and dexterous manipulation that humans excel at.
However, relying solely on tactile sensing fails to provide adequate cues for
reasoning about objects’ spatial configurations, limiting the ability to
correct errors and adapt to changing situations. In this paper, we present
Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances
tactile-based dexterity by optimizing dexterous policies using vision-based
rewards. First, we use a contrastive-based objective to learn visual
representations. Next, we construct a reward function using these visual
representations through optimal-transport based matching on one human
demonstration. Finally, we use online reinforcement learning on our robot to
optimize tactile-based policies that maximize the visual reward. On six
challenging tasks, such as peg pick-and-place, unstacking bowls, and flipping
slender objects, TAVI achieves a success rate of 73% using our four-fingered
Allegro robot hand. The increase in performance is 108% higher than policies
using tactile and vision-based rewards and 135% higher than policies without
tactile observational input. Robot videos are best viewed on our project
website: https://see-to-touch.github.io/.
[LINK]
http://arxiv.org/abs/2309.12300v1
[DATE]
2023-09-22 01:58:13+08:00
[CATEGORIES]
cs.LG
Learning to Drive Anywhere
[AUTHORS]
Ruizhao Zhu, Peng Huang, Eshed Ohn-Bar, Venkatesh Saligrama
[ABSTRACT]
Human drivers can seamlessly adapt their driving decisions across
geographical locations with diverse conditions and rules of the road, e.g.,
left vs. right-hand traffic. In contrast, existing models for autonomous
driving have been thus far only deployed within restricted operational domains,
i.e., without accounting for varying driving behaviors across locations or
model scalability. In this work, we propose AnyD, a single geographically-aware
conditional imitation learning (CIL) model that can efficiently learn from
heterogeneous and globally distributed data with dynamic environmental,
traffic, and social characteristics. Our key insight is to introduce a
high-capacity geo-location-based channel attention mechanism that effectively
adapts to local nuances while also flexibly modeling similarities among regions
in a data-driven manner. By optimizing a contrastive imitation objective, our
proposed approach can efficiently scale across inherently imbalanced data
distributions and location-dependent events. We demonstrate the benefits of our
AnyD agent across multiple datasets, cities, and scalable deployment paradigms,
i.e., centralized, semi-supervised, and distributed agent training.
Specifically, AnyD outperforms CIL baselines by over 14% in open-loop
evaluation and 30% in closed-loop testing on CARLA.
[COMMENTS]
Conference on Robot Learning (CoRL) 2023. https://any-d.github.io/
[LINK]
http://arxiv.org/abs/2309.12295v1
[DATE]
2023-09-22 01:55:36+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Ben Maman, Johannes Zeitler, Meinard Müller, Amit H. Bermano [ABSTRACT]
Generating multi-instrument music from symbolic music representations is an
important task in Music Information Retrieval (MIR). A central but still
largely unsolved problem in this context is musically and acoustically informed
control in the generation process. As the main contribution of this work, we
propose enhancing control of multi-instrument synthesis by conditioning a
generative model on a specific performance and recording environment, thus
allowing for better guidance of timbre and style. Building on state-of-the-art
diffusion-based music generative models, we introduce performance conditioninga simple tool indicating the generative model to synthesize music with style
and timbre of specific instruments taken from specific performances. Our
prototype is evaluated using uncurated performances with diverse
instrumentation and achieves state-of-the-art FAD realism scores while allowing
novel timbre and style control. Our project page, including samples and
demonstrations, is available at benadar293.github.io/midipm
[COMMENTS]
5 pages, project page available at benadar293.github.io/midipm
[LINK]
http://arxiv.org/abs/2309.12283v1
[DATE]
2023-09-22 01:44:57+08:00
[CATEGORIES]
cs.LG
A Constructive Approach to Function Realization by Neural Stochastic Differential Equations
[AUTHORS]
Tanya Veeravalli, Maxim Raginsky
[ABSTRACT]
The problem of function approximation by neural dynamical systems has
typically been approached in a top-down manner: Any continuous function can be
approximated to an arbitrary accuracy by a sufficiently complex model with a
given architecture. This can lead to high-complexity controls which are
impractical in applications. In this paper, we take the opposite, constructive
approach: We impose various structural restrictions on system dynamics and
consequently characterize the class of functions that can be realized by such a
system. The systems are implemented as a cascade interconnection of a neural
stochastic differential equation (Neural SDE), a deterministic dynamical
system, and a readout map. Both probabilistic and geometric (Lie-theoretic)
methods are used to characterize the classes of functions realized by such
systems.
[COMMENTS]
6 pages, 1 pdf figure; final version accepted to IEEE Conference on
Decision and Control
[LINK]
http://arxiv.org/abs/2307.00215v2
[DATE]
2023-09-22 01:25:50+08:00
[CATEGORIES]
cs.LG
Domain-knowledge Inspired Pseudo Supervision (DIPS) for Unsupervised Image-to-Image Translation Models to Support Cross-Domain Classification
[AUTHORS]
Firas Al-Hindawi, Md Mahfuzur Rahman Siddiquee, Teresa Wu, Han Hu, Ying Sun
[ABSTRACT]
The ability to classify images is dependent on having access to large labeled
datasets and testing on data from the same domain that the model can train on.
Classification becomes more challenging when dealing with new data from a
different domain, where gathering and especially labeling a larger image
dataset for retraining a classification model requires a labor-intensive human
effort. Cross-domain classification frameworks were developed to handle this
data domain shift problem by utilizing unsupervised image-to-image translation
models to translate an input image from the unlabeled domain to the labeled
domain. The problem with these unsupervised models lies in their unsupervised
nature. For lack of annotations, it is not possible to use the traditional
supervised metrics to evaluate these translation models to pick the best-saved
checkpoint model. This paper introduces a new method called Domain-knowledge
Inspired Pseudo Supervision (DIPS) which utilizes domain-informed Gaussian
Mixture Models to generate pseudo annotations to enable the use of traditional
supervised metrics. This method was designed specifically to support
cross-domain classification applications contrary to other typically used
metrics such as the FID which were designed to evaluate the model in terms of
the quality of the generated image from a human-eye perspective. DIPS proves
its effectiveness by outperforming various GAN evaluation metrics, including
FID, when selecting the optimal saved checkpoint model. It is also evaluated
against truly supervised metrics. Furthermore, DIPS showcases its robustness
and interpretability by demonstrating a strong correlation with truly
supervised metrics, highlighting its superiority over existing state-of-the-art
alternatives. The code and data to replicate the results can be found on the
official Github repository: https://github.com/Hindawi91/DIPS
[COMMENTS]
arXiv admin note: text overlap with arXiv:2212.09107
[LINK]
http://arxiv.org/abs/2303.10310v3
[DATE]
2023-09-22 01:25:08+08:00
[CATEGORIES]
cs.LG
Enabling Quartile-based Estimated-Mean Gradient Aggregation As Baseline for Federated Image Classifications
[AUTHORS]
Yusen Wu, Jamie Deng, Hao Chen, Phuong Nguyen, Yelena Yesha
[ABSTRACT]
Federated Learning (FL) has revolutionized how we train deep neural networks
by enabling decentralized collaboration while safeguarding sensitive data and
improving model performance. However, FL faces two crucial challenges: the
diverse nature of data held by individual clients and the vulnerability of the
FL system to security breaches. This paper introduces an innovative solution
named Estimated Mean Aggregation (EMA) that not only addresses these challenges
but also provides a fundamental reference point as a $\mathsf{baseline}$ for
advanced aggregation techniques in FL systems. EMA’s significance lies in its
dual role: enhancing model security by effectively handling malicious outliers
through trimmed means and uncovering data heterogeneity to ensure that trained
models are adaptable across various client datasets. Through a wealth of
experiments, EMA consistently demonstrates high accuracy and area under the
curve (AUC) compared to alternative methods, establishing itself as a robust
baseline for evaluating the effectiveness and security of FL aggregation
methods. EMA’s contributions thus offer a crucial step forward in advancing the
efficiency, security, and versatility of decentralized deep learning in the
context of FL.
[LINK]
http://arxiv.org/abs/2309.12267v1
[DATE]
2023-09-22 01:17:28+08:00
[CATEGORIES]
cs.LG
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
[AUTHORS]
Hao Chen, Yusen Wu, Phuong Nguyen, Chao Liu, Yelena Yesha
[ABSTRACT]
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in
deep learning, is often limited to converging to local optima due to the
non-convex nature of the problem. Leveraging these local optima to improve
model performance remains a challenging task. Given the inherent complexity of
neural networks, the simple arithmetic averaging of the obtained local optima
models in undesirable results. This paper proposes a {\em soft merging} method
that facilitates rapid merging of multiple models, simplifies the merging of
specific parts of neural networks, and enhances robustness against malicious
models with extreme values. This is achieved by learning gate parameters
through a surrogate of the $l_0$ norm using hard concrete distribution without
modifying the model weights of the given local optima models. This merging
process not only enhances the model performance by converging to a better local
optimum, but also minimizes computational costs, offering an efficient and
explicit learning process integrated with stochastic gradient descent. Thorough
experiments underscore the effectiveness and superior performance of the merged
neural networks.
[LINK]
http://arxiv.org/abs/2309.12259v1
[DATE]
2023-09-22 01:07:31+08:00
[CATEGORIES]
cs.LG
Multi-agent Deep Covering Skill Discovery
[AUTHORS]
Jiayu Chen, Marina Haliem, Tian Lan, Vaneet Aggarwal
[ABSTRACT]
The use of skills (a.k.a., options) can greatly accelerate exploration in
reinforcement learning, especially when only sparse reward signals are
available. While option discovery methods have been proposed for individual
agents, in multi-agent reinforcement learning settings, discovering
collaborative options that can coordinate the behavior of multiple agents and
encourage them to visit the under-explored regions of their joint state space
has not been considered. In this case, we propose Multi-agent Deep Covering
Option Discovery, which constructs the multi-agent options through minimizing
the expected cover time of the multiple agents’ joint state space. Also, we
propose a novel framework to adopt the multi-agent options in the MARL process.
In practice, a multi-agent task can usually be divided into some sub-tasks,
each of which can be completed by a sub-group of the agents. Therefore, our
algorithm framework first leverages an attention mechanism to find
collaborative agent sub-groups that would benefit most from coordinated
actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn
the multi-agent options for each sub-group to complete their sub-tasks first,
and then to integrate them through a high-level policy as the solution of the
whole task. This hierarchical option construction allows our framework to
strike a balance between scalability and effective collaboration among the
agents. The evaluation based on multi-agent collaborative tasks shows that the
proposed algorithm can effectively capture the agent interactions with the
attention mechanism, successfully identify multi-agent options, and
significantly outperforms prior works using single-agent options or no options,
in terms of both faster exploration and higher task rewards.
[COMMENTS]
This paper was presented in part at the ICML Reinforcement Learning
for Real Life Workshop, July 2021
[LINK]
http://arxiv.org/abs/2210.03269v3
[DATE]
2023-09-22 01:01:10+08:00
[CATEGORIES]
cs.LG
SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning
[AUTHORS]
Julian Minder, Florian Grötschla, Joël Mathys, Roger Wattenhofer
[ABSTRACT]
We introduce an extension to the CLRS algorithmic learning benchmark,
prioritizing scalability and the utilization of sparse representations. Many
algorithms in CLRS require global memory or information exchange, mirrored in
its execution model, which constructs fully connected (not sparse) graphs based
on the underlying problem. Despite CLRS’s aim of assessing how effectively
learned algorithms can generalize to larger instances, the existing execution
model becomes a significant constraint due to its demanding memory requirements
and runtime (hard to scale). However, many important algorithms do not demand a
fully connected graph; these algorithms, primarily distributed in nature, align
closely with the message-passing paradigm employed by Graph Neural Networks.
Hence, we propose SALSA-CLRS, an extension of the current CLRS benchmark
specifically with scalability and sparseness in mind. Our approach includes
adapted algorithms from the original CLRS benchmark and introduces new problems
from distributed and randomized algorithms. Moreover, we perform a thorough
empirical evaluation of our benchmark. Code is publicly available at
https://github.com/jkminder/SALSA-CLRS.
[LINK]
http://arxiv.org/abs/2309.12253v1
[DATE]
2023-09-22 00:57:09+08:00
[CATEGORIES]
cs.LG
Parallelizing non-linear sequential models over the sequence length
[AUTHORS]
Yi Heng Lim, Qi Zhu, Joshua Selfridge, Muhammad Firmansyah Kasim
[ABSTRACT]
Sequential models, such as Recurrent Neural Networks and Neural Ordinary
Differential Equations, have long suffered from slow training due to their
inherent sequential nature. For many years this bottleneck has persisted, as
many thought sequential models could not be parallelized. We challenge this
long-held belief with our parallel algorithm that accelerates GPU evaluation of
sequential models by up to 3 orders of magnitude faster without compromising
output accuracy. The algorithm does not need any special structure in the
sequential models’ architecture, making it applicable to a wide range of
architectures. Using our method, training sequential models can be more than 10
times faster than the common sequential method without any meaningful
difference in the training results. Leveraging this accelerated training, we
discovered the efficacy of the Gated Recurrent Unit in a long time series
classification problem with 17k time samples. By overcoming the training
bottleneck, our work serves as the first step to unlock the potential of
non-linear sequential models for long sequence problems.
[LINK]
http://arxiv.org/abs/2309.12252v1
[DATE]
2023-09-22 00:52:34+08:00
[CATEGORIES]
cs.LG
Adaptive Input-image Normalization for Solving Mode Collapse Problem in GAN-based X-ray Images
[AUTHORS]
Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O’Reilly
[ABSTRACT]
Biomedical image datasets can be imbalanced due to the rarity of targeted
diseases. Generative Adversarial Networks play a key role in addressing this
imbalance by enabling the generation of synthetic images to augment datasets.
It is important to generate synthetic images that incorporate a diverse range
of features to accurately represent the distribution of features present in the
training imagery. Furthermore, the absence of diverse features in synthetic
images can degrade the performance of machine learning classifiers. The mode
collapse problem impacts Generative Adversarial Networks’ capacity to generate
diversified images. Mode collapse comes in two varieties: intra-class and
inter-class. In this paper, both varieties of the mode collapse problem are
investigated, and their subsequent impact on the diversity of synthetic X-ray
images is evaluated. This work contributes an empirical demonstration of the
benefits of integrating the adaptive input-image normalization with the Deep
Convolutional GAN and Auxiliary Classifier GAN to alleviate the mode collapse
problems. Synthetically generated images are utilized for data augmentation and
training a Vision Transformer model. The classification performance of the
model is evaluated using accuracy, recall, and precision scores. Results
demonstrate that the DCGAN and the ACGAN with adaptive input-image
normalization outperform the DCGAN and ACGAN with un-normalized X-ray images as
evidenced by the superior diversity scores and classification scores.
[COMMENTS]
Submitted to the IEEE Journal
[LINK]
http://arxiv.org/abs/2309.12245v1
[DATE]
2023-09-22 00:43:29+08:00
[CATEGORIES]
cs.LG
Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
[AUTHORS]
Jarosław Błasiok, Preetum Nakkiran
[ABSTRACT]
Calibration measures and reliability diagrams are two fundamental tools for
measuring and interpreting the calibration of probabilistic predictors.
Calibration measures quantify the degree of miscalibration, and reliability
diagrams visualize the structure of this miscalibration. However, the most
common constructions of reliability diagrams and calibration measures –
binning and ECE – both suffer from well-known flaws (e.g. discontinuity). We
show that a simple modification fixes both constructions: first smooth the
observations using an RBF kernel, then compute the Expected Calibration Error
(ECE) of this smoothed function. We prove that with a careful choice of
bandwidth, this method yields a calibration measure that is well-behaved in the
sense of (B{\l}asiok, Gopalan, Hu, and Nakkiran 2023a) – a consistent
calibration measure. We call this measure the SmoothECE. Moreover, the
reliability diagram obtained from this smoothed function visually encodes the
SmoothECE, just as binned reliability diagrams encode the BinnedECE.
We also provide a Python package with simple, hyperparameter-free methods for
measuring and plotting calibration: pip install relplot\
.
[COMMENTS]
Code at: https://github.com/apple/ml-calibration
[LINK]
http://arxiv.org/abs/2309.12236v1
[DATE]
2023-09-22 00:30:22+08:00
[CATEGORIES]
cs.LG
Prodigy: An Expeditiously Adaptive Parameter-Free Learner
[AUTHORS]
Konstantin Mishchenko, Aaron Defazio
[ABSTRACT]
We consider the problem of estimating the learning rate in adaptive methods,
such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to
provably estimate the distance to the solution $D$, which is needed to set the
learning rate optimally. Our techniques are modifications of the D-Adaptation
method for learning-rate-free learning. Our methods improve upon the
convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where
$d_0$ is the initial estimate of $D$. We test our methods on 12 common
logistic-regression benchmark datasets, VGG11 and ResNet-50 training on
CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on
Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT
transformer training on BookWiki. Our experimental results show that our
approaches consistently outperform D-Adaptation and reach test accuracy values
close to that of hand-tuned Adam.
[LINK]
http://arxiv.org/abs/2306.06101v2
[DATE]
2023-09-22 00:29:31+08:00
[CATEGORIES]
cs.LG
Smooth Nash Equilibria: Algorithms and Complexity
[AUTHORS]
Constantinos Daskalakis, Noah Golowich, Nika Haghtalab, Abhishek Shetty
[ABSTRACT]
A fundamental shortcoming of the concept of Nash equilibrium is its
computational intractability: approximating Nash equilibria in normal-form
games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis,
we introduce a relaxed variant of Nash equilibrium called $\sigma$-smooth Nash
equilibrium, for a smoothness parameter $\sigma$. In a $\sigma$-smooth Nash
equilibrium, players only need to achieve utility at least as high as their
best deviation to a $\sigma$-smooth strategy, which is a distribution that does
not put too much mass (as parametrized by $\sigma$) on any fixed action. We
distinguish two variants of $\sigma$-smooth Nash equilibria: strong
$\sigma$-smooth Nash equilibria, in which players are required to play
$\sigma$-smooth strategies under equilibrium play, and weak $\sigma$-smooth
Nash equilibria, where there is no such requirement.
We show that both weak and strong $\sigma$-smooth Nash equilibria have
superior computational properties to Nash equilibria: when $\sigma$ as well as
an approximation parameter $\epsilon$ and the number of players are all
constants, there is a constant-time randomized algorithm to find a weak
$\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games.
In the same parameter regime, there is a polynomial-time deterministic
algorithm to find a strong $\epsilon$-approximate $\sigma$-smooth Nash
equilibrium in a normal-form game. These results stand in contrast to the
optimal algorithm for computing $\epsilon$-approximate Nash equilibria, which
cannot run in faster than quasipolynomial-time. We complement our upper bounds
by showing that when either $\sigma$ or $\epsilon$ is an inverse polynomial,
finding a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibria becomes
computationally intractable.
[LINK]
http://arxiv.org/abs/2309.12226v1
[DATE]
2023-09-22 00:22:07+08:00
[CATEGORIES]
cs.LG
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
[AUTHORS]
Zhengang Li, Geng Yuan, Tomoharu Yamauchi, Zabihi Masoud, Yanyue Xie, Peiyan Dong, Xulong Tang, Nobuyuki Yoshikawa, Devesh Tiwari, Yanzhi Wang, Olivia Chen
[ABSTRACT]
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with
extremely high energy efficiency. By employing the distinct polarity of current
to denote logic 0' and
1’, AQFP devices serve as excellent carriers for
binary neural network (BNN) computations. Although recent research has made
initial strides toward developing an AQFP-based BNN accelerator, several
critical challenges remain, preventing the design from being a comprehensive
solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN
acceleration framework that leverages software-hardware co-optimization to
eventually make the AQFP devices a feasible solution for BNN acceleration.
Specifically, we investigate the randomized behavior of the AQFP devices and
analyze the impact of crossbar size on current attenuation, subsequently
formulating the current amplitude into the values suitable for use in BNN
computation. To tackle the accumulation problem and improve overall hardware
performance, we propose a stochastic computing-based accumulation module and a
clocking scheme adjustment-based circuit optimization method. We validate our
SupeRBNN framework across various datasets and network architectures, comparing
it with implementations based on different technologies, including CMOS, ReRAM,
and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our
design achieves an energy efficiency of approximately 7.8x10^4 times higher
than that of the ReRAM-based BNN framework while maintaining a similar level of
model accuracy. Furthermore, when compared with superconductor-based
counterparts, our framework demonstrates at least two orders of magnitude
higher energy efficiency.
[COMMENTS]
Accepted by MICRO’23 (56th IEEE/ACM International Symposium on
Microarchitecture)
[LINK]
http://arxiv.org/abs/2309.12212v1
[DATE]
2023-09-22 00:14:42+08:00
[CATEGORIES]
cs.LG
Physics-informed State-space Neural Networks for Transport Phenomena
[AUTHORS]
Akshay J Dave, Richard B. Vilim
[ABSTRACT]
This work introduces Physics-informed State-space neural network Models
(PSMs), a novel solution to achieving real-time optimization, flexibility, and
fault tolerance in autonomous systems, particularly in transport-dominated
systems such as chemical, biomedical, and power plants. Traditional data-driven
methods fall short due to a lack of physical constraints like mass
conservation; PSMs address this issue by training deep neural networks with
sensor data and physics-informing using components’ Partial Differential
Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable
forward dynamics model. Through two in silico experiments - a heated channel
and a cooling system loop - we demonstrate that PSMs offer a more accurate
approach than purely data-driven models.
Beyond accuracy, there are several compelling use cases for PSMs. In this
work, we showcase two: the creation of a nonlinear supervisory controller
through a sequentially updated state-space representation and the proposal of a
diagnostic algorithm using residuals from each of the PDEs. The former
demonstrates the ability of PSMs to handle both constant and time-dependent
constraints, while the latter illustrates their value in system diagnostics and
fault detection. We further posit that PSMs could serve as a foundation for
Digital Twins, constantly updated digital representations of physical systems.
[COMMENTS]
19 pages, 13 figures
[LINK]
http://arxiv.org/abs/2309.12211v1
[DATE]
2023-09-22 00:14:36+08:00
[CATEGORIES]
cs.LG
Boolformer: Symbolic Regression of Logic Functions with Transformers
[AUTHORS]
Stéphane d’Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé
[ABSTRACT]
In this work, we introduce Boolformer, the first Transformer architecture
trained to perform end-to-end symbolic regression of Boolean functions. First,
we show that it can predict compact formulas for complex functions which were
not seen during training, when provided a clean truth table. Then, we
demonstrate its ability to find approximate expressions when provided
incomplete and noisy observations. We evaluate the Boolformer on a broad set of
real-world binary classification datasets, demonstrating its potential as an
interpretable alternative to classic machine learning methods. Finally, we
apply it to the widespread task of modelling the dynamics of gene regulatory
networks. Using a recent benchmark, we show that Boolformer is competitive with
state-of-the art genetic algorithms with a speedup of several orders of
magnitude. Our code and models are available publicly.
[LINK]
http://arxiv.org/abs/2309.12207v1
[DATE]
2023-09-22 00:11:38+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Anuvab Sen, Arul Rhik Mazumder, Udayon Sen [ABSTRACT]
Accurate load forecasting plays a vital role in numerous sectors, but
accurately capturing the complex dynamics of dynamic power systems remains a
challenge for traditional statistical models. For these reasons, time-series
models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly
deployed and often experience higher success. In this paper, we analyze the
efficacy of the recently developed Transformer-based Neural Network model in
Load forecasting. Transformer models have the potential to improve Load
forecasting because of their ability to learn long-range dependencies derived
from their Attention Mechanism. We apply several metaheuristics namely
Differential Evolution to find the optimal hyperparameters of the
Transformer-based Neural Network to produce accurate forecasts. Differential
Evolution provides scalable, robust, global solutions to non-differentiable,
multi-objective, or constrained optimization problems. Our work compares the
proposed Transformer based Neural Network model integrated with different
metaheuristic algorithms by their performance in Load forecasting based on
numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage
Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced
Transformer-based Neural Network models in Load forecasting accuracy and
provide optimal hyperparameters for each model. [COMMENTS]
6 Pages, 6 Figures, 2 Tables, Accepted by the 14th IEEE International
Symposium Series on Computational Intelligence (SSCI 2023), December 5-8,
2023, Mexico City, Mexico [LINK]
http://arxiv.org/abs/2307.15299v3 [DATE]
2023-09-22 00:04:00+08:00 [CATEGORIES]
cs.LG
L1-aware Multilingual Mispronunciation Detection Framework
[AUTHORS]
Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali
[ABSTRACT]
The phonological discrepancies between a speaker’s native (L1) and the
non-native language (L2) serves as a major factor for mispronunciation. This
paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched
with L1-aware speech representation. An end-to-end speech encoder is trained on
the input signal and its corresponding reference phoneme sequence. First, an
attention mechanism is deployed to align the input audio with the reference
phoneme sequence. Afterwards, the L1-L2-speech embedding are extracted from an
auxiliary model, pretrained in a multi-task setup identifying L1 and L2
language, and are infused with the primary network. Finally, the L1-MultiMDD is
then optimized for a unified multilingual phoneme recognition task using
connectionist temporal classification (CTC) loss for the target languages:
English, Arabic, and Mandarin. Our experiments demonstrate the effectiveness of
the proposed L1-MultiMDD framework on both seen – L2-ARTIC, LATIC, and
AraVoiceL2v2; and unseen – EpaDB and Speechocean762 datasets. The consistent
gains in PER, and false rejection rate (FRR) across all target languages
confirm our approach’s robustness, efficacy, and generalizability.
[COMMENTS]
5 papers, submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.07719v2
[DATE]
2023-09-21 23:26:33+08:00
[CATEGORIES]
cs.CL
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
[AUTHORS]
Tianbao Xie, Siheng Zhao, Chen Henry Wu, Yitao Liu, Qian Luo, Victor Zhong, Yanchao Yang, Tao Yu
[ABSTRACT]
Designing reward functions is a longstanding challenge in reinforcement
learning (RL); it requires specialized knowledge or domain data, leading to
high costs for development. To address this, we introduce Text2Reward, a
data-free framework that automates the generation of dense reward functions
based on large language models (LLMs). Given a goal described in natural
language, Text2Reward generates dense reward functions as an executable program
grounded in a compact representation of the environment. Unlike inverse RL and
recent work that uses LLMs to write sparse reward codes, Text2Reward produces
interpretable, free-form dense reward codes that cover a wide range of tasks,
utilize existing packages, and allow iterative refinement with human feedback.
We evaluate Text2Reward on two robotic manipulation benchmarks (ManiSkill2,
MetaWorld) and two locomotion environments of MuJoCo. On 13 of the 17
manipulation tasks, policies trained with generated reward codes achieve
similar or better task success rates and convergence speed than expert-written
reward codes. For locomotion tasks, our method learns six novel locomotion
behaviors with a success rate exceeding 94%. Furthermore, we show that the
policies trained in the simulator with our method can be deployed in the real
world. Finally, Text2Reward further improves the policies by refining their
reward functions with human feedback. Video results are available at
https://text-to-reward.github.io
[COMMENTS]
23 pages, 10 figures, update
[LINK]
http://arxiv.org/abs/2309.11489v2
[DATE]
2023-09-21 23:17:09+08:00
[CATEGORIES]
cs.LG
cs.CL
OSN-MDAD: Machine Translation Dataset for Arabic Multi-Dialectal Conversations on Online Social Media
[AUTHORS]
Fatimah Alzamzami, Abdulmotaleb El Saddik
[ABSTRACT]
While resources for English language are fairly sufficient to understand
content on social media, similar resources in Arabic are still immature. The
main reason that the resources in Arabic are insufficient is that Arabic has
many dialects in addition to the standard version (MSA). Arabs do not use MSA
in their daily communications; rather, they use dialectal versions.
Unfortunately, social users transfer this phenomenon into their use of social
media platforms, which in turn has raised an urgent need for building suitable
AI models for language-dependent applications. Existing machine translation
(MT) systems designed for MSA fail to work well with Arabic dialects. In light
of this, it is necessary to adapt to the informal nature of communication on
social networks by developing MT systems that can effectively handle the
various dialects of Arabic. Unlike for MSA that shows advanced progress in MT
systems, little effort has been exerted to utilize Arabic dialects for MT
systems. While few attempts have been made to build translation datasets for
dialectal Arabic, they are domain dependent and are not OSN cultural-language
friendly. In this work, we attempt to alleviate these limitations by proposing
an online social network-based multidialect Arabic dataset that is crafted by
contextually translating English tweets into four Arabic dialects: Gulf,
Yemeni, Iraqi, and Levantine. To perform the translation, we followed our
proposed guideline framework for content translation, which could be
universally applicable for translation between foreign languages and local
dialects. We validated the authenticity of our proposed dataset by developing
neural MT models for four Arabic dialects. Our results have shown a superior
performance of our NMT models trained using our dataset. We believe that our
dataset can reliably serve as an Arabic multidialectal translation dataset for
informal MT tasks.
[LINK]
http://arxiv.org/abs/2309.12137v1
[DATE]
2023-09-21 22:58:50+08:00
[CATEGORIES]
cs.CL
How-to Guides for Specific Audiences: A Corpus and Initial Findings
[AUTHORS]
Nicola Fanton, Agnieszka Falenska, Michael Roth
[COMMENTS]
ACL 2023 best student paper
[LINK]
http://arxiv.org/abs/2309.12117v1
[DATE]
2023-09-21 22:35:42+08:00
[CATEGORIES]
cs.CL
A Computational Analysis of Vagueness in Revisions of Instructional Texts
[AUTHORS]
Alok Debnath, Michael Roth
[ABSTRACT]
WikiHow is an open-domain repository of instructional articles for a variety
of tasks, which can be revised by users. In this paper, we extract pairwise
versions of an instruction before and after a revision was made. Starting from
a noisy dataset of revision histories, we specifically extract and analyze
edits that involve cases of vagueness in instructions. We further investigate
the ability of a neural model to distinguish between two versions of an
instruction in our data by adopting a pairwise ranking task from previous work
and showing improvements over existing baselines.
[COMMENTS]
EACL 2021 best student paper
[LINK]
http://arxiv.org/abs/2309.12107v1
[DATE]
2023-09-21 22:26:04+08:00
[CATEGORIES]
cs.CL
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam
[AUTHORS]
Matheus L. O. Santos, Cláudio E. C. Campelo
[ABSTRACT]
Although Large Language Models (LLMs) represent a revolution in the way we
interact with computers, allowing the construction of complex questions and the
ability to reason over a sequence of statements, their use is restricted due to
the need for dedicated hardware for execution. In this study, we evaluate the
performance of LLMs based on the 7 and 13 billion LLaMA models, subjected to a
quantization process and run on home hardware. The models considered were
Alpaca, Koala, and Vicuna. To evaluate the effectiveness of these models, we
developed a database containing 1,006 questions from the ENEM (Brazilian
National Secondary School Exam). Our analysis revealed that the best performing
models achieved an accuracy of approximately 46% for the original texts of the
Portuguese questions and 49% on their English translations. In addition, we
evaluated the computational efficiency of the models by measuring the time
required for execution. On average, the 7 and 13 billion LLMs took
approximately 20 and 50 seconds, respectively, to process the queries on a
machine equipped with an AMD Ryzen 5 3600x processor
[COMMENTS]
8 pages, 6 figures, 4 tables
[LINK]
http://arxiv.org/abs/2309.12071v1
[DATE]
2023-09-21 21:39:54+08:00
[CATEGORIES]
cs.CL
Can large language models generate salient negative statements?
[AUTHORS]
Hiba Arnaout, Simon Razniewski
[ABSTRACT]
We examine the ability of large language models (LLMs) to generate salient
(interesting) negative statements about real-world entities; an emerging
research topic of the last few years. We probe the LLMs using zero- and k-shot
unconstrained probes, and compare with traditional methods for negation
generation, i.e., pattern-based textual extractions and knowledge-graph-based
inferences, as well as crowdsourced gold statements. We measure the correctness
and salience of the generated lists about subjects from different domains. Our
evaluation shows that guided probes do in fact improve the quality of generated
negatives, compared to the zero-shot variant. Nevertheless, using both prompts,
LLMs still struggle with the notion of factuality of negatives, frequently
generating many ambiguous statements, or statements with negative keywords but
a positive meaning.
[COMMENTS]
For data, see
https://www.mpi-inf.mpg.de/fileadmin/inf/d5/research/negation_in_KBs/data.csv
[LINK]
http://arxiv.org/abs/2305.16755v2
[DATE]
2023-09-21 21:36:03+08:00
[CATEGORIES]
cs.CL
BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision
[AUTHORS]
Jinzhao Zhou, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin
[ABSTRACT]
This paper presents BELT, a novel model and learning framework for the
pivotal topic of brain-to-language translation research. The translation from
noninvasive brain signals into readable natural language has the potential to
promote the application scenario as well as the development of brain-computer
interfaces (BCI) as a whole. The critical problem in brain signal decoding or
brain-to-language translation is the acquisition of semantically appropriate
and discriminative EEG representation from a dataset of limited scale and
quality. The proposed BELT method is a generic and efficient framework that
bootstraps EEG representation learning using off-the-shelf large-scale
pretrained language models (LMs). With a large LM’s capacity for understanding
semantic information and zero-shot generalization, BELT utilizes large LMs
trained on Internet-scale datasets to bring significant improvements to the
understanding of EEG signals.
In particular, the BELT model is composed of a deep conformer encoder and a
vector quantization encoder. Semantical EEG representation is achieved by a
contrastive learning step that provides natural language supervision. We
achieve state-of-the-art results on two featuring brain decoding tasks
including the brain-to-language translation and zero-shot sentiment
classification. Specifically, our model surpasses the baseline model on both
tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32%
precision on the main evaluation metrics for translation and zero-shot
sentiment classification respectively.
[LINK]
http://arxiv.org/abs/2309.12056v1
[DATE]
2023-09-21 21:24:01+08:00
[CATEGORIES]
cs.CL
CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation
[AUTHORS]
Masato Mita, Soichiro Murakami, Akihiko Kato, Peinan Zhang
[ABSTRACT]
In response to the limitations of manual online ad production, significant
research has been conducted in the field of automatic ad text generation (ATG).
However, comparing different methods has been challenging because of the lack
of benchmarks encompassing the entire field and the absence of well-defined
problem sets with clear model inputs and outputs. To address these challenges,
this paper aims to advance the field of ATG by introducing a redesigned task
and constructing a benchmark. Specifically, we defined ATG as a
cross-application task encompassing various aspects of the Internet
advertising. As part of our contribution, we propose a first benchmark dataset,
CA Multimodal Evaluation for Ad Text GeneRAtion (CAMERA), carefully designed
for ATG to be able to leverage multi-modal information and conduct an
industry-wise evaluation. Furthermore, we demonstrate the usefulness of our
proposed benchmark through evaluation experiments using multiple baseline
models, which vary in terms of the type of pre-trained language model used and
the incorporation of multi-modal information. We also discuss the current state
of the task and the future challenges.
[COMMENTS]
13 pages
[LINK]
http://arxiv.org/abs/2309.12030v1
[DATE]
2023-09-21 20:51:24+08:00
[CATEGORIES]
cs.CL
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
[AUTHORS]
Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo
[ABSTRACT]
Denoising diffusion probabilistic models (DDPMs) have shown promising
performance for speech synthesis. However, a large number of iterative steps
are required to achieve high sample quality, which restricts the inference
speed. Maintaining sample quality while increasing sampling speed has become a
challenging task. In this paper, we propose a “Co”nsistency “Mo”del-based
“Speech” synthesis method, CoMoSpeech, which achieve speech synthesis through a
single diffusion sampling step while achieving high audio quality. The
consistency constraint is applied to distill a consistency model from a
well-designed diffusion-based teacher model, which ultimately yields superior
performances in the distilled CoMoSpeech. Our experiments show that by
generating audio recordings by a single sampling step, the CoMoSpeech achieves
an inference speed more than 150 times faster than real-time on a single NVIDIA
A100 GPU, which is comparable to FastSpeech2, making diffusion-sampling based
speech synthesis truly practical. Meanwhile, objective and subjective
evaluations on text-to-speech and singing voice synthesis show that the
proposed teacher models yield the best audio quality, and the one-step sampling
based CoMoSpeech achieves the best inference speed with better or comparable
audio quality to other conventional multi-step diffusion model baselines. Audio
samples are available at https://comospeech.github.io/.
[COMMENTS]
Accepted to ACM MM 2023
[LINK]
http://arxiv.org/abs/2305.06908v3
[DATE]
2023-09-21 20:32:22+08:00
[CATEGORIES]
cs.CL
cs.LG
Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables
[AUTHORS]
Wenting Zhao, Ye Liu, Yao Wan, Yibo Wang, Zhongfen Deng, Philip S. Yu
[ABSTRACT]
Question answering on tabular data (a.k.a TableQA), which aims at generating
answers to questions grounded on a provided table, has gained significant
attention recently. Prior work primarily produces concise factual responses
through information extraction from individual or limited table cells, lacking
the ability to reason across diverse table cells. Yet, the realm of free-form
TableQA, which demands intricate strategies for selecting relevant table cells
and the sophisticated integration and inference of discrete data fragments,
remains mostly unexplored. To this end, this paper proposes a generalized
three-stage approach: Table-to- Graph conversion and cell localizing, external
knowledge retrieval, and the fusion of table and text (called TAG-QA), to
address the challenge of inferring long free-form answers in generative
TableQA. In particular, TAG-QA (1) locates relevant table cells using a graph
neural network to gather intersecting cells between relevant rows and columns,
(2) leverages external knowledge from Wikipedia, and (3) generates answers by
integrating both tabular data and natural linguistic information. Experiments
showcase the superior capabilities of TAG-QA in generating sentences that are
both faithful and coherent, particularly when compared to several
state-of-the-art baselines. Notably, TAG-QA surpasses the robust pipeline-based
baseline TAPAS by 17% and 14% in terms of BLEU-4 and PARENT F-score,
respectively. Furthermore, TAG-QA outperforms the end-to-end model T5 by 16%
and 12% on BLEU-4 and PARENT F-score, respectively.
[COMMENTS]
Accepted by AACL-IJCNLP 2023
[LINK]
http://arxiv.org/abs/2309.11049v2
[DATE]
2023-09-21 18:57:08+08:00
[CATEGORIES]
cs.CL
Turning Whisper into Real-Time Transcription System
[AUTHORS]
Dominik Macháček, Raj Dabre, Ondřej Bojar
[ABSTRACT]
Whisper is one of the recent state-of-the-art multilingual speech recognition
and translation models, however, it is not designed for real time
transcription. In this paper, we build on top of Whisper and create
Whisper-Streaming, an implementation of real-time speech transcription and
translation of Whisper-like models. Whisper-Streaming uses local agreement
policy with self-adaptive latency to enable streaming transcription. We show
that Whisper-Streaming achieves high quality and 3.3 seconds latency on
unsegmented long-form speech transcription test set, and we demonstrate its
robustness and practical usability as a component in live transcription service
at a multilingual conference.
[COMMENTS]
IJCNLP-AACL 2023 system demonstration
[LINK]
http://arxiv.org/abs/2307.14743v2
[DATE]
2023-09-21 17:41:17+08:00
[CATEGORIES]
cs.CL
Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task
[AUTHORS]
Ricardo Rei, Nuno M. Guerreiro, José Pombal, Daan van Stigt, Marcos Treviso, Luisa Coheur, José G. C. de Souza, André F. T. Martins
[LINK]
http://arxiv.org/abs/2309.11925v1
[DATE]
2023-09-21 17:38:56+08:00
[CATEGORIES]
cs.CL
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
[AUTHORS]
Pengfei Zhu, Chao Pang, Yekun Chai, Lei Li, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu
[ABSTRACT]
In recent years, the burgeoning interest in diffusion models has led to
significant advances in image and speech generation. Nevertheless, the direct
synthesis of music waveforms from unrestricted textual prompts remains a
relatively underexplored domain. In response to this lacuna, this paper
introduces a pioneering contribution in the form of a text-to-waveform music
generation model, underpinned by the utilization of diffusion models. Our
methodology hinges on the innovative incorporation of free-form textual prompts
as conditional factors to guide the waveform generation process within the
diffusion model framework. Addressing the challenge of limited text-music
parallel data, we undertake the creation of a dataset by harnessing web
resources, a task facilitated by weak supervision techniques. Furthermore, a
rigorous empirical inquiry is undertaken to contrast the efficacy of two
distinct prompt formats for text conditioning, namely, music tags and
unconstrained textual descriptions. The outcomes of this comparative analysis
affirm the superior performance of our proposed model in terms of enhancing
text-music relevance. Finally, our work culminates in a demonstrative
exhibition of the excellent capabilities of our model in text-to-music
generation. We further demonstrate that our generated music in the waveform
domain outperforms previous works by a large margin in terms of diversity,
quality, and text-music relevance.
[COMMENTS]
Accepted by AACL demo 2023
[LINK]
http://arxiv.org/abs/2302.04456v2
[DATE]
2023-09-21 17:30:00+08:00
[CATEGORIES]
cs.CL
InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework
[AUTHORS]
Shanglin Lei, Guanting Dong, Xiaoping Wang, Keheng Wang, Sirui Wang
[ABSTRACT]
The development of emotion recognition in dialogue (ERC) has been
consistently hindered by the complexity of pipeline designs, leading to ERC
models that often overfit to specific datasets and dialogue patterns. In this
study, we propose a novel approach, namely
InstructERC, to reformulates the ERC task from a discriminative framework to
a generative framework based on Large Language Models (LLMs) . InstructERC has
two significant contributions: Firstly, InstructERC introduces a simple yet
effective retrieval template module, which helps the model explicitly integrate
multi-granularity dialogue supervision information by concatenating the
historical dialog content, label statement, and emotional domain demonstrations
with high semantic similarity. Furthermore, we introduce two additional emotion
alignment tasks, namely speaker identification and emotion prediction tasks, to
implicitly model the dialogue role relationships and future emotional
tendencies in conversations. Our LLM-based plug-and-play plugin framework
significantly outperforms all previous models and achieves comprehensive SOTA
on three commonly used ERC datasets. Extensive analysis of parameter-efficient
and data-scaling experiments provide empirical guidance for applying
InstructERC in practical scenarios. Our code will be released after blind
review.
[LINK]
http://arxiv.org/abs/2309.11911v1
[DATE]
2023-09-21 17:22:07+08:00
[CATEGORIES]
cs.CL
Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Speech Detection
[AUTHORS]
Sarah Masud, Ashutosh Bajpai, Tanmoy Chakraborty
[ABSTRACT]
Although pre-trained large language models (PLMs) have achieved
state-of-the-art on many NLP tasks, they lack understanding of subtle
expressions of implicit hate speech. Such nuanced and implicit hate is often
misclassified as non-hate. Various attempts have been made to enhance the
detection of (implicit) hate content by augmenting external context or
enforcing label separation via distance-based metrics. We combine these two
approaches and introduce FiADD, a novel Focused Inferential Adaptive Density
Discrimination framework. FiADD enhances the PLM finetuning pipeline by
bringing the surface form of an implicit hate speech closer to its implied form
while increasing the inter-cluster distance among various class labels. We test
FiADD on three implicit hate datasets and observe significant improvement in
the two-way and three-way hate classification tasks. We further experiment on
the generalizability of FiADD on three other tasks, namely detecting sarcasm,
irony, and stance, in which surface and implied forms differ, and observe
similar performance improvement. We analyze the generated latent space to
understand its evolution under FiADD, which corroborates the advantage of
employing FiADD for implicit hate speech detection.
[COMMENTS]
21 pages, 6 Figures and 9 Tables
[LINK]
http://arxiv.org/abs/2309.11896v1
[DATE]
2023-09-21 16:59:24+08:00
[CATEGORIES]
cs.CL
Is It Really Useful to Jointly Parse Constituency and Dependency Trees? A Revisit
[AUTHORS]
Yanggang Gu, Yang Hou, Zhefeng Wang, Xinyu Duan, Zhenghua Li
[ABSTRACT]
This work visits the topic of jointly parsing constituency and dependency
trees, i.e., to produce compatible constituency and dependency trees
simultaneously for input sentences, which is attractive considering that the
two types of trees are complementary in representing syntax. Compared with
previous works, we make progress in four aspects: (1) adopting a much more
efficient decoding algorithm, (2) exploring joint modeling at the training
phase, instead of only at the inference phase, (3) proposing high-order scoring
components for constituent-dependency interaction, (4) gaining more insights
via in-depth experiments and analysis.
[LINK]
http://arxiv.org/abs/2309.11888v1
[DATE]
2023-09-21 16:45:41+08:00
[CATEGORIES]
cs.CL
Scope is all you need: Transforming LLMs for HPC Code
[AUTHORS]
Tal Kadosh, Niranjan Hasabnis, Vy A. Vo, Nadav Schneider, Neva Krien, Abdul Wasay, Nesreen Ahmed, Ted Willke, Guy Tamir, Yuval Pinter, Timothy Mattson, Gal Oren
[ABSTRACT]
With easier access to powerful compute resources, there is a growing trend in
the field of AI for software development to develop larger and larger language
models (LLMs) to address a variety of programming tasks. Even LLMs applied to
tasks from the high-performance computing (HPC) domain are huge in size (e.g.,
billions of parameters) and demand expensive compute resources for training. We
found this design choice confusing - why do we need large LLMs trained on
natural languages and programming languages unrelated to HPC for HPC-specific
tasks? In this line of work, we aim to question design choices made by existing
LLMs by developing smaller LLMs for specific domains - we call them
domain-specific LLMs. Specifically, we start off with HPC as a domain and
propose a novel tokenizer named Tokompiler, designed specifically for
preprocessing code in HPC and compilation-centric tasks. Tokompiler leverages
knowledge of language primitives to generate language-oriented tokens,
providing a context-aware understanding of code structure while avoiding human
semantics attributed to code structures completely. We applied Tokompiler to
pre-train two state-of-the-art models, SPT-Code and Polycoder, for a Fortran
code corpus mined from GitHub. We evaluate the performance of these models
against the conventional LLMs. Results demonstrate that Tokompiler
significantly enhances code completion accuracy and semantic understanding
compared to traditional tokenizers in normalized-perplexity tests, down to ~1
perplexity score. This research opens avenues for further advancements in
domain-specific LLMs, catering to the unique demands of HPC and compilation
tasks.
[LINK]
http://arxiv.org/abs/2308.09440v2
[DATE]
2023-09-21 16:17:51+08:00
[CATEGORIES]
cs.CL
Syntactic Variation Across the Grammar: Modelling a Complex Adaptive System
[AUTHORS]
Jonathan Dunn
[ABSTRACT]
While language is a complex adaptive system, most work on syntactic variation
observes a few individual constructions in isolation from the rest of the
grammar. This means that the grammar, a network which connects thousands of
structures at different levels of abstraction, is reduced to a few disconnected
variables. This paper quantifies the impact of such reductions by
systematically modelling dialectal variation across 49 local populations of
English speakers in 16 countries. We perform dialect classification with both
an entire grammar as well as with isolated nodes within the grammar in order to
characterize the syntactic differences between these dialects. The results
show, first, that many individual nodes within the grammar are subject to
variation but, in isolation, none perform as well as the grammar as a whole.
This indicates that an important part of syntactic variation consists of
interactions between different parts of the grammar. Second, the results show
that the similarity between dialects depends heavily on the sub-set of the
grammar being observed: for example, New Zealand English could be more similar
to Australian English in phrasal verbs but at the same time more similar to UK
English in dative phrases.
[LINK]
http://arxiv.org/abs/2309.11869v1
[DATE]
2023-09-21 16:14:34+08:00
[CATEGORIES]
cs.CL
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
[AUTHORS]
Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang
[ABSTRACT]
This paper explores predicting suitable prosodic features for fine-grained
emotion analysis from the discourse-level text. To obtain fine-grained
emotional prosodic features as predictive values for our model, we extract a
phoneme-level Local Prosody Embedding sequence (LPEs) and a Global Style
Embedding as prosodic speech features from the speech with the help of a style
transfer model. We propose a Discourse-level Multi-scale text Prosodic Model
(D-MPM) that exploits multi-scale text to predict these two prosodic features.
The proposed model can be used to analyze better emotional prosodic features
and thus guide the speech synthesis model to synthesize more expressive speech.
To quantitatively evaluate the proposed model, we contribute a new and
large-scale Discourse-level Chinese Audiobook (DCA) dataset with more than
13,000 utterances annotated sequences to evaluate the proposed model.
Experimental results on the DCA dataset show that the multi-scale text
information effectively helps to predict prosodic features, and the
discourse-level text improves both the overall coherence and the user
experience. More interestingly, although we aim at the synthesis effect of the
style transfer model, the synthesized speech by the proposed text prosodic
analysis model is even better than the style transfer from the original speech
in some user evaluation indicators.
[COMMENTS]
ChinaMM 2023
[LINK]
http://arxiv.org/abs/2309.11849v1
[DATE]
2023-09-21 15:45:44+08:00
[CATEGORIES]
cs.CL
Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues
[AUTHORS]
Norbert Braunschweiler, Rama Doddipatla, Simon Keizer, Svetlana Stoyanchev
[ABSTRACT]
In this paper, we investigate the use of large language models (LLMs) like
ChatGPT for document-grounded response generation in the context of
information-seeking dialogues. For evaluation, we use the MultiDoc2Dial corpus
of task-oriented dialogues in four social service domains previously used in
the DialDoc 2022 Shared Task. Information-seeking dialogue turns are grounded
in multiple documents providing relevant information. We generate dialogue
completion responses by prompting a ChatGPT model, using two methods:
Chat-Completion and LlamaIndex. ChatCompletion uses knowledge from ChatGPT
model pretraining while LlamaIndex also extracts relevant information from
documents. Observing that document-grounded response generation via LLMs cannot
be adequately assessed by automatic evaluation metrics as they are
significantly more verbose, we perform a human evaluation where annotators rate
the output of the shared task winning system, the two Chat-GPT variants
outputs, and human responses. While both ChatGPT variants are more likely to
include information not present in the relevant segments, possibly including a
presence of hallucinations, they are rated higher than both the shared task
winning system and human responses.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2309.11838v1
[DATE]
2023-09-21 15:28:03+08:00
[CATEGORIES]
cs.CL
A Chinese Prompt Attack Dataset for LLMs with Evil Content
[AUTHORS]
Chengyuan Liu, Fubang Zhao, Lizhi Qing, Yangyang Kang, Changlong Sun, Kun Kuang, Fei Wu
[ABSTRACT]
Large Language Models (LLMs) present significant priority in text
understanding and generation. However, LLMs suffer from the risk of generating
harmful contents especially while being employed to applications. There are
several black-box attack methods, such as Prompt Attack, which can change the
behaviour of LLMs and induce LLMs to generate unexpected answers with harmful
contents. Researchers are interested in Prompt Attack and Defense with LLMs,
while there is no publicly available dataset to evaluate the abilities of
defending prompt attack. In this paper, we introduce a Chinese Prompt Attack
Dataset for LLMs, called CPAD. Our prompts aim to induce LLMs to generate
unexpected outputs with several carefully designed prompt attack approaches and
widely concerned attacking contents. Different from previous datasets involving
safety estimation, We construct the prompts considering three dimensions:
contents, attacking methods and goals, thus the responses can be easily
evaluated and analysed. We run several well-known Chinese LLMs on our dataset,
and the results show that our prompts are significantly harmful to LLMs, with
around 70% attack success rate. We will release CPAD to encourage further
studies on prompt attack and defense.
[LINK]
http://arxiv.org/abs/2309.11830v1
[DATE]
2023-09-21 15:07:49+08:00
[CATEGORIES]
cs.CL
Word Embedding with Neural Probabilistic Prior
[AUTHORS]
Shaogang Ren, Dingcheng Li, Ping Li
[ABSTRACT]
To improve word representation learning, we propose a probabilistic prior
which can be seamlessly integrated with word embedding models. Different from
previous methods, word embedding is taken as a probabilistic generative model,
and it enables us to impose a prior regularizing word representation learning.
The proposed prior not only enhances the representation of embedding vectors
but also improves the model’s robustness and stability. The structure of the
proposed prior is simple and effective, and it can be easily implemented and
flexibly plugged in most existing word embedding models. Extensive experiments
show the proposed method improves word representation on various tasks.
[LINK]
http://arxiv.org/abs/2309.11824v1
[DATE]
2023-09-21 14:54:32+08:00
[CATEGORIES]
cs.CL
SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features
[AUTHORS]
Zhaoyi Wang, Zhenyang Zhang, Jiaxin Qin, Mizuho Iwaihara
[ABSTRACT]
Wikipedia articles are hierarchically organized through categories and lists,
providing one of the most comprehensive and universal taxonomy, but its open
creation is causing redundancies and inconsistencies. Assigning DBPedia classes
to Wikipedia categories and lists can alleviate the problem, realizing a large
knowledge graph which is essential for categorizing digital contents through
entity linking and typing. However, the existing approach of CaLiGraph is
producing incomplete and non-fine grained mappings. In this paper, we tackle
the problem as ontology alignment, where structural information of knowledge
graphs and lexical and semantic features of ontology class names are utilized
to discover confident mappings, which are in turn utilized for finetuing
pretrained language models in a distant supervision fashion. Our method SLHCat
consists of two main parts: 1) Automatically generating training data by
leveraging knowledge graph structure, semantic similarities, and named entity
typing. 2) Finetuning and prompt-tuning of the pre-trained language model BERT
are carried out over the training data, to capture semantic and syntactic
properties of class names. Our model SLHCat is evaluated over a benchmark
dataset constructed by annotating 3000 fine-grained CaLiGraph-DBpedia mapping
pairs. SLHCat is outperforming the baseline model by a large margin of 25% in
accuracy, offering a practical solution for large-scale ontology mapping.
[LINK]
http://arxiv.org/abs/2309.11791v1
[DATE]
2023-09-21 13:38:14+08:00
[CATEGORIES]
cs.CL
CPPF: A contextual and post-processing-free model for automatic speech recognition
[AUTHORS]
Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan
[ABSTRACT]
ASR systems have become increasingly widespread in recent years. However,
their textual outputs often require post-processing tasks before they can be
practically utilized. To address this issue, we draw inspiration from the
multifaceted capabilities of LLMs and Whisper, and focus on integrating
multiple ASR text processing tasks related to speech recognition into the ASR
model. This integration not only shortens the multi-stage pipeline, but also
prevents the propagation of cascading errors, resulting in direct generation of
post-processed text. In this study, we focus on ASR-related processing tasks,
including Contextual ASR and multiple ASR post processing tasks. To achieve
this objective, we introduce the CPPF model, which offers a versatile and
highly effective alternative to ASR processing. CPPF seamlessly integrates
these tasks without any significant loss in recognition performance.
[COMMENTS]
Submitted to ICASSP2024
[LINK]
http://arxiv.org/abs/2309.07413v2
[DATE]
2023-09-21 11:02:27+08:00
[CATEGORIES]
cs.CL
You Only Look at Screens: Multimodal Chain-of-Action Agents
[AUTHORS]
Zhuosheng Zhang, Aston Zhang
[ABSTRACT]
Autonomous user interface (UI) agents aim to facilitate task automation by
interacting with the user interface without manual intervention. Recent studies
have investigated eliciting the capabilities of large language models (LLMs)
for effective engagement in diverse environments. To align with the
input-output requirement of LLMs, existing approaches are developed under a
sandbox setting where they rely on external tools and application-specific APIs
to parse the environment into textual elements and interpret the predicted
actions. Consequently, those approaches often grapple with inference
inefficiency and error propagation risks. To mitigate the challenges, we
introduce Auto-UI, a multimodal solution that directly interacts with the
interface, bypassing the need for environment parsing or reliance on
application-dependent APIs. Moreover, we propose a chain-of-action technique –
leveraging a series of intermediate previous action histories and future action
plans – to help the agent decide what action to execute. We evaluate our
approach on a new device-control benchmark AITW with 30K unique instructions,
spanning multi-step tasks such as application operation, web searching, and web
shopping. Experimental results show that Auto-UI achieves state-of-the-art
performance with an action type prediction accuracy of 90% and an overall
action success rate of 74%. Code is publicly available at
https://github.com/cooelf/Auto-UI.
[COMMENTS]
21 pages, 10 figures
[LINK]
http://arxiv.org/abs/2309.11436v2
[DATE]
2023-09-21 11:00:07+08:00
[CATEGORIES]
cs.CL
What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples
[AUTHORS]
Shakila Mahjabin Tonni, Mark Dras
[ABSTRACT]
Adversarial examples, deliberately crafted using small perturbations to fool
deep neural networks, were first studied in image processing and more recently
in NLP. While approaches to detecting adversarial examples in NLP have largely
relied on search over input perturbations, image processing has seen a range of
techniques that aim to characterise adversarial subspaces over the learned
representations.
In this paper, we adapt two such approaches to NLP, one based on nearest
neighbors and influence functions and one on Mahalanobis distances. The former
in particular produces a state-of-the-art detector when compared against
several strong baselines; moreover, the novel use of influence functions
provides insight into how the nature of adversarial example subspaces in NLP
relate to those in image processing, and also how they differ depending on the
kind of NLP task.
[COMMENTS]
20 pages, Accepted long-paper IJCNLP_AACL 2023
[LINK]
http://arxiv.org/abs/2309.10916v2
[DATE]
2023-09-21 10:12:47+08:00
[CATEGORIES]
cs.LG
cs.CL
ContextRef: Evaluating Referenceless Metrics For Image Description Generation
[AUTHORS]
Elisa Kreiss, Eric Zelikman, Christopher Potts, Nick Haber
[LINK]
http://arxiv.org/abs/2309.11710v1
[DATE]
2023-09-21 09:17:33+08:00
[CATEGORIES]
cs.CL
Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages
[AUTHORS]
Kyle Mahowald, Evgeniia Diachek, Edward Gibson, Evelina Fedorenko, Richard Futrell
[ABSTRACT]
Grammatical cues are sometimes redundant with word meanings in natural
language. For instance, English word order rules constrain the word order of a
sentence like “The dog chewed the bone” even though the status of “dog” as
subject and “bone” as object can be inferred from world knowledge and
plausibility. Quantifying how often this redundancy occurs, and how the level
of redundancy varies across typologically diverse languages, can shed light on
the function and evolution of grammar. To that end, we performed a behavioral
experiment in English and Russian and a cross-linguistic computational analysis
measuring the redundancy of grammatical cues in transitive clauses extracted
from corpus text. English and Russian speakers (n=484) were presented with
subjects, verbs, and objects (in random order and with morphological markings
removed) extracted from naturally occurring sentences and were asked to
identify which noun is the subject of the action. Accuracy was high in both
languages (~89% in English, ~87% in Russian). Next, we trained a neural network
machine classifier on a similar task: predicting which nominal in a
subject-verb-object triad is the subject. Across 30 languages from eight
language families, performance was consistently high: a median accuracy of 87%,
comparable to the accuracy observed in the human experiments. The conclusion is
that grammatical cues such as word order are necessary to convey subjecthood
and objecthood in a minority of naturally occurring transitive clauses;
nevertheless, they can (a) provide an important source of redundancy and (b)
are crucial for conveying intended meaning that cannot be inferred from the
words alone, including descriptions of human interactions, where roles are
often reversible (e.g., Ray helped Lu/Lu helped Ray), and expressing
non-prototypical meanings (e.g., “The bone chewed the dog.”).
[LINK]
http://arxiv.org/abs/2201.12911v3
[DATE]
2023-09-21 08:43:40+08:00
[CATEGORIES]
cs.CL
fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese
[AUTHORS]
Luiz Giordani, Gilsiley Darú, Rhenan Queiroz, Vitor Buzinaro, Davi Keglevich Neiva, Daniel Camilo Fuentes Guzmán, Marcos Jardel Henriques, Oilson Alberto Gonzatto Junior, Francisco Louzada
[ABSTRACT]
The proliferation of fake news has become a significant concern in recent
times due to its potential to spread misinformation and manipulate public
opinion. This paper presents a comprehensive study on detecting fake news in
Brazilian Portuguese, focusing on journalistic-type news. We propose a machine
learning-based approach that leverages natural language processing techniques,
including TF-IDF and Word2Vec, to extract features from textual data. We
evaluate the performance of various classification algorithms, such as logistic
regression, support vector machine, random forest, AdaBoost, and LightGBM, on a
dataset containing both true and fake news articles. The proposed approach
achieves high accuracy and F1-Score, demonstrating its effectiveness in
identifying fake news. Additionally, we developed a user-friendly web platform,
fakenewsbr.com, to facilitate the verification of news articles’ veracity. Our
platform provides real-time analysis, allowing users to assess the likelihood
of fake news articles. Through empirical analysis and comparative studies, we
demonstrate the potential of our approach to contribute to the fight against
the spread of fake news and promote more informed media consumption.
[LINK]
http://arxiv.org/abs/2309.11052v2
[DATE]
2023-09-21 08:35:12+08:00
[CATEGORIES]
cs.CL
cs.LG
Memory-Augmented LLM Personalization with Short- and Long-Term Memory Coordination
[AUTHORS]
Kai Zhang, Fubang Zhao, Yangyang Kang, Xiaozhong Liu
[ABSTRACT]
Large Language Models (LLMs), such as GPT3.5, have exhibited remarkable
proficiency in comprehending and generating natural language. However, their
unpersonalized generation paradigm may result in suboptimal user-specific
outcomes. Typically, users converse differently based on their knowledge and
preferences. This necessitates the task of enhancing user-oriented LLM which
remains unexplored. While one can fully train an LLM for this objective, the
resource consumption is unaffordable. Prior research has explored memory-based
methods to store and retrieve knowledge to enhance generation without
retraining for new queries. However, we contend that a mere memory module is
inadequate to comprehend a user’s preference, and fully training an LLM can be
excessively costly. In this study, we propose a novel computational bionic
memory mechanism, equipped with a parameter-efficient fine-tuning schema, to
personalize LLMs. Our extensive experimental results demonstrate the
effectiveness and superiority of the proposed approach. To encourage further
research into this area, we are releasing a new conversation dataset generated
entirely by LLM based on an open-source medical corpus, as well as our
implementation code.
[LINK]
http://arxiv.org/abs/2309.11696v1
[DATE]
2023-09-21 08:34:33+08:00
[CATEGORIES]
cs.CL
Semi-supervised News Discourse Profiling with Contrastive Learning
[AUTHORS]
Ming Li, Ruihong Huang
[ABSTRACT]
News Discourse Profiling seeks to scrutinize the event-related role of each
sentence in a news article and has been proven useful across various downstream
applications. Specifically, within the context of a given news discourse, each
sentence is assigned to a pre-defined category contingent upon its depiction of
the news event structure. However, existing approaches suffer from an
inadequacy of available human-annotated data, due to the laborious and
time-intensive nature of generating discourse-level annotations. In this paper,
we present a novel approach, denoted as Intra-document Contrastive Learning
with Distillation (ICLD), for addressing the news discourse profiling task,
capitalizing on its unique structural characteristics. Notably, we are the
first to apply a semi-supervised methodology within this task paradigm, and
evaluation demonstrates the effectiveness of the presented approach.
[COMMENTS]
IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.11692v1
[DATE]
2023-09-21 07:51:34+08:00
[CATEGORIES]
cs.CL
LLM Guided Inductive Inference for Solving Compositional Problems
[AUTHORS]
Abhigya Sodani, Lauren Moos, Matthew Mirman
[ABSTRACT]
While large language models (LLMs) have demonstrated impressive performance
in question-answering tasks, their performance is limited when the questions
require knowledge that is not included in the model’s training data and can
only be acquired through direct observation or interaction with the real world.
Existing methods decompose reasoning tasks through the use of modules invoked
sequentially, limiting their ability to answer deep reasoning tasks. We
introduce a method, Recursion based extensible LLM (REBEL), which handles
open-world, deep reasoning tasks by employing automated reasoning techniques
like dynamic planning and forward-chaining strategies. REBEL allows LLMs to
reason via recursive problem decomposition and utilization of external tools.
The tools that REBEL uses are specified only by natural language description.
We further demonstrate REBEL capabilities on a set of problems that require a
deeply nested use of external tools in a compositional and conversational
setting.
[COMMENTS]
5 pages, ICML TEACH Workshop
[LINK]
http://arxiv.org/abs/2309.11688v1
[DATE]
2023-09-21 07:44:16+08:00
[CATEGORIES]
cs.CL
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
[AUTHORS]
Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla
[ABSTRACT]
Generative Large Language Models (LLMs) have achieved remarkable advancements
in various NLP tasks. However, these advances have not been reflected in the
translation task, especially those with moderate model sizes (i.e., 7B or 13B
parameters), which still lag behind conventional supervised encoder-decoder
translation models. Previous studies have attempted to improve the translation
capabilities of these moderate LLMs, but their gains have been limited. In this
study, we propose a novel fine-tuning approach for LLMs that is specifically
designed for the translation task, eliminating the need for the abundant
parallel data that traditional translation models usually depend on. Our
approach consists of two fine-tuning stages: initial fine-tuning on monolingual
data followed by subsequent fine-tuning on a small set of high-quality parallel
data. We introduce the LLM developed through this strategy as Advanced Language
Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our
results show that the model can achieve an average improvement of more than 12
BLEU and 12 COMET over its zero-shot performance across 10 translation
directions from the WMT’21 (2 directions) and WMT’22 (8 directions) test
datasets. The performance is significantly better than all prior work and even
superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or
13B parameters. This method establishes the foundation for a novel training
paradigm in machine translation.
[LINK]
http://arxiv.org/abs/2309.11674v1
[DATE]
2023-09-21 06:53:15+08:00
[CATEGORIES]
cs.CL
Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation
[AUTHORS]
Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly
[ABSTRACT]
Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used
to train forward and reverse neural models that generate text from KG and vice
versa. However models trained on datasets where KG and text pairs are not
equivalent can suffer from more hallucination and poorer recall. In this paper,
we verify this empirically by generating datasets with different levels of
noise and find that noisier datasets do indeed lead to more hallucination. We
argue that the ability of forward and reverse models trained on a dataset to
cyclically regenerate source KG or text is a proxy for the equivalence between
the KG and the text in the dataset. Using cyclic evaluation we find that
manually created WebNLG is much better than automatically created TeKGen and
T-REx. Guided by these observations, we construct a new, improved dataset
called LAGRANGE using heuristics meant to improve equivalence between KG and
text and show the impact of each of the heuristics on cyclic evaluation. We
also construct two synthetic datasets using large language models (LLMs), and
observe that these are conducive to models that perform significantly well on
cyclic generation of text, but less so on cyclic generation of KGs, probably
because of a lack of a consistent underlying ontology.
[COMMENTS]
16 pages
[LINK]
http://arxiv.org/abs/2309.11669v1
[DATE]
2023-09-21 06:30:20+08:00
[CATEGORIES]
cs.CL
Towards Effective Disambiguation for Machine Translation with Large Language Models
[AUTHORS]
Vivek Iyer, Pinzhen Chen, Alexandra Birch
[ABSTRACT]
Resolving semantic ambiguity has long been recognised as a central challenge
in the field of machine translation. Recent work on benchmarking translation
performance on ambiguous sentences has exposed the limitations of conventional
Neural Machine Translation (NMT) systems, which fail to capture many of these
cases. Large language models (LLMs) have emerged as a promising alternative,
demonstrating comparable performance to traditional NMT models while
introducing new paradigms for controlling the target outputs. In this paper, we
study the capabilities of LLMs to translate ambiguous sentences containing
polysemous words and rare word senses. We also propose two ways to improve the
handling of such ambiguity through in-context learning and fine-tuning on
carefully curated ambiguous datasets. Experiments show that our methods can
match or outperform state-of-the-art systems such as DeepL and NLLB in four out
of five language directions. Our research provides valuable insights into
effectively adapting LLMs for disambiguation during machine translation.
[COMMENTS]
10 pages, 3 figures
[LINK]
http://arxiv.org/abs/2309.11668v1
[DATE]
2023-09-21 06:22:52+08:00
[CATEGORIES]
cs.CL
Faithful Chain-of-Thought Reasoning
[AUTHORS]
Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch
[ABSTRACT]
While Chain-of-Thought (CoT) prompting boosts Language Models’ (LM)
performance on a gamut of complex reasoning tasks, the generated reasoning
chain does not necessarily reflect how the model arrives at the answer (aka.
faithfulness). We propose Faithful CoT, a reasoning framework involving two
stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning
chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM
and a deterministic solver respectively. This guarantees that the reasoning
chain provides a faithful explanation of the final answer. Aside from
interpretability, Faithful CoT also improves empirical performance: it
outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a
relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning,
5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference.
Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot
performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong
synergy between faithfulness and accuracy.
[COMMENTS]
IJCNLP-AACL 2023 camera-ready version
[LINK]
http://arxiv.org/abs/2301.13379v3
[DATE]
2023-09-21 06:19:30+08:00
[CATEGORIES]
cs.CL
Hate speech detection in algerian dialect using deep learning
[AUTHORS]
Dihia Lanasri, Juan Olano, Sifal Klioui, Sin Liang Lee, Lamia Sekkai
[ABSTRACT]
With the proliferation of hate speech on social networks under different
formats, such as abusive language, cyberbullying, and violence, etc., people
have experienced a significant increase in violence, putting them in
uncomfortable situations and threats. Plenty of efforts have been dedicated in
the last few years to overcome this phenomenon to detect hate speech in
different structured languages like English, French, Arabic, and others.
However, a reduced number of works deal with Arabic dialects like Tunisian,
Egyptian, and Gulf, mainly the Algerian ones. To fill in the gap, we propose in
this work a complete approach for detecting hate speech on online Algerian
messages. Many deep learning architectures have been evaluated on the corpus we
created from some Algerian social networks (Facebook, YouTube, and Twitter).
This corpus contains more than 13.5K documents in Algerian dialect written in
Arabic, labeled as hateful or non-hateful. Promising results are obtained,
which show the efficiency of our approach.
[LINK]
http://arxiv.org/abs/2309.11611v1
[DATE]
2023-09-21 03:54:48+08:00
[CATEGORIES]
cs.CL
Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science
[AUTHORS]
Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song
[ABSTRACT]
Instruction-tuned Large Language Models (LLMs) have exhibited impressive
language understanding and the capacity to generate responses that follow
specific prompts. However, due to the computational demands associated with
training these models, their applications often adopt a zero-shot setting. In
this paper, we evaluate the zero-shot performance of two publicly accessible
LLMs, ChatGPT and OpenAssistant, in the context of six Computational Social
Science classification tasks, while also investigating the effects of various
prompting strategies. Our experiments investigate the impact of prompt
complexity, including the effect of incorporating label definitions into the
prompt; use of synonyms for label names; and the influence of integrating past
memories during foundation model training. The findings indicate that in a
zero-shot setting, current LLMs are unable to match the performance of smaller,
fine-tuned baseline transformer models (such as BERT-large). Additionally, we
find that different prompting strategies can significantly affect
classification accuracy, with variations in accuracy and F1 scores exceeding
10\%.
[LINK]
http://arxiv.org/abs/2305.14310v2
[DATE]
2023-09-21 03:53:48+08:00
[CATEGORIES]
cs.CL
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
[AUTHORS]
Belen Alastruey, Aleix Sant, Gerard I. Gállego, David Dale, Marta R. Costa-jussà
[ABSTRACT]
Speech-to-Speech and Speech-to-Text translation are currently dynamic areas
of research. To contribute to these fields, we present SpeechAlign, a framework
to evaluate the underexplored field of source-target alignment in speech
models. Our framework has two core components. First, to tackle the absence of
suitable evaluation datasets, we introduce the Speech Gold Alignment dataset,
built upon a English-German text translation gold alignment dataset. Secondly,
we introduce two novel metrics, Speech Alignment Error Rate (SAER) and
Time-weighted Speech Alignment Error Rate (TW-SAER), to evaluate alignment
quality in speech models. By publishing SpeechAlign we provide an accessible
evaluation framework for model assessment, and we employ it to benchmark
open-source Speech Translation models.
[LINK]
http://arxiv.org/abs/2309.11585v1
[DATE]
2023-09-21 02:46:37+08:00
[CATEGORIES]
cs.CL
SignBank+: Multilingual Sign Language Translation Dataset
[AUTHORS]
Amit Moryossef, Zifan Jiang
[ABSTRACT]
This work advances the field of sign language machine translation by focusing
on dataset quality and simplification of the translation system. We introduce
SignBank+, a clean version of the SignBank dataset, optimized for machine
translation. Contrary to previous works that employ complex factorization
techniques for translation, we advocate for a simplified text-to-text
translation approach. Our evaluation shows that models trained on SignBank+
surpass those on the original dataset, establishing a new benchmark and
providing an open resource for future research.
[LINK]
http://arxiv.org/abs/2309.11566v1
[DATE]
2023-09-21 02:08:28+08:00
[CATEGORIES]
cs.CL
DreamLLM: Synergistic Multimodal Comprehension and Creation
[AUTHORS]
Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, Hongyu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi
[ABSTRACT]
This paper presents DreamLLM, a learning framework that first achieves
versatile Multimodal Large Language Models (MLLMs) empowered with frequently
overlooked synergy between multimodal comprehension and creation. DreamLLM
operates on two fundamental principles. The first focuses on the generative
modeling of both language and image posteriors by direct sampling in the raw
multimodal space. This approach circumvents the limitations and information
loss inherent to external feature extractors like CLIP, and a more thorough
multimodal understanding is obtained. Second, DreamLLM fosters the generation
of raw, interleaved documents, modeling both text and image contents, along
with unstructured layouts. This allows DreamLLM to learn all conditional,
marginal, and joint multimodal distributions effectively. As a result, DreamLLM
is the first MLLM capable of generating free-form interleaved content.
Comprehensive experiments highlight DreamLLM’s superior performance as a
zero-shot multimodal generalist, reaping from the enhanced learning synergy.
[COMMENTS]
see project page at https://dreamllm.github.io/
[LINK]
http://arxiv.org/abs/2309.11499v1
[DATE]
2023-09-21 01:58:05+08:00
[CATEGORIES]
cs.CL
cs.LG
Chain-of-Verification Reduces Hallucination in Large Language Models
[AUTHORS]
Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston
[ABSTRACT]
Generation of plausible yet incorrect factual information, termed
hallucination, is an unsolved issue in large language models. We study the
ability of language models to deliberate on the responses they give in order to
correct their mistakes. We develop the Chain-of-Verification (CoVe) method
whereby the model first (i) drafts an initial response; then (ii) plans
verification questions to fact-check its draft; (iii) answers those questions
independently so the answers are not biased by other responses; and (iv)
generates its final verified response. In experiments, we show CoVe decreases
hallucinations across a variety of tasks, from list-based questions from
Wikidata, closed book MultiSpanQA and longform text generation.
[LINK]
http://arxiv.org/abs/2309.11495v1
[DATE]
2023-09-21 01:50:55+08:00
[CATEGORIES]
cs.CL
MasakhaNEWS: News Topic Classification for African languages
[AUTHORS]
David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, Doreen Nixdorf, Chris Chinenye Emezue, sana al-azzawi, Blessing Sibanda, Davis David, Lolwethu Ndolela, Jonathan Mukiibi, Tunde Ajayi, Tatiana Moteu, Brian Odhiambo, Abraham Owodunni, Nnaemeka Obiefuna, Muhidin Mohamed, Shamsuddeen Hassan Muhammad, Teshome Mulugeta Ababu, Saheed Abdullahi Salahudeen, Mesay Gemeda Yigezu, Tajuddeen Gwadabe, Idris Abdulmumin, Mahlet Taye, Oluwabusayo Awoyomi, Iyanuoluwa Shode, Tolulope Adelani, Habiba Abdulganiyu, Abdul-Hakeem Omotayo, Adetola Adeeko, Abeeb Afolabi, Anuoluwapo Aremu, Olanrewaju Samuel, Clemencia Siro, Wangari Kimotho, Onyekachi Ogbu, Chinedu Mbonu, Chiamaka Chukwuneke, Samuel Fanijo, Jessica Ojo, Oyinkansola Awosan, Tadesse Kebede, Toadoum Sari Sakayo, Pamela Nyatsine, Freedmore Sidume, Oreen Yousuf, Mardiyyah Oduwole, Tshinu Tshinu, Ussen Kimanuka, Thina Diko, Siyanda Nxakama, Sinodos Nigusse, Abdulmejid Johar, Shafie Mohamed, Fuad Mire Hassan, Moges Ahmed Mehamed, Evrard Ngabire, Jules Jules, Ivan Ssenkungu, Pontus Stenetorp
[ABSTRACT]
African languages are severely under-represented in NLP research due to lack
of datasets covering several NLP tasks. While there are individual language
specific datasets that are being expanded to different tasks, only a handful of
NLP tasks (e.g. named entity recognition and machine translation) have
standardized benchmark datasets covering several geographical and
typologically-diverse African languages. In this paper, we develop MasakhaNEWS
– a new benchmark dataset for news topic classification covering 16 languages
widely spoken in Africa. We provide an evaluation of baseline models by
training classical machine learning models and fine-tuning several language
models. Furthermore, we explore several alternatives to full fine-tuning of
language models that are better suited for zero-shot and few-shot learning such
as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern
exploiting training (PET), prompting language models (like ChatGPT), and
prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API).
Our evaluation in zero-shot setting shows the potential of prompting ChatGPT
for news topic classification in low-resource African languages, achieving an
average performance of 70 F1 points without leveraging additional supervision
like MAD-X. In few-shot setting, we show that with as little as 10 examples per
label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of
full supervised training (92.6 F1 points) leveraging the PET approach.
[COMMENTS]
Accepted to IJCNLP-AACL 2023 (main conference)
[LINK]
http://arxiv.org/abs/2304.09972v2
[DATE]
2023-09-21 01:14:40+08:00
[CATEGORIES]
cs.CL
Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction
[AUTHORS]
Masahiro Kaneko, Naoaki Okazaki
[ABSTRACT]
In Grammatical Error Correction (GEC), it is crucial to ensure the user’s
comprehension of a reason for correction. Existing studies present tokens,
examples, and hints as to the basis for correction but do not directly explain
the reasons for corrections. Although methods that use Large Language Models
(LLMs) to provide direct explanations in natural language have been proposed
for various tasks, no such method exists for GEC. Generating explanations for
GEC corrections involves aligning input and output tokens, identifying
correction points, and presenting corresponding explanations consistently.
However, it is not straightforward to specify a complex format to generate
explanations, because explicit control of generation is difficult with prompts.
This study introduces a method called controlled generation with Prompt
Insertion (PI) so that LLMs can explain the reasons for corrections in natural
language. In PI, LLMs first correct the input text, and then we automatically
extract the correction points based on the rules. The extracted correction
points are sequentially inserted into the LLM’s explanation output as prompts,
guiding the LLMs to generate explanations for the correction points. We also
create an Explainable GEC (XGEC) dataset of correction reasons by annotating
NUCLE, CoNLL2013, and CoNLL2014. Although generations from GPT-3 and ChatGPT
using original prompts miss some correction points, the generation control
using PI can explicitly guide to describe explanations for all correction
points, contributing to improved performance in generating correction reasons.
[COMMENTS]
Work in progress
[LINK]
http://arxiv.org/abs/2309.11439v1
[DATE]
2023-09-21 00:14:10+08:00
[CATEGORIES]
cs.CL
Optimal Conditional Inference in Adaptive Experiments
[AUTHORS]
Jiafeng Chen, Isaiah Andrews
[ABSTRACT]
We study batched bandit experiments and consider the problem of inference
conditional on the realized stopping time, assignment probabilities, and target
parameter, where all of these may be chosen adaptively using information up to
the last batch of the experiment. Absent further restrictions on the
experiment, we show that inference using only the results of the last batch is
optimal. When the adaptive aspects of the experiment are known to be
location-invariant, in the sense that they are unchanged when we shift all
batch-arm means by a constant, we show that there is additional information in
the data, captured by one additional linear function of the batch-arm means. In
the more restrictive case where the stopping time, assignment probabilities,
and target parameter are known to depend on the data only through a collection
of polyhedral events, we derive computationally tractable and optimal
conditional inference procedures.
[COMMENTS]
An extended abstract of this paper was presented at CODE@MIT 2021
[LINK]
http://arxiv.org/abs/2309.12162v1
[DATE]
2023-09-21 23:17:38+08:00
[CATEGORIES]
cs.LG
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features
[AUTHORS]
Travis Zhang, Katie Luo, Cheng Perng Phoo, Yurong You, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
[ABSTRACT]
The rapid development of 3D object detection systems for self-driving cars
has significantly improved accuracy. However, these systems struggle to
generalize across diverse driving environments, which can lead to
safety-critical failures in detecting traffic participants. To address this, we
propose a method that utilizes unlabeled repeated traversals of multiple
locations to adapt object detectors to new driving environments. By
incorporating statistics computed from repeated LiDAR scans, we guide the
adaptation process effectively. Our approach enhances LiDAR-based detection
models using spatial quantized historical features and introduces a lightweight
regression head to leverage the statistics for feature regularization.
Additionally, we leverage the statistics for a novel self-training process to
stabilize the training. The framework is detector model-agnostic and
experiments on real-world datasets demonstrate significant improvements,
achieving up to a 20-point performance gain, especially in detecting
pedestrians and distant objects. Code is available at
https://github.com/zhangtravis/Hist-DA.
[LINK]
http://arxiv.org/abs/2309.12140v1
[DATE]
2023-09-21 23:00:31+08:00
[CATEGORIES]
cs.LG
ALI-DPFL: Differentially Private Federated Learning with Adaptive Local Iterations
[AUTHORS]
Xinpeng Ling, Jie Fu, Kuncan Wang, Haitao Liu, Zhili Chen
[ABSTRACT]
Federated Learning (FL) is a distributed machine learning technique that
allows model training among multiple devices or organizations by sharing
training parameters instead of raw data. However, adversaries can still infer
individual information through inference attacks (e.g. differential attacks) on
these training parameters. As a result, Differential Privacy (DP) has been
widely used in FL to prevent such attacks. We consider differentially private
federated learning in a resource-constrained scenario, where both privacy
budget and communication round are constrained. By theoretically analyzing the
convergence, we can find the optimal number of differentially private local
iterations for clients between any two sequential global updates. Based on
this, we design an algorithm of differentially private federated learning with
adaptive local iterations (ALI-DPFL). We experiment our algorithm on the
FashionMNIST and CIFAR10 datasets, and demonstrate significantly better
performances than previous work in the resource-constraint scenario.
[LINK]
http://arxiv.org/abs/2308.10457v2
[DATE]
2023-09-21 22:59:28+08:00
[CATEGORIES]
cs.LG
Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation
[AUTHORS]
Victor Gallego
[ABSTRACT]
Recently, large multimodal models, such as CLIP and Stable Diffusion have
experimented tremendous successes in both foundations and applications.
However, as these models increase in parameter size and computational
requirements, it becomes more challenging for users to personalize them for
specific tasks or preferences. In this work, we address the problem of adapting
the previous models towards sets of particular human preferences, aligning the
retrieved or generated images with the preferences of the user. We leverage the
Bradley-Terry preference model to develop a fast adaptation method that
efficiently fine-tunes the original model, with few examples and with minimal
computing resources. Extensive evidence of the capabilities of this framework
is provided through experiments in different domains related to multimodal text
and image understanding, including preference prediction as a reward model, and
generation tasks.
[COMMENTS]
Accepted to Proceedings of the 23rd European Young Statisticians
Meeting (EYSM)
[LINK]
http://arxiv.org/abs/2308.07929v2
[DATE]
2023-09-21 22:53:31+08:00
[CATEGORIES]
cs.LG
Compositional Foundation Models for Hierarchical Planning
[AUTHORS]
Anurag Ajay, Seungwook Han, Yilun Du, Shuang Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal
[ABSTRACT]
To make effective decisions in novel environments with long-horizon goals, it
is crucial to engage in hierarchical reasoning across spatial and temporal
scales. This entails planning abstract subgoal sequences, visually reasoning
about the underlying plans, and executing actions in accordance with the
devised plan through visual-motor control. We propose Compositional Foundation
Models for Hierarchical Planning (HiP), a foundation model which leverages
multiple expert foundation model trained on language, vision and action data
individually jointly together to solve long-horizon tasks. We use a large
language model to construct symbolic plans that are grounded in the environment
through a large video diffusion model. Generated video plans are then grounded
to visual-motor control, through an inverse dynamics model that infers actions
from generated videos. To enable effective reasoning within this hierarchy, we
enforce consistency between the models via iterative refinement. We illustrate
the efficacy and adaptability of our approach in three different long-horizon
table-top manipulation tasks.
[COMMENTS]
Website: https://hierarchical-planning-foundation-model.github.io/
[LINK]
http://arxiv.org/abs/2309.08587v2
[DATE]
2023-09-21 22:49:20+08:00
[CATEGORIES]
cs.LG
Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems
[AUTHORS]
Nathan Buskulic, Jalal Fadili, Yvain Quéau
[ABSTRACT]
Neural networks have become a prominent approach to solve inverse problems in
recent years. While a plethora of such methods was developed to solve inverse
problems empirically, we are still lacking clear theoretical guarantees for
these methods. On the other hand, many works proved convergence to optimal
solutions of neural networks in a more general setting using
overparametrization as a way to control the Neural Tangent Kernel. In this work
we investigate how to bridge these two worlds and we provide deterministic
convergence and recovery guarantees for the class of unsupervised feedforward
multilayer neural networks trained to solve inverse problems. We also derive
overparametrization bounds under which a two-layers Deep Inverse Prior network
with smooth activation function will benefit from our guarantees.
[LINK]
http://arxiv.org/abs/2309.12128v1
[DATE]
2023-09-21 22:48:02+08:00
[CATEGORIES]
cs.LG
Learning End-to-End Channel Coding with Diffusion Models
[AUTHORS]
Muah Kim, Rick Fritschek, Rafael F. Schaefer
[ABSTRACT]
The training of neural encoders via deep learning necessitates a
differentiable channel model due to the backpropagation algorithm. This
requirement can be sidestepped by approximating either the channel distribution
or its gradient through pilot signals in real-world scenarios. The initial
approach draws upon the latest advancements in image generation, utilizing
generative adversarial networks (GANs) or their enhanced variants to generate
channel distributions. In this paper, we address this channel approximation
challenge with diffusion models, which have demonstrated high sample quality in
image generation. We offer an end-to-end channel coding framework underpinned
by diffusion models and propose an efficient training algorithm. Our
simulations with various channel models establish that our diffusion models
learn the channel distribution accurately, thereby achieving near-optimal
end-to-end symbol error rates (SERs). We also note a significant advantage of
diffusion models: A robust generalization capability in high signal-to-noise
ratio regions, in contrast to GAN variants that suffer from error floor.
Furthermore, we examine the trade-off between sample quality and sampling
speed, when an accelerated sampling algorithm is deployed, and investigate the
effect of the noise scheduling on this trade-off. With an apt choice of noise
scheduling, sampling time can be significantly reduced with a minor increase in
SER.
[LINK]
http://arxiv.org/abs/2309.10505v2
[DATE]
2023-09-21 22:45:03+08:00
[CATEGORIES]
cs.LG
Bayesian Flow Networks
[AUTHORS]
Alex Graves, Rupesh Kumar Srivastava, Timothy Atkinson, Faustino Gomez
[ABSTRACT]
This paper introduces Bayesian Flow Networks (BFNs), a new class of
generative model in which the parameters of a set of independent distributions
are modified with Bayesian inference in the light of noisy data samples, then
passed as input to a neural network that outputs a second, interdependent
distribution. Starting from a simple prior and iteratively updating the two
distributions yields a generative procedure similar to the reverse process of
diffusion models; however it is conceptually simpler in that no forward process
is required. Discrete and continuous-time loss functions are derived for
continuous, discretised and discrete data, along with sample generation
procedures. Notably, the network inputs for discrete data lie on the
probability simplex, and are therefore natively differentiable, paving the way
for gradient-based sample guidance and few-step generation in discrete domains
such as language modelling. The loss function directly optimises data
compression and places no restrictions on the network architecture. In our
experiments BFNs achieve competitive log-likelihoods for image modelling on
dynamically binarized MNIST and CIFAR-10, and outperform all known discrete
diffusion models on the text8 character-level language modelling task.
[LINK]
http://arxiv.org/abs/2308.07037v2
[DATE]
2023-09-21 22:38:24+08:00
[CATEGORIES]
cs.LG
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards
[AUTHORS]
Miguel Calvo-Fullana, Santiago Paternain, Luiz F. O. Chamon, Alejandro Ribeiro
[ABSTRACT]
A common formulation of constrained reinforcement learning involves multiple
rewards that must individually accumulate to given thresholds. In this class of
problems, we show a simple example in which the desired optimal policy cannot
be induced by any weighted linear combination of rewards. Hence, there exist
constrained reinforcement learning problems for which neither regularized nor
classical primal-dual methods yield optimal policies. This work addresses this
shortcoming by augmenting the state with Lagrange multipliers and
reinterpreting primal-dual methods as the portion of the dynamics that drives
the multipliers evolution. This approach provides a systematic state
augmentation procedure that is guaranteed to solve reinforcement learning
problems with constraints. Thus, as we illustrate by an example, while previous
methods can fail at finding optimal policies, running the dual dynamics while
executing the augmented policy yields an algorithm that provably samples
actions from the optimal policy.
[LINK]
http://arxiv.org/abs/2102.11941v2
[DATE]
2023-09-21 22:36:25+08:00
[CATEGORIES]
cs.LG
Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval
[AUTHORS]
Luis Carvalho, Gerhard Widmer
[ABSTRACT]
Many applications of cross-modal music retrieval are related to connecting
sheet music images to audio recordings. A typical and recent approach to this
is to learn, via deep neural networks, a joint embedding space that correlates
short fixed-size snippets of audio and sheet music by means of an appropriate
similarity structure. However, two challenges that arise out of this strategy
are the requirement of strongly aligned data to train the networks, and the
inherent discrepancies of musical content between audio and sheet music
snippets caused by local and global tempo differences. In this paper, we
address these two shortcomings by designing a cross-modal recurrent network
that learns joint embeddings that can summarize longer passages of
corresponding audio and sheet music. The benefits of our method are that it
only requires weakly aligned audio-sheet music pairs, as well as that the
recurrent network handles the non-linearities caused by tempo variations
between audio and sheet music. We conduct a number of experiments on synthetic
and real piano data and scores, showing that our proposed recurrent method
leads to more accurate retrieval in all possible configurations.
[COMMENTS]
In Proceedings of the 24th Conference of the International Society
for Music Information Retrieval (ISMIR 2023), Milan, Italy
[LINK]
http://arxiv.org/abs/2309.12111v1
[DATE]
2023-09-21 22:30:02+08:00
[CATEGORIES]
cs.LG
Traffic Forecasting on New Roads Using Spatial Contrastive Pre-Training (SCPT)
[AUTHORS]
Arian Prabowo, Hao Xue, Wei Shao, Piotr Koniusz, Flora D. Salim
[ABSTRACT]
New roads are being constructed all the time. However, the capabilities of
previous deep forecasting models to generalize to new roads not seen in the
training data (unseen roads) are rarely explored. In this paper, we introduce a
novel setup called a spatio-temporal (ST) split to evaluate the models’
capabilities to generalize to unseen roads. In this setup, the models are
trained on data from a sample of roads, but tested on roads not seen in the
training data. Moreover, we also present a novel framework called Spatial
Contrastive Pre-Training (SCPT) where we introduce a spatial encoder module to
extract latent features from unseen roads during inference time. This spatial
encoder is pre-trained using contrastive learning. During inference, the
spatial encoder only requires two days of traffic data on the new roads and
does not require any re-training. We also show that the output from the spatial
encoder can be used effectively to infer latent node embeddings on unseen roads
during inference time. The SCPT framework also incorporates a new layer, named
the spatially gated addition (SGA) layer, to effectively combine the latent
features from the output of the spatial encoder to existing backbones.
Additionally, since there is limited data on the unseen roads, we argue that it
is better to decouple traffic signals to trivial-to-capture periodic signals
and difficult-to-capture Markovian signals, and for the spatial encoder to only
learn the Markovian signals. Finally, we empirically evaluated SCPT using the
ST split setup on four real-world datasets. The results showed that adding SCPT
to a backbone consistently improves forecasting performance on unseen roads.
More importantly, the improvements are greater when forecasting further into
the future. The codes are available on GitHub:
https://github.com/cruiseresearchgroup/forecasting-on-new-roads .
[COMMENTS]
25 pages including reference, an additional 3 pages of appendix, 8
figures. ECML PKDD 2023 Journal track special issue: Data Mining and
Knowledge Discovery (DAMI)
[LINK]
http://arxiv.org/abs/2305.05237v4
[DATE]
2023-09-21 22:16:23+08:00
[CATEGORIES]
cs.LG
Neural-BO: A Black-box Optimization Algorithm using Deep Neural Networks
[AUTHORS]
Dat Phan-Trong, Hung Tran-The, Sunil Gupta
[ABSTRACT]
Bayesian Optimization (BO) is an effective approach for global optimization
of black-box functions when function evaluations are expensive. Most prior
works use Gaussian processes to model the black-box function, however, the use
of kernels in Gaussian processes leads to two problems: first, the kernel-based
methods scale poorly with the number of data points and second, kernel methods
are usually not effective on complex structured high dimensional data due to
curse of dimensionality. Therefore, we propose a novel black-box optimization
algorithm where the black-box function is modeled using a neural network. Our
algorithm does not need a Bayesian neural network to estimate predictive
uncertainty and is therefore computationally favorable. We analyze the
theoretical behavior of our algorithm in terms of regret bound using advances
in NTK theory showing its efficient convergence. We perform experiments with
both synthetic and real-world optimization tasks and show that our algorithm is
more sample efficient compared to existing methods.
[LINK]
http://arxiv.org/abs/2303.01682v2
[DATE]
2023-09-21 22:12:05+08:00
[CATEGORIES]
cs.LG
Bayesian sparsification for deep neural networks with Bayesian model reduction
[AUTHORS]
Dimitrije Marković, Karl J. Friston, Stefan J. Kiebel
[ABSTRACT]
Deep learning’s immense capabilities are often constrained by the complexity
of its models, leading to an increasing demand for effective sparsification
techniques. Bayesian sparsification for deep learning emerges as a crucial
approach, facilitating the design of models that are both computationally
efficient and competitive in terms of performance across various deep learning
applications. The state-of-the-art – in Bayesian sparsification of deep neural
networks – combines structural shrinkage priors on model weights with an
approximate inference scheme based on black-box stochastic variational
inference. However, model inversion of the full generative model is
exceptionally computationally demanding, especially when compared to standard
deep learning of point estimates. In this context, we advocate for the use of
Bayesian model reduction (BMR) as a more efficient alternative for pruning of
model weights. As a generalization of the Savage-Dickey ratio, BMR allows a
post-hoc elimination of redundant model weights based on the posterior
estimates under a straightforward (non-hierarchical) generative model. Our
comparative study highlights the computational efficiency and the pruning rate
of the BMR method relative to the established stochastic variational inference
(SVI) scheme, when applied to the full hierarchical generative model. We
illustrate the potential of BMR to prune model parameters across various deep
learning architectures, from classical networks like LeNet to modern frameworks
such as Vision Transformers and MLP-Mixers.
[LINK]
http://arxiv.org/abs/2309.12095v1
[DATE]
2023-09-21 22:10:47+08:00
[CATEGORIES]
cs.LG
Clustering-based Domain-Incremental Learning
[AUTHORS]
Christiaan Lamers, Rene Vidal, Nabil Belbachir, Niki van Stein, Thomas Baeck, Paris Giampouras
[ABSTRACT]
We consider the problem of learning multiple tasks in a continual learning
setting in which data from different tasks is presented to the learner in a
streaming fashion. A key challenge in this setting is the so-called
“catastrophic forgetting problem”, in which the performance of the learner in
an “old task” decreases when subsequently trained on a “new task”. Existing
continual learning methods, such as Averaged Gradient Episodic Memory (A-GEM)
and Orthogonal Gradient Descent (OGD), address catastrophic forgetting by
minimizing the loss for the current task without increasing the loss for
previous tasks. However, these methods assume the learner knows when the task
changes, which is unrealistic in practice. In this paper, we alleviate the need
to provide the algorithm with information about task changes by using an online
clustering-based approach on a dynamically updated finite pool of samples or
gradients. We thereby successfully counteract catastrophic forgetting in one of
the hardest settings, namely: domain-incremental learning, a setting for which
the problem was previously unsolved. We showcase the benefits of our approach
by applying these ideas to projection-based methods, such as A-GEM and OGD,
which lead to task-agnostic versions of them. Experiments on real datasets
demonstrate the effectiveness of the proposed strategy and its promising
performance compared to state-of-the-art methods.
[LINK]
http://arxiv.org/abs/2309.12078v1
[DATE]
2023-09-21 21:49:05+08:00
[CATEGORIES]
cs.LG
CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
[AUTHORS]
Chaejeong Lee, Jayoung Kim, Noseong Park
[ABSTRACT]
With growing attention to tabular data these days, the attempt to apply a
synthetic table to various tasks has been expanded toward various scenarios.
Owing to the recent advances in generative modeling, fake data generated by
tabular data synthesis models become sophisticated and realistic. However,
there still exists a difficulty in modeling discrete variables (columns) of
tabular data. In this work, we propose to process continuous and discrete
variables separately (but being conditioned on each other) by two diffusion
models. The two diffusion models are co-evolved during training by reading
conditions from each other. In order to further bind the diffusion models,
moreover, we introduce a contrastive learning method with a negative sampling
method. In our experiments with 11 real-world tabular datasets and 8 baseline
methods, we prove the efficacy of the proposed method, called CoDi.
[COMMENTS]
Accepted by ICML 2023
[LINK]
http://arxiv.org/abs/2304.12654v2
[DATE]
2023-09-21 21:40:53+08:00
[CATEGORIES]
cs.LG
Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer – Current Trends and Research Perspectives
[AUTHORS]
Karolina Seweryn, Anna Wróblewska, Szymon Łukasik
[ABSTRACT]
Action scene understanding in soccer is a challenging task due to the complex
and dynamic nature of the game, as well as the interactions between players.
This article provides a comprehensive overview of this task divided into action
recognition, spotting, and spatio-temporal action localization, with a
particular emphasis on the modalities used and multimodal methods. We explore
the publicly available data sources and metrics used to evaluate models’
performance. The article reviews recent state-of-the-art methods that leverage
deep learning techniques and traditional methods. We focus on multimodal
methods, which integrate information from multiple sources, such as video and
audio data, and also those that represent one source in various ways. The
advantages and limitations of methods are discussed, along with their potential
for improving the accuracy and robustness of models. Finally, the article
highlights some of the open research questions and future directions in the
field of soccer action recognition, including the potential for multimodal
methods to advance this field. Overall, this survey provides a valuable
resource for researchers interested in the field of action scene understanding
in soccer.
[LINK]
http://arxiv.org/abs/2309.12067v1
[DATE]
2023-09-21 21:36:57+08:00
[CATEGORIES]
cs.LG
On the different regimes of Stochastic Gradient Descent
[AUTHORS]
Antonio Sclocchi, Matthieu Wyart
[ABSTRACT]
Modern deep networks are trained with stochastic gradient descent (SGD) whose
key parameters are the number of data considered at each step or batch size
$B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$,
SGD corresponds to a stochastic evolution of the parameters, whose noise
amplitude is governed by the `temperature’ $T\equiv \eta/B$. Yet this
description is observed to break down for sufficiently large batches $B\geq
B^$, or simplifies to gradient descent (GD) when the temperature is
sufficiently small. Understanding where these cross-overs take place remains a
central challenge. Here we resolve these questions for a teacher-student
perceptron classification model, and show empirically that our key predictions
still apply to deep networks. Specifically, we obtain a phase diagram in the
$B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a
noise-dominated SGD governed by temperature, $\textit{(ii)}$ a
large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases
also corresponds to different regimes of generalization error. Remarkably, our
analysis reveals that the batch size $B^$ separating regimes $\textit{(i)}$
and $\textit{(ii)}$ scale with the size $P$ of the training set, with an
exponent that characterizes the hardness of the classification problem.
[COMMENTS]
8 pages, 4 figures; Appendix: 16 pages, 8 figures
[LINK]
http://arxiv.org/abs/2309.10688v2
[DATE]
2023-09-21 21:35:04+08:00
[CATEGORIES]
cs.LG
Decision-making and control with diffractive optical networks
[AUTHORS]
Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu
[ABSTRACT]
The ultimate goal of artificial intelligence is to mimic the human brain to
perform decision-making and control directly from high-dimensional sensory
input. Diffractive optical networks provide a promising solution for
implementing artificial intelligence with high-speed and low-power consumption.
Most of the reported diffractive optical networks focus on single or multiple
tasks that do not involve environmental interaction, such as object recognition
and image classification. In contrast, the networks capable of performing
decision-making and control have not yet been developed to our knowledge. Here,
we propose using deep reinforcement learning to implement diffractive optical
networks that imitate human-level decision-making and control capability. Such
networks taking advantage of a residual architecture, allow for finding optimal
control policies through interaction with the environment and can be readily
implemented with existing optical devices. The superior performance of these
networks is verified by engaging three types of classic games, Tic-Tac-Toe,
Super Mario Bros., and Car Racing. Finally, we present an experimental
demonstration of playing Tic-Tac-Toe by leveraging diffractive optical networks
based on a spatial light modulator. Our work represents a solid step forward in
advancing diffractive optical networks, which promises a fundamental shift from
the target-driven control of a pre-designed state for simple recognition or
classification tasks to the high-level sensory capability of artificial
intelligence. It may find exciting applications in autonomous driving,
intelligent robots, and intelligent manufacturing.
[LINK]
http://arxiv.org/abs/2212.11278v3
[DATE]
2023-09-21 21:34:54+08:00
[CATEGORIES]
cs.LG
Self-supervised learning unveils change in urban housing from street-level images
[AUTHORS]
Steven Stalder, Michele Volpi, Nicolas Büttner, Stephen Law, Kenneth Harttgen, Esra Suel
[ABSTRACT]
Cities around the world face a critical shortage of affordable and decent
housing. Despite its critical importance for policy, our ability to effectively
monitor and track progress in urban housing is limited. Deep learning-based
computer vision methods applied to street-level images have been successful in
the measurement of socioeconomic and environmental inequalities but did not
fully utilize temporal images to track urban change as time-varying labels are
often unavailable. We used self-supervised methods to measure change in London
using 15 million street images taken between 2008 and 2021. Our novel
adaptation of Barlow Twins, Street2Vec, embeds urban structure while being
invariant to seasonal and daily changes without manual annotations. It
outperformed generic embeddings, successfully identified point-level change in
London’s housing supply from street-level images, and distinguished between
major and minor change. This capability can provide timely information for
urban planning and policy decisions toward more liveable, equitable, and
sustainable cities.
[COMMENTS]
16 pages, 5 figures
[LINK]
http://arxiv.org/abs/2309.11354v2
[DATE]
2023-09-21 21:18:35+08:00
[CATEGORIES]
cs.LG
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
[AUTHORS]
Moritz Kirsche, Thorsten Peinemann, Joshua Stock, Carlos Cotrini, Esfandiar Mohammadi
[ABSTRACT]
Privacy-preserving learning of gradient boosting decision trees (GBDT) has
the potential for strong utility-privacy tradeoffs for tabular data, such as
census data or medical meta data: classical GBDT learners can extract
non-linear patterns from small sized datasets. The state-of-the-art notion for
provable privacy-properties is differential privacy, which requires that the
impact of single data points is limited and deniable. We introduce a novel
differentially private GBDT learner and utilize four main techniques to improve
the utility-privacy tradeoff. (1) We use an improved noise scaling approach
with tighter accounting of privacy leakage of a decision tree leaf compared to
prior work, resulting in noise that in expectation scales with $O(1/n)$, for
$n$ data points. (2) We integrate individual R'enyi filters to our method to
learn from data points that have been underutilized during an iterative
training process, which – potentially of independent interest – results in a
natural yet effective insight to learning streams of non-i.i.d. data. (3) We
incorporate the concept of random decision tree splits to concentrate privacy
budget on learning leaves. (4) We deploy subsampling for privacy amplification.
Our evaluation shows for the Abalone dataset ($<4k$ training data points) a
$R^2$-score of $0.39$ for $\varepsilon=0.15$, which the closest prior work only
achieved for $\varepsilon=10.0$. On the Adult dataset ($50k$ training data
points) we achieve test error of $18.7\,\%$ for $\varepsilon=0.07$ which the
closest prior work only achieved for $\varepsilon=1.0$. For the Abalone dataset
for $\varepsilon=0.54$ we achieve $R^2$-score of $0.47$ which is very close to
the $R^2$-score of $0.54$ for the nonprivate version of GBDT. For the Adult
dataset for $\varepsilon=0.54$ we achieve test error $17.1\,\%$ which is very
close to the test error $13.7\,\%$ of the nonprivate version of GBDT.
[COMMENTS]
The first two authors equally contributed to this work
[LINK]
http://arxiv.org/abs/2309.12041v1
[DATE]
2023-09-21 21:09:10+08:00
[CATEGORIES]
cs.LG
Uplift vs. predictive modeling: a theoretical analysis
[AUTHORS]
Théo Verhelst, Robin Petit, Wouter Verbeke, Gianluca Bontempi
[ABSTRACT]
Despite the growing popularity of machine-learning techniques in
decision-making, the added value of causal-oriented strategies with respect to
pure machine-learning approaches has rarely been quantified in the literature.
These strategies are crucial for practitioners in various domains, such as
marketing, telecommunications, health care and finance. This paper presents a
comprehensive treatment of the subject, starting from firm theoretical
foundations and highlighting the parameters that influence the performance of
the uplift and predictive approaches. The focus of the paper is on a binary
outcome case and a binary action, and the paper presents a theoretical analysis
of uplift modeling, comparing it with the classical predictive approach. The
main research contributions of the paper include a new formulation of the
measure of profit, a formal proof of the convergence of the uplift curve to the
measure of profit ,and an illustration, through simulations, of the conditions
under which predictive approaches still outperform uplift modeling. We show
that the mutual information between the features and the outcome plays a
significant role, along with the variance of the estimators, the distribution
of the potential outcomes and the underlying costs and benefits of the
treatment and the outcome.
[COMMENTS]
46 pages, 6 figures
[LINK]
http://arxiv.org/abs/2309.12036v1
[DATE]
2023-09-21 20:59:17+08:00
[CATEGORIES]
cs.LG
Face Identity-Aware Disentanglement in StyleGAN
[AUTHORS]
Adrian Suwała, Bartosz Wójcik, Magdalena Proszewska, Jacek Tabor, Przemysław Spurek, Marek Śmieja
[ABSTRACT]
Conditional GANs are frequently used for manipulating the attributes of face
images, such as expression, hairstyle, pose, or age. Even though the
state-of-the-art models successfully modify the requested attributes, they
simultaneously modify other important characteristics of the image, such as a
person’s identity. In this paper, we focus on solving this problem by
introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles
face attributes from a person’s identity. Our key idea is to perform training
on images retrieved from movie frames, where a given person appears in various
poses and with different attributes. By applying a type of contrastive loss, we
encourage the model to group images of the same person in similar regions of
latent space. Our experiments demonstrate that the modifications of face
attributes performed by PluGeN4Faces are significantly less invasive on the
remaining characteristics of the image than in the existing state-of-the-art
models.
[LINK]
http://arxiv.org/abs/2309.12033v1
[DATE]
2023-09-21 20:54:09+08:00
[CATEGORIES]
cs.LG
Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
[AUTHORS]
Tiago da Silva, Eliezer Silva, Adèle Ribeiro, António Góis, Dominik Heider, Samuel Kaski, Diego Mesquita
[ABSTRACT]
Structure learning is the crux of causal inference. Notably, causal discovery
(CD) algorithms are brittle when data is scarce, possibly inferring imprecise
causal relations that contradict expert knowledge – especially when
considering latent confounders. To aggravate the issue, most CD methods do not
provide uncertainty estimates, making it hard for users to interpret results
and improve the inference process. Surprisingly, while CD is a human-centered
affair, no works have focused on building methods that both 1) output
uncertainty estimates that can be verified by experts and 2) interact with
those experts to iteratively refine CD. To solve these issues, we start by
proposing to sample (causal) ancestral graphs proportionally to a belief
distribution based on a score function, such as the Bayesian information
criterion (BIC), using generative flow networks. Then, we leverage the
diversity in candidate graphs and introduce an optimal experimental design to
iteratively probe the expert about the relations among variables, effectively
reducing the uncertainty of our belief over ancestral graphs. Finally, we
update our samples to incorporate human feedback via importance sampling.
Importantly, our method does not require causal sufficiency (i.e., unobserved
confounders may exist). Experiments with synthetic observational data show that
our method can accurately sample from distributions over ancestral graphs and
that we can greatly improve inference quality with human aid.
[LINK]
http://arxiv.org/abs/2309.12032v1
[DATE]
2023-09-21 20:53:45+08:00
[CATEGORIES]
cs.LG
Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting
[AUTHORS]
Yusheng Zhao, Xiao Luo, Wei Ju, Chong Chen, Xian-Sheng Hua, Ming Zhang
[ABSTRACT]
This paper studies the problem of traffic flow forecasting, which aims to
predict future traffic conditions on the basis of road networks and traffic
conditions in the past. The problem is typically solved by modeling complex
spatio-temporal correlations in traffic data using spatio-temporal graph neural
networks (GNNs). However, the performance of these methods is still far from
satisfactory since GNNs usually have limited representation capacity when it
comes to complex traffic networks. Graphs, by nature, fall short in capturing
non-pairwise relations. Even worse, existing methods follow the paradigm of
message passing that aggregates neighborhood information linearly, which fails
to capture complicated spatio-temporal high-order interactions. To tackle these
issues, in this paper, we propose a novel model named Dynamic Hypergraph
Structure Learning (DyHSL) for traffic flow prediction. To learn non-pairwise
relationships, our DyHSL extracts hypergraph structural information to model
dynamics in the traffic networks, and updates each node representation by
aggregating messages from its associated hyperedges. Additionally, to capture
high-order spatio-temporal relations in the road network, we introduce an
interactive graph convolution block, which further models the neighborhood
interaction for each node. Finally, we integrate these two views into a
holistic multi-scale correlation extraction module, which conducts temporal
pooling with different scales to model different temporal patterns. Extensive
experiments on four popular traffic benchmark datasets demonstrate the
effectiveness of our proposed DyHSL compared with a broad range of competing
baselines.
[COMMENTS]
Accepted by 2023 IEEE 39th International Conference on Data
Engineering (ICDE 2023)
[LINK]
http://arxiv.org/abs/2309.12028v1
[DATE]
2023-09-21 20:44:55+08:00
[CATEGORIES]
cs.LG
Safe Hierarchical Reinforcement Learning for CubeSat Task Scheduling Based on Energy Consumption
[AUTHORS]
Mahya Ramezani, M. Amin Alandihallaj, Jose Luis Sanchez-Lopez, Andreas Hein
[ABSTRACT]
This paper presents a Hierarchical Reinforcement Learning methodology
tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO).
Incorporating a high-level policy for global task distribution and a low-level
policy for real-time adaptations as a safety mechanism, our approach integrates
the Similarity Attention-based Encoder (SABE) for task prioritization and an
MLP estimator for energy consumption forecasting. Integrating this mechanism
creates a safe and fault-tolerant system for CubeSat task scheduling.
Simulation results validate the Hierarchical Reinforcement Learning superior
convergence and task success rate, outperforming both the MADDPG model and
traditional random scheduling across multiple CubeSat configurations.
[LINK]
http://arxiv.org/abs/2309.12004v1
[DATE]
2023-09-21 20:22:11+08:00
[CATEGORIES]
cs.LG
Identification of pneumonia on chest x-ray images through machine learning
[AUTHORS]
Eduardo Augusto Roeder
[ABSTRACT]
Pneumonia is the leading infectious cause of infant death in the world. When
identified early, it is possible to alter the prognosis of the patient, one
could use imaging exams to help in the diagnostic confirmation. Performing and
interpreting the exams as soon as possible is vital for a good treatment, with
the most common exam for this pathology being chest X-ray. The objective of
this study was to develop a software that identify the presence or absence of
pneumonia in chest radiographs. The software was developed as a computational
model based on machine learning using transfer learning technique. For the
training process, images were collected from a database available online with
children’s chest X-rays images taken at a hospital in China. After training,
the model was then exposed to new images, achieving relevant results on
identifying such pathology, reaching 98% sensitivity and 97.3% specificity for
the sample used for testing. It can be concluded that it is possible to develop
a software that identifies pneumonia in chest X-ray images.
[COMMENTS]
In Brazilian Portuguese, 30 pages, 16 figures. This thesis was
elaborated by the guidance of Prof. Dr. Akihito Inca Atahualpa Urdiales
[LINK]
http://arxiv.org/abs/2309.11995v1
[DATE]
2023-09-21 20:10:22+08:00
[CATEGORIES]
cs.LG
Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization
[AUTHORS]
Hao Hao, Xiaoqun Zhang, Aimin Zhou
[ABSTRACT]
Surrogate-assisted evolutionary algorithms (SAEAs) hold significant
importance in resolving expensive optimization problems~(EOPs). Extensive
efforts have been devoted to improving the efficacy of SAEAs through the
development of proficient model-assisted selection methods. However, generating
high-quality solutions is a prerequisite for selection. The fundamental
paradigm of evaluating a limited number of solutions in each generation within
SAEAs reduces the variance of adjacent populations, thus impacting the quality
of offspring solutions. This is a frequently encountered issue, yet it has not
gained widespread attention. This paper presents a framework using unevaluated
solutions to enhance the efficiency of SAEAs. The surrogate model is employed
to identify high-quality solutions for direct generation of new solutions
without evaluation. To ensure dependable selection, we have introduced two
tailored relation models for the selection of the optimal solution and the
unevaluated population. A comprehensive experimental analysis is performed on
two test suites, which showcases the superiority of the relation model over
regression and classification models in the selection phase. Furthermore, the
surrogate-selected unevaluated solutions with high potential have been shown to
significantly enhance the efficiency of the algorithm.
[COMMENTS]
18 pages, 9 figures
[LINK]
http://arxiv.org/abs/2309.11994v1
[DATE]
2023-09-21 20:09:55+08:00
[CATEGORIES]
cs.LG
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
[AUTHORS]
Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed
[ABSTRACT]
Connectionist temporal classification (CTC) is commonly adopted for sequence
modeling tasks like speech recognition, where it is necessary to preserve order
between the input and target sequences. However, CTC is only applied to
deterministic sequence models, where the latent space is discontinuous and
sparse, which in turn makes them less capable of handling data variability when
compared to variational models. In this paper, we integrate CTC with a
variational model and derive loss functions that can be used to train more
generalizable sequence models that preserve order. Specifically, we derive two
versions of the novel variational CTC based on two reasonable assumptions, the
first being that the variational latent variables at each time step are
conditionally independent; and the second being that these latent variables are
Markovian. We show that both loss functions allow direct optimization of the
variational lower bound for the model log-likelihood, and present
computationally tractable forms for implementing them.
[COMMENTS]
5 pages, 3 figures, conference
[LINK]
http://arxiv.org/abs/2309.11983v1
[DATE]
2023-09-21 19:39:33+08:00
[CATEGORIES]
cs.LG
Semantic-aware Transmission Scheduling: a Monotonicity-driven Deep Reinforcement Learning Approach
[AUTHORS]
Jiazheng Chen, Wanchun Liu, Daniel Quevedo, Yonghui Li, Branka Vucetic
[ABSTRACT]
For cyber-physical systems in the 6G era, semantic communications connecting
distributed devices for dynamic control and remote state estimation are
required to guarantee application-level performance, not merely focus on
communication-centric performance. Semantics here is a measure of the
usefulness of information transmissions. Semantic-aware transmission scheduling
of a large system often involves a large decision-making space, and the optimal
policy cannot be obtained by existing algorithms effectively. In this paper, we
first investigate the fundamental properties of the optimal semantic-aware
scheduling policy and then develop advanced deep reinforcement learning (DRL)
algorithms by leveraging the theoretical guidelines. Our numerical results show
that the proposed algorithms can substantially reduce training time and enhance
training performance compared to benchmark algorithms.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2305.13706v2
[DATE]
2023-09-21 18:48:47+08:00
[CATEGORIES]
cs.LG
Generating Hierarchical Structures for Improved Time Series Classification Using Stochastic Splitting Functions
[AUTHORS]
Celal Alagoz
[ABSTRACT]
This study introduces a novel hierarchical divisive clustering approach with
stochastic splitting functions (SSFs) to enhance classification performance in
multi-class datasets through hierarchical classification (HC). The method has
the unique capability of generating hierarchy without requiring explicit
information, making it suitable for datasets lacking prior knowledge of
hierarchy. By systematically dividing classes into two subsets based on their
discriminability according to the classifier, the proposed approach constructs
a binary tree representation of hierarchical classes. The approach is evaluated
on 46 multi-class time series datasets using popular classifiers (svm and
rocket) and SSFs (potr, srtr, and lsoo). The results reveal that the approach
significantly improves classification performance in approximately half and a
third of the datasets when using rocket and svm as the classifier,
respectively. The study also explores the relationship between dataset features
and HC performance. While the number of classes and flat classification (FC)
score show consistent significance, variations are observed with different
splitting functions. Overall, the proposed approach presents a promising
strategy for enhancing classification by generating hierarchical structure in
multi-class time series datasets. Future research directions involve exploring
different splitting functions, classifiers, and hierarchy structures, as well
as applying the approach to diverse domains beyond time series data. The source
code is made openly available to facilitate reproducibility and further
exploration of the method.
[LINK]
http://arxiv.org/abs/2309.11963v1
[DATE]
2023-09-21 18:34:50+08:00
[CATEGORIES]
cs.LG
A Study of Forward-Forward Algorithm for Self-Supervised Learning
[AUTHORS]
Jonas Brenig, Radu Timofte
[ABSTRACT]
Self-supervised representation learning has seen remarkable progress in the
last few years, with some of the recent methods being able to learn useful
image representations without labels. These methods are trained using
backpropagation, the de facto standard. Recently, Geoffrey Hinton proposed the
forward-forward algorithm as an alternative training method. It utilizes two
forward passes and a separate loss function for each layer to train the network
without backpropagation.
In this study, for the first time, we study the performance of
forward-forward vs. backpropagation for self-supervised representation learning
and provide insights into the learned representation spaces. Our benchmark
employs four standard datasets, namely MNIST, F-MNIST, SVHN and CIFAR-10, and
three commonly used self-supervised representation learning techniques, namely
rotation, flip and jigsaw.
Our main finding is that while the forward-forward algorithm performs
comparably to backpropagation during (self-)supervised training, the transfer
performance is significantly lagging behind in all the studied settings. This
may be caused by a combination of factors, including having a loss function for
each layer and the way the supervised training is realized in the
forward-forward paradigm. In comparison to backpropagation, the forward-forward
algorithm focuses more on the boundaries and drops part of the information
unnecessary for making decisions which harms the representation learning goal.
Further investigation and research are necessary to stabilize the
forward-forward strategy for self-supervised learning, to work beyond the
datasets and configurations demonstrated by Geoffrey Hinton.
[LINK]
http://arxiv.org/abs/2309.11955v1
[DATE]
2023-09-21 18:14:53+08:00
[CATEGORIES]
cs.LG
On the Probability of Immunity
[AUTHORS]
Jose M. Peña
[ABSTRACT]
This work is devoted to the study of the probability of immunity, i.e. the
effect occurs whether exposed or not. We derive necessary and sufficient
conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the
probability of immunity is zero and $\epsilon$-bounded, respectively. The
former allows us to estimate the probability of benefit (i.e., the effect
occurs if and only if exposed) from a randomized controlled trial, and the
latter allows us to produce bounds of the probability of benefit that are
tighter than the existing ones. We also introduce the concept of indirect
immunity (i.e., through a mediator) and repeat our previous analysis for it.
Finally, we propose a method for sensitivity analysis of the probability of
immunity under unmeasured confounding.
[LINK]
http://arxiv.org/abs/2309.11942v1
[DATE]
2023-09-21 17:57:03+08:00
[CATEGORIES]
cs.LG
Global universal approximation of functional input maps on weighted spaces
[AUTHORS]
Christa Cuchiero, Philipp Schmocker, Josef Teichmann
[ABSTRACT]
We introduce so-called functional input neural networks defined on a possibly
infinite dimensional weighted space with values also in a possibly infinite
dimensional output space. To this end, we use an additive family as hidden
layer maps and a non-linear activation function applied to each hidden layer.
Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global
universal approximation result for generalizations of continuous functions
going beyond the usual approximation on compact sets. This then applies in
particular to approximation of (non-anticipative) path space functionals via
functional input neural networks. As a further application of the weighted
Stone-Weierstrass theorem we prove a global universal approximation result for
linear functions of the signature. We also introduce the viewpoint of Gaussian
process regression in this setting and show that the reproducing kernel Hilbert
space of the signature kernels are Cameron-Martin spaces of certain Gaussian
processes. This paves the way towards uncertainty quantification for signature
kernel regression.
[COMMENTS]
57 pages, 4 figures
[LINK]
http://arxiv.org/abs/2306.03303v2
[DATE]
2023-09-21 17:51:29+08:00
[CATEGORIES]
cs.LG
A Machine Learning-oriented Survey on Tiny Machine Learning
[AUTHORS]
Luigi Capogrosso, Federico Cunico, Dong Seon Cheng, Franco Fummi, Marco cristani
[ABSTRACT]
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized
the field of Artificial Intelligence by promoting the joint design of
resource-constrained IoT hardware devices and their learning-based software
architectures. TinyML carries an essential role within the fourth and fifth
industrial revolutions in helping societies, economies, and individuals employ
effective AI-infused computing technologies (e.g., smart cities, automotive,
and medical robotics). Given its multidisciplinary nature, the field of TinyML
has been approached from many different angles: this comprehensive survey
wishes to provide an up-to-date overview focused on all the learning algorithms
within TinyML-based solutions. The survey is based on the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow,
allowing for a systematic and complete literature survey. In particular,
firstly we will examine the three different workflows for implementing a
TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly,
we propose a taxonomy that covers the learning panorama under the TinyML lens,
examining in detail the different families of model optimization and design, as
well as the state-of-the-art learning techniques. Thirdly, this survey will
present the distinct features of hardware devices and software tools that
represent the current state-of-the-art for TinyML intelligent edge
applications. Finally, we discuss the challenges and future directions.
[COMMENTS]
Article currently under review at IEEE Access
[LINK]
http://arxiv.org/abs/2309.11932v1
[DATE]
2023-09-21 17:47:12+08:00
[CATEGORIES]
cs.LG
Persistent Homology of the Multiscale Clustering Filtration
[AUTHORS]
Dominik J. Schindler, Mauricio Barahona
[ABSTRACT]
In many applications in data clustering, it is desirable to find not just a
single partition into clusters but a sequence of partitions describing the data
at different scales, or levels of coarseness. A natural problem then is to
analyse and compare the (not necessarily hierarchical) sequences of partitions
that underpin such multiscale descriptions of data. Here, we introduce a
filtration of abstract simplicial complexes, denoted the Multiscale Clustering
Filtration (MCF), which encodes arbitrary patterns of cluster assignments
across scales, and we prove that the MCF produces stable persistence diagrams.
We then show that the zero-dimensional persistent homology of the MCF measures
the degree of hierarchy in the sequence of partitions, and that the
higher-dimensional persistent homology tracks the emergence and resolution of
conflicts between cluster assignments across the sequence of partitions. To
broaden the theoretical foundations of the MCF, we also provide an equivalent
construction via a nerve complex filtration, and we show that in the
hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an
ultrametric space. We briefly illustrate how the MCF can serve to characterise
multiscale clustering structures in numerical experiments on synthetic data.
[COMMENTS]
This work was presented at the Dagstuhl Seminar (23192) on
“Topological Data Analysis and Applications”
[LINK]
http://arxiv.org/abs/2305.04281v2
[DATE]
2023-09-21 17:39:55+08:00
[CATEGORIES]
cs.LG
Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States
[AUTHORS]
Chung-En Tsai, Hao-Chung Cheng, Yen-Huan Li
[ABSTRACT]
Consider an online convex optimization problem where the loss functions are
self-concordant barriers, smooth relative to a convex function $h$, and
possibly non-Lipschitz. We analyze the regret of online mirror descent with
$h$. Then, based on the result, we prove the following in a unified manner.
Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online
portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of
exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3}
)$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4}
d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio
selection, the regret of online mirror descent with the logarithmic barrier is
$\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due
to Orseau et al. up to logarithmic terms. 3. For online learning quantum states
with the logarithmic loss, the regret of online mirror descent with the
log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration
time is shorter than all existing algorithms we know.
[COMMENTS]
34th Int. Conf. Algorithmic Learning Theory (ALT 2023). A typo in the
last equation in the proof of Lemma 10 is corrected
[LINK]
http://arxiv.org/abs/2210.00997v3
[DATE]
2023-09-21 17:19:10+08:00
[CATEGORIES]
cs.LG
Ensuring Topological Data-Structure Preservation under Autoencoder Compression due to Latent Space Regularization in Gauss–Legendre nodes
[AUTHORS]
Chethan Krishnamurthy Ramanaik, Juan-Esteban Suarez Cardona, Anna Willmann, Pia Hanfeld, Nico Hoffmann, Michael Hecht
[ABSTRACT]
We formulate a data independent latent space regularisation constraint for
general unsupervised autoencoders. The regularisation rests on sampling the
autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre
quadrature. Revisiting this classic enables to prove that regularised
autoencoders ensure a one-to-one re-embedding of the initial data manifold to
its latent representation. Demonstrations show that prior proposed
regularisation strategies, such as contractive autoencoding, cause topological
defects already for simple examples, and so do convolutional based
(variational) autoencoders. In contrast, topological preservation is ensured
already by standard multilayer perceptron neural networks when being
regularised due to our contribution. This observation extends through the
classic FashionMNIST dataset up to real world encoding problems for MRI brain
scans, suggesting that, across disciplines, reliable low dimensional
representations of complex high-dimensional datasets can be delivered due to
this regularisation technique.
[LINK]
http://arxiv.org/abs/2309.08228v2
[DATE]
2023-09-21 17:10:39+08:00
[CATEGORIES]
cs.LG
FedGKD: Unleashing the Power of Collaboration in Federated Graph Neural Networks
[AUTHORS]
Qiying Pan, Ruofan Wu, Tengfei Liu, Tianyi Zhang, Yifei Zhu, Weiqiang Wang
[ABSTRACT]
Federated training of Graph Neural Networks (GNN) has become popular in
recent years due to its ability to perform graph-related tasks under data
isolation scenarios while preserving data privacy. However, graph heterogeneity
issues in federated GNN systems continue to pose challenges. Existing
frameworks address the problem by representing local tasks using different
statistics and relating them through a simple aggregation mechanism. However,
these approaches suffer from limited efficiency from two aspects: low quality
of task-relatedness quantification and inefficacy of exploiting the
collaboration structure. To address these issues, we propose FedGKD, a novel
federated GNN framework that utilizes a novel client-side graph dataset
distillation method to extract task features that better describe
task-relatedness, and introduces a novel server-side aggregation mechanism that
is aware of the global collaboration structure. We conduct extensive
experiments on six real-world datasets of different scales, demonstrating our
framework’s outperformance.
[LINK]
http://arxiv.org/abs/2309.09517v3
[DATE]
2023-09-21 16:37:22+08:00
[CATEGORIES]
cs.LG
Stochastic stiffness identification and response estimation of Timoshenko beams via physics-informed Gaussian processes
[AUTHORS]
Gledson Rodrigo Tondo, Sebastian Rau, Igor Kavrakov, Guido Morgenthal
[ABSTRACT]
Machine learning models trained with structural health monitoring data have
become a powerful tool for system identification. This paper presents a
physics-informed Gaussian process (GP) model for Timoshenko beam elements. The
model is constructed as a multi-output GP with covariance and cross-covariance
kernels analytically derived based on the differential equations for
deflections, rotations, strains, bending moments, shear forces and applied
loads. Stiffness identification is performed in a Bayesian format by maximising
a posterior model through a Markov chain Monte Carlo method, yielding a
stochastic model for the structural parameters. The optimised GP model is
further employed for probabilistic predictions of unobserved responses.
Additionally, an entropy-based method for physics-informed sensor placement
optimisation is presented, exploiting heterogeneous sensor position information
and structural boundary conditions built into the GP model. Results demonstrate
that the proposed approach is effective at identifying structural parameters
and is capable of fusing data from heterogeneous and multi-fidelity sensors.
Probabilistic predictions of structural responses and internal forces are in
closer agreement with measured data. We validate our model with an experimental
setup and discuss the quality and uncertainty of the obtained results. The
proposed approach has potential applications in the field of structural health
monitoring (SHM) for both mechanical and structural systems.
[LINK]
http://arxiv.org/abs/2309.11875v1
[DATE]
2023-09-21 16:22:12+08:00
[CATEGORIES]
cs.LG
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
[AUTHORS]
Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu
[ABSTRACT]
Audiovisual data is everywhere in this digital age, which raises higher
requirements for the deep learning models developed on them. To well handle the
information of the multi-modal data is the key to a better audiovisual modal.
We observe that these audiovisual data naturally have temporal attributes, such
as the time information for each frame in the video. More concretely, such data
is inherently multi-modal according to both audio and visual cues, which
proceed in a strict chronological order. It indicates that temporal information
is important in multi-modal acoustic event modeling for both intra- and
inter-modal. However, existing methods deal with each modal feature
independently and simply fuse them together, which neglects the mining of
temporal relation and thus leads to sub-optimal performance. With this
motivation, we propose a Temporal Multi-modal graph learning method for
Acoustic event Classification, called TMac, by modeling such temporal
information via graph learning techniques. In particular, we construct a
temporal graph for each acoustic event, dividing its audio data and video data
into multiple segments. Each segment can be considered as a node, and the
temporal relationships between nodes can be considered as timestamps on their
edges. In this case, we can smoothly capture the dynamic information in
intra-modal and inter-modal. Several experiments are conducted to demonstrate
TMac outperforms other SOTA models in performance. Our code is available at
https://github.com/MGitHubL/TMac.
[LINK]
http://arxiv.org/abs/2309.11845v1
[DATE]
2023-09-21 15:39:08+08:00
[CATEGORIES]
cs.LG
Weakly supervised learning for pattern classification in serial femtosecond crystallography
[AUTHORS]
Jianan Xie, Ji Liu, Chi Zhang, Xihui Chen, Ping Huai, Jie Zheng, Xiaofeng Zhang
[ABSTRACT]
Serial femtosecond crystallography at X-ray free electron laser facilities
opens a new era for the determination of crystal structure. However, the data
processing of those experiments is facing unprecedented challenge, because the
total number of diffraction patterns needed to determinate a high-resolution
structure is huge. Machine learning methods are very likely to play important
roles in dealing with such a large volume of data. Convolutional neural
networks have made a great success in the field of pattern classification,
however, training of the networks need very large datasets with labels. Th is
heavy dependence on labeled datasets will seriously restrict the application of
networks, because it is very costly to annotate a large number of diffraction
patterns. In this article we present our job on the classification of
diffraction pattern by weakly supervised algorithms, with the aim of reducing
as much as possible the size of the labeled dataset required for training. Our
result shows that weakly supervised methods can significantly reduce the need
for the number of labeled patterns while achieving comparable accuracy to fully
supervised methods.
[COMMENTS]
$\copyright$ 2023 Optica Publishing Group. Users may use, reuse, and
build upon the article, or use the article for text or data mining, so long
as such uses are for non-commercial purposes and appropriate attribution is
maintained. All other rights are reserved. Link for fulltext:
https://opg.optica.org/oe/fulltext.cfm?uri=oe-31-20-32909&id=538502
[LINK]
http://arxiv.org/abs/2309.04474v2
[DATE]
2023-09-21 14:52:38+08:00
[CATEGORIES]
cs.LG
A Comprehensive Review of Community Detection in Graphs
[AUTHORS]
Songlai Ning, Jiakang Li, Yonggang Lu
[ABSTRACT]
The study of complex networks has significantly advanced our understanding of
community structures which serves as a crucial feature of real-world graphs.
Detecting communities in graphs is a challenging problem with applications in
sociology, biology, and computer science. Despite the efforts of an
interdisciplinary community of scientists, a satisfactory solution to this
problem has not yet been achieved. This review article delves into the topic of
community detection in graphs, which serves as a crucial role in understanding
the organization and functioning of complex systems. We begin by introducing
the concept of community structure, which refers to the arrangement of vertices
into clusters, with strong internal connections and weaker connections between
clusters. Then, we provide a thorough exposition of various community detection
methods, including a new method designed by us. Additionally, we explore
real-world applications of community detection in diverse networks. In
conclusion, this comprehensive review provides a deep understanding of
community detection in graphs. It serves as a valuable resource for researchers
and practitioners in multiple disciplines, offering insights into the
challenges, methodologies, and applications of community detection in complex
networks.
[LINK]
http://arxiv.org/abs/2309.11798v1
[DATE]
2023-09-21 14:04:06+08:00
[CATEGORIES]
cs.LG
Nonparametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations
[AUTHORS]
Kevin C. Cheng, Shuchin Aeron, Michael C. Hughes, Eric L. Miller
[ABSTRACT]
We consider probabilistic models for sequential observations which exhibit
gradual transitions among a finite number of states. We are particularly
motivated by applications such as human activity analysis where observed
accelerometer time series contains segments representing distinct activities,
which we call pure states, as well as periods characterized by continuous
transition among these pure states. To capture this transitory behavior, the
dynamical Wasserstein barycenter (DWB) model of Cheng et al. in 2021 [1]
associates with each pure state a data-generating distribution and models the
continuous transitions among these states as a Wasserstein barycenter of these
distributions with dynamically evolving weights. Focusing on the univariate
case where Wasserstein distances and barycenters can be computed in closed
form, we extend [1] specifically relaxing the parameterization of the pure
states as Gaussian distributions. We highlight issues related to the uniqueness
in identifying the model parameters as well as uncertainties induced when
estimating a dynamically evolving distribution from a limited number of
samples. To ameliorate non-uniqueness, we introduce regularization that imposes
temporal smoothness on the dynamics of the barycentric weights. A
quantile-based approximation of the pure state distributions yields a finite
dimensional estimation problem which we numerically solve using cyclic descent
alternating between updates to the pure-state quantile functions and the
barycentric weights. We demonstrate the utility of the proposed algorithm in
segmenting both simulated and real world human activity time series.
[LINK]
http://arxiv.org/abs/2210.01918v3
[DATE]
2023-09-21 12:22:17+08:00
[CATEGORIES]
cs.LG
Cross-scale Multi-instance Learning for Pathological Image Diagnosis
[AUTHORS]
Ruining Deng, Can Cui, Lucas W. Remedios, Shunxing Bao, R. Michael Womick, Sophie Chiron, Jia Li, Joseph T. Roland, Ken S. Lau, Qi Liu, Keith T. Wilson, Yaohong Wang, Lori A. Coburn, Bennett A. Landman, Yuankai Huo
[ABSTRACT]
Analyzing high resolution whole slide images (WSIs) with regard to
information across multiple scales poses a significant challenge in digital
pathology. Multi-instance learning (MIL) is a common solution for working with
high resolution images by classifying bags of objects (i.e. sets of smaller
image patches). However, such processing is typically performed at a single
scale (e.g., 20x magnification) of WSIs, disregarding the vital inter-scale
information that is key to diagnoses by human pathologists. In this study, we
propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale
relationships into a single MIL network for pathological image diagnosis. The
contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL)
algorithm that integrates the multi-scale information and the inter-scale
relationships is proposed; (2) A toy dataset with scale-specific morphological
features is created and released to examine and visualize differential
cross-scale attention; (3) Superior performance on both in-house and public
datasets is demonstrated by our simple cross-scale MIL strategy. The official
implementation is publicly available at https://github.com/hrlblab/CS-MIL.
[LINK]
http://arxiv.org/abs/2304.00216v2
[DATE]
2023-09-21 12:09:39+08:00
[CATEGORIES]
cs.LG
Dictionary Attack on IMU-based Gait Authentication
[AUTHORS]
Rajesh Kumar, Can Isik, Chilukuri K. Mohan
[ABSTRACT]
We present a novel adversarial model for authentication systems that use gait
patterns recorded by the inertial measurement unit (IMU) built into
smartphones. The attack idea is inspired by and named after the concept of a
dictionary attack on knowledge (PIN or password) based authentication systems.
In particular, this work investigates whether it is possible to build a
dictionary of IMUGait patterns and use it to launch an attack or find an
imitator who can actively reproduce IMUGait patterns that match the target’s
IMUGait pattern. Nine physically and demographically diverse individuals walked
at various levels of four predefined controllable and adaptable gait factors
(speed, step length, step width, and thigh-lift), producing 178 unique IMUGait
patterns. Each pattern attacked a wide variety of user authentication models.
The deeper analysis of error rates (before and after the attack) challenges the
belief that authentication systems based on IMUGait patterns are the most
difficult to spoof; further research is needed on adversarial models and
associated countermeasures.
[COMMENTS]
12 pages, 9 figures, accepted at AISec23 colocated with ACM CCS,
November 30, 2023, Copenhagen, Denmark
[LINK]
http://arxiv.org/abs/2309.11766v1
[DATE]
2023-09-21 12:00:21+08:00
[CATEGORIES]
cs.LG
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation
[AUTHORS]
Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, Fatemehsadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim
[ABSTRACT]
We study the problem of in-context learning (ICL) with large language models
(LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak
or regurgitate the private examples demonstrated in the prompt. We propose a
novel algorithm that generates synthetic few-shot demonstrations from the
private dataset with formal differential privacy (DP) guarantees, and show
empirically that it can achieve effective ICL. We conduct extensive experiments
on standard benchmarks and compare our algorithm with non-private ICL and
zero-shot solutions. Our results demonstrate that our algorithm can achieve
competitive performance with strong privacy levels. These results open up new
possibilities for ICL with privacy protection for a broad range of
applications.
[LINK]
http://arxiv.org/abs/2309.11765v1
[DATE]
2023-09-21 11:59:00+08:00
[CATEGORIES]
cs.LG
Optimal Propagation for Graph Neural Networks
[AUTHORS]
Beidi Zhao, Boxin Du, Zhe Xu, Liangyue Li, Hanghang Tong
[ABSTRACT]
Graph Neural Networks (GNNs) have achieved tremendous success in a variety of
real-world applications by relying on the fixed graph data as input. However,
the initial input graph might not be optimal in terms of specific downstream
tasks, because of information scarcity, noise, adversarial attacks, or
discrepancies between the distribution in graph topology, features, and
groundtruth labels. In this paper, we propose a bi-level optimization approach
for learning the optimal graph structure via directly learning the Personalized
PageRank propagation matrix as well as the downstream semi-supervised node
classification simultaneously. We also explore a low-rank approximation model
for further reducing the time complexity. Empirical evaluations show the
superior efficacy and robustness of the proposed model over all baseline
methods.
[COMMENTS]
7 pages, 3 figures
[LINK]
http://arxiv.org/abs/2205.02998v2
[DATE]
2023-09-21 11:44:49+08:00
[CATEGORIES]
cs.LG
SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks
[AUTHORS]
Chengliang Wang, Xinrun Chen, Haojian Ning, Shiying Li
[ABSTRACT]
In the analysis of optical coherence tomography angiography (OCTA) images,
the operation of segmenting specific targets is necessary. Existing methods
typically train on supervised datasets with limited samples (approximately a
few hundred), which can lead to overfitting. To address this, the low-rank
adaptation technique is adopted for foundation model fine-tuning and proposed
corresponding prompt point generation strategies to process various
segmentation tasks on OCTA datasets. This method is named SAM-OCTA and has been
experimented on the publicly available OCTA-500 dataset. While achieving
state-of-the-art performance metrics, this method accomplishes local vessel
segmentation as well as effective artery-vein segmentation, which was not
well-solved in previous works. The code is available at:
https://github.com/ShellRedia/SAM-OCTA.
[COMMENTS]
ICASSP conference is in submission
[LINK]
http://arxiv.org/abs/2309.11758v1
[DATE]
2023-09-21 11:41:08+08:00
[CATEGORIES]
cs.LG
How Robust is Google’s Bard to Adversarial Image Attacks?
[AUTHORS]
Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, Jun Zhu
[ABSTRACT]
Multimodal Large Language Models (MLLMs) that integrate text and other
modalities (especially vision) have achieved unprecedented performance in
various multimodal tasks. However, due to the unsolved adversarial robustness
problem of vision models, MLLMs can have more severe safety and security risks
by introducing the vision inputs. In this work, we study the adversarial
robustness of Google’s Bard, a competitive chatbot to ChatGPT that released its
multimodal capability recently, to better understand the vulnerabilities of
commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs,
the generated adversarial examples can mislead Bard to output wrong image
descriptions with a 22% success rate based solely on the transferability. We
show that the adversarial examples can also attack other MLLMs, e.g., a 26%
attack success rate against Bing Chat and a 86% attack success rate against
ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face
detection and toxicity detection of images. We design corresponding attacks to
evade these defenses, demonstrating that the current defenses of Bard are also
vulnerable. We hope this work can deepen our understanding on the robustness of
MLLMs and facilitate future research on defenses. Our code is available at
https://github.com/thu-ml/Attack-Bard.
[COMMENTS]
Technical report
[LINK]
http://arxiv.org/abs/2309.11751v1
[DATE]
2023-09-21 11:24:30+08:00
[CATEGORIES]
cs.LG
PIE: Simulating Disease Progression via Progressive Image Editing
[AUTHORS]
Kaizhao Liang, Xu Cao, Kuei-Da Liao, Tianren Gao, Zhengyu Chen, Tejas Nama
[ABSTRACT]
Disease progression simulation is a crucial area of research that has
significant implications for clinical diagnosis, prognosis, and treatment. One
major challenge in this field is the lack of continuous medical imaging
monitoring of individual patients over time. To address this issue, we develop
a novel framework termed Progressive Image Editing (PIE) that enables
controlled manipulation of disease-related image features, facilitating precise
and realistic disease progression simulation. Specifically, we leverage recent
advancements in text-to-image generative models to simulate disease progression
accurately and personalize it for each patient. We theoretically analyze the
iterative refining process in our framework as a gradient descent with an
exponentially decayed learning rate. To validate our framework, we conduct
experiments in three medical imaging domains. Our results demonstrate the
superiority of PIE over existing methods such as Stable Diffusion Walk and
Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease
Classification Confidence (Alignment). Our user study collected feedback from
35 veteran physicians to assess the generated progressions. Remarkably, 76.2%
of the feedback agrees with the fidelity of the generated progressions. To our
best knowledge, PIE is the first of its kind to generate disease progression
images meeting real-world standards. It is a promising tool for medical
research and clinical practice, potentially allowing healthcare providers to
model disease trajectories over time, predict future treatment responses, and
improve patient outcomes.
[LINK]
http://arxiv.org/abs/2309.11745v1
[DATE]
2023-09-21 10:46:32+08:00
[CATEGORIES]
cs.LG
Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations
[AUTHORS]
Zhihang Yu, Shu Wang, Yunqiang Zhu, Wen Yuan, Xiaoliang Dai, Zhiqiang Zou
[ABSTRACT]
The recommendation of appropriate development pathways, also known as
ecological civilization patterns for achieving Sustainable Development Goals
(namely, sustainable development patterns), are of utmost importance for
promoting ecological, economic, social, and resource sustainability in a
specific region. To achieve this, the recommendation process must carefully
consider the region’s natural, environmental, resource, and economic
characteristics. However, current recommendation algorithms in the field of
computer science fall short in adequately addressing the spatial heterogeneity
related to environment and sparsity of regional historical interaction data,
which limits their effectiveness in recommending sustainable development
patterns. To overcome these challenges, this paper proposes a method called
User Graph after Pruning and Intent Graph (UGPIG). Firstly, we utilize the
high-density linking capability of the pruned User Graph to address the issue
of spatial heterogeneity neglect in recommendation algorithms. Secondly, we
construct an Intent Graph by incorporating the intent network, which captures
the preferences for attributes including environmental elements of target
regions. This approach effectively alleviates the problem of sparse historical
interaction data in the region. Through extensive experiments, we demonstrate
that UGPIG outperforms state-of-the-art recommendation algorithms like KGCN,
KGAT, and KGIN in sustainable development pattern recommendations, with a
maximum improvement of 9.61% in Top-3 recommendation performance.
[LINK]
http://arxiv.org/abs/2309.11741v1
[DATE]
2023-09-21 10:32:17+08:00
[CATEGORIES]
cs.LG
Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems
[AUTHORS]
Blaž Škrlj, Nir Ki-Tov, Lee Edelist, Natalia Silberstein, Hila Weisman-Zohar, Blaž Mramor, Davorin Kopič, Naama Ziporin
[ABSTRACT]
Real-world production systems often grapple with maintaining data quality in
large-scale, dynamic streams. We introduce Drifter, an efficient and
lightweight system for online feature monitoring and verification in
recommendation use cases. Drifter addresses limitations of existing methods by
delivering agile, responsive, and adaptable data quality monitoring, enabling
real-time root cause analysis, drift detection and insights into problematic
production events. Integrating state-of-the-art online feature ranking for
sparse data and anomaly detection ideas, Drifter is highly scalable and
resource-efficient, requiring only two threads and less than a gigabyte of RAM
per production deployments that handle millions of instances per minute.
Evaluation on real-world data sets demonstrates Drifter’s effectiveness in
alerting and mitigating data quality issues, substantially improving
reliability and performance of real-time live recommender systems.
[COMMENTS]
Accepted to ORSUM RecSys workshop
[LINK]
http://arxiv.org/abs/2309.08617v2
[DATE]
2023-09-21 10:16:24+08:00
[CATEGORIES]
cs.LG
Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs
[AUTHORS]
Alex Renda, Yi Ding, Michael Carbin
[ABSTRACT]
Programmers and researchers are increasingly developing surrogates of
programs, models of a subset of the observable behavior of a given program, to
solve a variety of software development challenges. Programmers train
surrogates from measurements of the behavior of a program on a dataset of input
examples. A key challenge of surrogate construction is determining what
training data to use to train a surrogate of a given program.
We present a methodology for sampling datasets to train neural-network-based
surrogates of programs. We first characterize the proportion of data to sample
from each region of a program’s input space (corresponding to different
execution paths of the program) based on the complexity of learning a surrogate
of the corresponding execution path. We next provide a program analysis to
determine the complexity of different paths in a program. We evaluate these
results on a range of real-world programs, demonstrating that complexity-guided
sampling results in empirical improvements in accuracy.
[COMMENTS]
Published in OOPSLA 2023
[LINK]
http://arxiv.org/abs/2309.11726v1
[DATE]
2023-09-21 09:59:20+08:00
[CATEGORIES]
cs.LG
Efficient Core-selecting Incentive Mechanism for Data Sharing in Federated Learning
[AUTHORS]
Mengda Ji, Genjiu Xu, Jianjun Ge, Mingqiang Li
[ABSTRACT]
Federated learning is a distributed machine learning system that uses
participants’ data to train an improved global model. In federated learning,
participants cooperatively train a global model, and they will receive the
global model and payments. Rational participants try to maximize their
individual utility, and they will not input their high-quality data truthfully
unless they are provided with satisfactory payments based on their data
quality. Furthermore, federated learning benefits from the cooperative
contributions of participants. Accordingly, how to establish an incentive
mechanism that both incentivizes inputting data truthfully and promotes stable
cooperation has become an important issue to consider. In this paper, we
introduce a data sharing game model for federated learning and employ
game-theoretic approaches to design a core-selecting incentive mechanism by
utilizing a popular concept in cooperative games, the core. In federated
learning, the core can be empty, resulting in the core-selecting mechanism
becoming infeasible. To address this, our core-selecting mechanism employs a
relaxation method and simultaneously minimizes the benefits of inputting false
data for all participants. However, this mechanism is computationally expensive
because it requires aggregating exponential models for all possible coalitions,
which is infeasible in federated learning. To address this, we propose an
efficient core-selecting mechanism based on sampling approximation that only
aggregates models on sampled coalitions to approximate the exact result.
Extensive experiments verify that the efficient core-selecting mechanism can
incentivize inputting high-quality data and stable cooperation, while it
reduces computational overhead compared to the core-selecting mechanism.
[LINK]
http://arxiv.org/abs/2309.11722v1
[DATE]
2023-09-21 09:47:39+08:00
[CATEGORIES]
cs.LG
A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification
[AUTHORS]
Jie Jiao, Meiyan Xu, Qingqing Chen, Hefan Zhou, Wangliang Zhou
[ABSTRACT]
There is a correlation between adjacent channels of electroencephalogram
(EEG), and how to represent this correlation is an issue that is currently
being explored. In addition, due to inter-individual differences in EEG
signals, this discrepancy results in new subjects need spend a amount of
calibration time for EEG-based motor imagery brain-computer interface. In order
to solve the above problems, we propose a Dynamic Domain Adaptation Based Deep
Learning Network (DADL-Net). First, the EEG data is mapped to the
three-dimensional geometric space and its temporal-spatial features are learned
through the 3D convolution module, and then the spatial-channel attention
mechanism is used to strengthen the features, and the final convolution module
can further learn the spatial-temporal information of the features. Finally, to
account for inter-subject and cross-sessions differences, we employ a dynamic
domain-adaptive strategy, the distance between features is reduced by
introducing a Maximum Mean Discrepancy loss function, and the classification
layer is fine-tuned by using part of the target domain data. We verify the
performance of the proposed method on BCI competition IV 2a and OpenBMI
datasets. Under the intra-subject experiment, the accuracy rates of 70.42% and
73.91% were achieved on the OpenBMI and BCIC IV 2a datasets.
[COMMENTS]
10 pages,4 figures,journal
[LINK]
http://arxiv.org/abs/2309.11714v1
[DATE]
2023-09-21 09:34:00+08:00
[CATEGORIES]
cs.LG
Quasi-Monte Carlo for 3D Sliced Wasserstein
[AUTHORS]
Khai Nguyen, Nicola Bariletto, Nhat Ho
[ABSTRACT]
Monte Carlo (MC) approximation has been used as the standard computation
approach for the Sliced Wasserstein (SW) distance, which has an intractable
expectation in its analytical form. However, the MC method is not optimal in
terms of minimizing the absolute approximation error. To provide a better class
of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that
rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of
QMC for SW, we focus on the 3D setting, specifically computing the SW between
probability measures in three dimensions. In greater detail, we empirically
verify various ways of constructing QMC points sets on the 3D unit-hypersphere,
including Gaussian-based mapping, equal area mapping, generalized spiral
points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased
estimation for stochastic optimization, we extend QSW into Randomized
Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed
low-discrepancy sequences. For theoretical properties, we prove the asymptotic
convergence of QSW and the unbiasedness of RQSW. Finally, we conduct
experiments on various 3D tasks, such as point-cloud comparison, point-cloud
interpolation, image style transfer, and training deep point-cloud
autoencoders, to demonstrate the favorable performance of the proposed QSW and
RQSW variants.
[COMMENTS]
31 pages, 13 figures, 6 tables
[LINK]
http://arxiv.org/abs/2309.11713v1
[DATE]
2023-09-21 09:32:42+08:00
[CATEGORIES]
cs.LG
Meta OOD Learning for Continuously Adaptive OOD Detection
[AUTHORS]
Xinheng Wu, Jie Lu, Zhen Fang, Guangquan Zhang
[ABSTRACT]
Out-of-distribution (OOD) detection is crucial to modern deep learning
applications by identifying and alerting about the OOD samples that should not
be tested or used for making predictions. Current OOD detection methods have
made significant progress when in-distribution (ID) and OOD samples are drawn
from static distributions. However, this can be unrealistic when applied to
real-world systems which often undergo continuous variations and shifts in ID
and OOD distributions over time. Therefore, for an effective application in
real-world systems, the development of OOD detection methods that can adapt to
these dynamic and evolving distributions is essential. In this paper, we
propose a novel and more realistic setting called continuously adaptive
out-of-distribution (CAOOD) detection which targets on developing an OOD
detection model that enables dynamic and quick adaptation to a new arriving
distribution, with insufficient ID samples during deployment time. To address
CAOOD, we develop meta OOD learning (MOL) by designing a learning-to-adapt
diagram such that a good initialized OOD detection model is learned during the
training process. In the testing process, MOL ensures OOD detection performance
over shifting distributions by quickly adapting to new distributions with a few
adaptations. Extensive experiments on several OOD benchmarks endorse the
effectiveness of our method in preserving both ID classification accuracy and
OOD detection performance on continuously shifting distributions.
[COMMENTS]
Accepted by ICCV 2023
[LINK]
http://arxiv.org/abs/2309.11705v1
[DATE]
2023-09-21 09:05:45+08:00
[CATEGORIES]
cs.LG
Incentivized Communication for Federated Bandits
[AUTHORS]
Zhepei Wei, Chuanhao Li, Haifeng Xu, Hongning Wang
[ABSTRACT]
Most existing works on federated bandits take it for granted that all clients
are altruistic about sharing their data with the server for the collective good
whenever needed. Despite their compelling theoretical guarantee on performance
and communication efficiency, this assumption is overly idealistic and
oftentimes violated in practice, especially when the algorithm is operated over
self-interested clients, who are reluctant to share data without explicit
benefits. Negligence of such self-interested behaviors can significantly affect
the learning efficiency and even the practical operability of federated bandit
learning. In light of this, we aim to spark new insights into this
under-explored research area by formally introducing an incentivized
communication problem for federated bandits, where the server shall motivate
clients to share data by providing incentives. Without loss of generality, we
instantiate this bandit problem with the contextual linear setting and propose
the first incentivized communication protocol, namely, Inc-FedUCB, that
achieves near-optimal regret with provable communication and incentive cost
guarantees. Extensive empirical experiments on both synthetic and real-world
datasets further validate the effectiveness of the proposed method across
various environments.
[COMMENTS]
25 pages, 4 figures
[LINK]
http://arxiv.org/abs/2309.11702v1
[DATE]
2023-09-21 08:59:20+08:00
[CATEGORIES]
cs.LG
Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning
[AUTHORS]
Sangwoo Park, Osvaldo Simeone
[ABSTRACT]
Quantum machine learning is a promising programming paradigm for the
optimization of quantum algorithms in the current era of noisy intermediate
scale quantum (NISQ) computers. A fundamental challenge in quantum machine
learning is generalization, as the designer targets performance under testing
conditions, while having access only to limited training data. Existing
generalization analyses, while identifying important general trends and scaling
laws, cannot be used to assign reliable and informative “error bars” to the
decisions made by quantum models. In this article, we propose a general
methodology that can reliably quantify the uncertainty of quantum models,
irrespective of the amount of training data, of the number of shots, of the
ansatz, of the training algorithm, and of the presence of quantum hardware
noise. The approach, which builds on probabilistic conformal prediction, turns
an arbitrary, possibly small, number of shots from a pre-trained quantum model
into a set prediction, e.g., an interval, that provably contains the true
target with any desired coverage level. Experimental results confirm the
theoretical calibration guarantees of the proposed framework, referred to as
quantum conformal prediction.
[COMMENTS]
added subsection on application to quantum data
[LINK]
http://arxiv.org/abs/2304.03398v2
[DATE]
2023-09-21 08:33:44+08:00
[CATEGORIES]
cs.LG
DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
[AUTHORS]
Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra
[ABSTRACT]
Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone
control involve dynamic behaviors in various granularity; task, model, and
layers within a model. Such dynamic behaviors introduce new challenges to the
system software in an ML system since the overall system load is not completely
predictable, unlike traditional ML workloads. In addition, RTMM workloads
require real-time processing, involve highly heterogeneous models, and target
resource-constrained devices. Under such circumstances, developing an effective
scheduler gains more importance to better utilize underlying hardware
considering the unique characteristics of RTMM workloads. Therefore, we propose
a new scheduler, DREAM, which effectively handles various dynamicity in RTMM
workloads targeting multi-accelerator systems. DREAM quantifies the unique
requirements for RTMM workloads and utilizes the quantified scores to drive
scheduling decisions, considering the current system load and other inference
jobs on different models and input frames. DREAM utilizes tunable parameters
that provide fast and effective adaptivity to dynamic workload changes. In our
evaluation of five scenarios of RTMM workload, DREAM reduces the overall
UXCost, which is an equivalent metric of the energy-delay product (EDP) for
RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to
80.8% and 97.6%) compared to state-of-the-art baselines, which shows the
efficacy of our scheduling methodology.
[COMMENTS]
14 pages
[LINK]
http://arxiv.org/abs/2212.03414v2
[DATE]
2023-09-21 08:24:09+08:00
[CATEGORIES]
cs.LG
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio
[AUTHORS]
Forsad Al Hossain, Tanjid Hasan Tonmoy, Andrew A. Lover, George A. Corey, Mohammad Arif Ul Alam, Tauhidur Rahman
[ABSTRACT]
Privacy-preserving crowd density analysis finds application across a wide
range of scenarios, substantially enhancing smart building operation and
management while upholding privacy expectations in various spaces. We propose a
non-speech audio-based approach for crowd analytics, leveraging a
transformer-based model. Our results demonstrate that non-speech audio alone
can be used to conduct such analysis with remarkable accuracy. To the best of
our knowledge, this is the first time when non-speech audio signals are
proposed for predicting occupancy. As far as we know, there has been no other
similar approach of its kind prior to this. To accomplish this, we deployed our
sensor-based platform in the waiting room of a large hospital with IRB approval
over a period of several months to capture non-speech audio and thermal images
for the training and evaluation of our models. The proposed non-speech-based
approach outperformed the thermal camera-based model and all other baselines.
In addition to demonstrating superior performance without utilizing speech
audio, we conduct further analysis using differential privacy techniques to
provide additional privacy guarantees. Overall, our work demonstrates the
viability of employing non-speech audio data for accurate occupancy estimation,
while also ensuring the exclusion of speech-related content and providing
robust privacy protections through differential privacy guarantees.
[LINK]
http://arxiv.org/abs/2309.10280v2
[DATE]
2023-09-21 07:45:05+08:00
[CATEGORIES]
cs.LG
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
[AUTHORS]
Zhonglin Cao, Simone Sciabola, Ye Wang
[ABSTRACT]
Virtual screening of large compound libraries to identify potential hit
candidates is one of the earliest steps in drug discovery. As the size of
commercially available compound collections grows exponentially to the scale of
billions, brute-force virtual screening using traditional tools such as docking
becomes infeasible in terms of time and computational resources. Active
learning and Bayesian optimization has recently been proven as effective
methods of narrowing down the search space. An essential component in those
methods is a surrogate machine learning model that is trained with a small
subset of the library to predict the desired properties of compounds. Accurate
model can achieve high sample efficiency by finding the most promising
compounds with only a fraction of the whole library being virtually screened.
In this study, we examined the performance of pretrained transformer-based
language model and graph neural network in Bayesian optimization active
learning framework. The best pretrained models identifies 58.97% of the
top-50000 by docking score after screening only 0.6% of an ultra-large library
containing 99.5 million compounds, improving 8% over previous state-of-the-art
baseline. Through extensive benchmarks, we show that the superior performance
of pretrained models persists in both structure-based and ligand-based drug
discovery. Such model can serve as a boost to the accuracy and sample
efficiency of active learning based molecule virtual screening.
[LINK]
http://arxiv.org/abs/2309.11687v1
[DATE]
2023-09-21 07:43:42+08:00
[CATEGORIES]
cs.LG
ZeroFlow: Fast, Zero Label, Scalable Scene Flow via Distillation
[AUTHORS]
Kyle Vedder, Neehar Peri, Nathaniel Chodosh, Ishan Khatri, Eric Eaton, Dinesh Jayaraman, Yang Liu, Deva Ramanan, James Hays
[ABSTRACT]
Scene flow estimation is the task of describing the 3D motion field between
temporally successive point clouds. State-of-the-art methods use strong priors
and test-time optimization techniques, but require on the order of tens of
seconds to process large-scale point clouds, making them unusable as computer
vision primitives for real-time applications such as open world object
detection. Feed forward methods are considerably faster, running on the order
of tens to hundreds of milliseconds for large-scale point clouds, but require
expensive human supervision. To address both limitations, we propose Scene Flow
via Distillation, a simple, scalable distillation framework that uses a
label-free optimization method to produce pseudo-labels to supervise a feed
forward model. Our instantiation of this framework, ZeroFlow, achieves
state-of-the-art performance on the Argoverse 2 Self-Supervised Scene Flow
Challenge while using zero human labels by simply training on large-scale,
diverse unlabeled data. At test-time, ZeroFlow is over 1000$\times$ faster than
label-free state-of-the-art optimization-based methods on large-scale point
clouds and over 1000$\times$ cheaper to train on unlabeled data compared to the
cost of human annotation of that data. To facilitate further research, we will
release our code, trained model weights, and high quality pseudo-labels for the
Argoverse 2 and Waymo Open datasets.
[COMMENTS]
9 pages, 8 pages of Supplemental. Project page with data releases is
at http://vedder.io/zeroflow.html
[LINK]
http://arxiv.org/abs/2305.10424v5
[DATE]
2023-09-21 07:31:11+08:00
[CATEGORIES]
cs.LG
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework
[AUTHORS]
Sina Baharlouei, Meisam Razaviyayn
[ABSTRACT]
While training fair machine learning models has been studied extensively in
recent years, most developed methods rely on the assumption that the training
and test data have similar distributions. In the presence of distribution
shifts, fair models may behave unfairly on test data. There have been some
developments for fair learning robust to distribution shifts to address this
shortcoming. However, most proposed solutions are based on the assumption of
having access to the causal graph describing the interaction of different
features. Moreover, existing algorithms require full access to data and cannot
be used when small batches are used (stochastic/batch implementation). This
paper proposes the first stochastic distributionally robust fairness framework
with convergence guarantees that do not require knowledge of the causal graph.
More specifically, we formulate the fair inference in the presence of the
distribution shift as a distributionally robust optimization problem under
$L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual
Information (ERMI) as the measure of fairness violation. We then discuss how
the proposed method can be implemented in a stochastic fashion. We have
evaluated the presented framework’s performance and efficiency through
extensive experiments on real datasets consisting of distribution shifts.
[COMMENTS]
22 pages, 3 figures
[LINK]
http://arxiv.org/abs/2309.11682v1
[DATE]
2023-09-21 07:25:28+08:00
[CATEGORIES]
cs.LG
Federated Learning with Neural Graphical Models
[AUTHORS]
Urszula Chajewska, Harsh Shrivastava
[ABSTRACT]
Federated Learning (FL) addresses the need to create models based on
proprietary data in such a way that multiple clients retain exclusive control
over their data, while all benefit from improved model accuracy due to pooled
resources. Recently proposed Neural Graphical Models (NGMs) are Probabilistic
Graphical models that utilize the expressive power of neural networks to learn
complex non-linear dependencies between the input features. They learn to
capture the underlying data distribution and have efficient algorithms for
inference and sampling. We develop a FL framework which maintains a global NGM
model that learns the averaged information from the local NGM models while
keeping the training data within the client’s environment. Our design, FedNGMs,
avoids the pitfalls and shortcomings of neuron matching frameworks like
Federated Matched Averaging that suffers from model parameter explosion. Our
global model size remains constant throughout the process. In the cases where
clients have local variables that are not part of the combined global
distribution, we propose a `Stitching’ algorithm, which personalizes the global
NGM models by merging the additional variables using the client’s data. FedNGM
is robust to data heterogeneity, large number of participants, and limited
communication bandwidth.
[LINK]
http://arxiv.org/abs/2309.11680v1
[DATE]
2023-09-21 07:24:22+08:00
[CATEGORIES]
cs.LG
Popularity Degradation Bias in Local Music Recommendation
[AUTHORS]
April Trainor, Douglas Turnbull
[ABSTRACT]
In this paper, we study the effect of popularity degradation bias in the
context of local music recommendations. Specifically, we examine how accurate
two top-performing recommendation algorithms, Weight Relevance Matrix
Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at
recommending artists as a function of artist popularity. We find that both
algorithms improve recommendation performance for more popular artists and, as
such, exhibit popularity degradation bias. While both algorithms produce a
similar level of performance for more popular artists, Mult-VAE shows better
relative performance for less popular artists. This suggests that this
algorithm should be preferred for local (long-tail) music artist
recommendation.
[COMMENTS]
Presented at MuRS Workshop, RecSys ‘23
[LINK]
http://arxiv.org/abs/2309.11671v1
[DATE]
2023-09-21 06:36:33+08:00
[CATEGORIES]
cs.LG
GLM Regression with Oblivious Corruptions
[AUTHORS]
Ilias Diakonikolas, Sushrut Karmalkar, Jongho Park, Christos Tzamos
[ABSTRACT]
We demonstrate the first algorithms for the problem of regression for
generalized linear models (GLMs) in the presence of additive oblivious noise.
We assume we have sample access to examples $(x, y)$ where $y$ is a noisy
measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of
the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious
noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$,
and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover
a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has}
arbitrarily small error when compared to the true values $g(w^* \cdot x)$,
rather than the noisy measurements $y$.
We present an algorithm that tackles \new{this} problem in its most general
distribution-independent setting, where the solution may not \new{even} be
identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the
solution if it is identifiable, and otherwise returns a small list of
candidates, one of which is close to the true solution. Furthermore, we
\new{provide} a necessary and sufficient condition for identifiability, which
holds in broad settings. \new{Specifically,} the problem is identifiable when
the quantile at which $\xi + \epsilon = 0$ is known, or when the family of
hypotheses does not contain candidates that are nearly equal to a translated
$g(w^* \cdot x) + A$ for some real number $A$, while also having large error
when compared to $g(w^* \cdot x)$.
This is the first \new{algorithmic} result for GLM regression \new{with
oblivious noise} which can handle more than half the samples being arbitrarily
corrupted. Prior work focused largely on the setting of linear regression, and
gave algorithms under restrictive assumptions.
[COMMENTS]
Published in COLT 2023
[LINK]
http://arxiv.org/abs/2309.11657v1
[DATE]
2023-09-21 05:41:59+08:00
[CATEGORIES]
cs.LG
$λ$-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces
[AUTHORS]
Claas A Voelcker, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-massoud Farahmand
[ABSTRACT]
The idea of decision-aware model learning, that models should be accurate
where it matters for decision-making, has gained prominence in model-based
reinforcement learning. While promising theoretical results have been
established, the empirical performance of algorithms leveraging a
decision-aware loss has been lacking, especially in continuous control
problems. In this paper, we present a study on the necessary components for
decision-aware reinforcement learning models and we showcase design choices
that enable well-performing algorithms. To this end, we provide a theoretical
and empirical investigation into prominent algorithmic ideas in the field. We
highlight that empirical design decisions established in the MuZero line of
works are vital to achieving good performance for related algorithms, and we
showcase differences in behavior between different instantiations of
value-aware algorithms in stochastic environments. Using these insights, we
propose the Latent Model-Based Decision-Aware Actor-Critic framework
($\lambda$-AC) for decision-aware model-based reinforcement learning in
continuous state-spaces and highlight important design choices in different
environments.
[LINK]
http://arxiv.org/abs/2306.17366v2
[DATE]
2023-09-21 05:34:22+08:00
[CATEGORIES]
cs.LG
Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks
[AUTHORS]
Baris Ata, J. Michael Harrison, Nian Si
[ABSTRACT]
Motivated by applications in queueing theory, we consider a stochastic
control problem whose state space is the $d$-dimensional positive orthant. The
controlled process $Z$ evolves as a reflected Brownian motion whose covariance
matrix is exogenously specified, as are its directions of reflection from the
orthant’s boundary surfaces. A system manager chooses a drift vector
$\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at
time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem
formulation, the objective is to minimize expected discounted cost over an
infinite planning horizon, after which we treat the corresponding ergodic
control problem. Extending earlier work by Han et al. (Proceedings of the
National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a
simulation-based computational method that relies heavily on deep neural
network technology. For test problems studied thus far, our method is accurate
to within a fraction of one percent, and is computationally feasible in
dimensions up to at least $d=30$.
[LINK]
http://arxiv.org/abs/2309.11651v1
[DATE]
2023-09-21 05:32:58+08:00
[CATEGORIES]
cs.LG
Orbital AI-based Autonomous Refuelling Solution
[AUTHORS]
Duarte Rondao, Lei He, Nabil Aouf
[ABSTRACT]
Cameras are rapidly becoming the choice for on-board sensors towards space
rendezvous due to their small form factor and inexpensive power, mass, and
volume costs. When it comes to docking, however, they typically serve a
secondary role, whereas the main work is done by active sensors such as lidar.
This paper documents the development of a proposed AI-based (artificial
intelligence) navigation algorithm intending to mature the use of on-board
visible wavelength cameras as a main sensor for docking and on-orbit servicing
(OOS), reducing the dependency on lidar and greatly reducing costs.
Specifically, the use of AI enables the expansion of the relative navigation
solution towards multiple classes of scenarios, e.g., in terms of targets or
illumination conditions, which would otherwise have to be crafted on a
case-by-case manner using classical image processing methods. Multiple
convolutional neural network (CNN) backbone architectures are benchmarked on
synthetically generated data of docking manoeuvres with the International Space
Station (ISS), achieving position and attitude estimates close to 1%
range-normalised and 1 deg, respectively. The integration of the solution with
a physical prototype of the refuelling mechanism is validated in laboratory
using a robotic arm to simulate a berthing procedure.
[COMMENTS]
13 pages
[LINK]
http://arxiv.org/abs/2309.11648v1
[DATE]
2023-09-21 05:25:52+08:00
[CATEGORIES]
cs.LG
Potential and limitations of random Fourier features for dequantizing quantum machine learning
[AUTHORS]
Ryan Sweke, Erik Recio, Sofiene Jerbi, Elies Gil-Fuster, Bryce Fuller, Jens Eisert, Johannes Jakob Meyer
[ABSTRACT]
Quantum machine learning is arguably one of the most explored applications of
near-term quantum devices. Much focus has been put on notions of variational
quantum machine learning where parameterized quantum circuits (PQCs) are used
as learning models. These PQC models have a rich structure which suggests that
they might be amenable to efficient dequantization via random Fourier features
(RFF). In this work, we establish necessary and sufficient conditions under
which RFF does indeed provide an efficient dequantization of variational
quantum machine learning for regression. We build on these insights to make
concrete suggestions for PQC architecture design, and to identify structures
which are necessary for a regression problem to admit a potential quantum
advantage via PQC based optimization.
[COMMENTS]
33 pages, 2 figures. Comments and feedback welcome
[LINK]
http://arxiv.org/abs/2309.11647v1
[DATE]
2023-09-21 05:23:52+08:00
[CATEGORIES]
cs.LG
Early diagnosis of autism spectrum disorder using machine learning approaches
[AUTHORS]
Rownak Ara Rasul, Promy Saha, Diponkor Bala, S M Rakib Ul Karim, Ibrahim Abdullah, Bishwajit Saha
[ABSTRACT]
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by
difficulties with social interaction, communication, and repetitive activities.
The severity of these difficulties varies, and those with this diagnosis face
unique challenges. While its primary origin lies in genetics, identifying and
addressing it early can contribute to the enhancement of the condition. In
recent years, machine learning-driven intelligent diagnosis has emerged as a
supplement to conventional clinical approaches, aiming to address the potential
drawbacks of time-consuming and costly traditional methods. In this work, we
utilize different machine learning algorithms to find the most significant
traits responsible for ASD and to automate the diagnostic process. We study six
classification models to see which model works best to identify ASD and also
study five popular clustering methods to get a meaningful insight of these ASD
datasets. To find the best classifier for these binary datasets, we evaluate
the models using accuracy, precision, recall, specificity, F1-score, AUC, kappa
and log loss metrics. Our evaluation demonstrates that five out of the six
selected models perform exceptionally, achieving a 100% accuracy rate on the
ASD datasets when hyperparameters are meticulously tuned for each model. As
almost all classification models are able to get 100% accuracy, we become
interested in observing the underlying insights of the datasets by implementing
some popular clustering algorithms on these datasets. We calculate Normalized
Mutual Information (NMI), Adjusted Rand Index (ARI) & Silhouette Coefficient
(SC) metrics to select the best clustering models. Our evaluation finds that
spectral clustering outperforms all other benchmarking clustering models in
terms of NMI & ARI metrics and it also demonstrates comparability to the
optimal SC achieved by k-means.
[COMMENTS]
14 pages, 2 figures, 12 tables
[LINK]
http://arxiv.org/abs/2309.11646v1
[DATE]
2023-09-21 05:23:37+08:00
[CATEGORIES]
cs.LG
A Survey on Transformers in Reinforcement Learning
[AUTHORS]
Wenzhe Li, Hao Luo, Zichuan Lin, Chongjie Zhang, Zongqing Lu, Deheng Ye
[ABSTRACT]
Transformer has been considered the dominating neural architecture in NLP and
CV, mostly under supervised settings. Recently, a similar surge of using
Transformers has appeared in the domain of reinforcement learning (RL), but it
is faced with unique design choices and challenges brought by the nature of RL.
However, the evolution of Transformers in RL has not yet been well unraveled.
In this paper, we seek to systematically review motivations and progress on
using Transformers in RL, provide a taxonomy on existing works, discuss each
sub-field, and summarize future prospects.
[COMMENTS]
Accepted by TMLR
[LINK]
http://arxiv.org/abs/2301.03044v3
[DATE]
2023-09-21 05:12:31+08:00
[CATEGORIES]
cs.LG
A survey on the semantics of sequential patterns with negation
[AUTHORS]
Thomas Guyet
[ABSTRACT]
A sequential pattern with negation, or negative sequential pattern, takes the
form of a sequential pattern for which the negation symbol may be used in front
of some of the pattern’s itemsets. Intuitively, such a pattern occurs in a
sequence if negated itemsets are absent in the sequence. Recent work has shown
that different semantics can be attributed to these pattern forms, and that
state-of-the-art algorithms do not extract the same sets of patterns. This
raises the important question of the interpretability of sequential pattern
with negation. In this study, our focus is on exploring how potential users
perceive negation in sequential patterns. Our aim is to determine whether
specific semantics are more “intuitive” than others and whether these align
with the semantics employed by one or more state-of-the-art algorithms. To
achieve this, we designed a questionnaire to reveal the semantics’ intuition of
each user. This article presents both the design of the questionnaire and an
in-depth analysis of the 124 responses obtained. The outcomes indicate that two
of the semantics are predominantly intuitive; however, neither of them aligns
with the semantics of the primary state-of-the-art algorithms. As a result, we
provide recommendations to account for this disparity in the conclusions drawn.
[LINK]
http://arxiv.org/abs/2309.11638v1
[DATE]
2023-09-21 05:03:18+08:00
[CATEGORIES]
cs.LG
Multiclass Learnability Does Not Imply Sample Compression
[AUTHORS]
Chirag Pabbaraju
[ABSTRACT]
A hypothesis class admits a sample compression scheme, if for every sample
labeled by a hypothesis from the class, it is possible to retain only a small
subsample, using which the labels on the entire sample can be inferred. The
size of the compression scheme is an upper bound on the size of the subsample
produced. Every learnable binary hypothesis class (which must necessarily have
finite VC dimension) admits a sample compression scheme of size only a finite
function of its VC dimension, independent of the sample size. For multiclass
hypothesis classes, the analog of VC dimension is the DS dimension. We show
that the analogous statement pertaining to sample compression is not true for
multiclass hypothesis classes: every learnable multiclass hypothesis class,
which must necessarily have finite DS dimension, does not admit a sample
compression scheme of size only a finite function of its DS dimension.
[LINK]
http://arxiv.org/abs/2308.06424v2
[DATE]
2023-09-21 04:51:51+08:00
[CATEGORIES]
cs.LG
Grassmann Manifold Flows for Stable Shape Generation
[AUTHORS]
Ryoma Yataka, Masashi Shiraishi, Kazuki Hirashima
[ABSTRACT]
Recently, studies on machine learning have focused on methods that use
symmetry implicit in a specific manifold as an inductive bias. Grassmann
manifolds provide the ability to handle fundamental shapes represented as shape
spaces, enabling stable shape analysis. In this paper, we present a novel
approach in which we establish the theoretical foundations for learning
distributions on the Grassmann manifold via continuous normalization flows,
with the explicit goal of generating stable shapes. Our approach facilitates
more robust generation by effectively eliminating the influence of extraneous
transformations, such as rotations and inversions, through learning and
generating within a Grassmann manifolds designed to accommodate the essential
shape information of the object. The experimental results indicated that the
proposed method can generate high-quality samples by capturing the data
structure. Furthermore, the proposed method significantly outperformed
state-of-the-art methods in terms of the log-likelihood or evidence lower
bound. The results obtained are expected to stimulate further research in this
field, leading to advances for stable shape generation and analysis.
[COMMENTS]
35 pages
[LINK]
http://arxiv.org/abs/2211.02900v2
[DATE]
2023-09-21 04:34:29+08:00
[CATEGORIES]
cs.LG
Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation
[AUTHORS]
Pavan Seshadri, Peter Knees
[ABSTRACT]
Music streaming services heavily rely on their recommendation engines to
continuously provide content to their consumers. Sequential recommendation
consequently has seen considerable attention in current literature, where state
of the art approaches focus on self-attentive models leveraging contextual
information such as long and short-term user history and item features;
however, most of these studies focus on long-form content domains (retail,
movie, etc.) rather than short-form, such as music. Additionally, many do not
explore incorporating negative session-level feedback during training. In this
study, we investigate the use of transformer-based self-attentive architectures
to learn implicit session-level information for sequential music
recommendation. We additionally propose a contrastive learning task to
incorporate negative feedback (e.g skipped tracks) to promote positive hits and
penalize negative hits. This task is formulated as a simple loss term that can
be incorporated into a variety of deep learning architectures for sequential
recommendation. Our experiments show that this results in consistent
performance gains over the baseline architectures ignoring negative user
feedback.
[COMMENTS]
Accepted to the 1st Workshop on Music Recommender Systems, co-located
with the 17th ACM Conference on Recommender Systems (MuRS @ RecSys 2023)
[LINK]
http://arxiv.org/abs/2309.11623v1
[DATE]
2023-09-21 04:21:13+08:00
[CATEGORIES]
cs.LG
Analysis and Comparison of Classification Metrics
[AUTHORS]
Luciana Ferrer
[ABSTRACT]
A variety of different performance metrics are commonly used in the machine
learning literature for the evaluation of classification systems. Some of the
most common ones for measuring quality of hard decisions are standard and
balanced accuracy, standard and balanced error rate, F-beta score, and Matthews
correlation coefficient (MCC). In this document, we review the definition of
these and other metrics and compare them with the expected cost (EC), a metric
introduced in every statistical learning course but rarely used in the machine
learning literature. We show that both the standard and balanced error rates
are special cases of the EC. Further, we show its relation with F-beta score
and MCC and argue that EC is superior to these traditional metrics for being
based on first principles from statistics, and for being more general,
interpretable, and adaptable to any application scenario. The metrics mentioned
above measure the quality of hard decisions. Yet, most modern classification
systems output continuous scores for the classes which we may want to evaluate
directly. Metrics for measuring the quality of system scores include the area
under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC
or Bayes risk, among others. The last three metrics are special cases of a
family of metrics given by the expected value of proper scoring rules (PSRs).
We review the theory behind these metrics, showing that they are a principled
way to measure the quality of the posterior probabilities produced by a system.
Finally, we show how to use these metrics to compute a system’s calibration
loss and compare this metric with the widely-used expected calibration error
(ECE), arguing that calibration loss based on PSRs is superior to the ECE for
being more interpretable, more general, and directly applicable to the
multi-class case, among other reasons.
[LINK]
http://arxiv.org/abs/2209.05355v4
[DATE]
2023-09-21 04:20:45+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks for the Offline Nanosatellite Task Scheduling Problem
[AUTHORS]
Bruno Machado Pacheco, Laio Oriel Seman, Cezar Antonio Rigo, Eduardo Camponogara, Eduardo Augusto Bezerra, Leandro dos Santos Coelho
[ABSTRACT]
This study investigates how to schedule nanosatellite tasks more efficiently
using Graph Neural Networks (GNNs). In the Offline Nanosatellite Task
Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks
to be carried out in orbit while taking into account Quality-of-Service (QoS)
considerations such as priority, minimum and maximum activation events,
execution time-frames, periods, and execution windows, as well as constraints
on the satellite’s power resources and the complexity of energy harvesting and
management. The ONTS problem has been approached using conventional
mathematical formulations and exact methods, but their applicability to
challenging cases of the problem is limited. This study examines the use of
GNNs in this context, which has been effectively applied to optimization
problems such as the traveling salesman, scheduling, and facility placement
problems. More specifically, we investigate whether GNNs can learn the complex
structure of the ONTS problem with respect to feasibility and optimality of
candidate solutions. Furthermore, we evaluate using GNN-based heuristic
solutions to provide better solutions (w.r.t. the objective value) to the ONTS
problem and reduce the optimization cost. Our experiments show that GNNs are
not only able to learn feasibility and optimality for instances of the ONTS
problem, but they can generalize to harder instances than those seen during
training. Furthermore, the GNN-based heuristics improved the expected objective
value of the best solution found under the time limit in 45%, and reduced the
expected time to find a feasible solution in 35%, when compared to the SCIP
(Solving Constraint Integer Programs) solver in its off-the-shelf configuration
[LINK]
http://arxiv.org/abs/2303.13773v2
[DATE]
2023-09-21 04:04:09+08:00
[CATEGORIES]
cs.LG
GrACE: Generation using Associated Code Edits
[AUTHORS]
Priyanshu Gupta, Avishree Khare, Yasharth Bajpai, Saikat Chakraborty, Sumit Gulwani, Aditya Kanade, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari
[ABSTRACT]
Developers expend a significant amount of time in editing code for a variety
of reasons such as bug fixing or adding new features. Designing effective
methods to predict code edits has been an active yet challenging area of
research due to the diversity of code edits and the difficulty of capturing the
developer intent. In this work, we address these challenges by endowing
pre-trained large language models (LLMs) of code with the knowledge of prior,
relevant edits. The generative capability of the LLMs helps address the
diversity in code changes and conditioning code generation on prior edits helps
capture the latent developer intent. We evaluate two well-known LLMs, Codex and
CodeT5, in zero-shot and fine-tuning settings respectively. In our experiments
with two datasets, the knowledge of prior edits boosts the performance of the
LLMs significantly and enables them to generate 29% and 54% more correctly
edited code in top-1 suggestions relative to the current state-of-the-art
symbolic and neural approaches, respectively.
[LINK]
http://arxiv.org/abs/2305.14129v3
[DATE]
2023-09-21 03:46:10+08:00
[CATEGORIES]
cs.LG
Latent Diffusion Models for Structural Component Design
[AUTHORS]
Ethan Herron, Jaydeep Rade, Anushrut Jignasu, Baskar Ganapathysubramanian, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy
[ABSTRACT]
Recent advances in generative modeling, namely Diffusion models, have
revolutionized generative modeling, enabling high-quality image generation
tailored to user needs. This paper proposes a framework for the generative
design of structural components. Specifically, we employ a Latent Diffusion
model to generate potential designs of a component that can satisfy a set of
problem-specific loading conditions. One of the distinct advantages our
approach offers over other generative approaches, such as generative
adversarial networks (GANs), is that it permits the editing of existing
designs. We train our model using a dataset of geometries obtained from
structural topology optimization utilizing the SIMP algorithm. Consequently,
our framework generates inherently near-optimal designs. Our work presents
quantitative results that support the structural performance of the generated
designs and the variability in potential candidate designs. Furthermore, we
provide evidence of the scalability of our framework by operating over voxel
domains with resolutions varying from $32^3$ to $128^3$. Our framework can be
used as a starting point for generating novel near-optimal designs similar to
topology-optimized designs.
[LINK]
http://arxiv.org/abs/2309.11601v1
[DATE]
2023-09-21 03:28:45+08:00
[CATEGORIES]
cs.LG
Hyena Neural Operator for Partial Differential Equations
[AUTHORS]
Saurabh Patil, Zijie Li, Amir Barati Farimani
[ABSTRACT]
Numerically solving partial differential equations typically requires fine
discretization to resolve necessary spatiotemporal scales, which can be
computationally expensive. Recent advances in deep learning have provided a new
approach to solving partial differential equations that involves the use of
neural operators. Neural operators are neural network architectures that learn
mappings between function spaces and have the capability to solve partial
differential equations based on data. This study utilizes a novel neural
operator called Hyena, which employs a long convolutional filter that is
parameterized by a multilayer perceptron. The Hyena operator is an operation
that enjoys sub-quadratic complexity and state space model to parameterize long
convolution that enjoys a global receptive field. This mechanism enhances the
model’s comprehension of the input’s context and enables data-dependent weight
for different partial differential equations instances. To measure how
effective the layers are in solving partial differential equations, we conduct
experiments on Diffusion-Reaction equation and Navier Stokes equation. Our
findings indicate Hyena Neural operator can serve as an efficient and accurate
model for learning partial differential equations solution operator. The data
and code used can be found at:
https://github.com/Saupatil07/Hyena-Neural-Operator
[LINK]
http://arxiv.org/abs/2306.16524v2
[DATE]
2023-09-21 03:13:20+08:00
[CATEGORIES]
cs.LG
CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches
[AUTHORS]
Jinmeng Rao, Song Gao, Sijia Zhu
[ABSTRACT]
The prevalence of ubiquitous location-aware devices and mobile Internet
enables us to collect massive individual-level trajectory dataset from users.
Such trajectory big data bring new opportunities to human mobility research but
also raise public concerns with regard to location privacy. In this work, we
present the Conditional Adversarial Trajectory Synthesis (CATS), a
deep-learning-based GeoAI methodological framework for privacy-preserving
trajectory data generation and publication. CATS applies K-anonymity to the
underlying spatiotemporal distributions of human movements, which provides a
distributional-level strong privacy guarantee. By leveraging conditional
adversarial training on K-anonymized human mobility matrices, trajectory global
context learning using the attention-based mechanism, and recurrent bipartite
graph matching of adjacent trajectory points, CATS is able to reconstruct
trajectory topology from conditionally sampled locations and generate
high-quality individual-level synthetic trajectory data, which can serve as
supplements or alternatives to raw data for privacy-preserving trajectory data
publication. The experiment results on over 90k GPS trajectories show that our
method has a better performance in privacy preservation, spatiotemporal
characteristic preservation, and downstream utility compared with baseline
methods, which brings new insights into privacy-preserving human mobility
research using generative AI techniques and explores data ethics issues in
GIScience.
[COMMENTS]
9 figures, 4 figures
[LINK]
http://arxiv.org/abs/2309.11587v1
[DATE]
2023-09-21 02:52:56+08:00
[CATEGORIES]
cs.LG
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
[AUTHORS]
Sanath Kumar Krishnamurthy, Ruohan Zhan, Susan Athey, Emma Brunskill
[ABSTRACT]
Simple regret minimization is a critical problem in learning optimal
treatment assignment policies across various domains, including healthcare and
e-commerce. However, it remains understudied in the contextual bandit setting.
We propose a new family of computationally efficient bandit algorithms for the
stochastic contextual bandit settings, with the flexibility to be adapted for
cumulative regret minimization (with near-optimal minimax guarantees) and
simple regret minimization (with SOTA guarantees). Furthermore, our algorithms
adapt to model misspecification and extend to the continuous arm settings.
These advantages come from constructing and relying on “conformal arm sets”
(CASs), which provide a set of arms at every context that encompass the
context-specific optimal arm with some probability across the context
distribution. Our positive results on simple and cumulative regret guarantees
are contrasted by a negative result, which shows that an algorithm can’t
achieve instance-dependent simple regret guarantees while simultaneously
achieving minimax optimal cumulative regret guarantees.
[LINK]
http://arxiv.org/abs/2307.02108v2
[DATE]
2023-09-21 02:48:41+08:00
[CATEGORIES]
cs.LG
Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints
[AUTHORS]
Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
[ABSTRACT]
This paper studies the problem of online performance optimization of
constrained closed-loop control systems, where both the objective and the
constraints are unknown black-box functions affected by exogenous time-varying
contextual disturbances. A primal-dual contextual Bayesian optimization
algorithm is proposed that achieves sublinear cumulative regret with respect to
the dynamic optimal solution under certain regularity conditions. Furthermore,
the algorithm achieves zero time-average constraint violation, ensuring that
the average value of the constraint function satisfies the desired constraint.
The method is applied to both sampled instances from Gaussian processes and a
continuous stirred tank reactor parameter tuning problem; simulation results
show that the method simultaneously provides close-to-optimal performance and
maintains constraint feasibility on average. This contrasts current
state-of-the-art methods, which either suffer from large cumulative regret or
severe constraint violations for the case studies presented.
[LINK]
http://arxiv.org/abs/2304.06104v4
[DATE]
2023-09-21 02:41:52+08:00
[CATEGORIES]
cs.LG
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
[AUTHORS]
Manuel Brack, Patrick Schramowski, Kristian Kersting
[ABSTRACT]
Text-conditioned image generation models have recently achieved astonishing
image quality and alignment results. Consequently, they are employed in a
fast-growing number of applications. Since they are highly data-driven, relying
on billion-sized datasets randomly scraped from the web, they also produce
unsafe content. As a contribution to the Adversarial Nibbler challenge, we
distill a large set of over 1,000 potential adversarial inputs from existing
safety benchmarks. Our analysis of the gathered prompts and corresponding
images demonstrates the fragility of input filters and provides further
insights into systematic safety issues in current generative image models.
[LINK]
http://arxiv.org/abs/2309.11575v1
[DATE]
2023-09-21 02:25:44+08:00
[CATEGORIES]
cs.LG
Multiplying poles to avoid unwanted points in root finding and optimization
[AUTHORS]
Tuyen Trung Truong
[ABSTRACT]
In root finding and optimization, there are many cases where there is a
closed set $A$ one does not the sequence constructed by one’s favourite method
will converge to A (here, we do not assume extra properties on $A$ such as
being convex or connected). For example, if one wants to find roots, and one
chooses initial points in the basin of attraction for 1 root $x^$ (a fact
which one may not know before hand), then one will always end up in that root.
In this case, one would like to have a mechanism to avoid this point $z^$ in
the next runs of one’s algorithm.
In this paper, we propose a new method aiming to achieve this: we divide the
cost function by an appropriate power of the distance function to $A$. This
idea is inspired by how one would try to find all roots of a function in 1
variable. We first explain the heuristic for this method in the case where the
minimum of the cost function is exactly 0, and then explain how to proceed if
the minimum is non-zero (allowing both positive and negative values). The
method is very suitable for iterative algorithms which have the descent
property. We also propose, based on this, an algorithm to escape the basin of
attraction of a component of positive dimension to reach another component.
Along the way, we compare with main existing relevant methods in the current
literature. We provide several examples to illustrate the usefulness of the new
approach.
[COMMENTS]
19 pages
[LINK]
http://arxiv.org/abs/2309.11475v1
[DATE]
2023-09-21 01:20:41+08:00
[CATEGORIES]
cs.LG
Toward Dynamic Stability Assessment of Power Grid Topologies using Graph Neural Networks
[AUTHORS]
Christian Nauck, Michael Lindner, Konstantin Schürholt, Frank Hellmann
[ABSTRACT]
To mitigate climate change, the share of renewable energies in power
production needs to be increased. Renewables introduce new challenges to power
grids regarding the dynamic stability due to decentralization, reduced inertia,
and volatility in production. Since dynamic stability simulations are
intractable and exceedingly expensive for large grids, graph neural networks
(GNNs) are a promising method to reduce the computational effort of analyzing
the dynamic stability of power grids. As a testbed for GNN models, we generate
new, large datasets of dynamic stability of synthetic power grids, and provide
them as an open-source resource to the research community. We find that GNNs
are surprisingly effective at predicting the highly non-linear targets from
topological information only. For the first time, performance that is suitable
for practical use cases is achieved. Furthermore, we demonstrate the ability of
these models to accurately identify particular vulnerable nodes in power grids,
so-called troublemakers. Last, we find that GNNs trained on small grids
generate accurate predictions on a large synthetic model of the Texan power
grid, which illustrates the potential for real-world applications.
[COMMENTS]
9 pages + appendix and references, 7 figures
[LINK]
http://arxiv.org/abs/2206.06369v4
[DATE]
2023-09-21 01:17:08+08:00
[CATEGORIES]
cs.LG
Model-free tracking control of complex dynamical trajectories with machine learning
[AUTHORS]
Zheng-Meng Zhai, Mohammadamin Moradi, Ling-Wei Kong, Bryan Glaz, Mulugeta Haile, Ying-Cheng Lai
[ABSTRACT]
Nonlinear tracking control enabling a dynamical system to track a desired
trajectory is fundamental to robotics, serving a wide range of civil and
defense applications. In control engineering, designing tracking control
requires complete knowledge of the system model and equations. We develop a
model-free, machine-learning framework to control a two-arm robotic manipulator
using only partially observed states, where the controller is realized by
reservoir computing. Stochastic input is exploited for training, which consists
of the observed partial state vector as the first and its immediate future as
the second component so that the neural machine regards the latter as the
future state of the former. In the testing (deployment) phase, the
immediate-future component is replaced by the desired observational vector from
the reference trajectory. We demonstrate the effectiveness of the control
framework using a variety of periodic and chaotic signals, and establish its
robustness against measurement noise, disturbances, and uncertainties.
[COMMENTS]
16 pages, 8 figures
[LINK]
http://arxiv.org/abs/2309.11470v1
[DATE]
2023-09-21 01:10:10+08:00
[CATEGORIES]
cs.LG
Graph Fuzzy System: Concepts, Models and Algorithms
[AUTHORS]
Fuping Hu, Zhaohong Deng, Zhenping Xie, Kup-Sze Choi, Shitong Wang
[ABSTRACT]
Fuzzy systems (FSs) have enjoyed wide applications in various fields,
including pattern recognition, intelligent control, data mining and
bioinformatics, which is attributed to the strong interpretation and learning
ability. In traditional application scenarios, FSs are mainly applied to model
Euclidean space data and cannot be used to handle graph data of non-Euclidean
structure in nature, such as social networks and traffic route maps. Therefore,
development of FS modeling method that is suitable for graph data and can
retain the advantages of traditional FSs is an important research. To meet this
challenge, a new type of FS for graph data modeling called Graph Fuzzy System
(GFS) is proposed in this paper, where the concepts, modeling framework and
construction algorithms are systematically developed. First, GFS related
concepts, including graph fuzzy rule base, graph fuzzy sets and graph
consequent processing unit (GCPU), are defined. A GFS modeling framework is
then constructed and the antecedents and consequents of the GFS are presented
and analyzed. Finally, a learning framework of GFS is proposed, in which a
kernel K-prototype graph clustering (K2PGC) is proposed to develop the
construction algorithm for the GFS antecedent generation, and then based on
graph neural network (GNNs), consequent parameters learning algorithm is
proposed for GFS. Specifically, three different versions of the GFS
implementation algorithm are developed for comprehensive evaluations with
experiments on various benchmark graph classification datasets. The results
demonstrate that the proposed GFS inherits the advantages of both existing
mainstream GNNs methods and conventional FSs methods while achieving better
performance than the counterparts.
[COMMENTS]
This paper has been submitted to a journal
[LINK]
http://arxiv.org/abs/2210.16730v2
[DATE]
2023-09-21 01:02:38+08:00
[CATEGORIES]
cs.LG
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
[AUTHORS]
Mohamad Fakih, Rouwaida Kanj, Fadi Kurdahi, Mohammed E. Fouda
[ABSTRACT]
Automatic Speech Recognition systems have been shown to be vulnerable to
adversarial attacks that manipulate the command executed on the device. Recent
research has focused on exploring methods to create such attacks, however, some
issues relating to Over-The-Air (OTA) attacks have not been properly addressed.
In our work, we examine the needed properties of robust attacks compatible with
the OTA model, and we design a method of generating attacks with arbitrary such
desired properties, namely the invariance to synchronization, and the
robustness to filtering: this allows a Denial-of-Service (DoS) attack against
ASR systems. We achieve these characteristics by constructing attacks in a
modified frequency domain through an inverse Fourier transform. We evaluate our
method on standard keyword classification tasks and analyze it in OTA, and we
analyze the properties of the cross-domain attacks to explain the efficiency of
the approach.
[COMMENTS]
10 pages, 11 Figures
[LINK]
http://arxiv.org/abs/2309.11462v1
[DATE]
2023-09-21 00:59:22+08:00
[CATEGORIES]
cs.LG
Digital twins of nonlinear dynamical systems: A perspective
[AUTHORS]
Ying-Cheng Lai
[ABSTRACT]
Digital twins have attracted a great deal of recent attention from a wide
range of fields. A basic requirement for digital twins of nonlinear dynamical
systems is the ability to generate the system evolution and predict potentially
catastrophic emergent behaviors so as to providing early warnings. The digital
twin can then be used for system “health” monitoring in real time and for
predictive problem solving. In particular, if the digital twin forecasts a
possible system collapse in the future due to parameter drifting as caused by
environmental changes or perturbations, an optimal control strategy can be
devised and executed as early intervention to prevent the collapse. Two
approaches exist for constructing digital twins of nonlinear dynamical systems:
sparse optimization and machine learning. The basics of these two approaches
are described and their advantages and caveats are discussed.
[COMMENTS]
12 pages, 3 figures
[LINK]
http://arxiv.org/abs/2309.11461v1
[DATE]
2023-09-21 00:57:11+08:00
[CATEGORIES]
cs.LG
Generative Agent-Based Modeling: Unveiling Social System Dynamics through Coupling Mechanistic Models with Generative Artificial Intelligence
[AUTHORS]
Navid Ghaffarzadegan, Aritra Majumdar, Ross Williams, Niyousha Hosseinichimeh
[ABSTRACT]
We discuss the emerging new opportunity for building feedback-rich
computational models of social systems using generative artificial
intelligence. Referred to as Generative Agent-Based Models (GABMs), such
individual-level models utilize large language models such as ChatGPT to
represent human decision-making in social settings. We provide a GABM case in
which human behavior can be incorporated in simulation models by coupling a
mechanistic model of human interactions with a pre-trained large language
model. This is achieved by introducing a simple GABM of social norm diffusion
in an organization. For educational purposes, the model is intentionally kept
simple. We examine a wide range of scenarios and the sensitivity of the results
to several changes in the prompt. We hope the article and the model serve as a
guide for building useful diffusion models that include realistic human
reasoning and decision-making.
[LINK]
http://arxiv.org/abs/2309.11456v1
[DATE]
2023-09-21 00:43:05+08:00
[CATEGORIES]
cs.LG
Multi-Step Model Predictive Safety Filters: Reducing Chattering by Increasing the Prediction Horizon
[AUTHORS]
Federico Pizarro Bejarano, Lukas Brunke, Angela P. Schoellig
[ABSTRACT]
Learning-based controllers have demonstrated superior performance compared to
classical controllers in various tasks. However, providing safety guarantees is
not trivial. Safety, the satisfaction of state and input constraints, can be
guaranteed by augmenting the learned control policy with a safety filter. Model
predictive safety filters (MPSFs) are a common safety filtering approach based
on model predictive control (MPC). MPSFs seek to guarantee safety while
minimizing the difference between the proposed and applied inputs in the
immediate next time step. This limited foresight can lead to jerky motions and
undesired oscillations close to constraint boundaries, known as chattering. In
this paper, we reduce chattering by considering input corrections over a longer
horizon. Under the assumption of bounded model uncertainties, we prove
recursive feasibility using techniques from robust MPC. We verified the
proposed approach in both extensive simulation and quadrotor experiments. In
experiments with a Crazyflie 2.0 drone, we show that, in addition to preserving
the desired safety guarantees, the proposed MPSF reduces chattering by more
than a factor of 4 compared to previous MPSF formulations.
[COMMENTS]
8 pages, 9 figures. Accepted to IEEE CDC 2023. Code is publicly
available at
https://github.com/Federico-PizarroBejarano/safe-control-gym/tree/smooth_mpsc_paper
[LINK]
http://arxiv.org/abs/2309.11453v1
[DATE]
2023-09-21 00:35:29+08:00
[CATEGORIES]
cs.LG
Distribution and volume based scoring for Isolation Forests
[AUTHORS]
Hichem Dhouib, Alissa Wilms, Paul Boes
[ABSTRACT]
We make two contributions to the Isolation Forest method for anomaly and
outlier detection. The first contribution is an information-theoretically
motivated generalisation of the score function that is used to aggregate the
scores across random tree estimators. This generalisation allows one to take
into account not just the ensemble average across trees but instead the whole
distribution. The second contribution is an alternative scoring function at the
level of the individual tree estimator, in which we replace the depth-based
scoring of the Isolation Forest with one based on hyper-volumes associated to
an isolation tree’s leaf nodes.
We motivate the use of both of these methods on generated data and also
evaluate them on 34 datasets from the recent and exhaustive ``ADBench’’
benchmark, finding significant improvement over the standard isolation forest
for both variants on some datasets and improvement on average across all
datasets for one of the two variants. The code to reproduce our results is made
available as part of the submission.
[COMMENTS]
7 pages
[LINK]
http://arxiv.org/abs/2309.11450v1
[DATE]
2023-09-21 00:27:10+08:00
[CATEGORIES]
cs.LG
Signature Activation: A Sparse Signal View for Holistic Saliency
[AUTHORS]
Jose Roberto Tello Ayala, Akl C. Fahed, Weiwei Pan, Eugene V. Pomerantsev, Patrick T. Ellinor, Anthony Philippakis, Finale Doshi-Velez
[LINK]
http://arxiv.org/abs/2309.11443v1
[DATE]
2023-09-21 00:17:26+08:00
[CATEGORIES]
cs.LG
Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing
[AUTHORS]
Sewoong Lee, JinKyou Choi, Min Su Kim
[ABSTRACT]
This paper introduces TRACE-GPT, which stands for Time-seRies
Anomaly-detection with Convolutional Embedding and Generative Pre-trained
Transformers. TRACE-GPT is designed to pre-train univariate time-series sensor
data and detect faults on unlabeled datasets in semiconductor manufacturing. In
semiconductor industry, classifying abnormal time-series sensor data from
normal data is important because it is directly related to wafer defect.
However, small, unlabeled, and even mixed training data without enough
anomalies make classification tasks difficult. In this research, we capture
features of time-series data with temporal convolutional embedding and
Generative Pre-trained Transformer (GPT) to classify abnormal sequences from
normal sequences using cross entropy loss. We prove that our model shows better
performance than previous unsupervised models with both an open dataset, the
University of California Riverside (UCR) time-series classification archive,
and the process log of our Chemical Vapor Deposition (CVD) equipment. Our model
has the highest F1 score at Equal Error Rate (EER) across all datasets and is
only 0.026 below the supervised state-of-the-art baseline on the open dataset.
[LINK]
http://arxiv.org/abs/2309.11427v1
[DATE]
2023-09-21 00:01:45+08:00
[CATEGORIES]
cs.LG
Kosmos-2.5: A Multimodal Literate Model
[AUTHORS]
Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
[ABSTRACT]
We present Kosmos-2.5, a multimodal literate model for machine reading of
text-intensive images. Pre-trained on large-scale text-intensive images,
Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1)
generating spatially-aware text blocks, where each block of text is assigned
its spatial coordinates within the image, and (2) producing structured text
output that captures styles and structures into the markdown format. This
unified multimodal literate capability is achieved through a shared Transformer
architecture, task-specific prompts, and flexible text representations. We
evaluate Kosmos-2.5 on end-to-end document-level text recognition and
image-to-markdown text generation. Furthermore, the model can be readily
adapted for any text-intensive image understanding task with different prompts
through supervised fine-tuning, making it a general-purpose tool for real-world
applications involving text-rich images. This work also paves the way for the
future scaling of multimodal large language models.
[LINK]
http://arxiv.org/abs/2309.11419v1
[DATE]
2023-09-20 23:50:08+08:00
[CATEGORIES]
cs.CL
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
[AUTHORS]
Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie
[ABSTRACT]
Emotional intelligence significantly impacts our daily behaviors and
interactions. Although Large Language Models (LLMs) are increasingly viewed as
a stride toward artificial general intelligence, exhibiting impressive
performance in numerous tasks, it is still uncertain if LLMs can genuinely
grasp psychological emotional stimuli. Understanding and responding to
emotional cues gives humans a distinct advantage in problem-solving. In this
paper, we take the first step towards exploring the ability of LLMs to
understand emotional stimuli. To this end, we first conduct automatic
experiments on 45 tasks using various LLMs, including Flan-T5-Large, Vicuna,
Llama 2, BLOOM, ChatGPT, and GPT-4. Our tasks span deterministic and generative
applications that represent comprehensive evaluation scenarios. Our automatic
experiments show that LLMs have a grasp of emotional intelligence, and their
performance can be improved with emotional prompts (which we call
“EmotionPrompt” that combines the original prompt with emotional stimuli),
e.g., 8.00% relative performance improvement in Instruction Induction and 115%
in BIG-Bench. In addition to those deterministic tasks that can be
automatically evaluated using existing metrics, we conducted a human study with
106 participants to assess the quality of generative tasks using both vanilla
and emotional prompts. Our human study results demonstrate that EmotionPrompt
significantly boosts the performance of generative tasks (10.9% average
improvement in terms of performance, truthfulness, and responsibility metrics).
We provide an in-depth discussion regarding why EmotionPrompt works for LLMs
and the factors that may influence its performance. We posit that EmotionPrompt
heralds a novel avenue for exploring interdisciplinary knowledge for human-LLMs
interaction.
[COMMENTS]
Technical report; short version (v1) was accepted by LLM@IJCAI’23; 35
pages; more work: https://llm-enhance.github.io/
[LINK]
http://arxiv.org/abs/2307.11760v4
[DATE]
2023-09-20 23:46:15+08:00
[CATEGORIES]
cs.CL
Long-Form End-to-End Speech Translation via Latent Alignment Segmentation
[AUTHORS]
Peter Polák, Ondřej Bojar
[ABSTRACT]
Current simultaneous speech translation models can process audio only up to a
few seconds long. Contemporary datasets provide an oracle segmentation into
sentences based on human-annotated transcripts and translations. However, the
segmentation into sentences is not available in the real world. Current speech
segmentation approaches either offer poor segmentation quality or have to trade
latency for quality. In this paper, we propose a novel segmentation approach
for a low-latency end-to-end speech translation. We leverage the existing
speech translation encoder-decoder architecture with ST CTC and show that it
can perform the segmentation task without supervision or additional parameters.
To the best of our knowledge, our method is the first that allows an actual
end-to-end simultaneous speech translation, as the same model is used for
translation and segmentation at the same time. On a diverse set of language
pairs and in- and out-of-domain data, we show that the proposed approach
achieves state-of-the-art quality at no additional computational cost.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2309.11384v1
[DATE]
2023-09-20 23:10:12+08:00
[CATEGORIES]
cs.CL
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
[AUTHORS]
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar
[ABSTRACT]
Blockwise self-attentional encoder models have recently emerged as one
promising end-to-end approach to simultaneous speech translation. These models
employ a blockwise beam search with hypothesis reliability scoring to determine
when to wait for more input speech before translating further. However, this
method maintains multiple hypotheses until the entire speech input is consumed
– this scheme cannot directly show a single \textit{incremental} translation
to users. Further, this method lacks mechanisms for \textit{controlling} the
quality vs. latency tradeoff. We propose a modified incremental blockwise beam
search incorporating local agreement or hold-$n$ policies for quality-latency
control. We apply our framework to models trained for online or offline
translation and demonstrate that both types can be effectively used in online
mode.
Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing
latency or 0.8-1.4 s latency improvement without changing quality.
[COMMENTS]
Accepted at INTERSPEECH 2023
[LINK]
http://arxiv.org/abs/2309.11379v1
[DATE]
2023-09-20 22:59:06+08:00
[CATEGORIES]
cs.CL
MT4CrossOIE: Multi-stage Tuning for Cross-lingual Open Information Extraction
[AUTHORS]
Tongliang Li, Zixiang Wang, Linzheng Chai, Jian Yang, Jiaqi Bai, Yuwei Yin, Jiaheng Liu, Hongcheng Guo, Liqun Yang, Hebboul Zine el-abidine, Zhoujun Li
[ABSTRACT]
Cross-lingual open information extraction aims to extract structured
information from raw text across multiple languages. Previous work uses a
shared cross-lingual pre-trained model to handle the different languages but
underuses the potential of the language-specific representation. In this paper,
we propose an effective multi-stage tuning framework called MT4CrossIE,
designed for enhancing cross-lingual open information extraction by injecting
language-specific knowledge into the shared model. Specifically, the
cross-lingual pre-trained model is first tuned in a shared semantic space
(e.g., embedding matrix) in the fixed encoder and then other components are
optimized in the second stage. After enough training, we freeze the pre-trained
model and tune the multiple extra low-rank language-specific modules using
mixture-of-LoRAs for model-based cross-lingual transfer. In addition, we
leverage two-stage prompting to encourage the large language model (LLM) to
annotate the multi-lingual raw data for data-based cross-lingual transfer. The
model is trained with multi-lingual objectives on our proposed dataset
OpenIE4++ by combing the model-based and data-based transfer techniques.
Experimental results on various benchmarks emphasize the importance of
aggregating multiple plug-in-and-play language-specific modules and demonstrate
the effectiveness of MT4CrossIE in cross-lingual
OIE\footnote{\url{https://github.com/CSJianYang/Multilingual-Multimodal-NLP}}.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2308.06552v2
[DATE]
2023-09-20 22:37:38+08:00
[CATEGORIES]
cs.CL
GECTurk: Grammatical Error Correction and Detection Dataset for Turkish
[AUTHORS]
Atakan Kara, Farrin Marouf Sofian, Andrew Bond, Gözde Gül Şahin
[ABSTRACT]
Grammatical Error Detection and Correction (GEC) tools have proven useful for
native speakers and second language learners. Developing such tools requires a
large amount of parallel, annotated data, which is unavailable for most
languages. Synthetic data generation is a common practice to overcome the
scarcity of such data. However, it is not straightforward for morphologically
rich languages like Turkish due to complex writing rules that require
phonological, morphological, and syntactic information. In this work, we
present a flexible and extensible synthetic data generation pipeline for
Turkish covering more than 20 expert-curated grammar and spelling rules
(a.k.a., writing rules) implemented through complex transformation functions.
Using this pipeline, we derive 130,000 high-quality parallel sentences from
professionally edited articles. Additionally, we create a more realistic test
set by manually annotating a set of movie reviews. We implement three baselines
formulating the task as i) neural machine translation, ii) sequence tagging,
and iii) prefix tuning with a pretrained decoder-only model, achieving strong
results. Furthermore, we perform exhaustive experiments on out-of-domain
datasets to gain insights on the transferability and robustness of the proposed
approaches. Our results suggest that our corpus, GECTurk, is high-quality and
allows knowledge transfer for the out-of-domain setting. To encourage further
research on Turkish GEC, we release our datasets, baseline models, and the
synthetic data generation pipeline at https://github.com/GGLAB-KU/gecturk.
[COMMENTS]
Accepted at Findings of IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.11346v1
[DATE]
2023-09-20 22:25:44+08:00
[CATEGORIES]
cs.CL
cs.LG
Improving Article Classification with Edge-Heterogeneous Graph Neural Networks
[AUTHORS]
Khang Ly, Yury Kashnitsky, Savvas Chamezopoulos, Valeria Krzhizhanovskaya
[ABSTRACT]
Classifying research output into context-specific label taxonomies is a
challenging and relevant downstream task, given the volume of existing and
newly published articles. We propose a method to enhance the performance of
article classification by enriching simple Graph Neural Networks (GNN)
pipelines with edge-heterogeneous graph representations. SciBERT is used for
node feature generation to capture higher-order semantics within the articles’
textual metadata. Fully supervised transductive node classification experiments
are conducted on the Open Graph Benchmark (OGB) ogbn-arxiv dataset and the
PubMed diabetes dataset, augmented with additional metadata from Microsoft
Academic Graph (MAG) and PubMed Central, respectively. The results demonstrate
that edge-heterogeneous graphs consistently improve the performance of all GNN
models compared to the edge-homogeneous graphs. The transformed data enable
simple and shallow GNN pipelines to achieve results on par with more complex
architectures. On ogbn-arxiv, we achieve a top-15 result in the OGB competition
with a 2-layer GCN (accuracy 74.61%), being the highest-scoring solution with
sub-1 million parameters. On PubMed, we closely trail SOTA GNN architectures
using a 2-layer GraphSAGE by including additional co-authorship edges in the
graph (accuracy 89.88%). The implementation is available at:
$\href{https://github.com/lyvykhang/edgehetero-nodeproppred}{\text{https://github.com/lyvykhang/edgehetero-nodeproppred}}$.
[LINK]
http://arxiv.org/abs/2309.11341v1
[DATE]
2023-09-20 22:18:04+08:00
[CATEGORIES]
cs.LG
cs.CL
TRAVID: An End-to-End Video Translation Framework
[AUTHORS]
Prottay Kumar Adhikary, Bandaru Sugandhi, Subhojit Ghimire, Santanu Pal, Partha Pakray
[ABSTRACT]
In today’s globalized world, effective communication with people from diverse
linguistic backgrounds has become increasingly crucial. While traditional
methods of language translation, such as written text or voice-only
translations, can accomplish the task, they often fail to capture the complete
context and nuanced information conveyed through nonverbal cues like facial
expressions and lip movements. In this paper, we present an end-to-end video
translation system that not only translates spoken language but also
synchronizes the translated speech with the lip movements of the speaker. Our
system focuses on translating educational lectures in various Indian languages,
and it is designed to be effective even in low-resource system settings. By
incorporating lip movements that align with the target language and matching
them with the speaker’s voice using voice cloning techniques, our application
offers an enhanced experience for students and users. This additional feature
creates a more immersive and realistic learning environment, ultimately making
the learning process more effective and engaging.
[LINK]
http://arxiv.org/abs/2309.11338v1
[DATE]
2023-09-20 22:13:05+08:00
[CATEGORIES]
cs.CL
Rating Prediction in Conversational Task Assistants with Behavioral and Conversational-Flow Features
[AUTHORS]
Rafael Ferreira, David Semedo, João Magalhães
[ABSTRACT]
Predicting the success of Conversational Task Assistants (CTA) can be
critical to understand user behavior and act accordingly. In this paper, we
propose TB-Rater, a Transformer model which combines conversational-flow
features with user behavior features for predicting user ratings in a CTA
scenario. In particular, we use real human-agent conversations and ratings
collected in the Alexa TaskBot challenge, a novel multimodal and multi-turn
conversational context. Our results show the advantages of modeling both the
conversational-flow and behavioral aspects of the conversation in a single
model for offline rating prediction. Additionally, an analysis of the
CTA-specific behavioral features brings insights into this setting and can be
used to bootstrap future systems.
[LINK]
http://arxiv.org/abs/2309.11307v1
[DATE]
2023-09-20 21:34:03+08:00
[CATEGORIES]
cs.CL
CPLLM: Clinical Prediction with Large Language Models
[AUTHORS]
Ofir Ben Shoham, Nadav Rappoport
[ABSTRACT]
We present Clinical Prediction with Large Language Models (CPLLM), a method
that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical
disease prediction. We utilized quantization and fine-tuned the LLM using
prompts, with the task of predicting whether patients will be diagnosed with a
target disease during their next visit or in the subsequent diagnosis,
leveraging their historical diagnosis records. We compared our results versus
various baselines, including Logistic Regression, RETAIN, and Med-BERT, which
is the current state-of-the-art model for disease prediction using structured
EHR data. Our experiments have shown that CPLLM surpasses all the tested models
in terms of both PR-AUC and ROC-AUC metrics, displaying noteworthy enhancements
compared to the baseline models.
[LINK]
http://arxiv.org/abs/2309.11295v1
[DATE]
2023-09-20 21:24:12+08:00
[CATEGORIES]
cs.CL
cs.LG
Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains
[AUTHORS]
Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso
[ABSTRACT]
This paper presents the overview of the AuTexTification shared task as part
of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the
framework of the SEPLN 2023 conference. AuTexTification consists of two
subtasks: for Subtask 1, participants had to determine whether a text is
human-authored or has been generated by a large language model. For Subtask 2,
participants had to attribute a machine-generated text to one of six different
text generation models. Our AuTexTification 2023 dataset contains more than
160.000 texts across two languages (English and Spanish) and five domains
(tweets, reviews, news, legal, and how-to articles). A total of 114 teams
signed up to participate, of which 36 sent 175 runs, and 20 of them sent their
working notes. In this overview, we present the AuTexTification dataset and
task, the submitted participating systems, and the results.
[COMMENTS]
Accepted at SEPLN 2023
[LINK]
http://arxiv.org/abs/2309.11285v1
[DATE]
2023-09-20 21:10:06+08:00
[CATEGORIES]
cs.CL
cs.LG
Grounded Complex Task Segmentation for Conversational Assistants
[AUTHORS]
Rafael Ferreira, David Semedo, João Magalhães
[ABSTRACT]
Following complex instructions in conversational assistants can be quite
daunting due to the shorter attention and memory spans when compared to reading
the same instructions. Hence, when conversational assistants walk users through
the steps of complex tasks, there is a need to structure the task into
manageable pieces of information of the right length and complexity. In this
paper, we tackle the recipes domain and convert reading structured instructions
into conversational structured ones. We annotated the structure of instructions
according to a conversational scenario, which provided insights into what is
expected in this setting. To computationally model the conversational step’s
characteristics, we tested various Transformer-based architectures, showing
that a token-based approach delivers the best results. A further user study
showed that users tend to favor steps of manageable complexity and length, and
that the proposed methodology can improve the original web-based instructional
text. Specifically, 86% of the evaluated tasks were improved from a
conversational suitability point of view.
[LINK]
http://arxiv.org/abs/2309.11271v1
[DATE]
2023-09-20 20:55:46+08:00
[CATEGORIES]
cs.CL
Sequence-to-Sequence Spanish Pre-trained Language Models
[AUTHORS]
Vladimir Araujo, Maria Mihaela Trusca, Rodrigo Tufiño, Marie-Francine Moens
[ABSTRACT]
In recent years, substantial advancements in pre-trained language models have
paved the way for the development of numerous non-English language versions,
with a particular focus on encoder-only and decoder-only architectures. While
Spanish language models encompassing BERT, RoBERTa, and GPT have exhibited
prowess in natural language understanding and generation, there remains a
scarcity of encoder-decoder models designed for sequence-to-sequence tasks
involving input-output pairs. This paper breaks new ground by introducing the
implementation and evaluation of renowned encoder-decoder architectures,
exclusively pre-trained on Spanish corpora. Specifically, we present Spanish
versions of BART, T5, and BERT2BERT-style models and subject them to a
comprehensive assessment across a diverse range of sequence-to-sequence tasks,
spanning summarization, rephrasing, and generative question answering. Our
findings underscore the competitive performance of all models, with BART and T5
emerging as top performers across all evaluated tasks. As an additional
contribution, we have made all models publicly available to the research
community, fostering future exploration and development in Spanish language
processing.
[LINK]
http://arxiv.org/abs/2309.11259v1
[DATE]
2023-09-20 20:35:19+08:00
[CATEGORIES]
cs.CL
cs.LG
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
[AUTHORS]
Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu
[ABSTRACT]
Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement
learning fine-tuning (RLFT) to align these models with human goals. However,
SFT methods treat all training data with mixed quality equally, while RLFT
methods require high-quality pairwise or ranking-based preference data. In this
study, we present a novel framework, named OpenChat, to advance open-source
language models with mixed-quality data. Specifically, we consider the general
SFT training data, consisting of a small amount of expert data mixed with a
large proportion of sub-optimal data, without any preference labels. We propose
the C(onditioned)-RLFT, which regards different data sources as coarse-grained
reward labels and learns a class-conditioned policy to leverage complementary
data quality information. Interestingly, the optimal policy in C-RLFT can be
easily solved through single-stage, RL-free supervised learning, which is
lightweight and avoids costly human preference labeling. Through extensive
experiments on three standard benchmarks, our openchat-13b fine-tuned with
C-RLFT achieves the highest average performance among all 13b open-source
language models. Moreover, we use AGIEval to validate the model generalization
performance, in which only openchat-13b surpasses the base model. Finally, we
conduct a series of analyses to shed light on the effectiveness and robustness
of OpenChat. Our code, data, and models are publicly available at
https://github.com/imoneoi/openchat.
[LINK]
http://arxiv.org/abs/2309.11235v1
[DATE]
2023-09-20 19:54:40+08:00
[CATEGORIES]
cs.CL
Speak While You Think: Streaming Speech Synthesis During Text Generation
[AUTHORS]
Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory
[ABSTRACT]
Large Language Models (LLMs) demonstrate impressive capabilities, yet
interaction with these models is mostly facilitated through text. Using
Text-To-Speech to synthesize LLM outputs typically results in notable latency,
which is impractical for fluent voice conversations. We propose LLM2Speech, an
architecture to synthesize speech while text is being generated by an LLM which
yields significant latency reduction. LLM2Speech mimics the predictions of a
non-streaming teacher model while limiting the exposure to future context in
order to enable streaming. It exploits the hidden embeddings of the LLM, a
by-product of the text generation that contains informative semantic context.
Experimental results show that LLM2Speech maintains the teacher’s quality while
reducing the latency to enable natural conversations.
[COMMENTS]
Under review for ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.11210v1
[DATE]
2023-09-20 19:00:15+08:00
[CATEGORIES]
cs.CL
MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation
[AUTHORS]
Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, Zihao Wang, Lingyun Sun
[ABSTRACT]
Pre-trained language models have achieved impressive results in various music
understanding and generation tasks. However, existing pre-training methods for
symbolic melody generation struggle to capture multi-scale, multi-dimensional
structural information in note sequences, due to the domain knowledge
discrepancy between text and music. Moreover, the lack of available large-scale
symbolic melody datasets limits the pre-training improvement. In this paper, we
propose MelodyGLM, a multi-task pre-training framework for generating melodies
with long-term structure. We design the melodic n-gram and long span sampling
strategies to create local and global blank infilling tasks for modeling the
local and global structures in melodies. Specifically, we incorporate pitch
n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram
blank infilling tasks for modeling the multi-dimensional structures in
melodies. To this end, we have constructed a large-scale symbolic melody
dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet
is utilized for large-scale pre-training and domain-specific n-gram lexicon
construction. Both subjective and objective evaluations demonstrate that
MelodyGLM surpasses the standard and previous pre-training methods. In
particular, subjective evaluations show that, on the melody continuation task,
MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in
consistency, rhythmicity, structure, and overall quality, respectively.
Notably, MelodyGLM nearly matches the quality of human-composed melodies on the
melody inpainting task.
[LINK]
http://arxiv.org/abs/2309.10738v2
[DATE]
2023-09-20 18:56:07+08:00
[CATEGORIES]
cs.CL
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
[AUTHORS]
Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag
[ABSTRACT]
The Languini Kitchen serves as both a research collective and codebase
designed to empower researchers with limited computational resources to
contribute meaningfully to the field of language modelling. We introduce an
experimental protocol that enables model comparisons based on equivalent
compute, measured in accelerator hours. The number of tokens on which a model
is trained is defined by the model’s throughput and the chosen compute class.
Notably, this approach avoids constraints on critical hyperparameters which
affect total parameters or floating-point operations. For evaluation, we
pre-process an existing large, diverse, and high-quality dataset of books that
surpasses existing academic benchmarks in quality, diversity, and document
length. On it, we compare methods based on their empirical scaling trends which
are estimated through experiments at various levels of compute. This work also
provides two baseline models: a feed-forward model derived from the GPT-2
architecture and a recurrent model in the form of a novel LSTM with ten-fold
throughput. While the GPT baseline achieves better perplexity throughout all
our levels of compute, our LSTM baseline exhibits a predictable and more
favourable scaling law. This is due to the improved throughput and the need for
fewer training tokens to achieve the same decrease in test perplexity.
Extrapolating the scaling laws leads of both models results in an intersection
at roughly 50,000 accelerator hours. We hope this work can serve as the
foundation for meaningful and reproducible language modelling research.
[LINK]
http://arxiv.org/abs/2309.11197v1
[DATE]
2023-09-20 18:31:17+08:00
[CATEGORIES]
cs.LG
cs.CL
Are Large Language Models Really Robust to Word-Level Perturbations?
[AUTHORS]
Haoyu Wang, Guozheng Ma, Cong Yu, Ning Gui, Linrui Zhang, Zhiqi Huang, Suwei Ma, Yongzhe Chang, Sen Zhang, Li Shen, Xueqian Wang, Peilin Zhao, Dacheng Tao
[ABSTRACT]
The swift advancement in the scale and capabilities of Large Language Models
(LLMs) positions them as promising tools for a variety of downstream tasks. In
addition to the pursuit of better performance and the avoidance of violent
feedback on a certain prompt, to ensure the responsibility of the LLM, much
attention is drawn to the robustness of LLMs. However, existing evaluation
methods mostly rely on traditional question answering datasets with predefined
supervised labels, which do not align with the superior generation capabilities
of contemporary LLMs. To address this issue, we propose a novel rational
evaluation approach that leverages pre-trained reward models as diagnostic
tools to evaluate the robustness of LLMs, which we refer to as the Reward Model
for Reasonable Robustness Evaluation (TREvaL). Our extensive empirical
experiments have demonstrated that TREval provides an accurate method for
evaluating the robustness of an LLM, especially when faced with more
challenging open questions. Furthermore, our results demonstrate that LLMs
frequently exhibit vulnerability to word-level perturbations, which are
commonplace in daily language usage. Notably, we were surprised to discover
that robustness tends to decrease as fine-tuning (SFT and RLHF) is conducted.
The code of TREval is available in https://github.com/Harry-mic/TREval.
[LINK]
http://arxiv.org/abs/2309.11166v1
[DATE]
2023-09-20 17:23:46+08:00
[CATEGORIES]
cs.CL
Assessment of Pre-Trained Models Across Languages and Grammars
[AUTHORS]
Alberto Muñoz-Ortiz, David Vilares, Carlos Gómez-Rodríguez
[ABSTRACT]
We present an approach for assessing how multilingual large language models
(LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to
recover constituent and dependency structures by casting parsing as sequence
labeling. To do so, we select a few LLMs and study them on 13 diverse UD
treebanks for dependency parsing and 10 treebanks for constituent parsing. Our
results show that: (i) the framework is consistent across encodings, (ii)
pre-trained word vectors do not favor constituency representations of syntax
over dependencies, (iii) sub-word tokenization is needed to represent syntax,
in contrast to character-based models, and (iv) occurrence of a language in the
pretraining data is more important than the amount of task data when recovering
syntax from the word vectors.
[COMMENTS]
Accepted at IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.11165v1
[DATE]
2023-09-20 17:23:36+08:00
[CATEGORIES]
cs.CL
CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought
[AUTHORS]
Bowen Zhang, Kehua Chang, Chunping Li
[ABSTRACT]
Unsupervised sentence representation learning aims to transform input
sentences into fixed-length vectors enriched with intricate semantic
information while obviating the reliance on labeled data. Recent progress
within this field, propelled by contrastive learning and prompt engineering,
has significantly bridged the gap between unsupervised and supervised
strategies. Nonetheless, the potential utilization of Chain-of-Thought, remains
largely untapped within this trajectory. To unlock latent capabilities within
pre-trained models, such as BERT, we propose a two-stage approach for sentence
representation: comprehension and summarization. Subsequently, the output of
the latter phase is harnessed as the vectorized representation of the input
sentence. For further performance enhancement, we meticulously refine both the
contrastive learning loss function and the template denoising technique for
prompt engineering. Rigorous experimentation substantiates our method,
CoT-BERT, transcending a suite of robust baselines without necessitating other
text representation models or external databases.
[LINK]
http://arxiv.org/abs/2309.11143v1
[DATE]
2023-09-20 16:42:06+08:00
[CATEGORIES]
cs.CL
Prototype of a robotic system to assist the learning process of English language with text-generation through DNN
[AUTHORS]
Carlos Morales-Torres, Mario Campos-Soberanis, Diego Campos-Sobrino
[ABSTRACT]
In the last ongoing years, there has been a significant ascending on the
field of Natural Language Processing (NLP) for performing multiple tasks
including English Language Teaching (ELT). An effective strategy to favor the
learning process uses interactive devices to engage learners in their
self-learning process. In this work, we present a working prototype of a
humanoid robotic system to assist English language self-learners through text
generation using Long Short Term Memory (LSTM) Neural Networks. The learners
interact with the system using a Graphic User Interface that generates text
according to the English level of the user. The experimentation was conducted
using English learners and the results were measured accordingly to
International English Language Testing System (IELTS) rubric. Preliminary
results show an increment in the Grammatical Range of learners who interacted
with the system.
[COMMENTS]
Paper presented in the Mexican International Conference on Artificial
Intelligence 2021
[LINK]
http://arxiv.org/abs/2309.11142v1
[DATE]
2023-09-20 16:39:51+08:00
[CATEGORIES]
cs.CL
Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation
[AUTHORS]
Hyelin Nam, Jihong Park, Jinho Choi, Mehdi Bennis, Seong-Lyun Kim
[ABSTRACT]
By integrating recent advances in large language models (LLMs) and generative
models into the emerging semantic communication (SC) paradigm, in this article
we put forward to a novel framework of language-oriented semantic communication
(LSC). In LSC, machines communicate using human language messages that can be
interpreted and manipulated via natural language processing (NLP) techniques
for SC efficiency. To demonstrate LSC’s potential, we introduce three
innovative algorithms: 1) semantic source coding (SSC) which compresses a text
prompt into its key head words capturing the prompt’s syntactic essence while
maintaining their appearance order to keep the prompt’s context; 2) semantic
channel coding (SCC) that improves robustness against errors by substituting
head words with their lenghthier synonyms; and 3) semantic knowledge
distillation (SKD) that produces listener-customized prompts via in-context
learning the listener’s language style. In a communication task for progressive
text-to-image generation, the proposed methods achieve higher perceptual
similarities with fewer transmissions while enhancing robustness in noisy
communication channels.
[COMMENTS]
5 pages, 4 figures, submitted to 2024 IEEE International Conference
on Acoustics, Speech and Signal Processing
[LINK]
http://arxiv.org/abs/2309.11127v1
[DATE]
2023-09-20 16:19:05+08:00
[CATEGORIES]
cs.CL
Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
[AUTHORS]
Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney
[ABSTRACT]
Language barriers present a great challenge in our increasingly connected and
global world. Especially within the medical domain, e.g. hospital or emergency
room, communication difficulties and delays may lead to malpractice and
non-optimal patient care. In the HYKIST project, we consider patient-physician
communication, more specifically between a German-speaking physician and an
Arabic- or Vietnamese-speaking patient. Currently, a doctor can call the
Triaphon service to get assistance from an interpreter in order to help
facilitate communication. The HYKIST goal is to support the usually
non-professional bilingual interpreter with an automatic speech translation
system to improve patient care and help overcome language barriers. In this
work, we present our ASR system development efforts for this conversational
telephone speech translation task in the medical domain for two languages
pairs, data collection, various acoustic model architectures and
dialect-induced difficulties.
[COMMENTS]
ASR System Paper for HYKIST project
[LINK]
http://arxiv.org/abs/2210.13397v3
[DATE]
2023-09-20 16:08:47+08:00
[CATEGORIES]
cs.CL
Analyzing And Improving Neural Speaker Embeddings for ASR
[AUTHORS]
Christoph Lüscher, Jingjing Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney
[ABSTRACT]
Neural speaker embeddings encode the speaker’s speech characteristics through
a DNN model and are prevalent for speaker verification tasks. However, few
studies have investigated the usage of neural speaker embeddings for an ASR
system. In this work, we present our efforts w.r.t integrating neural speaker
embeddings into a conformer based hybrid HMM ASR system. For ASR, our improved
embedding extraction pipeline in combination with the Weighted-Simple-Add
integration method results in x-vector and c-vector reaching on par performance
with i-vectors. We further compare and analyze different speaker embeddings. We
present our acoustic model improvements obtained by switching from newbob
learning rate schedule to one cycle learning schedule resulting in a ~3%
relative WER reduction on Switchboard, additionally reducing the overall
training time by 17%. By further adding neural speaker embeddings, we gain
additional ~3% relative WER improvement on Hub5’00. Our best Conformer-based
hybrid ASR system with speaker embeddings achieves 9.0% WER on Hub5’00 and
Hub5’01 with training on SWB 300h.
[COMMENTS]
Accepted at ITG Speech Communications 2023
[LINK]
http://arxiv.org/abs/2301.04571v2
[DATE]
2023-09-20 15:43:13+08:00
[CATEGORIES]
cs.CL
Cross-lingual Data Augmentation for Document-grounded Dialog Systems in Low Resource Languages
[AUTHORS]
Qi Gou, Zehua Xia, Wenzhe Du
[ABSTRACT]
This paper proposes a framework to address the issue of data scarcity in
Document-Grounded Dialogue Systems(DGDS). Our model leverages high-resource
languages to enhance the capability of dialogue generation in low-resource
languages. Specifically, We present a novel pipeline CLEM (Cross-Lingual
Enhanced Model) including adversarial training retrieval (Retriever and
Re-ranker), and Fid (fusion-in-decoder) generator. To further leverage
high-resource language, we also propose an innovative architecture to conduct
alignment across different languages with translated training. Extensive
experiment results demonstrate the effectiveness of our model and we achieved
4th place in the DialDoc 2023 Competition. Therefore, CLEM can serve as a
solution to resource scarcity in DGDS and provide useful guidance for
multi-lingual alignment tasks.
[COMMENTS]
The Third DialDoc Workshop at ACL 2023,
https://aclanthology.org/2023.dialdoc-1.1
[LINK]
http://arxiv.org/abs/2305.14949v2
[DATE]
2023-09-20 15:39:19+08:00
[CATEGORIES]
cs.CL
Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level
[AUTHORS]
Jingzhe Ding, Yan Cen, Xinyuan Wei
[ABSTRACT]
Our work demonstrates that large language model (LLM) pre-trained on texts
can not only solve pure math word problems, but also physics word problems,
whose solution requires calculation and inference based on prior physical
knowledge. We collect and annotate the first physics word problem
dataset-PhysQA, which contains over 1000 junior high school physics word
problems (covering Kinematics, Mass&Density, Mechanics, Heat, Electricity).
Then we use OpenAI’ s GPT3.5 to generate the answer of these problems and found
that GPT3.5 could automatically solve 49.3% of the problems through zero-shot
learning and 73.2% through few-shot learning. This result demonstrates that by
using similar problems and their answers as prompt, LLM could solve elementary
physics word problems approaching human level performance. In addition to
solving problems, GPT3.5 can also summarize the knowledge or topics covered by
the problems, provide relevant explanations, and generate new physics word
problems based on the input. Our work is the first research to focus on the
automatic solving, explanation, and generation of physics word problems across
various types and scenarios, and we achieve an acceptable and state-of-the-art
accuracy. This underscores the potential of LLMs for further applications in
secondary education.
[COMMENTS]
9 pages, 6 figures
[LINK]
http://arxiv.org/abs/2309.08182v2
[DATE]
2023-09-20 15:08:53+08:00
[CATEGORIES]
cs.CL
K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling
[AUTHORS]
Haven Kim, Jongmin Jung, Dasaem Jeong, Juhan Nam
[ABSTRACT]
Lyric translation, a field studied for over a century, is now attracting
computational linguistics researchers. We identified two limitations in
previous studies. Firstly, lyric translation studies have predominantly focused
on Western genres and languages, with no previous study centering on K-pop
despite its popularity. Second, the field of lyric translation suffers from a
lack of publicly available datasets; to the best of our knowledge, no such
dataset exists. To broaden the scope of genres and languages in lyric
translation studies, we introduce a novel singable lyric translation dataset,
approximately 89\% of which consists of K-pop song lyrics. This dataset aligns
Korean and English lyrics line-by-line and section-by-section. We leveraged
this dataset to unveil unique characteristics of K-pop lyric translation,
distinguishing it from other extensively studied genres, and to construct a
neural lyric translation model, thereby underscoring the importance of a
dedicated dataset for singable lyric translations.
[LINK]
http://arxiv.org/abs/2309.11093v1
[DATE]
2023-09-20 14:54:55+08:00
[CATEGORIES]
cs.CL
cs.LG
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
[AUTHORS]
Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, Yuan Qi
[ABSTRACT]
In recent years, the explosion of web videos makes text-video retrieval
increasingly essential and popular for video filtering, recommendation, and
search. Text-video retrieval aims to rank relevant text/video higher than
irrelevant ones. The core of this task is to precisely measure the cross-modal
similarity between texts and videos. Recently, contrastive learning methods
have shown promising results for text-video retrieval, most of which focus on
the construction of positive and negative pairs to learn text and video
representations. Nevertheless, they do not pay enough attention to hard
negative pairs and lack the ability to model different levels of semantic
similarity. To address these two issues, this paper improves contrastive
learning using two novel techniques. First, to exploit hard examples for robust
discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module
(DMAE) to mine hard negative pairs from textual and visual clues. By further
introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively
identify all these hard negatives and explicitly highlight their impacts in the
training loss. Second, our work argues that triplet samples can better model
fine-grained semantic similarity compared to pairwise samples. We thereby
present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to
construct partial order triplet samples by automatically generating
fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL
designs an adaptive token masking strategy with cross-modal interaction to
model subtle semantic differences. Extensive experiments demonstrate that the
proposed approach outperforms existing methods on four widely-used text-video
retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.
[COMMENTS]
Accepted by ACM MM 2023
[LINK]
http://arxiv.org/abs/2309.11082v1
[DATE]
2023-09-20 14:08:11+08:00
[CATEGORIES]
cs.CL
UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt
[AUTHORS]
Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li
[ABSTRACT]
Recent research has shown that multi-task pre-training greatly improves the
model’s robustness and transfer ability, which is crucial for building a
high-quality dialog system. However, most previous works on multi-task
pre-training rely heavily on human-defined input format or prompt, which is not
optimal in quality and quantity. In this work, we propose to use Task-based
Automatic Prompt generation (TAP) to automatically generate high-quality
prompts. Using the high-quality prompts generated, we scale the corpus of the
pre-trained conversation model to 122 datasets from 15 dialog-related tasks,
resulting in Universal Pre-trained Conversation Model (UniPCM), a powerful
foundation model for various conversational tasks and different dialog systems.
Extensive experiments have shown that UniPCM is robust to input prompts and
capable of various dialog-related tasks. Moreover, UniPCM has strong transfer
ability and excels at low resource scenarios, achieving SOTA results on 9
different datasets ranging from task-oriented dialog to open-domain
conversation. Furthermore, we are amazed to find that TAP can generate prompts
on par with those collected with crowdsourcing. The code is released with the
paper.
[LINK]
http://arxiv.org/abs/2309.11065v1
[DATE]
2023-09-20 13:05:40+08:00
[CATEGORIES]
cs.CL
Design of Chain-of-Thought in Math Problem Solving
[AUTHORS]
Zhanming Jie, Trung Quoc Luong, Xinbo Zhang, Xiaoran Jin, Hang Li
[ABSTRACT]
Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem
solving. We conduct a comprehensive examination of methods for designing CoT,
comparing conventional natural language CoT with various program CoTs,
including the self-describing program, the comment-describing program, and the
non-describing program. Furthermore, we investigate the impact of programming
language on program CoTs, comparing Python and Wolfram Language. Through
extensive experiments on GSM8K, MATHQA, and SVAMP, we find that program CoTs
often have superior effectiveness in math problem solving. Notably, the best
performing combination with 30B parameters beats GPT-3.5-turbo by a significant
margin. The results show that self-describing program offers greater diversity
and thus can generally achieve higher performance. We also find that Python is
a better choice of language than Wolfram for program CoTs. The experimental
results provide a valuable guideline for future CoT designs that take into
account both programming language and coding style for further advancements.
Our datasets and code are publicly available.
[COMMENTS]
15 pages
[LINK]
http://arxiv.org/abs/2309.11054v1
[DATE]
2023-09-20 12:17:28+08:00
[CATEGORIES]
cs.CL
cs.LG
Heterogeneous Entity Matching with Complex Attribute Associations using BERT and Neural Networks
[AUTHORS]
Shitao Wang, Jiamin Lu
[ABSTRACT]
Across various domains, data from different sources such as Baidu Baike and
Wikipedia often manifest in distinct forms. Current entity matching
methodologies predominantly focus on homogeneous data, characterized by
attributes that share the same structure and concise attribute values. However,
this orientation poses challenges in handling data with diverse formats.
Moreover, prevailing approaches aggregate the similarity of attribute values
between corresponding attributes to ascertain entity similarity. Yet, they
often overlook the intricate interrelationships between attributes, where one
attribute may have multiple associations. The simplistic approach of pairwise
attribute comparison fails to harness the wealth of information encapsulated
within entities.To address these challenges, we introduce a novel entity
matching model, dubbed Entity Matching Model for Capturing Complex Attribute
Relationships(EMM-CCAR),built upon pre-trained models. Specifically, this model
transforms the matching task into a sequence matching problem to mitigate the
impact of varying data formats. Moreover, by introducing attention mechanisms,
it identifies complex relationships between attributes, emphasizing the degree
of matching among multiple attributes rather than one-to-one correspondences.
Through the integration of the EMM-CCAR model, we adeptly surmount the
challenges posed by data heterogeneity and intricate attribute
interdependencies. In comparison with the prevalent DER-SSM and Ditto
approaches, our model achieves improvements of approximately 4% and 1% in F1
scores, respectively. This furnishes a robust solution for addressing the
intricacies of attribute complexity in entity matching.
[LINK]
http://arxiv.org/abs/2309.11046v1
[DATE]
2023-09-20 11:49:57+08:00
[CATEGORIES]
cs.CL
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters
[AUTHORS]
Yukang Xie, Chengyu Wang, Junbing Yan, Jiyong Zhou, Feiqi Deng, Jun Huang
[ABSTRACT]
Recently, Large Language Models (LLMs) have achieved amazing zero-shot
learning performance over a variety of Natural Language Processing (NLP) tasks,
especially for text generative tasks. Yet, the large size of LLMs often leads
to the high computational cost of model training and online deployment. In our
work, we present ALTER, a system that effectively builds the multi-tAsk
Learners with mixTure-of-task-adaptERs upon small language models (with <1B
parameters) to address multiple NLP tasks simultaneously, capturing the
commonalities and differences between tasks, in order to support
domain-specific applications. Specifically, in ALTER, we propose the
Mixture-of-Task-Adapters (MTA) module as an extension to the transformer
architecture for the underlying model to capture the intra-task and inter-task
knowledge. A two-stage training method is further proposed to optimize the
collaboration between adapters at a small computational cost. Experimental
results over a mixture of NLP tasks show that our proposed MTA architecture and
the two-stage training method achieve good performance. Based on ALTER, we have
also produced MTA-equipped language models for various domains.
[LINK]
http://arxiv.org/abs/2309.11042v1
[DATE]
2023-09-20 11:39:56+08:00
[CATEGORIES]
cs.CL
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language
[AUTHORS]
Shuyan Zhou, Pengcheng Yin, Graham Neubig
[LINK]
http://arxiv.org/abs/2109.08214v2
[DATE]
2023-09-20 11:37:24+08:00
[CATEGORIES]
cs.CL
Toward Unified Controllable Text Generation via Regular Expression Instruction
[AUTHORS]
Xin Zheng, Hongyu Lin, Xianpei Han, Le Sun
[ABSTRACT]
Controllable text generation is a fundamental aspect of natural language
generation, with numerous methods proposed for different constraint types.
However, these approaches often require significant architectural or decoding
modifications, making them challenging to apply to additional constraints or
resolve different constraint combinations. To address this, our paper
introduces Regular Expression Instruction (REI), which utilizes an
instruction-based mechanism to fully exploit regular expressions’ advantages to
uniformly model diverse constraints. Specifically, our REI supports all popular
fine-grained controllable generation constraints, i.e., lexical, positional,
and length, as well as their complex combinations, via regular expression-style
instructions. Our method only requires fine-tuning on medium-scale language
models or few-shot, in-context learning on large language models, and requires
no further adjustment when applied to various constraint combinations.
Experiments demonstrate that our straightforward approach yields high success
rates and adaptability to various constraints while maintaining competitiveness
in automatic metrics and outperforming most previous baselines.
[COMMENTS]
Accepted on IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.10447v2
[DATE]
2023-09-20 10:18:06+08:00
[CATEGORIES]
cs.CL
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
[AUTHORS]
Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung
[ABSTRACT]
Democratizing access to natural language processing (NLP) technology is
crucial, especially for underrepresented and extremely low-resource languages.
Previous research has focused on developing labeled and unlabeled corpora for
these languages through online scraping and document translation. While these
methods have proven effective and cost-efficient, we have identified
limitations in the resulting corpora, including a lack of lexical diversity and
cultural relevance to local communities. To address this gap, we conduct a case
study on Indonesian local languages. We compare the effectiveness of online
scraping, human translation, and paragraph writing by native speakers in
constructing datasets. Our findings demonstrate that datasets generated through
paragraph writing by native speakers exhibit superior quality in terms of
lexical diversity and cultural content. In addition, we present the
\datasetname{} benchmark, encompassing 12 underrepresented and extremely
low-resource languages spoken by millions of individuals in Indonesia. Our
empirical experiment results using existing multilingual large language models
conclude the need to extend these models to more underrepresented languages. We
release the NusaWrites dataset at https://github.com/IndoNLP/nusa-writes.
[LINK]
http://arxiv.org/abs/2309.10661v2
[DATE]
2023-09-20 10:15:50+08:00
[CATEGORIES]
cs.CL
QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation
[AUTHORS]
Kunlun Zhu, Shihao Liang, Xu Han, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
[ABSTRACT]
Recent years have witnessed the success of question answering (QA),
especially its potential to be a foundation paradigm for tackling diverse NLP
tasks. However, obtaining sufficient data to build an effective and stable QA
system still remains an open problem. For this problem, we introduce an
iterative bootstrapping framework for QA data augmentation (named QASnowball),
which can iteratively generate large-scale high-quality QA data based on a seed
set of supervised examples. Specifically, QASnowball consists of three modules,
an answer extractor to extract core phrases in unlabeled documents as candidate
answers, a question generator to generate questions based on documents and
candidate answers, and a QA data filter to filter out high-quality QA data.
Moreover, QASnowball can be self-enhanced by reseeding the seed set to
fine-tune itself in different iterations, leading to continual improvements in
the generation quality. We conduct experiments in the high-resource English
scenario and the medium-resource Chinese scenario, and the experimental results
show that the data generated by QASnowball can facilitate QA models: (1)
training models on the generated data achieves comparable results to using
supervised data, and (2) pre-training on the generated data and fine-tuning on
supervised data can achieve better performance. Our code and generated data
will be released to advance further work.
[LINK]
http://arxiv.org/abs/2309.10326v2
[DATE]
2023-09-20 09:57:10+08:00
[CATEGORIES]
cs.CL
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model
[AUTHORS]
Xinyu Zhou, Delong Chen, Yudong Chen
[ABSTRACT]
This paper explores the potential of constructing an AI spoken dialogue
system that “thinks how to respond” and “thinks how to speak” simultaneously,
which more closely aligns with the human speech production process compared to
the current cascade pipeline of independent chatbot and Text-to-Speech (TTS)
modules. We hypothesize that Large Language Models (LLMs) with billions of
parameters possess significant speech understanding capabilities and can
jointly model dialogue responses and linguistic features. We conduct two sets
of experiments: 1) Prosodic structure prediction, a typical front-end task in
TTS, demonstrating the speech understanding ability of LLMs, and 2) Further
integrating dialogue response and a wide array of linguistic features using a
unified encoding format. Our results indicate that the LLM-based approach is a
promising direction for building unified spoken dialogue systems.
[LINK]
http://arxiv.org/abs/2309.11000v1
[DATE]
2023-09-20 09:48:27+08:00
[CATEGORIES]
cs.CL
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
[AUTHORS]
Mara Finkelstein, Markus Freitag
[ABSTRACT]
Recent research in decoding methods for Natural Language Generation (NLG)
tasks has shown that the traditional beam search and greedy decoding algorithms
are not optimal, because model probabilities do not always align with human
preferences. Stronger decoding methods, including Quality Estimation (QE)
reranking and Minimum Bayes’ Risk (MBR) decoding, have since been proposed to
mitigate the model-perplexity-vs-quality mismatch. While these decoding methods
achieve state-of-the-art performance, they are prohibitively expensive to
compute. In this work, we propose MBR finetuning and QE finetuning which
distill the quality gains from these decoding methods at training time, while
using an efficient decoding algorithm at inference time. Using the canonical
NLG task of Neural Machine Translation (NMT), we show that even with
self-training, these finetuning methods significantly outperform the base
model. Moreover, when using an external LLM as a teacher model, these
finetuning methods outperform finetuning on human-generated references. These
findings suggest new ways to leverage monolingual data to achieve improvements
in model quality that are on par with, or even exceed, improvements from
human-curated data, while maintaining maximum efficiency during decoding.
[LINK]
http://arxiv.org/abs/2309.10966v1
[DATE]
2023-09-20 07:39:07+08:00
[CATEGORIES]
cs.CL
LMDX: Language Model-based Document Information Extraction and Localization
[AUTHORS]
Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Jiaqi Mu, Hao Zhang, Nan Hua
[ABSTRACT]
Large Language Models (LLM) have revolutionized Natural Language Processing
(NLP), improving state-of-the-art on many existing tasks and exhibiting
emergent capabilities. However, LLMs have not yet been successfully applied on
semi-structured document information extraction, which is at the core of many
document processing workflows and consists of extracting key entities from a
visually rich document (VRD) given a predefined target schema. The main
obstacles to LLM adoption in that task have been the absence of layout encoding
within LLMs, critical for a high quality extraction, and the lack of a
grounding mechanism ensuring the answer is not hallucinated. In this paper, we
introduce Language Model-based Document Information Extraction and Localization
(LMDX), a methodology to adapt arbitrary LLMs for document information
extraction. LMDX can do extraction of singular, repeated, and hierarchical
entities, both with and without training data, while providing grounding
guarantees and localizing the entities within the document. In particular, we
apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks,
setting a new state-of-the-art and showing how LMDX enables the creation of
high quality, data-efficient parsers.
[LINK]
http://arxiv.org/abs/2309.10952v1
[DATE]
2023-09-20 06:32:56+08:00
[CATEGORIES]
cs.CL
cs.LG
Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change
[AUTHORS]
Paulo Pirozelli, Marcos M. José, Igor Silveira, Flávio Nakasato, Sarajane M. Peres, Anarosa A. F. Brandão, Anna H. R. Costa, Fabio G. Cozman
[ABSTRACT]
Pir'a is a reading comprehension dataset focused on the ocean, the Brazilian
coast, and climate change, built from a collection of scientific abstracts and
reports on these topics. This dataset represents a versatile language resource,
particularly useful for testing the ability of current machine learning models
to acquire expert scientific knowledge. Despite its potential, a detailed set
of baselines has not yet been developed for Pir'a. By creating these
baselines, researchers can more easily utilize Pir'a as a resource for testing
machine learning models across a wide range of question answering tasks. In
this paper, we define six benchmarks over the Pir'a dataset, covering closed
generative question answering, machine reading comprehension, information
retrieval, open question answering, answer triggering, and multiple choice
question answering. As part of this effort, we have also produced a curated
version of the original dataset, where we fixed a number of grammar issues,
repetitions, and other shortcomings. Furthermore, the dataset has been extended
in several new directions, so as to face the aforementioned benchmarks:
translation of supporting texts from English into Portuguese, classification
labels for answerability, automatic paraphrases of questions and answers, and
multiple choice candidates. The results described in this paper provide several
points of reference for researchers interested in exploring the challenges
provided by the Pir'a dataset.
[COMMENTS]
Accepted at Data Intelligence. Online ISSN 2641-435X
[LINK]
http://arxiv.org/abs/2309.10945v1
[DATE]
2023-09-20 05:56:45+08:00
[CATEGORIES]
cs.CL
Concept-Oriented Deep Learning with Large Language Models
[AUTHORS]
Daniel T. Chang
[ABSTRACT]
Large Language Models (LLMs) have been successfully used in many
natural-language tasks and applications including text generation and AI
chatbots. They also are a promising new technology for concept-oriented deep
learning (CODL). However, the prerequisite is that LLMs understand concepts and
ensure conceptual consistency. We discuss these in this paper, as well as major
uses of LLMs for CODL including concept extraction from text, concept graph
extraction from text, and concept learning. Human knowledge consists of both
symbolic (conceptual) knowledge and embodied (sensory) knowledge. Text-only
LLMs, however, can represent only symbolic (conceptual) knowledge. Multimodal
LLMs, on the other hand, are capable of representing the full range (conceptual
and sensory) of human knowledge. We discuss conceptual understanding in
visual-language LLMs, the most important multimodal LLMs, and major uses of
them for CODL including concept extraction from image, concept graph extraction
from image, and concept learning. While uses of LLMs for CODL are valuable
standalone, they are particularly valuable as part of LLM applications such as
AI chatbots.
[LINK]
http://arxiv.org/abs/2306.17089v2
[DATE]
2023-09-20 05:15:52+08:00
[CATEGORIES]
cs.LG
cs.CL
A Family of Pretrained Transformer Language Models for Russian
[AUTHORS]
Dmitry Zmitrovich, Alexander Abramov, Andrey Kalmykov, Maria Tikhonova, Ekaterina Taktasheva, Danil Astafurov, Mark Baushenko, Artem Snegirev, Tatiana Shavrina, Sergey Markov, Vladislav Mikhailov, Alena Fenogenova
[ABSTRACT]
Nowadays, Transformer language models (LMs) represent a fundamental component
of the NLP research methodologies and applications. However, the development of
such models specifically for the Russian language has received little
attention. This paper presents a collection of 13 Russian Transformer LMs based
on the encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and
encoder-decoder (ruT5, FRED-T5) models in multiple sizes. Access to these
models is readily available via the HuggingFace platform. We provide a report
of the model architecture design and pretraining, and the results of evaluating
their generalization abilities on Russian natural language understanding and
generation datasets and benchmarks. By pretraining and releasing these
specialized Transformer LMs, we hope to broaden the scope of the NLP research
directions and enable the development of industrial solutions for the Russian
language.
[LINK]
http://arxiv.org/abs/2309.10931v1
[DATE]
2023-09-20 05:07:52+08:00
[CATEGORIES]
cs.CL
Specializing Small Language Models towards Complex Style Transfer via Latent Attribute Pre-Training
[AUTHORS]
Ruiqi Xu, Yongfeng Huang, Xin Chen, Lin Zhang
[ABSTRACT]
In this work, we introduce the concept of complex text style transfer tasks,
and constructed complex text datasets based on two widely applicable scenarios.
Our dataset is the first large-scale data set of its kind, with 700 rephrased
sentences and 1,000 sentences from the game Genshin Impact. While large
language models (LLM) have shown promise in complex text style transfer, they
have drawbacks such as data privacy concerns, network instability, and high
deployment costs. To address these issues, we explore the effectiveness of
small models (less than T5-3B) with implicit style pre-training through
contrastive learning. We also propose a method for automated evaluation of text
generation quality based on alignment with human evaluations using ChatGPT.
Finally, we compare our approach with existing methods and show that our model
achieves state-of-art performances of few-shot text style transfer models.
[LINK]
http://arxiv.org/abs/2309.10929v1
[DATE]
2023-09-20 05:01:40+08:00
[CATEGORIES]
cs.CL
Semi-Autoregressive Streaming ASR With Label Context
[AUTHORS]
Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury
[ABSTRACT]
Non-autoregressive (NAR) modeling has gained significant interest in speech
processing since these models achieve dramatically lower inference time than
autoregressive (AR) models while also achieving good transcription accuracy.
Since NAR automatic speech recognition (ASR) models must wait for the
completion of the entire utterance before processing, some works explore
streaming NAR models based on blockwise attention for low-latency applications.
However, streaming NAR models significantly lag in accuracy compared to
streaming AR and non-streaming NAR models. To address this, we propose a
streaming “semi-autoregressive” ASR model that incorporates the labels emitted
in previous blocks as additional context using a Language Model (LM)
subnetwork. We also introduce a novel greedy decoding algorithm that addresses
insertion and deletion errors near block boundaries while not significantly
increasing the inference time. Experiments show that our method outperforms the
existing streaming NAR model by 19% relative on Tedlium2, 16%/8% on
Librispeech-100 clean/other test sets, and 19%/8% on the Switchboard(SWB) /
Callhome(CH) test sets. It also reduced the accuracy gap with streaming AR and
non-streaming NAR models while achieving 2.5x lower latency. We also
demonstrate that our approach can effectively utilize external text data to
pre-train the LM subnetwork to further improve streaming ASR accuracy.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.10926v1
[DATE]
2023-09-20 04:55:58+08:00
[CATEGORIES]
cs.CL
Semi-automatic staging area for high-quality structured data extraction from scientific literature
[AUTHORS]
Luca Foppiano, Tomoya Mato, Kensei Terashima, Pedro Ortiz Suarez, Taku Tou, Chikako Sakai, Wei-Sheng Wang, Toshiyuki Amagasa, Yoshihiko Takano, Masashi Ishii
[COMMENTS]
5 tables, 9 figures, 31 pages
[LINK]
http://arxiv.org/abs/2309.10923v1
[DATE]
2023-09-20 04:53:13+08:00
[CATEGORIES]
cs.CL
cs.LG
End-to-End Speech Recognition Contextualization with Large Language Models
[AUTHORS]
Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen
[ABSTRACT]
In recent years, Large Language Models (LLMs) have garnered significant
attention from the research community due to their exceptional performance and
generalization capabilities. In this paper, we introduce a novel method for
contextualizing speech recognition models incorporating LLMs. Our approach
casts speech recognition as a mixed-modal language modeling task based on a
pretrained LLM. We provide audio features, along with optional text tokens for
context, to train the system to complete transcriptions in a decoder-only
fashion. As a result, the system is implicitly incentivized to learn how to
leverage unstructured contextual information during training. Our empirical
results demonstrate a significant improvement in performance, with a 6% WER
reduction when additional textual context is provided. Moreover, we find that
our method performs competitively and improve by 7.5% WER overall and 17% WER
on rare words against a baseline contextualized RNN-T system that has been
trained on more than twenty five times larger speech dataset. Overall, we
demonstrate that by only adding a handful number of trainable parameters via
adapters, we can unlock contextualized speech recognition capability for the
pretrained LLM while keeping the same text-only input functionality.
[LINK]
http://arxiv.org/abs/2309.10917v1
[DATE]
2023-09-20 04:28:57+08:00
[CATEGORIES]
cs.CL
cs.LG
RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spans
[AUTHORS]
Bohdan Didenko, Andrii Sameliuk
[ABSTRACT]
The text editing tasks, including sentence fusion, sentence splitting and
rephrasing, text simplification, and Grammatical Error Correction (GEC), share
a common trait of dealing with highly similar input and output sequences. This
area of research lies at the intersection of two well-established fields: (i)
fully autoregressive sequence-to-sequence approaches commonly used in tasks
like Neural Machine Translation (NMT) and (ii) sequence tagging techniques
commonly used to address tasks such as Part-of-speech tagging, Named-entity
recognition (NER), and similar. In the pursuit of a balanced architecture,
researchers have come up with numerous imaginative and unconventional
solutions, which we’re discussing in the Related Works section. Our approach to
addressing text editing tasks is called RedPenNet and is aimed at reducing
architectural and parametric redundancies presented in specific
Sequence-To-Edits models, preserving their semi-autoregressive advantages. Our
models achieve $F_{0.5}$ scores of 77.60 on the BEA-2019 (test), which can be
considered as state-of-the-art the only exception for system combination and
67.71 on the UAGEC+Fluency (test) benchmarks.
This research is being conducted in the context of the UNLP 2023 workshop,
where it was presented as a paper as a paper for the Shared Task in Grammatical
Error Correction (GEC) for Ukrainian. This study aims to apply the RedPenNet
approach to address the GEC problem in the Ukrainian language.
[LINK]
http://arxiv.org/abs/2309.10898v1
[DATE]
2023-09-20 03:48:30+08:00
[CATEGORIES]
cs.CL
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
[AUTHORS]
Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass
[ABSTRACT]
How can we perform computations over natural language representations to
solve tasks that require symbolic and numeric reasoning? We propose natural
language embedded programs (NLEP) as a unifying framework for addressing
math/symbolic reasoning, natural language understanding, and instruction
following tasks. Our approach prompts a language model to generate full Python
programs that define functions over data structures which contain natural
language representations of structured knowledge. A Python interpreter then
executes the generated code and prints the output. Despite using a task-general
prompt, we find that this approach can improve upon strong baselines across a
range of different tasks including math and symbolic reasoning, text
classification, question answering, and instruction following. We further find
the generated programs are often interpretable and enable post-hoc verification
of the intermediate reasoning steps.
[LINK]
http://arxiv.org/abs/2309.10814v1
[DATE]
2023-09-20 01:54:21+08:00
[CATEGORIES]
cs.CL
Modeling interdisciplinary interactions among Physics, Mathematics & Computer Science
[AUTHORS]
Rima Hazra, Mayank Singh, Pawan Goyal, Bibhas Adhikari, Animesh Mukherjee
[ABSTRACT]
Interdisciplinarity has over the recent years have gained tremendous
importance and has become one of the key ways of doing cutting edge research.
In this paper we attempt to model the citation flow across three different
fields – Physics (PHY), Mathematics (MA) and Computer Science (CS). For
instance, is there a specific pattern in which these fields cite one another?
We carry out experiments on a dataset comprising more than 1.2 million articles
taken from these three fields. We quantify the citation interactions among
these three fields through temporal bucket signatures. We present numerical
models based on variants of the recently proposed relay-linking framework to
explain the citation dynamics across the three disciplines. These models make a
modest attempt to unfold the underlying principles of how citation links could
have been formed across the three fields over time.
[COMMENTS]
Accepted at Journal of Physics: Complexity
[LINK]
http://arxiv.org/abs/2309.10811v1
[DATE]
2023-09-20 01:52:50+08:00
[CATEGORIES]
cs.CL
Sparse Autoencoders Find Highly Interpretable Features in Language Models
[AUTHORS]
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey
[ABSTRACT]
One of the roadblocks to a better understanding of neural networks’ internals
is \textit{polysemanticity}, where neurons appear to activate in multiple,
semantically distinct contexts. Polysemanticity prevents us from identifying
concise, human-understandable explanations for what neural networks are doing
internally. One hypothesised cause of polysemanticity is
\textit{superposition}, where neural networks represent more features than they
have neurons by assigning features to an overcomplete set of directions in
activation space, rather than to individual neurons. Here, we attempt to
identify those directions, using sparse autoencoders to reconstruct the
internal activations of a language model. These autoencoders learn sets of
sparsely activating features that are more interpretable and monosemantic than
directions identified by alternative approaches, where interpretability is
measured by automated methods. Ablating these features enables precise model
editing, for example, by removing capabilities such as pronoun prediction,
while disrupting model behaviour less than prior techniques. This work
indicates that it is possible to resolve superposition in language models using
a scalable, unsupervised method. Our method may serve as a foundation for
future mechanistic interpretability work, which we hope will enable greater
model transparency and steerability.
[COMMENTS]
18 pages, 17 figures, 2 tables
[LINK]
http://arxiv.org/abs/2309.08600v2
[DATE]
2023-09-20 01:20:52+08:00
[CATEGORIES]
cs.LG
cs.CL
FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection
[AUTHORS]
Jamil Zaghir, Mina Bjelogrlic, Jean-Philippe Goldman, Soukaïna Aananou, Christophe Gaudet-Blavignac, Christian Lovis
[ABSTRACT]
Natural language processing (NLP) applications such as named entity
recognition (NER) for low-resource corpora do not benefit from recent advances
in the development of large language models (LLMs) where there is still a need
for larger annotated datasets. This research article introduces a methodology
for generating translated versions of annotated datasets through crosslingual
annotation projection. Leveraging a language agnostic BERT-based approach, it
is an efficient solution to increase low-resource corpora with few human
efforts and by only using already available open data resources. Quantitative
and qualitative evaluations are often lacking when it comes to evaluating the
quality and effectiveness of semi-automatic data generation strategies. The
evaluation of our crosslingual annotation projection approach showed both
effectiveness and high accuracy in the resulting dataset. As a practical
application of this methodology, we present the creation of French Annotated
Resource with Semantic Information for Medical Entities Detection (FRASIMED),
an annotated corpus comprising 2’051 synthetic clinical cases in French. The
corpus is now available for researchers and practitioners to develop and refine
French natural language processing (NLP) applications in the clinical field
(https://zenodo.org/record/8355629), making it the largest open annotated
corpus with linked medical concepts in French.
[LINK]
http://arxiv.org/abs/2309.10770v1
[DATE]
2023-09-20 01:17:28+08:00
[CATEGORIES]
cs.CL
Controllable Speaking Styles Using a Large Language Model
[AUTHORS]
Atli Thor Sigurgeirsson, Simon King
[ABSTRACT]
Reference-based Text-to-Speech (TTS) models can generate multiple,
prosodically-different renditions of the same target text. Such models jointly
learn a latent acoustic space during training, which can be sampled from during
inference. Controlling these models during inference typically requires finding
an appropriate reference utterance, which is non-trivial.
Large generative language models (LLMs) have shown excellent performance in
various language-related tasks. Given only a natural language query text (the
prompt), such models can be used to solve specific, context-dependent tasks.
Recent work in TTS has attempted similar prompt-based control of novel speaking
style generation. Those methods do not require a reference utterance and can,
under ideal conditions, be controlled with only a prompt. But existing methods
typically require a prompt-labelled speech corpus for jointly training a
prompt-conditioned encoder.
In contrast, we instead employ an LLM to directly suggest prosodic
modifications for a controllable TTS model, using contextual information
provided in the prompt. The prompt can be designed for a multitude of tasks.
Here, we give two demonstrations: control of speaking style; prosody
appropriate for a given dialogue context. The proposed method is rated most
appropriate in 50% of cases vs. 31% for a baseline model.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2305.10321v2
[DATE]
2023-09-20 00:35:57+08:00
[CATEGORIES]
cs.CL
Context is Environment
[AUTHORS]
Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja
[ABSTRACT]
Two lines of work are taking the central stage in AI research. On the one
hand, the community is making increasing efforts to build models that discard
spurious correlations and generalize better in novel test environments.
Unfortunately, the bitter lesson so far is that no proposal convincingly
outperforms a simple empirical risk minimization baseline. On the other hand,
large language models (LLMs) have erupted as algorithms able to learn
in-context, generalizing on-the-fly to eclectic contextual circumstances that
users enforce by means of prompting. In this paper, we argue that context is
environment, and posit that in-context learning holds the key to better domain
generalization. Via extensive theory and experiments, we show that paying
attention to context$\unicode{x2013}\unicode{x2013}$unlabeled examples as they
arrive$\unicode{x2013}\unicode{x2013}$allows our proposed In-Context Risk
Minimization (ICRM) algorithm to zoom-in on the test environment risk
minimizer, leading to significant out-of-distribution performance improvements.
From all of this, two messages are worth taking home. Researchers in domain
generalization should consider environment as context, and harness the adaptive
power of in-context learning. Researchers in LLMs should consider context as
environment, to better structure data towards generalization.
[COMMENTS]
41 Pages, 4 Figures
[LINK]
http://arxiv.org/abs/2309.09888v2
[DATE]
2023-09-20 23:58:13+08:00
[CATEGORIES]
cs.LG
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
[AUTHORS]
Song Mei, Yuchen Wu
[ABSTRACT]
We investigate the approximation efficiency of score functions by deep neural
networks in diffusion-based generative modeling. While existing approximation
theories utilize the smoothness of score functions, they suffer from the curse
of dimensionality for intrinsically high-dimensional data. This limitation is
pronounced in graphical models such as Markov random fields, common for image
distributions, where the approximation efficiency of score functions remains
unestablished.
To address this, we observe score functions can often be well-approximated in
graphical models through variational inference denoising algorithms.
Furthermore, these algorithms are amenable to efficient neural network
representation. We demonstrate this in examples of graphical models, including
Ising models, conditional Ising models, restricted Boltzmann machines, and
sparse encoding models. Combined with off-the-shelf discretization error bounds
for diffusion-based sampling, we provide an efficient sample complexity bound
for diffusion-based generative modeling when the score function is learned by
deep neural networks.
[COMMENTS]
41 pages
[LINK]
http://arxiv.org/abs/2309.11420v1
[DATE]
2023-09-20 23:51:10+08:00
[CATEGORIES]
cs.LG
EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning
[AUTHORS]
Kallol Saha, Vishal Mandadi, Jayaram Reddy, Ajit Srikanth, Aditya Agarwal, Bipasha Sen, Arun Singh, Madhava Krishna
[ABSTRACT]
Classical motion planning for robotic manipulation includes a set of general
algorithms that aim to minimize a scene-specific cost of executing a given
plan. This approach offers remarkable adaptability, as they can be directly
used off-the-shelf for any new scene without needing specific training
datasets. However, without a prior understanding of what diverse valid
trajectories are and without specially designed cost functions for a given
scene, the overall solutions tend to have low success rates. While
deep-learning-based algorithms tremendously improve success rates, they are
much harder to adopt without specialized training datasets. We propose EDMP, an
Ensemble-of-costs-guided Diffusion for Motion Planning that aims to combine the
strengths of classical and deep-learning-based motion planning. Our
diffusion-based network is trained on a set of diverse kinematically valid
trajectories. Like classical planning, for any new scene at the time of
inference, we compute scene-specific costs such as “collision cost” and guide
the diffusion to generate valid trajectories that satisfy the scene-specific
constraints. Further, instead of a single cost function that may be
insufficient in capturing diversity across scenes, we use an ensemble of costs
to guide the diffusion process, significantly improving the success rate
compared to classical planners. EDMP performs comparably with SOTA
deep-learning-based methods while retaining the generalization capabilities
primarily associated with classical planners.
[COMMENTS]
8 pages, 8 figures, submitted to ICRA 2024 (International Conference
on Robotics and Automation)
[LINK]
http://arxiv.org/abs/2309.11414v1
[DATE]
2023-09-20 23:40:32+08:00
[CATEGORIES]
cs.LG
Transformers versus LSTMs for electronic trading
[AUTHORS]
Paul Bilokon, Yitao Qiu
[ABSTRACT]
With the rapid development of artificial intelligence, long short term memory
(LSTM), one kind of recurrent neural network (RNN), has been widely applied in
time series prediction.
Like RNN, Transformer is designed to handle the sequential data. As
Transformer achieved great success in Natural Language Processing (NLP),
researchers got interested in Transformer’s performance on time series
prediction, and plenty of Transformer-based solutions on long time series
forecasting have come out recently. However, when it comes to financial time
series prediction, LSTM is still a dominant architecture. Therefore, the
question this study wants to answer is: whether the Transformer-based model can
be applied in financial time series prediction and beat LSTM.
To answer this question, various LSTM-based and Transformer-based models are
compared on multiple financial prediction tasks based on high-frequency limit
order book data. A new LSTM-based model called DLSTM is built and new
architecture for the Transformer-based model is designed to adapt for financial
prediction. The experiment result reflects that the Transformer-based model
only has the limited advantage in absolute price sequence prediction. The
LSTM-based models show better and more robust performance on difference
sequence prediction, such as price difference and price movement.
[LINK]
http://arxiv.org/abs/2309.11400v1
[DATE]
2023-09-20 23:25:43+08:00
[CATEGORIES]
cs.LG
Neural Latent Geometry Search: Product Manifold Inference via Gromov-Hausdorff-Informed Bayesian Optimization
[AUTHORS]
Haitz Saez de Ocariz Borde, Alvaro Arroyo, Ismael Morales, Ingmar Posner, Xiaowen Dong
[ABSTRACT]
Recent research indicates that the performance of machine learning models can
be improved by aligning the geometry of the latent space with the underlying
data structure. Rather than relying solely on Euclidean space, researchers have
proposed using hyperbolic and spherical spaces with constant curvature, or
combinations thereof, to better model the latent space and enhance model
performance. However, little attention has been given to the problem of
automatically identifying the optimal latent geometry for the downstream task.
We mathematically define this novel formulation and coin it as neural latent
geometry search (NLGS). More specifically, we introduce a principled method
that searches for a latent geometry composed of a product of constant curvature
model spaces with minimal query evaluations. To accomplish this, we propose a
novel notion of distance between candidate latent geometries based on the
Gromov-Hausdorff distance from metric geometry. In order to compute the
Gromov-Hausdorff distance, we introduce a mapping function that enables the
comparison of different manifolds by embedding them in a common
high-dimensional ambient space. Finally, we design a graph search space based
on the calculated distances between candidate manifolds and use Bayesian
optimization to search for the optimal latent geometry in a query-efficient
manner. This is a general method which can be applied to search for the optimal
latent geometry for a variety of models and downstream tasks. Extensive
experiments on synthetic and real-world datasets confirm the efficacy of our
method in identifying the optimal latent geometry for multiple machine learning
problems.
[LINK]
http://arxiv.org/abs/2309.04810v2
[DATE]
2023-09-20 23:21:34+08:00
[CATEGORIES]
cs.LG
Adaptive PD Control using Deep Reinforcement Learning for Local-Remote Teleoperation with Stochastic Time Delays
[AUTHORS]
Luc McCutcheon, Saber Fallah
[ABSTRACT]
Local-remote systems allow robots to execute complex tasks in hazardous
environments such as space and nuclear power stations. However, establishing
accurate positional mapping between local and remote devices can be difficult
due to time delays that can compromise system performance and stability.
Enhancing the synchronicity and stability of local-remote systems is vital for
enabling robots to interact with environments at greater distances and under
highly challenging network conditions, including time delays. We introduce an
adaptive control method employing reinforcement learning to tackle the
time-delayed control problem. By adjusting controller parameters in real-time,
this adaptive controller compensates for stochastic delays and improves
synchronicity between local and remote robotic manipulators. To improve the
adaptive PD controller’s performance, we devise a model-based reinforcement
learning approach that effectively incorporates multi-step delays into the
learning framework. Utilizing this proposed technique, the local-remote
system’s performance is stabilized for stochastic communication time-delays of
up to 290ms. Our results demonstrate that the suggested model-based
reinforcement learning method surpasses the Soft-Actor Critic and augmented
state Soft-Actor Critic techniques. Access the code at:
https://github.com/CAV-Research-Lab/Predictive-Model-Delay-Correction
[COMMENTS]
7 pages + 1 references, 4 figures
[LINK]
http://arxiv.org/abs/2305.16979v2
[DATE]
2023-09-20 23:09:20+08:00
[CATEGORIES]
cs.LG
Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees
[AUTHORS]
Alexander Terenin, David R. Burt, Artem Artemev, Seth Flaxman, Mark van der Wilk, Carl Edward Rasmussen, Hong Ge
[ABSTRACT]
Gaussian processes are frequently deployed as part of larger machine learning
and decision-making systems, for instance in geospatial modeling, Bayesian
optimization, or in latent Gaussian models. Within a system, the Gaussian
process model needs to perform in a stable and reliable manner to ensure it
interacts correctly with other parts of the system. In this work, we study the
numerical stability of scalable sparse approximations based on inducing points.
To do so, we first review numerical stability, and illustrate typical
situations in which Gaussian process models can be unstable. Building on
stability theory originally developed in the interpolation literature, we
derive sufficient and in certain cases necessary conditions on the inducing
points for the computations performed to be numerically stable. For
low-dimensional tasks such as geospatial modeling, we propose an automated
method for computing inducing points satisfying these conditions. This is done
via a modification of the cover tree data structure, which is of independent
interest. We additionally propose an alternative sparse approximation for
regression with a Gaussian likelihood which trades off a small amount of
performance to further improve stability. We provide illustrative examples
showing the relationship between stability of calculations and predictive
performance of inducing point methods on spatial tasks.
[LINK]
http://arxiv.org/abs/2210.07893v2
[DATE]
2023-09-20 23:08:47+08:00
[CATEGORIES]
cs.LG
Multi-Head Graph Convolutional Network for Structural Connectome Classification
[AUTHORS]
Anees Kazi, Jocelyn Mora, Bruce Fischl, Adrian V. Dalca, Iman Aganj
[ABSTRACT]
We tackle classification based on brain connectivity derived from diffusion
magnetic resonance images. We propose a machine-learning model inspired by
graph convolutional networks (GCNs), which takes a brain connectivity input
graph and processes the data separately through a parallel GCN mechanism with
multiple heads. The proposed network is a simple design that employs different
heads involving graph convolutions focused on edges and nodes, capturing
representations from the input data thoroughly. To test the ability of our
model to extract complementary and representative features from brain
connectivity data, we chose the task of sex classification. This quantifies the
degree to which the connectome varies depending on the sex, which is important
for improving our understanding of health and disease in both sexes. We show
experiments on two publicly available datasets: PREVENT-AD (347 subjects) and
OASIS3 (771 subjects). The proposed model demonstrates the highest performance
compared to the existing machine-learning algorithms we tested, including
classical methods and (graph and non-graph) deep learning. We provide a
detailed analysis of each component of our model.
[LINK]
http://arxiv.org/abs/2305.02199v2
[DATE]
2023-09-20 23:03:08+08:00
[CATEGORIES]
cs.LG
Black-box Generalization of Machine Teaching
[AUTHORS]
Xiaofeng Cao, Yaming Guo, Ivor W. Tsang, James T. Kwok
[ABSTRACT]
Hypothesis-pruning maximizes the hypothesis updates for active learning to
find those desired unlabeled data. An inherent assumption is that this learning
manner can derive those updates into the optimal hypothesis. However, its
convergence may not be guaranteed well if those incremental updates are
negative and disordered. In this paper, we introduce a black-box teaching
hypothesis $h^\mathcal{T}$ employing a tighter slack term
$\left(1+\mathcal{F}^{\mathcal{T}}(\widehat{h}t)\right)\Delta_t$ to replace
the typical $2\Delta_t$ for pruning. Theoretically, we prove that, under the
guidance of this teaching hypothesis, the learner can converge into a tighter
generalization error and label complexity bound than those non-educated
learners who do not receive any guidance from a teacher:1) the generalization
error upper bound can be reduced from $R(h^*)+4\Delta{T-1}$ to approximately
$R(h^{\mathcal{T}})+2\Delta_{T-1}$, and 2) the label complexity upper bound can
be decreased from $4 \theta\left(TR(h^{})+2O(\sqrt{T})\right)$ to
approximately $2\theta\left(2TR(h^{\mathcal{T}})+3 O(\sqrt{T})\right)$. To be
strict with our assumption, self-improvement of teaching is firstly proposed
when $h^\mathcal{T}$ loosely approximates $h^$. Against learning, we further
consider two teaching scenarios: teaching a white-box and black-box learner.
Experiments verify this idea and show better generalization performance than
the fundamental active learning strategies, such as IWAL, IWAL-D, etc.
[LINK]
http://arxiv.org/abs/2206.15205v2
[DATE]
2023-09-20 23:01:26+08:00
[CATEGORIES]
cs.LG
Learning Patient Static Information from Time-series EHR and an Approach for Safeguarding Privacy and Fairness
[AUTHORS]
Wei Liao, Joel Voldman
[ABSTRACT]
Recent work in machine learning for healthcare has raised concerns about
patient privacy and algorithmic fairness. For example, previous work has shown
that patient self-reported race can be predicted from medical data that does
not explicitly contain racial information. However, the extent of data
identification is unknown, and we lack ways to develop models whose outcomes
are minimally affected by such information. Here we systematically investigated
the ability of time-series electronic health record data to predict patient
static information. We found that not only the raw time-series data, but also
learned representations from machine learning models, can be trained to predict
a variety of static information with area under the receiver operating
characteristic curve as high as 0.851 for biological sex, 0.869 for binarized
age and 0.810 for self-reported race. Such high predictive performance can be
extended to a wide range of comorbidity factors and exists even when the model
was trained for different tasks, using different cohorts, using different model
architectures and databases. Given the privacy and fairness concerns these
findings pose, we develop a variational autoencoder-based approach that learns
a structured latent space to disentangle patient-sensitive attributes from
time-series data. Our work thoroughly investigates the ability of machine
learning models to encode patient static information from time-series
electronic health records and introduces a general approach to protect
patient-sensitive attribute information for downstream tasks.
[LINK]
http://arxiv.org/abs/2309.11373v1
[DATE]
2023-09-20 22:54:48+08:00
[CATEGORIES]
cs.LG
Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability
[AUTHORS]
Sitan Chen, Frederic Koehler, Ankur Moitra, Morris Yau
[ABSTRACT]
In this paper we revisit some classic problems on classification under
misspecification. In particular, we study the problem of learning halfspaces
under Massart noise with rate $\eta$. In a recent work, Diakonikolas,
Goulekakis, and Tzamos resolved a long-standing problem by giving the first
efficient algorithm for learning to accuracy $\eta + \epsilon$ for any
$\epsilon > 0$. However, their algorithm outputs a complicated hypothesis,
which partitions space into $\text{poly}(d,1/\epsilon)$ regions. Here we give a
much simpler algorithm and in the process resolve a number of outstanding open
questions:
(1) We give the first proper learner for Massart halfspaces that achieves
$\eta + \epsilon$. We also give improved bounds on the sample complexity
achievable by polynomial time algorithms.
(2) Based on (1), we develop a blackbox knowledge distillation procedure to
convert an arbitrarily complex classifier to an equally good proper classifier.
(3) By leveraging a simple but overlooked connection to evolvability, we show
any SQ algorithm requires super-polynomially many queries to achieve
$\mathsf{OPT} + \epsilon$.
Moreover we study generalized linear models where $\mathbb{E}[Y|\mathbf{X}] =
\sigma(\langle \mathbf{w}^*, \mathbf{X}\rangle)$ for any odd, monotone, and
Lipschitz function $\sigma$. This family includes the previously mentioned
halfspace models as a special case, but is much richer and includes other
fundamental models like logistic regression. We introduce a challenging new
corruption model that generalizes Massart noise, and give a general algorithm
for learning in this setting. Our algorithms are based on a small set of core
recipes for learning to classify in the presence of misspecification.
Finally we study our algorithm for learning halfspaces under Massart noise
empirically and find that it exhibits some appealing fairness properties.
[COMMENTS]
52 pages, v2: updated references
[LINK]
http://arxiv.org/abs/2006.04787v2
[DATE]
2023-09-20 22:40:02+08:00
[CATEGORIES]
cs.LG
C$\cdot$ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters
[AUTHORS]
Zhiyang Dou, Xuelin Chen, Qingnan Fan, Taku Komura, Wenping Wang
[ABSTRACT]
We present C$\cdot$ASE, an efficient and effective framework that learns
conditional Adversarial Skill Embeddings for physics-based characters. Our
physically simulated character can learn a diverse repertoire of skills while
providing controllability in the form of direct manipulation of the skills to
be performed. C$\cdot$ASE divides the heterogeneous skill motions into distinct
subsets containing homogeneous samples for training a low-level conditional
model to learn conditional behavior distribution. The skill-conditioned
imitation learning naturally offers explicit control over the character’s
skills after training. The training course incorporates the focal skill
sampling, skeletal residual forces, and element-wise feature masking to balance
diverse skills of varying complexities, mitigate dynamics mismatch to master
agile motions and capture more general behavior characteristics, respectively.
Once trained, the conditional model can produce highly diverse and realistic
skills, outperforming state-of-the-art models, and can be repurposed in various
downstream tasks. In particular, the explicit skill control handle allows a
high-level policy or user to direct the character with desired skill
specifications, which we demonstrate is advantageous for interactive character
animation.
[COMMENTS]
SIGGRAPH Asia 2023
[LINK]
http://arxiv.org/abs/2309.11351v1
[DATE]
2023-09-20 22:34:45+08:00
[CATEGORIES]
cs.LG
Using Property Elicitation to Understand the Impacts of Fairness Constraints
[AUTHORS]
Jessie Finocchiaro
[ABSTRACT]
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
[COMMENTS]
Please reach out if you have comments or thoughts; this is a living
project
[LINK]
http://arxiv.org/abs/2309.11343v1
[DATE]
2023-09-20 22:20:56+08:00
[CATEGORIES]
cs.LG
PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning
[AUTHORS]
Utsav Singh, Vinay P Namboodiri
[ABSTRACT]
Hierarchical reinforcement learning (HRL) has the potential to solve complex
long horizon tasks using temporal abstraction and increased exploration.
However, hierarchical agents are difficult to train due to inherent
non-stationarity. We present primitive enabled adaptive relabeling (PEAR), a
two-phase approach where we first perform adaptive relabeling on a few expert
demonstrations to generate efficient subgoal supervision, and then jointly
optimize HRL agents by employing reinforcement learning (RL) and imitation
learning (IL). We perform theoretical analysis to $(i)$ bound the
sub-optimality of our approach, and $(ii)$ derive a generalized plug-and-play
framework for joint optimization using RL and IL. PEAR uses a handful of expert
demonstrations and makes minimal limiting assumptions on the task structure.
Additionally, it can be easily integrated with typical model free RL algorithms
to produce a practical HRL algorithm. We perform experiments on challenging
robotic environments and show that PEAR is able to solve tasks that require
long term decision making. We empirically show that PEAR exhibits improved
performance and sample efficiency over previous hierarchical and
non-hierarchical approaches. We also perform real world robotic experiments on
complex tasks and demonstrate that PEAR consistently outperforms the baselines.
[LINK]
http://arxiv.org/abs/2306.06394v2
[DATE]
2023-09-20 22:04:27+08:00
[CATEGORIES]
cs.LG
CRISP: Curriculum inducing Primitive Informed Subgoal Prediction
[AUTHORS]
Utsav Singh, Vinay P Namboodiri
[ABSTRACT]
Hierarchical reinforcement learning is a promising approach that uses
temporal abstraction to solve complex long horizon problems. However,
simultaneously learning a hierarchy of policies is unstable as it is
challenging to train higher-level policy when the lower-level primitive is
non-stationary. In this paper, we propose a novel hierarchical algorithm CRISP
to generate a curriculum of achievable subgoals for evolving lower-level
primitives using reinforcement learning and imitation learning. The lower level
primitive periodically performs data relabeling on a handful of expert
demonstrations using our primitive informed parsing approach to handle
non-stationarity. Since our approach uses a handful of expert demonstrations,
it is suitable for most robotic control tasks. Experimental evaluations on
complex robotic maze navigation and robotic manipulation environments show that
inducing hierarchical curriculum learning significantly improves sample
efficiency, and results in efficient goal conditioned policies for solving
temporally extended tasks. We perform real world robotic experiments on complex
manipulation tasks and demonstrate that CRISP consistently outperforms the
baselines.
[LINK]
http://arxiv.org/abs/2304.03535v2
[DATE]
2023-09-20 21:58:28+08:00
[CATEGORIES]
cs.LG
Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models
[AUTHORS]
Daniel Wolf, Tristan Payer, Catharina Silvia Lisson, Christoph Gerhard Lisson, Meinrad Beer, Michael Götz, Timo Ropinski
[ABSTRACT]
Deep learning in medical imaging has the potential to minimize the risk of
diagnostic errors, reduce radiologist workload, and accelerate diagnosis.
Training such deep learning models requires large and accurate datasets, with
annotations for all training samples. However, in the medical imaging domain,
annotated datasets for specific tasks are often small due to the high
complexity of annotations, limited access, or the rarity of diseases. To
address this challenge, deep learning models can be pre-trained on large image
datasets without annotations using methods from the field of self-supervised
learning. After pre-training, small annotated datasets are sufficient to
fine-tune the models for a specific task. The most popular self-supervised
pre-training approaches in medical imaging are based on contrastive learning.
However, recent studies in natural image processing indicate a strong potential
for masked autoencoder approaches. Our work compares state-of-the-art
contrastive learning methods with the recently introduced masked autoencoder
approach “SparK” for convolutional neural networks (CNNs) on medical images.
Therefore we pre-train on a large unannotated CT image dataset and fine-tune on
several CT classification tasks. Due to the challenge of obtaining sufficient
annotated training data in medical imaging, it is of particular interest to
evaluate how the self-supervised pre-training methods perform when fine-tuning
on small datasets. By experimenting with gradually reducing the training
dataset size for fine-tuning, we find that the reduction has different effects
depending on the type of pre-training chosen. The SparK pre-training method is
more robust to the training dataset size than the contrastive methods. Based on
our results, we propose the SparK pre-training for medical imaging tasks with
only small annotated datasets.
[COMMENTS]
This paper is under review. The code will be released if accepted
[LINK]
http://arxiv.org/abs/2308.06534v3
[DATE]
2023-09-20 21:51:06+08:00
[CATEGORIES]
cs.LG
WFTNet: Exploiting Global and Local Periodicity in Long-term Time Series Forecasting
[AUTHORS]
Peiyuan Liu, Beiliang Wu, Naiqi Li, Tao Dai, Fengmao Lei, Jigang Bao, Yong Jiang, Shu-Tao Xia
[ABSTRACT]
Recent CNN and Transformer-based models tried to utilize frequency and
periodicity information for long-term time series forecasting. However, most
existing work is based on Fourier transform, which cannot capture fine-grained
and local frequency structure. In this paper, we propose a Wavelet-Fourier
Transform Network (WFTNet) for long-term time series forecasting. WFTNet
utilizes both Fourier and wavelet transforms to extract comprehensive
temporal-frequency information from the signal, where Fourier transform
captures the global periodic patterns and wavelet transform captures the local
ones. Furthermore, we introduce a Periodicity-Weighted Coefficient (PWC) to
adaptively balance the importance of global and local frequency patterns.
Extensive experiments on various time series datasets show that WFTNet
consistently outperforms other state-of-the-art baseline.
[LINK]
http://arxiv.org/abs/2309.11319v1
[DATE]
2023-09-20 21:44:18+08:00
[CATEGORIES]
cs.LG
Inferring effective couplings with Restricted Boltzmann Machines
[AUTHORS]
Aurélien Decelle, Cyril Furtlehner, Alfonso De Jesus Navas Gómez, Beatriz Seoane
[ABSTRACT]
Generative models offer a direct way to model complex data. Among them,
energy-based models provide us with a neural network model that aims to
accurately reproduce all statistical correlations observed in the data at the
level of the Boltzmann weight of the model. However, one challenge is to
understand the physical interpretation of such models. In this study, we
propose a simple solution by implementing a direct mapping between the energy
function of the Restricted Boltzmann Machine and an effective Ising spin
Hamiltonian that includes high-order interactions between spins. This mapping
includes interactions of all possible orders, going beyond the conventional
pairwise interactions typically considered in the inverse Ising approach, and
allowing the description of complex datasets. Earlier works attempted to
achieve this goal, but the proposed mappings did not do properly treat the
complexity of the problem or did not contain direct prescriptions for practical
application. To validate our method, we performed several controlled numerical
experiments where we trained the RBMs using equilibrium samples of predefined
models containing local external fields, two-body and three-body interactions
in various low-dimensional topologies. The results demonstrate the
effectiveness of our proposed approach in learning the correct interaction
network and pave the way for its application in modeling interesting datasets.
We also evaluate the quality of the inferred model based on different training
methods.
[COMMENTS]
15 figures, 31 pages
[LINK]
http://arxiv.org/abs/2309.02292v2
[DATE]
2023-09-20 21:31:42+08:00
[CATEGORIES]
cs.LG
Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information
[AUTHORS]
Sarwan Ali
[ABSTRACT]
Effective representation of data is crucial in various machine learning
tasks, as it captures the underlying structure and context of the data.
Embeddings have emerged as a powerful technique for data representation, but
evaluating their quality and capacity to preserve structural and contextual
information remains a challenge. In this paper, we address this need by
proposing a method to measure the \textit{representation capacity} of
embeddings. The motivation behind this work stems from the importance of
understanding the strengths and limitations of embeddings, enabling researchers
and practitioners to make informed decisions in selecting appropriate embedding
models for their specific applications. By combining extrinsic evaluation
methods, such as classification and clustering, with t-SNE-based neighborhood
analysis, such as neighborhood agreement and trustworthiness, we provide a
comprehensive assessment of the representation capacity. Additionally, the use
of optimization techniques (bayesian optimization) for weight optimization (for
classification, clustering, neighborhood agreement, and trustworthiness)
ensures an objective and data-driven approach in selecting the optimal
combination of metrics. The proposed method not only contributes to advancing
the field of embedding evaluation but also empowers researchers and
practitioners with a quantitative measure to assess the effectiveness of
embeddings in capturing structural and contextual information. For the
evaluation, we use $3$ real-world biological sequence (proteins and nucleotide)
datasets and performed representation capacity analysis of $4$ embedding
methods from the literature, namely Spike2Vec, Spaced $k$-mers, PWM2Vec, and
AutoEncoder.
[COMMENTS]
Accepted at ISBRA 2023
[LINK]
http://arxiv.org/abs/2309.11294v1
[DATE]
2023-09-20 21:21:12+08:00
[CATEGORIES]
cs.LG
From Classification to Segmentation with Explainable AI: A Study on Crack Detection and Growth Monitoring
[AUTHORS]
Florent Forest, Hugo Porta, Devis Tuia, Olga Fink
[ABSTRACT]
Monitoring surface cracks in infrastructure is crucial for structural health
monitoring. Automatic visual inspection offers an effective solution,
especially in hard-to-reach areas. Machine learning approaches have proven
their effectiveness but typically require large annotated datasets for
supervised training. Once a crack is detected, monitoring its severity often
demands precise segmentation of the damage. However, pixel-level annotation of
images for segmentation is labor-intensive. To mitigate this cost, one can
leverage explainable artificial intelligence (XAI) to derive segmentations from
the explanations of a classifier, requiring only weak image-level supervision.
This paper proposes applying this methodology to segment and monitor surface
cracks. We evaluate the performance of various XAI methods and examine how this
approach facilitates severity quantification and growth monitoring. Results
reveal that while the resulting segmentation masks may exhibit lower quality
than those produced by supervised methods, they remain meaningful and enable
severity monitoring, thus reducing substantial labeling costs.
[COMMENTS]
43 pages. Under review
[LINK]
http://arxiv.org/abs/2309.11267v1
[DATE]
2023-09-20 20:50:52+08:00
[CATEGORIES]
cs.LG
Sparse Index Tracking: Simultaneous Asset Selection and Capital Allocation via $\ell_0$-Constrained Portfolio
[AUTHORS]
Eisuke Yamagata, Shunsuke Ono
[ABSTRACT]
Sparse index tracking is one of the prominent passive portfolio management
strategies that construct a sparse portfolio to track a financial index. A
sparse portfolio is desirable over a full portfolio in terms of transaction
cost reduction and avoiding illiquid assets. To enforce the sparsity of the
portfolio, conventional studies have proposed formulations based on
$\ell_p$-norm regularizations as a continuous surrogate of the $\ell_0$-norm
regularization. Although such formulations can be used to construct sparse
portfolios, they are not easy to use in actual investments because parameter
tuning to specify the exact upper bound on the number of assets in the
portfolio is delicate and time-consuming. In this paper, we propose a new
problem formulation of sparse index tracking using an $\ell_0$-norm constraint
that enables easy control of the upper bound on the number of assets in the
portfolio. In addition, our formulation allows the choice between portfolio
sparsity and turnover sparsity constraints, which also reduces transaction
costs by limiting the number of assets that are updated at each rebalancing.
Furthermore, we develop an efficient algorithm for solving this problem based
on a primal-dual splitting method. Finally, we illustrate the effectiveness of
the proposed method through experiments on the S\&P500 and NASDAQ100 index
datasets.
[COMMENTS]
Submitted to IEEE Open Journal of Signal Processing
[LINK]
http://arxiv.org/abs/2309.10152v2
[DATE]
2023-09-20 20:16:12+08:00
[CATEGORIES]
cs.LG
Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering
[AUTHORS]
Ardian Selmonaj, Oleg Szehr, Giacomo Del Rio, Alessandro Antonucci, Adrian Schneider, Michael Rüegsegger
[ABSTRACT]
The application of artificial intelligence to simulate air-to-air combat
scenarios is attracting increasing attention. To date the high-dimensional
state and action spaces, the high complexity of situation information (such as
imperfect and filtered information, stochasticity, incomplete knowledge about
mission targets) and the nonlinear flight dynamics pose significant challenges
for accurate air combat decision-making. These challenges are exacerbated when
multiple heterogeneous agents are involved. We propose a hierarchical
multi-agent reinforcement learning framework for air-to-air combat with
multiple heterogeneous agents. In our framework, the decision-making process is
divided into two stages of abstraction, where heterogeneous low-level policies
control the action of individual units, and a high-level commander policy
issues macro commands given the overall mission targets. Low-level policies are
trained for accurate unit combat control. Their training is organized in a
learning curriculum with increasingly complex training scenarios and
league-based self-play. The commander policy is trained on mission targets
given pre-trained low-level policies. The empirical validation advocates the
advantages of our design choices.
[COMMENTS]
22nd International Conference on Machine Learning and Applications
(ICMLA 23)
[LINK]
http://arxiv.org/abs/2309.11247v1
[DATE]
2023-09-20 20:16:00+08:00
[CATEGORIES]
cs.LG
Towards a Prediction of Machine Learning Training Time to Support Continuous Learning Systems Development
[AUTHORS]
Francesca Marzi, Giordano d’Aloisio, Antinisca Di Marco, Giovanni Stilo
[ABSTRACT]
The problem of predicting the training time of machine learning (ML) models
has become extremely relevant in the scientific community. Being able to
predict a priori the training time of an ML model would enable the automatic
selection of the best model both in terms of energy efficiency and in terms of
performance in the context of, for instance, MLOps architectures. In this
paper, we present the work we are conducting towards this direction. In
particular, we present an extensive empirical study of the Full Parameter Time
Complexity (FPTC) approach by Zheng et al., which is, to the best of our
knowledge, the only approach formalizing the training time of ML models as a
function of both dataset’s and model’s parameters. We study the formulations
proposed for the Logistic Regression and Random Forest classifiers, and we
highlight the main strengths and weaknesses of the approach. Finally, we
observe how, from the conducted study, the prediction of training time is
strictly related to the context (i.e., the involved dataset) and how the FPTC
approach is not generalizable.
[LINK]
http://arxiv.org/abs/2309.11226v1
[DATE]
2023-09-20 19:35:03+08:00
[CATEGORIES]
cs.LG
Optimal subgroup selection
[AUTHORS]
Henry W. J. Reeve, Timothy I. Cannings, Richard J. Samworth
[ABSTRACT]
In clinical trials and other applications, we often see regions of the
feature space that appear to exhibit interesting behaviour, but it is unclear
whether these observed phenomena are reflected at the population level.
Focusing on a regression setting, we consider the subgroup selection challenge
of identifying a region of the feature space on which the regression function
exceeds a pre-determined threshold. We formulate the problem as one of
constrained optimisation, where we seek a low-complexity, data-dependent
selection set on which, with a guaranteed probability, the regression function
is uniformly at least as large as the threshold; subject to this constraint, we
would like the region to contain as much mass under the marginal feature
distribution as possible. This leads to a natural notion of regret, and our
main contribution is to determine the minimax optimal rate for this regret in
both the sample size and the Type I error probability. The rate involves a
delicate interplay between parameters that control the smoothness of the
regression function, as well as exponents that quantify the extent to which the
optimal selection set at the population level can be approximated by families
of well-behaved subsets. Finally, we expand the scope of our previous results
by illustrating how they may be generalised to a treatment and control setting,
where interest lies in the heterogeneous treatment effect.
[COMMENTS]
65 pages, 2 figures, to appear in the Annals of Statistics
[LINK]
http://arxiv.org/abs/2109.01077v3
[DATE]
2023-09-20 19:29:48+08:00
[CATEGORIES]
cs.LG
A Unified Active Learning Framework for Annotating Graph Data with Application to Software Source Code Performance Prediction
[AUTHORS]
Peter Samoaa, Linus Aronsson, Antonio Longa, Philipp Leitner, Morteza Haghir Chehreghani
[ABSTRACT]
Most machine learning and data analytics applications, including performance
engineering in software systems, require a large number of annotations and
labelled data, which might not be available in advance. Acquiring annotations
often requires significant time, effort, and computational resources, making it
challenging. We develop a unified active learning framework specializing in
software performance prediction to address this task. We begin by parsing the
source code to an Abstract Syntax Tree (AST) and augmenting it with data and
control flow edges. Then, we convert the tree representation of the source code
to a Flow Augmented-AST graph (FA-AST) representation. Based on the graph
representation, we construct various graph embeddings (unsupervised and
supervised) into a latent space. Given such an embedding, the framework becomes
task agnostic since active learning can be performed using any regression
method and query strategy suited for regression. Within this framework, we
investigate the impact of using different levels of information for active and
passive learning, e.g., partially available labels and unlabeled test data. Our
approach aims to improve the investment in AI models for different software
performance predictions (execution time) based on the structure of the source
code. Our real-world experiments reveal that respectable performance can be
achieved by querying labels for only a small subset of all the data.
[LINK]
http://arxiv.org/abs/2304.13032v2
[DATE]
2023-09-20 19:18:08+08:00
[CATEGORIES]
cs.LG
A Model-Based Machine Learning Approach for Assessing the Performance of Blockchain Applications
[AUTHORS]
Adel Albshri, Ali Alzubaidi, Ellis Solaiman
[ABSTRACT]
The recent advancement of Blockchain technology consolidates its status as a
viable alternative for various domains. However, evaluating the performance of
blockchain applications can be challenging due to the underlying
infrastructure’s complexity and distributed nature. Therefore, a reliable
modelling approach is needed to boost Blockchain-based applications’
development and evaluation. While simulation-based solutions have been
researched, machine learning (ML) model-based techniques are rarely discussed
in conjunction with evaluating blockchain application performance. Our novel
research makes use of two ML model-based methods. Firstly, we train a $k$
nearest neighbour ($k$NN) and support vector machine (SVM) to predict
blockchain performance using predetermined configuration parameters. Secondly,
we employ the salp swarm optimization (SO) ML model which enables the
investigation of optimal blockchain configurations for achieving the required
performance level. We use rough set theory to enhance SO, hereafter called ISO,
which we demonstrate to prove achieving an accurate recommendation of optimal
parameter configurations; despite uncertainty. Finally, statistical comparisons
indicate that our models have a competitive edge. The $k$NN model outperforms
SVM by 5\% and the ISO also demonstrates a reduction of 4\% inaccuracy
deviation compared to regular SO.
[LINK]
http://arxiv.org/abs/2309.11205v1
[DATE]
2023-09-20 18:39:21+08:00
[CATEGORIES]
cs.LG
When to Trust AI: Advances and Challenges for Certification of Neural Networks
[AUTHORS]
Marta Kwiatkowska, Xiyue Zhang
[ABSTRACT]
Artificial intelligence (AI) has been advancing at a fast pace and it is now
poised for deployment in a wide range of applications, such as autonomous
systems, medical diagnosis and natural language processing. Early adoption of
AI technology for real-world applications has not been without problems,
particularly for neural networks, which may be unstable and susceptible to
adversarial examples. In the longer term, appropriate safety assurance
techniques need to be developed to reduce potential harm due to avoidable
system failures and ensure trustworthiness. Focusing on certification and
explainability, this paper provides an overview of techniques that have been
developed to ensure safety of AI decisions and discusses future challenges.
[LINK]
http://arxiv.org/abs/2309.11196v1
[DATE]
2023-09-20 18:31:09+08:00
[CATEGORIES]
cs.LG
RHALE: Robust and Heterogeneity-aware Accumulated Local Effects
[AUTHORS]
Vasilis Gkolemis, Theodore Dalamagas, Eirini Ntoutsi, Christos Diou
[ABSTRACT]
Accumulated Local Effects (ALE) is a widely-used explainability method for
isolating the average effect of a feature on the output, because it handles
cases with correlated features well. However, it has two limitations. First, it
does not quantify the deviation of instance-level (local) effects from the
average (global) effect, known as heterogeneity. Second, for estimating the
average effect, it partitions the feature domain into user-defined, fixed-sized
bins, where different bin sizes may lead to inconsistent ALE estimations. To
address these limitations, we propose Robust and Heterogeneity-aware ALE
(RHALE). RHALE quantifies the heterogeneity by considering the standard
deviation of the local effects and automatically determines an optimal
variable-size bin-splitting. In this paper, we prove that to achieve an
unbiased approximation of the standard deviation of local effects within each
bin, bin splitting must follow a set of sufficient conditions. Based on these
conditions, we propose an algorithm that automatically determines the optimal
partitioning, balancing the estimation bias and variance. Through evaluations
on synthetic and real datasets, we demonstrate the superiority of RHALE
compared to other methods, including the advantages of automatic bin splitting,
especially in cases with correlated features.
[COMMENTS]
Accepted at ECAI 2023 (European Conference on Artificial
Intelligence)
[LINK]
http://arxiv.org/abs/2309.11193v1
[DATE]
2023-09-20 18:27:41+08:00
[CATEGORIES]
cs.LG
Boosting Object Representation Learning via Motion and Object Continuity
[AUTHORS]
Quentin Delfosse, Wolfgang Stammer, Thomas Rothenbacher, Dwarak Vittal, Kristian Kersting
[ABSTRACT]
Recent unsupervised multi-object detection models have shown impressive
performance improvements, largely attributed to novel architectural inductive
biases. Unfortunately, they may produce suboptimal object encodings for
downstream tasks. To overcome this, we propose to exploit object motion and
continuity, i.e., objects do not pop in and out of existence. This is
accomplished through two mechanisms: (i) providing priors on the location of
objects through integration of optical flow, and (ii) a contrastive object
continuity loss across consecutive image frames. Rather than developing an
explicit deep architecture, the resulting Motion and Object Continuity (MOC)
scheme can be instantiated using any baseline object detection model. Our
results show large improvements in the performances of a SOTA model in terms of
object discovery, convergence speed and overall latent object representations,
particularly for playing Atari games. Overall, we show clear benefits of
integrating motion and object continuity for downstream tasks, moving beyond
object representation learning based only on reconstruction.
[COMMENTS]
8 pages main text, 32 tables, 21 Figures
[LINK]
http://arxiv.org/abs/2211.09771v2
[DATE]
2023-09-20 18:02:19+08:00
[CATEGORIES]
cs.LG
A Novel Convolutional Neural Network Architecture with a Continuous Symmetry
[AUTHORS]
Yao Liu, Hang Shao, Bing Bai
[ABSTRACT]
This paper introduces a new Convolutional Neural Network (ConvNet)
architecture inspired by a class of partial differential equations (PDEs)
called quasi-linear hyperbolic systems. With comparable performance on the
image classification task, it allows for the modification of the weights via a
continuous group of symmetry. This is a significant shift from traditional
models where the architecture and weights are essentially fixed. We wish to
promote the (internal) symmetry as a new desirable property for a neural
network, and to draw attention to the PDE perspective in analyzing and
interpreting ConvNets in the broader Deep Learning community.
[COMMENTS]
Accepted by the 3rd CAAI International Conference on Artificial
Intelligence (CICAI), 2023; with Addendum + minor edits
[LINK]
http://arxiv.org/abs/2308.01621v3
[DATE]
2023-09-20 17:17:25+08:00
[CATEGORIES]
cs.LG
ACTC: Active Threshold Calibration for Cold-Start Knowledge Graph Completion
[AUTHORS]
Anastasiia Sedova, Benjamin Roth
[ABSTRACT]
Self-supervised knowledge-graph completion (KGC) relies on estimating a
scoring model over (entity, relation, entity)-tuples, for example, by embedding
an initial knowledge graph. Prediction quality can be improved by calibrating
the scoring model, typically by adjusting the prediction thresholds using
manually annotated examples. In this paper, we attempt for the first time
cold-start calibration for KGC, where no annotated examples exist initially for
calibration, and only a limited number of tuples can be selected for
annotation. Our new method ACTC finds good per-relation thresholds efficiently
based on a limited set of annotated tuples. Additionally to a few annotated
tuples, ACTC also leverages unlabeled tuples by estimating their correctness
with Logistic Regression or Gaussian Process classifiers. We also experiment
with different methods for selecting candidate tuples for annotation:
density-based and random selection. Experiments with five scoring models and an
oracle annotator show an improvement of 7% points when using ACTC in the
challenging setting with an annotation budget of only 10 tuples, and an average
improvement of 4% points over different budgets.
[COMMENTS]
ACL‘23
[LINK]
http://arxiv.org/abs/2305.06395v3
[DATE]
2023-09-20 17:03:08+08:00
[CATEGORIES]
cs.LG
Investigating Personalization Methods in Text to Music Generation
[AUTHORS]
Manos Plitsis, Theodoros Kouzelis, Georgios Paraskevopoulos, Vassilis Katsouros, Yannis Panagakis
[ABSTRACT]
In this work, we investigate the personalization of text-to-music diffusion
models in a few-shot setting. Motivated by recent advances in the computer
vision domain, we are the first to explore the combination of pre-trained
text-to-audio diffusers with two established personalization methods. We
experiment with the effect of audio-specific data augmentation on the overall
system performance and assess different training strategies. For evaluation, we
construct a novel dataset with prompts and music clips. We consider both
embedding-based and music-specific metrics for quantitative evaluation, as well
as a user study for qualitative evaluation. Our analysis shows that similarity
metrics are in accordance with user preferences and that current
personalization approaches tend to learn rhythmic music constructs more easily
than melody. The code, dataset, and example material of this study are open to
the research community.
[COMMENTS]
Submitted to ICASSP 2024, Examples at https://zelaki.github.io/
[LINK]
http://arxiv.org/abs/2309.11140v1
[DATE]
2023-09-20 16:36:34+08:00
[CATEGORIES]
cs.LG
Differentiable Quantum Architecture Search for Quantum Reinforcement Learning
[AUTHORS]
Yize Sun, Yunpu Ma, Volker Tresp
[ABSTRACT]
Differentiable quantum architecture search (DQAS) is a gradient-based
framework to design quantum circuits automatically in the NISQ era. It was
motivated by such as low fidelity of quantum hardware, low flexibility of
circuit architecture, high circuit design cost, barren plateau (BP) problem,
and periodicity of weights. People used it to address error mitigation, unitary
decomposition, and quantum approximation optimization problems based on fixed
datasets. Quantum reinforcement learning (QRL) is a part of quantum machine
learning and often has various data. QRL usually uses a manually designed
circuit. However, the pre-defined circuit needs more flexibility for different
tasks, and the circuit design based on various datasets could become
intractable in the case of a large circuit. The problem of whether DQAS can be
applied to quantum deep Q-learning with various datasets is still open. The
main target of this work is to discover the capability of DQAS to solve quantum
deep Q-learning problems. We apply a gradient-based framework DQAS on
reinforcement learning tasks and evaluate it in two different environments -
cart pole and frozen lake. It contains input- and output weights, progressive
search, and other new features. The experiments conclude that DQAS can design
quantum circuits automatically and efficiently. The evaluation results show
significant outperformance compared to the manually designed circuit.
Furthermore, the performance of the automatically created circuit depends on
whether the super-circuit learned well during the training process. This work
is the first to show that gradient-based quantum architecture search is
applicable to QRL tasks.
[COMMENTS]
4+1 pages, 3 figures, QCE23 workshop QML
[LINK]
http://arxiv.org/abs/2309.10392v2
[DATE]
2023-09-20 15:54:05+08:00
[CATEGORIES]
cs.LG
RouteNet-Fermi: Network Modeling with Graph Neural Networks
[AUTHORS]
Miquel Ferriol-Galmés, Jordi Paillisse, José Suárez-Varela, Krzysztof Rusek, Shihan Xiao, Xiang Shi, Xiangle Cheng, Pere Barlet-Ros, Albert Cabellos-Aparicio
[ABSTRACT]
Network models are an essential block of modern networks. For example, they
are widely used in network planning and optimization. However, as networks
increase in scale and complexity, some models present limitations, such as the
assumption of Markovian traffic in queuing theory models, or the high
computational cost of network simulators. Recent advances in machine learning,
such as Graph Neural Networks (GNN), are enabling a new generation of network
models that are data-driven and can learn complex non-linear behaviors. In this
paper, we present RouteNet-Fermi, a custom GNN model that shares the same goals
as Queuing Theory, while being considerably more accurate in the presence of
realistic traffic models. The proposed model predicts accurately the delay,
jitter, and packet loss of a network. We have tested RouteNet-Fermi in networks
of increasing size (up to 300 nodes), including samples with mixed traffic
profiles – e.g., with complex non-Markovian models – and arbitrary routing
and queue scheduling configurations. Our experimental results show that
RouteNet-Fermi achieves similar accuracy as computationally-expensive
packet-level simulators and scales accurately to larger networks. Our model
produces delay estimates with a mean relative error of 6.24% when applied to a
test dataset of 1,000 samples, including network topologies one order of
magnitude larger than those seen during training. Finally, we have also
evaluated RouteNet-Fermi with measurements from a physical testbed and packet
traces from a real-life network.
[COMMENTS]
This paper has been accepted for publication at IEEE/ACM Transactions
on Networking 2023 (DOI: 10.1109/TNET.2023.3269983). \copyright 2023 IEEE.
Personal use of this material is permitted. Permission from IEEE must be
obtained for all other uses
[LINK]
http://arxiv.org/abs/2212.12070v3
[DATE]
2023-09-20 15:42:10+08:00
[CATEGORIES]
cs.LG
TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback
[AUTHORS]
Yuxiang Qiu, Karim Djemili, Denis Elezi, Aaneel Shalman, María Pérez-Ortiz, Sahan Bulathwela
[ABSTRACT]
This work describes the TrueLearn Python library, which contains a family of
online learning Bayesian models for building educational (or more generally,
informational) recommendation systems. This family of models was designed
following the “open learner” concept, using humanly-intuitive user
representations. For the sake of interpretability and putting the user in
control, the TrueLearn library also contains different representations to help
end-users visualise the learner models, which may in the future facilitate user
interaction with their own models. Together with the library, we include a
previously publicly released implicit feedback educational dataset with
evaluation metrics to measure the performance of the models. The extensive
documentation and coding examples make the library highly accessible to both
machine learning developers and educational data mining and learning analytic
practitioners. The library and the support documentation with examples are
available at https://truelearn.readthedocs.io/en/latest.
[COMMENTS]
To be presented at the ORSUM workshop at RecSys 2023
[LINK]
http://arxiv.org/abs/2309.11527v1
[DATE]
2023-09-20 15:21:50+08:00
[CATEGORIES]
cs.LG
TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions
[AUTHORS]
Taehyung Kwon, Jihoon Ko, Jinhong Jung, Kijung Shin
[ABSTRACT]
Many real-world datasets are represented as tensors, i.e., multi-dimensional
arrays of numerical values. Storing them without compression often requires
substantial space, which grows exponentially with the order. While many tensor
compression algorithms are available, many of them rely on strong data
assumptions regarding its order, sparsity, rank, and smoothness. In this work,
we propose TENSORCODEC, a lossy compression algorithm for general tensors that
do not necessarily adhere to strong input data assumptions. TENSORCODEC
incorporates three key ideas. The first idea is Neural Tensor-Train
Decomposition (NTTD) where we integrate a recurrent neural network into
Tensor-Train Decomposition to enhance its expressive power and alleviate the
limitations imposed by the low-rank assumption. Another idea is to fold the
input tensor into a higher-order tensor to reduce the space required by NTTD.
Finally, the mode indices of the input tensor are reordered to reveal patterns
that can be exploited by NTTD for improved approximation. Our analysis and
experiments on 8 real-world datasets demonstrate that TENSORCODEC is (a)
Concise: it gives up to 7.38x more compact compression than the best competitor
with similar reconstruction error, (b) Accurate: given the same budget for
compressed size, it yields up to 3.33x more accurate reconstruction than the
best competitor, (c) Scalable: its empirical compression time is linear in the
number of tensor entries, and it reconstructs each entry in logarithmic time.
Our code and datasets are available at https://github.com/kbrother/TensorCodec.
[COMMENTS]
Accepted to ICDM 2023 - IEEE International Conference on Data Mining
2023
[LINK]
http://arxiv.org/abs/2309.10310v2
[DATE]
2023-09-20 15:18:02+08:00
[CATEGORIES]
cs.LG
Delays in Reinforcement Learning
[AUTHORS]
Pierre Liotet
[ABSTRACT]
Delays are inherent to most dynamical systems. Besides shifting the process
in time, they can significantly affect their performance. For this reason, it
is usually valuable to study the delay and account for it. Because they are
dynamical systems, it is of no surprise that sequential decision-making
problems such as Markov decision processes (MDP) can also be affected by
delays. These processes are the foundational framework of reinforcement
learning (RL), a paradigm whose goal is to create artificial agents capable of
learning to maximise their utility by interacting with their environment.
RL has achieved strong, sometimes astonishing, empirical results, but delays
are seldom explicitly accounted for. The understanding of the impact of delay
on the MDP is limited. In this dissertation, we propose to study the delay in
the agent’s observation of the state of the environment or in the execution of
the agent’s actions. We will repeatedly change our point of view on the problem
to reveal some of its structure and peculiarities. A wide spectrum of delays
will be considered, and potential solutions will be presented. This
dissertation also aims to draw links between celebrated frameworks of the RL
literature and the one of delays.
[LINK]
http://arxiv.org/abs/2309.11096v1
[DATE]
2023-09-20 15:04:46+08:00
[CATEGORIES]
cs.LG
Practical Probabilistic Model-based Deep Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling
[AUTHORS]
Wenjun Huang, Yunduan Cui, Huiyun Li, Xinyu Wu
[ABSTRACT]
This paper addresses the prediction stability, prediction accuracy and
control capability of the current probabilistic model-based reinforcement
learning (MBRL) built on neural networks. A novel approach dropout-based
probabilistic ensembles with trajectory sampling (DPETS) is proposed where the
system uncertainty is stably predicted by combining the Monte-Carlo dropout and
trajectory sampling in one framework. Its loss function is designed to correct
the fitting error of neural networks for more accurate prediction of
probabilistic models. The state propagation in its policy is extended to filter
the aleatoric uncertainty for superior control capability. Evaluated by several
Mujoco benchmark control tasks under additional disturbances and one practical
robot arm manipulation task, DPETS outperforms related MBRL approaches in both
average return and convergence velocity while achieving superior performance
than well-known model-free baselines with significant sample efficiency. The
open source code of DPETS is available at https://github.com/mrjun123/DPETS.
[LINK]
http://arxiv.org/abs/2309.11089v1
[DATE]
2023-09-20 14:39:19+08:00
[CATEGORIES]
cs.LG
GPSINDy: Data-Driven Discovery of Equations of Motion
[AUTHORS]
Junette Hsin, Shubhankar Agarwal, Adam Thorpe, David Fridovich-Keil
[ABSTRACT]
In this paper, we consider the problem of discovering dynamical system models
from noisy data. The presence of noise is known to be a significant problem for
symbolic regression algorithms. We combine Gaussian process regression, a
nonparametric learning method, with SINDy, a parametric learning approach, to
identify nonlinear dynamical systems from data. The key advantages of our
proposed approach are its simplicity coupled with the fact that it demonstrates
improved robustness properties with noisy data over SINDy. We demonstrate our
proposed approach on a Lotka-Volterra model and a unicycle dynamic model in
simulation and on an NVIDIA JetRacer system using hardware data. We demonstrate
improved performance over SINDy for discovering the system dynamics and
predicting future trajectories.
[COMMENTS]
Submitted to ICRA 2024
[LINK]
http://arxiv.org/abs/2309.11076v1
[DATE]
2023-09-20 13:44:49+08:00
[CATEGORIES]
cs.LG
InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update
[AUTHORS]
Dan Wu, Zhaoying Li, Tulika Mitra
[ABSTRACT]
Classic Graph Neural Network (GNN) inference approaches, designed for static
graphs, are ill-suited for streaming graphs that evolve with time. The dynamism
intrinsic to streaming graphs necessitates constant updates, posing unique
challenges to acceleration on GPU. We address these challenges based on two key
insights: (1) Inside the $k$-hop neighborhood, a significant fraction of the
nodes is not impacted by the modified edges when the model uses min or max as
aggregation function; (2) When the model weights remain static while the graph
structure changes, node embeddings can incrementally evolve over time by
computing only the impacted part of the neighborhood. With these insights, we
propose a novel method, InkStream, designed for real-time inference with
minimal memory access and computation, while ensuring an identical output to
conventional methods. InkStream operates on the principle of propagating and
fetching data only when necessary. It uses an event-based system to control
inter-layer effect propagation and intra-layer incremental updates of node
embedding. InkStream is highly extensible and easily configurable by allowing
users to create and process customized events. We showcase that less than 10
lines of additional user code are needed to support popular GNN models such as
GCN, GraphSAGE, and GIN. Our experiments with three GNN models on four large
graphs demonstrate that InkStream accelerates by 2.5-427$\times$ on a CPU
cluster and 2.4-343$\times$ on two different GPU clusters while producing
identical outputs as GNN model inference on the latest graph snapshot.
[LINK]
http://arxiv.org/abs/2309.11071v1
[DATE]
2023-09-20 13:34:52+08:00
[CATEGORIES]
cs.LG
Extreme Scenario Selection in Day-Ahead Power Grid Operational Planning
[AUTHORS]
Guillermo Terrén-Serrano, Michael Ludkovski
[ABSTRACT]
We propose and analyze the application of statistical functional depth
metrics for the selection of extreme scenarios in day-ahead grid planning. Our
primary motivation is screening of probabilistic scenarios for realized load
and renewable generation, in order to identify scenarios most relevant for
operational risk mitigation. To handle the high-dimensionality of the scenarios
across asset classes and intra-day periods, we employ functional measures of
depth to sub-select outlying scenarios that are most likely to be the riskiest
for the grid operation. We investigate a range of functional depth measures, as
well as a range of operational risks, including load shedding, operational
costs, reserves shortfall and variable renewable energy curtailment. The
effectiveness of the proposed screening approach is demonstrated through a case
study on the realistic Texas-7k grid.
[LINK]
http://arxiv.org/abs/2309.11067v1
[DATE]
2023-09-20 13:09:09+08:00
[CATEGORIES]
cs.LG
Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks
[AUTHORS]
Nastaran Darabi, Amit R. Trivedi
[ABSTRACT]
Edge computing is a promising solution for handling high-dimensional,
multispectral analog data from sensors and IoT devices for applications such as
autonomous drones. However, edge devices’ limited storage and computing
resources make it challenging to perform complex predictive modeling at the
edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize
energy for deep learning-based inference at the edge. Nevertheless, integrating
storage and processing complicates memory cells and/or memory peripherals,
essentially trading off area efficiency for energy efficiency. This paper
proposes a novel solution to improve area efficiency in deep learning inference
tasks. The proposed method employs two key strategies. Firstly, a Frequency
domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the
necessary parameters for DNN (by 87% in MobileNetV2) and enabling
compute-in-SRAM, which better utilizes parallelism during inference. Secondly,
a memory-immersed collaborative digitization method is described among CiM
arrays to reduce the area overheads of conventional ADCs. This facilitates more
CiM arrays in limited footprint designs, leading to better parallelism and
reduced external memory accesses. Different networking configurations are
explored, where Flash, SA, and their hybrid digitization steps can be
implemented using the memory-immersed scheme. The results are demonstrated
using a 65 nm CMOS test chip, exhibiting significant area and energy savings
compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing
analog data more efficiently, it is possible to selectively retain valuable
data from sensors and alleviate the challenges posed by the analog data deluge.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2307.03863,
arXiv:2309.01771
[LINK]
http://arxiv.org/abs/2309.11048v1
[DATE]
2023-09-20 11:52:04+08:00
[CATEGORIES]
cs.LG
Clustered FedStack: Intermediate Global Models with Bayesian Information Criterion
[AUTHORS]
Thanveer Shaik, Xiaohui Tao, Lin Li, Niall Higgins, Raj Gururajan, Xujuan Zhou, Jianming Yong
[ABSTRACT]
Federated Learning (FL) is currently one of the most popular technologies in
the field of Artificial Intelligence (AI) due to its collaborative learning and
ability to preserve client privacy. However, it faces challenges such as
non-identically and non-independently distributed (non-IID) and data with
imbalanced labels among local clients. To address these limitations, the
research community has explored various approaches such as using local model
parameters, federated generative adversarial learning, and federated
representation learning. In our study, we propose a novel Clustered FedStack
framework based on the previously published Stacked Federated Learning
(FedStack) framework. The local clients send their model predictions and output
layer weights to a server, which then builds a robust global model. This global
model clusters the local clients based on their output layer weights using a
clustering mechanism. We adopt three clustering mechanisms, namely K-Means,
Agglomerative, and Gaussian Mixture Models, into the framework and evaluate
their performance. We use Bayesian Information Criterion (BIC) with the maximum
likelihood function to determine the number of clusters. The Clustered FedStack
models outperform baseline models with clustering mechanisms. To estimate the
convergence of our proposed framework, we use Cyclical learning rates.
[COMMENTS]
This work has been submitted to the ELSEVIER for possible
publication. Copyright may be transferred without notice, after which this
version may no longer be accessible
[LINK]
http://arxiv.org/abs/2309.11044v1
[DATE]
2023-09-20 11:47:53+08:00
[CATEGORIES]
cs.LG
Federated Learning in Intelligent Transportation Systems: Recent Applications and Open Problems
[AUTHORS]
Shiying Zhang, Jun Li, Long Shi, Ming Ding, Dinh C. Nguyen, Wuzheng Tan, Jian Weng, Zhu Han
[ABSTRACT]
Intelligent transportation systems (ITSs) have been fueled by the rapid
development of communication technologies, sensor technologies, and the
Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of
the vehicle networks, it is rather challenging to make timely and accurate
decisions of vehicle behaviors. Moreover, in the presence of mobile wireless
communications, the privacy and security of vehicle information are at constant
risk. In this context, a new paradigm is urgently needed for various
applications in dynamic vehicle environments. As a distributed machine learning
technology, federated learning (FL) has received extensive attention due to its
outstanding privacy protection properties and easy scalability. We conduct a
comprehensive survey of the latest developments in FL for ITS. Specifically, we
initially research the prevalent challenges in ITS and elucidate the
motivations for applying FL from various perspectives. Subsequently, we review
existing deployments of FL in ITS across various scenarios, and discuss
specific potential issues in object recognition, traffic management, and
service providing scenarios. Furthermore, we conduct a further analysis of the
new challenges introduced by FL deployment and the inherent limitations that FL
alone cannot fully address, including uneven data distribution, limited storage
and computing power, and potential privacy and security concerns. We then
examine the existing collaborative technologies that can help mitigate these
challenges. Lastly, we discuss the open challenges that remain to be addressed
in applying FL in ITS and propose several future research directions.
[LINK]
http://arxiv.org/abs/2309.11039v1
[DATE]
2023-09-20 11:39:30+08:00
[CATEGORIES]
cs.LG
A Region-Shrinking-Based Acceleration for Classification-Based Derivative-Free Optimization
[AUTHORS]
Tianyi Han, Jingya Li, Zhipeng Guo, Yuan Jin
[ABSTRACT]
Derivative-free optimization algorithms play an important role in scientific
and engineering design optimization problems, especially when derivative
information is not accessible. In this paper, we study the framework of
classification-based derivative-free optimization algorithms. By introducing a
concept called hypothesis-target shattering rate, we revisit the computational
complexity upper bound of this type of algorithms. Inspired by the revisited
upper bound, we propose an algorithm named “RACE-CARS”, which adds a random
region-shrinking step compared with “SRACOS” (Hu et al., 2017).. We further
establish a theorem showing the acceleration of region-shrinking. Experiments
on the synthetic functions as well as black-box tuning for
language-model-as-a-service demonstrate empirically the efficiency of
“RACE-CARS”. An ablation experiment on the introduced hyperparameters is also
conducted, revealing the mechanism of “RACE-CARS” and putting forward an
empirical hyperparameter-tuning guidance.
[LINK]
http://arxiv.org/abs/2309.11036v1
[DATE]
2023-09-20 11:31:11+08:00
[CATEGORIES]
cs.LG
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation
[AUTHORS]
S. Karthik Mukkavilli, Daniel Salles Civitarese, Johannes Schmude, Johannes Jakubik, Anne Jones, Nam Nguyen, Christopher Phillips, Sujit Roy, Shraddha Singh, Campbell Watson, Raghu Ganti, Hendrik Hamann, Udaysankar Nair, Rahul Ramachandran, Kommy Weldemariam
[ABSTRACT]
Machine learning and deep learning methods have been widely explored in
understanding the chaotic behavior of the atmosphere and furthering weather
forecasting. There has been increasing interest from technology companies,
government institutions, and meteorological agencies in building digital twins
of the Earth. Recent approaches using transformers, physics-informed machine
learning, and graph neural networks have demonstrated state-of-the-art
performance on relatively narrow spatiotemporal scales and specific tasks. With
the recent success of generative artificial intelligence (AI) using pre-trained
transformers for language modeling and vision with prompt engineering and
fine-tuning, we are now moving towards generalizable AI. In particular, we are
witnessing the rise of AI foundation models that can perform competitively on
multiple domain-specific downstream tasks. Despite this progress, we are still
in the nascent stages of a generalizable AI model for global Earth system
models, regional climate models, and mesoscale weather models. Here, we review
current state-of-the-art AI approaches, primarily from transformer and operator
learning literature in the context of meteorology. We provide our perspective
on criteria for success towards a family of foundation models for nowcasting
and forecasting weather and climate predictions. We also discuss how such
models can perform competitively on downstream tasks such as downscaling
(super-resolution), identifying conditions conducive to the occurrence of
wildfires, and predicting consequential meteorological phenomena across various
spatiotemporal scales such as hurricanes and atmospheric rivers. In particular,
we examine current AI methodologies and contend they have matured enough to
design and implement a weather foundation model.
[COMMENTS]
44 pages, 1 figure, updated Fig. 1
[LINK]
http://arxiv.org/abs/2309.10808v2
[DATE]
2023-09-20 11:03:16+08:00
[CATEGORIES]
cs.LG
Uncertainty Quantification in Machine Learning for Engineering Design and Health Prognostics: A Tutorial
[AUTHORS]
Venkat Nemani, Luca Biggio, Xun Huan, Zhen Hu, Olga Fink, Anh Tran, Yan Wang, Xiaoge Zhang, Chao Hu
[ABSTRACT]
On top of machine learning models, uncertainty quantification (UQ) functions
as an essential layer of safety assurance that could lead to more principled
decision making by enabling sound risk assessment and management. The safety
and reliability improvement of ML models empowered by UQ has the potential to
significantly facilitate the broad adoption of ML solutions in high-stakes
decision settings, such as healthcare, manufacturing, and aviation, to name a
few. In this tutorial, we aim to provide a holistic lens on emerging UQ methods
for ML models with a particular focus on neural networks and the applications
of these UQ methods in tackling engineering design as well as prognostics and
health management problems. Toward this goal, we start with a comprehensive
classification of uncertainty types, sources, and causes pertaining to UQ of ML
models. Next, we provide a tutorial-style description of several
state-of-the-art UQ methods: Gaussian process regression, Bayesian neural
network, neural network ensemble, and deterministic UQ methods focusing on
spectral-normalized neural Gaussian process. Established upon the mathematical
formulations, we subsequently examine the soundness of these UQ methods
quantitatively and qualitatively (by a toy regression example) to examine their
strengths and shortcomings from different dimensions. Then, we review
quantitative metrics commonly used to assess the quality of predictive
uncertainty in classification and regression problems. Afterward, we discuss
the increasingly important role of UQ of ML models in solving challenging
problems in engineering design and health prognostics. Two case studies with
source codes available on GitHub are used to demonstrate these UQ methods and
compare their performance in the life prediction of lithium-ion batteries at
the early stage and the remaining useful life prediction of turbofan engines.
[LINK]
http://arxiv.org/abs/2305.04933v2
[DATE]
2023-09-20 11:00:14+08:00
[CATEGORIES]
cs.LG
Dyadic Reinforcement Learning
[AUTHORS]
Shuangning Li, Lluis Salvat Niell, Sung Won Choi, Inbal Nahum-Shani, Guy Shani, Susan Murphy
[ABSTRACT]
Mobile health aims to enhance health outcomes by delivering interventions to
individuals as they go about their daily life. The involvement of care partners
and social support networks often proves crucial in helping individuals
managing burdensome medical conditions. This presents opportunities in mobile
health to design interventions that target the dyadic relationship – the
relationship between a target person and their care partner – with the aim of
enhancing social support. In this paper, we develop dyadic RL, an online
reinforcement learning algorithm designed to personalize intervention delivery
based on contextual factors and past responses of a target person and their
care partner. Here, multiple sets of interventions impact the dyad across
multiple time intervals. The developed dyadic RL is Bayesian and hierarchical.
We formally introduce the problem setup, develop dyadic RL and establish a
regret bound. We demonstrate dyadic RL’s empirical performance through
simulation studies on both toy scenarios and on a realistic test bed
constructed from data collected in a mobile health study.
[LINK]
http://arxiv.org/abs/2308.07843v4
[DATE]
2023-09-20 10:45:36+08:00
[CATEGORIES]
cs.LG
Conformalized Multimodal Uncertainty Regression and Reasoning
[AUTHORS]
Domenico Parente, Nastaran Darabi, Alex C. Stutts, Theja Tulabandhula, Amit Ranjan Trivedi
[ABSTRACT]
This paper introduces a lightweight uncertainty estimator capable of
predicting multimodal (disjoint) uncertainty bounds by integrating conformal
prediction with a deep-learning regressor. We specifically discuss its
application for visual odometry (VO), where environmental features such as
flying domain symmetries and sensor measurements under ambiguities and
occlusion can result in multimodal uncertainties. Our simulation results show
that uncertainty estimates in our framework adapt sample-wise against
challenging operating conditions such as pronounced noise, limited training
data, and limited parametric size of the prediction model. We also develop a
reasoning framework that leverages these robust uncertainty estimates and
incorporates optical flow-based reasoning to improve prediction prediction
accuracy. Thus, by appropriately accounting for predictive uncertainties of
data-driven learning and closing their estimation loop via rule-based
reasoning, our methodology consistently surpasses conventional deep learning
approaches on all these challenging scenarios–pronounced noise, limited
training data, and limited model size-reducing the prediction error by 2-3x.
[LINK]
http://arxiv.org/abs/2309.11018v1
[DATE]
2023-09-20 10:40:59+08:00
[CATEGORIES]
cs.LG
3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images
[AUTHORS]
Yifu Zhang, Zuozhu Liu, Yang Feng, Renjing Xu
[ABSTRACT]
Accurate representation of tooth position is extremely important in
treatment. 3D dental image segmentation is a widely used method, however
labelled 3D dental datasets are a scarce resource, leading to the problem of
small samples that this task faces in many cases. To this end, we address this
problem with a pretrained SAM and propose a novel 3D-U-SAM network for 3D
dental image segmentation. Specifically, in order to solve the problem of using
2D pre-trained weights on 3D datasets, we adopted a convolution approximation
method; in order to retain more details, we designed skip connections to fuse
features at all levels with reference to U-Net. The effectiveness of the
proposed method is demonstrated in ablation experiments, comparison
experiments, and sample size experiments.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2309.11015v1
[DATE]
2023-09-20 10:32:09+08:00
[CATEGORIES]
cs.LG
It’s Simplex! Disaggregating Measures to Improve Certified Robustness
[AUTHORS]
Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah M. Erfani, Benjamin I. P. Rubinstein
[ABSTRACT]
Certified robustness circumvents the fragility of defences against
adversarial attacks, by endowing model predictions with guarantees of class
invariance for attacks up to a calculated size. While there is value in these
certifications, the techniques through which we assess their performance do not
present a proper accounting of their strengths and weaknesses, as their
analysis has eschewed consideration of performance over individual samples in
favour of aggregated measures. By considering the potential output space of
certified models, this work presents two distinct approaches to improve the
analysis of certification mechanisms, that allow for both dataset-independent
and dataset-dependent measures of certification performance. Embracing such a
perspective uncovers new certification approaches, which have the potential to
more than double the achievable radius of certification, relative to current
state-of-the-art. Empirical evaluation verifies that our new approach can
certify $9\%$ more samples at noise scale $\sigma = 1$, with greater relative
improvements observed as the difficulty of the predictive task increases.
[COMMENTS]
IEEE S&P 2024, IEEE Security & Privacy 2024, 14 pages
[LINK]
http://arxiv.org/abs/2309.11005v1
[DATE]
2023-09-20 10:16:19+08:00
[CATEGORIES]
cs.LG
Tackling the dimensions in imaging genetics with CLUB-PLS
[AUTHORS]
Andre Altmann, Ana C Lawry Aguila, Neda Jahanshad, Paul M Thompson, Marco Lorenzi
[ABSTRACT]
A major challenge in imaging genetics and similar fields is to link
high-dimensional data in one domain, e.g., genetic data, to high dimensional
data in a second domain, e.g., brain imaging data. The standard approach in the
area are mass univariate analyses across genetic factors and imaging
phenotypes. That entails executing one genome-wide association study (GWAS) for
each pre-defined imaging measure. Although this approach has been tremendously
successful, one shortcoming is that phenotypes must be pre-defined.
Consequently, effects that are not confined to pre-selected regions of interest
or that reflect larger brain-wide patterns can easily be missed. In this work
we introduce a Partial Least Squares (PLS)-based framework, which we term
Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in
both domains as well as with large sample sizes. One key factor of the
framework is to use cluster bootstrap to provide robust statistics for single
input features in both domains. We applied CLUB-PLS to investigating the
genetic basis of surface area and cortical thickness in a sample of 33,000
subjects from the UK Biobank. We found 107 genome-wide significant
locus-phenotype pairs that are linked to 386 different genes. We found that a
vast majority of these loci could be technically validated at a high rate:
using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85
locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
[COMMENTS]
12 pages, 4 Figures, 2 Tables
[LINK]
http://arxiv.org/abs/2309.07352v2
[DATE]
2023-09-20 09:45:56+08:00
[CATEGORIES]
cs.LG
A Pragmatic Look at Deep Imitation Learning
[AUTHORS]
Kai Arulkumaran, Dan Ogawa Lillrank
[ABSTRACT]
The introduction of the generative adversarial imitation learning (GAIL)
algorithm has spurred the development of scalable imitation learning approaches
using deep neural networks. Many of the algorithms that followed used a similar
procedure, combining on-policy actor-critic algorithms with inverse
reinforcement learning. More recently there have been an even larger breadth of
approaches, most of which use off-policy algorithms. However, with the breadth
of algorithms, everything from datasets to base reinforcement learning
algorithms to evaluation settings can vary, making it difficult to fairly
compare them. In this work we re-implement 6 different IL algorithms, updating
3 of them to be off-policy, base them on a common off-policy algorithm (SAC),
and evaluate them on a widely-used expert trajectory dataset (D4RL) for the
most common benchmark (MuJoCo). After giving all algorithms the same
hyperparameter optimisation budget, we compare their results for a range of
expert trajectories. In summary, GAIL, with all of its improvements,
consistently performs well across a range of sample sizes, AdRIL is a simple
contender that performs well with one important hyperparameter to tune, and
behavioural cloning remains a strong baseline when data is more plentiful.
[COMMENTS]
Asian Conference on Machine Learning, 2023
[LINK]
http://arxiv.org/abs/2108.01867v2
[DATE]
2023-09-20 09:44:06+08:00
[CATEGORIES]
cs.LG
Toward Unlimited Self-Learning MCMC with Parallel Adaptive Annealing
[AUTHORS]
Yuma Ichikawa, Akira Nakagawa, Hiromoto Masayuki, Yuhei Umeda
[ABSTRACT]
Self-learning Monte Carlo (SLMC) methods are recently proposed to accelerate
Markov chain Monte Carlo (MCMC) methods using a machine learning model. With
latent generative models, SLMC methods realize efficient Monte Carlo updates
with less autocorrelation. However, SLMC methods are difficult to directly
apply to multimodal distributions for which training data are difficult to
obtain. To solve the limitation, we propose parallel adaptive annealing, which
makes SLMC methods directly apply to multimodal distributions with a gradually
trained proposal while annealing target distribution. Parallel adaptive
annealing is based on (i) sequential learning with annealing to inherit and
update the model parameters, (ii) adaptive annealing to automatically detect
under-learning, and (iii) parallel annealing to mitigate mode collapse of
proposal models. We also propose VAE-SLMC method which utilizes a variational
autoencoder (VAE) as a proposal of SLMC to make efficient parallel proposals
independent of any previous state using recently clarified quantitative
properties of VAE. Experiments validate that our method can proficiently obtain
accurate samples from multiple multimodal toy distributions and practical
multimodal posterior distributions, which is difficult to achieve with the
existing SLMC methods.
[COMMENTS]
24 pages,11 figures
[LINK]
http://arxiv.org/abs/2211.14024v2
[DATE]
2023-09-20 09:17:05+08:00
[CATEGORIES]
cs.LG
On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver
[AUTHORS]
Zhongzhan Huang, Senwei Liang, Hong Zhang, Haizhao Yang, Liang Lin
[ABSTRACT]
The large-scale simulation of dynamical systems is critical in numerous
scientific and engineering disciplines. However, traditional numerical solvers
are limited by the choice of step sizes when estimating integration, resulting
in a trade-off between accuracy and computational efficiency. To address this
challenge, we introduce a deep learning-based corrector called Neural Vector
(NeurVec), which can compensate for integration errors and enable larger time
step sizes in simulations. Our extensive experiments on a variety of complex
dynamical system benchmarks demonstrate that NeurVec exhibits remarkable
generalization capability on a continuous phase space, even when trained using
limited and discrete data. NeurVec significantly accelerates traditional
solvers, achieving speeds tens to hundreds of times faster while maintaining
high levels of accuracy and stability. Moreover, NeurVec’s simple-yet-effective
design, combined with its ease of implementation, has the potential to
establish a new paradigm for fast-solving differential equations based on deep
learning.
[COMMENTS]
Accepted by Scientific Report
[LINK]
http://arxiv.org/abs/2208.03680v3
[DATE]
2023-09-20 09:16:31+08:00
[CATEGORIES]
cs.LG
XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks
[AUTHORS]
Jian Sun, Ali Pourramezan Fard, Mohammad H. Mahoor
[ABSTRACT]
Capsule Network is powerful at defining the positional relationship between
features in deep neural networks for visual recognition tasks, but it is
computationally expensive and not suitable for running on mobile devices. The
bottleneck is in the computational complexity of the Dynamic Routing mechanism
used between the capsules. On the other hand, XNOR-Net is fast and
computationally efficient, though it suffers from low accuracy due to
information loss in the binarization process. To address the computational
burdens of the Dynamic Routing mechanism, this paper proposes new Fully
Connected (FC) layers by xnorizing the linear projection outside or inside the
Dynamic Routing within the CapsFC layer. Specifically, our proposed FC layers
have two versions, XnODR (Xnorize the Linear Projection Outside Dynamic
Routing) and XnIDR (Xnorize the Linear Projection Inside Dynamic Routing). To
test the generalization of both XnODR and XnIDR, we insert them into two
different networks, MobileNetV2 and ResNet-50. Our experiments on three
datasets, MNIST, CIFAR-10, and MultiMNIST validate their effectiveness. The
results demonstrate that both XnODR and XnIDR help networks to have high
accuracy with lower FLOPs and fewer parameters (e.g., 96.14% correctness with
2.99M parameters and 311.74M FLOPs on CIFAR-10).
[COMMENTS]
19 pages, 5 figures, 9 tables, 2 algorithms
[LINK]
http://arxiv.org/abs/2111.10854v3
[DATE]
2023-09-20 09:12:51+08:00
[CATEGORIES]
cs.LG
AI-Driven Patient Monitoring with Multi-Agent Deep Reinforcement Learning
[AUTHORS]
Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Jianming Yong, Hong-Ning Dai
[ABSTRACT]
Effective patient monitoring is vital for timely interventions and improved
healthcare outcomes. Traditional monitoring systems often struggle to handle
complex, dynamic environments with fluctuating vital signs, leading to delays
in identifying critical conditions. To address this challenge, we propose a
novel AI-driven patient monitoring framework using multi-agent deep
reinforcement learning (DRL). Our approach deploys multiple learning agents,
each dedicated to monitoring a specific physiological feature, such as heart
rate, respiration, and temperature. These agents interact with a generic
healthcare monitoring environment, learn the patients’ behavior patterns, and
make informed decisions to alert the corresponding Medical Emergency Teams
(METs) based on the level of emergency estimated. In this study, we evaluate
the performance of the proposed multi-agent DRL framework using real-world
physiological and motion data from two datasets: PPG-DaLiA and WESAD. We
compare the results with several baseline models, including Q-Learning, PPO,
Actor-Critic, Double DQN, and DDPG, as well as monitoring frameworks like
WISEML and CA-MAQL. Our experiments demonstrate that the proposed DRL approach
outperforms all other baseline models, achieving more accurate monitoring of
patient’s vital signs. Furthermore, we conduct hyperparameter optimization to
fine-tune the learning process of each agent. By optimizing hyperparameters, we
enhance the learning rate and discount factor, thereby improving the agents’
overall performance in monitoring patient health status. Our AI-driven patient
monitoring system offers several advantages over traditional methods, including
the ability to handle complex and uncertain environments, adapt to varying
patient conditions, and make real-time decisions without external supervision.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2309.10576
[LINK]
http://arxiv.org/abs/2309.10980v1
[DATE]
2023-09-20 08:42:08+08:00
[CATEGORIES]
cs.LG
Towards Data-centric Graph Machine Learning: Review and Outlook
[AUTHORS]
Xin Zheng, Yixin Liu, Zhifeng Bao, Meng Fang, Xia Hu, Alan Wee-Chung Liew, Shirui Pan
[ABSTRACT]
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.
[COMMENTS]
42 pages, 9 figures
[LINK]
http://arxiv.org/abs/2309.10979v1
[DATE]
2023-09-20 08:40:13+08:00
[CATEGORIES]
cs.LG
Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks
[AUTHORS]
Puja Trivedi, Mark Heimann, Rushil Anirudh, Danai Koutra, Jayaraman J. Thiagarajan
[ABSTRACT]
Safe deployment of graph neural networks (GNNs) under distribution shift
requires models to provide accurate confidence indicators (CI). However, while
it is well-known in computer vision that CI quality diminishes under
distribution shift, this behavior remains understudied for GNNs. Hence, we
begin with a case study on CI calibration under controlled structural and
feature distribution shifts and demonstrate that increased expressivity or
model size do not always lead to improved CI performance. Consequently, we
instead advocate for the use of epistemic uncertainty quantification (UQ)
methods to modulate CIs. To this end, we propose G-$\Delta$UQ, a new single
model UQ method that extends the recently proposed stochastic centering
framework to support structured data and partial stochasticity. Evaluated
across covariate, concept, and graph size shifts, G-$\Delta$UQ not only
outperforms several popular UQ methods in obtaining calibrated CIs, but also
outperforms alternatives when CIs are used for generalization gap prediction or
OOD detection. Overall, our work not only introduces a new, flexible GNN UQ
method, but also provides novel insights into GNN CIs on safety-critical tasks.
[COMMENTS]
22 pages, 11 figures
[LINK]
http://arxiv.org/abs/2309.10976v1
[DATE]
2023-09-20 08:35:27+08:00
[CATEGORIES]
cs.LG
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
[AUTHORS]
Jinjie Zhang, Rayan Saab
[ABSTRACT]
Quantization is a widely used compression method that effectively reduces
redundancies in over-parameterized neural networks. However, existing
quantization techniques for deep neural networks often lack a comprehensive
error analysis due to the presence of non-convex loss functions and nonlinear
activations. In this paper, we propose a fast stochastic algorithm for
quantizing the weights of fully trained neural networks. Our approach leverages
a greedy path-following mechanism in combination with a stochastic quantizer.
Its computational complexity scales only linearly with the number of weights in
the network, thereby enabling the efficient quantization of large networks.
Importantly, we establish, for the first time, full-network error bounds, under
an infinite alphabet condition and minimal assumptions on the weights and input
data. As an application of this result, we prove that when quantizing a
multi-layer network having Gaussian weights, the relative square quantization
error exhibits a linear decay as the degree of over-parametrization increases.
Furthermore, we demonstrate that it is possible to achieve error bounds
equivalent to those obtained in the infinite alphabet case, using on the order
of a mere $\log\log N$ bits per weight, where $N$ represents the largest number
of neurons in a layer.
[LINK]
http://arxiv.org/abs/2309.10975v1
[DATE]
2023-09-20 08:35:16+08:00
[CATEGORIES]
cs.LG
SEMPART: Self-supervised Multi-resolution Partitioning of Image Semantics
[AUTHORS]
Sriram Ravindran, Debraj Basu
[ABSTRACT]
Accurately determining salient regions of an image is challenging when
labeled data is scarce. DINO-based self-supervised approaches have recently
leveraged meaningful image semantics captured by patch-wise features for
locating foreground objects. Recent methods have also incorporated intuitive
priors and demonstrated value in unsupervised methods for object partitioning.
In this paper, we propose SEMPART, which jointly infers coarse and fine
bi-partitions over an image’s DINO-based semantic graph. Furthermore, SEMPART
preserves fine boundary details using graph-driven regularization and
successfully distills the coarse mask semantics into the fine mask. Our salient
object detection and single object localization findings suggest that SEMPART
produces high-quality masks rapidly without additional post-processing and
benefits from co-optimizing the coarse and fine branches.
[LINK]
http://arxiv.org/abs/2309.10972v1
[DATE]
2023-09-20 08:07:30+08:00
[CATEGORIES]
cs.LG
Statistical and Computational Guarantees for Influence Diagnostics
[AUTHORS]
Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid Harchaoui
[ABSTRACT]
Influence diagnostics such as influence functions and approximate maximum
influence perturbations are popular in machine learning and in AI domain
applications. Influence diagnostics are powerful statistical tools to identify
influential datapoints or subsets of datapoints. We establish finite-sample
statistical bounds, as well as computational complexity bounds, for influence
functions and approximate maximum influence perturbations using efficient
inverse-Hessian-vector product implementations. We illustrate our results with
generalized linear models and large attention based models on synthetic and
real data.
[COMMENTS]
For AISTATS 2023. Software see
https://github.com/jfisher52/influence_theory
[LINK]
http://arxiv.org/abs/2212.04014v2
[DATE]
2023-09-20 07:55:46+08:00
[CATEGORIES]
cs.LG
DPpack: An R Package for Differentially Private Statistical Analysis and Machine Learning
[AUTHORS]
Spencer Giddens, Fang Liu
[ABSTRACT]
Differential privacy (DP) is the state-of-the-art framework for guaranteeing
privacy for individuals when releasing aggregated statistics or building
statistical/machine learning models from data. We develop the open-source R
package DPpack that provides a large toolkit of differentially private
analysis. The current version of DPpack implements three popular mechanisms for
ensuring DP: Laplace, Gaussian, and exponential. Beyond that, DPpack provides a
large toolkit of easily accessible privacy-preserving descriptive statistics
functions. These include mean, variance, covariance, and quantiles, as well as
histograms and contingency tables. Finally, DPpack provides user-friendly
implementation of privacy-preserving versions of logistic regression, SVM, and
linear regression, as well as differentially private hyperparameter tuning for
each of these models. This extensive collection of implemented differentially
private statistics and models permits hassle-free utilization of differential
privacy principles in commonly performed statistical analysis. We plan to
continue developing DPpack and make it more comprehensive by including more
differentially private machine learning techniques, statistical modeling and
inference in the future.
[LINK]
http://arxiv.org/abs/2309.10965v1
[DATE]
2023-09-20 07:36:11+08:00
[CATEGORIES]
cs.LG
A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning
[AUTHORS]
Nika Haghtalab, Michael I. Jordan, Eric Zhao
[ABSTRACT]
We provide a unifying framework for the design and analysis of
multicalibrated predictors. By placing the multicalibration problem in the
general setting of multi-objective learning – where learning guarantees must
hold simultaneously over a set of distributions and loss functions – we
exploit connections to game dynamics to achieve state-of-the-art guarantees for
a diverse set of multicalibration learning problems. In addition to shedding
light on existing multicalibration guarantees and greatly simplifying their
analysis, our approach also yields improved guarantees, such as obtaining
stronger multicalibration conditions that scale with the square-root of group
size and improving the complexity of $k$-class multicalibration by an
exponential factor of $k$. Beyond multicalibration, we use these game dynamics
to address emerging considerations in the study of group fairness and
multi-distribution learning.
[COMMENTS]
45 pages. Authors are ordered alphabetically
[LINK]
http://arxiv.org/abs/2302.10863v2
[DATE]
2023-09-20 07:25:36+08:00
[CATEGORIES]
cs.LG
A Novel Deep Neural Network for Trajectory Prediction in Automated Vehicles Using Velocity Vector Field
[AUTHORS]
MReza Alipour Sormoli, Amir Samadi, Sajjad Mozaffari, Konstantinos Koufos, Mehrdad Dianati, Roger Woodman
[ABSTRACT]
Anticipating the motion of other road users is crucial for automated driving
systems (ADS), as it enables safe and informed downstream decision-making and
motion planning. Unfortunately, contemporary learning-based approaches for
motion prediction exhibit significant performance degradation as the prediction
horizon increases or the observation window decreases. This paper proposes a
novel technique for trajectory prediction that combines a data-driven
learning-based method with a velocity vector field (VVF) generated from a
nature-inspired concept, i.e., fluid flow dynamics. In this work, the vector
field is incorporated as an additional input to a convolutional-recurrent deep
neural network to help predict the most likely future trajectories given a
sequence of bird’s eye view scene representations. The performance of the
proposed model is compared with state-of-the-art methods on the HighD dataset
demonstrating that the VVF inclusion improves the prediction accuracy for both
short and long-term (5~sec) time horizons. It is also shown that the accuracy
remains consistent with decreasing observation windows which alleviates the
requirement of a long history of past observations for accurate trajectory
prediction. Source codes are available at:
https://github.com/Amir-Samadi/VVF-TP.
[COMMENTS]
This paper has been accepted and nominated as the best student paper
at the 26th IEEE International Conference on Intelligent Transportation
Systems (ITSC 2023)
[LINK]
http://arxiv.org/abs/2309.10948v1
[DATE]
2023-09-20 06:14:52+08:00
[CATEGORIES]
cs.LG
DyG2Vec: Representation Learning for Dynamic Graphs with Self-Supervision
[AUTHORS]
Mohammad Ali Alomrani, Mahdi Biparva, Yingxue Zhang, Mark Coates
[ABSTRACT]
Temporal graph neural networks have shown promising results in learning
inductive representations by automatically extracting temporal patterns.
However, previous works often rely on complex memory modules or inefficient
random walk methods to construct temporal representations. In addition, the
existing dynamic graph encoders are non-trivial to adapt to self-supervised
paradigms, which prevents them from utilizing unlabeled data. To address these
limitations, we present an efficient yet effective attention-based encoder that
leverages temporal edge encodings and window-based subgraph sampling to
generate task-agnostic embeddings. Moreover, we propose a joint-embedding
architecture using non-contrastive SSL to learn rich temporal embeddings
without labels. Experimental results on 7 benchmark datasets indicate that on
average, our model outperforms SoTA baselines on the future link prediction
task by 4.23% for the transductive setting and 3.30% for the inductive setting
while only requiring 5-10x less training/inference time. Additionally, we
empirically validate the SSL pre-training significance under two probings
commonly used in language and vision modalities. Lastly, different aspects of
the proposed framework are investigated through experimental analysis and
ablation studies.
[COMMENTS]
Proceedings of the 19th International Workshop on Mining and Learning
with Graphs (MLG)
[LINK]
http://arxiv.org/abs/2210.16906v2
[DATE]
2023-09-20 06:01:21+08:00
[CATEGORIES]
cs.LG
Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning
[AUTHORS]
Andreas Kirsch, Sebastian Farquhar, Parmida Atighehchian, Andrew Jesson, Frederic Branchaud-Charron, Yarin Gal
[ABSTRACT]
We examine a simple stochastic strategy for adapting well-known single-point
acquisition functions to allow batch active learning. Unlike acquiring the
top-K points from the pool set, score- or rank-based sampling takes into
account that acquisition scores change as new data are acquired. This simple
strategy for adapting standard single-sample acquisition strategies can even
perform just as well as compute-intensive state-of-the-art batch acquisition
functions, like BatchBALD or BADGE, while using orders of magnitude less
compute. In addition to providing a practical option for machine learning
practitioners, the surprising success of the proposed method in a wide range of
experimental settings raises a difficult question for the field: when are these
expensive batch acquisition methods pulling their weight?
[COMMENTS]
TMLR Paper: https://openreview.net/forum?id=vcHwQyNBjW
[LINK]
http://arxiv.org/abs/2106.12059v3
[DATE]
2023-09-20 05:20:38+08:00
[CATEGORIES]
cs.LG
Posterior Contraction Rates for Matérn Gaussian Processes on Riemannian Manifolds
[AUTHORS]
Paul Rosa, Viacheslav Borovitskiy, Alexander Terenin, Judith Rousseau
[ABSTRACT]
Gaussian processes are used in many machine learning applications that rely
on uncertainty quantification. Recently, computational tools for working with
these models in geometric settings, such as when inputs lie on a Riemannian
manifold, have been developed. This raises the question: can these intrinsic
models be shown theoretically to lead to better performance, compared to simply
embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction
of an ordinary Euclidean Gaussian process? To study this, we prove optimal
contraction rates for intrinsic Mat'ern Gaussian processes defined on compact
Riemannian manifolds. We also prove analogous rates for extrinsic processes
using trace and extension theorems between manifold and ambient Sobolev spaces:
somewhat surprisingly, the rates obtained turn out to coincide with those of
the intrinsic processes, provided that their smoothness parameters are matched
appropriately. We illustrate these rates empirically on a number of examples,
which, mirroring prior work, show that intrinsic processes can achieve better
performance in practice. Therefore, our work shows that finer-grained analyses
are needed to distinguish between different levels of data-efficiency of
geometric Gaussian processes, particularly in settings which involve small data
set sizes and non-asymptotic behavior.
[LINK]
http://arxiv.org/abs/2309.10918v1
[DATE]
2023-09-20 04:30:58+08:00
[CATEGORIES]
cs.LG
Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning
[AUTHORS]
Mohammad-Javad Darvishi-Bayazi, Mohammad Sajjad Ghaemi, Timothee Lesort, Md Rifat Arefin, Jocelyn Faubert, Irina Rish
[ABSTRACT]
Pathology diagnosis based on EEG signals and decoding brain activity holds
immense importance in understanding neurological disorders. With the
advancement of artificial intelligence methods and machine learning techniques,
the potential for accurate data-driven diagnoses and effective treatments has
grown significantly. However, applying machine learning algorithms to
real-world datasets presents diverse challenges at multiple levels. The
scarcity of labelled data, especially in low regime scenarios with limited
availability of real patient cohorts due to high costs of recruitment,
underscores the vital deployment of scaling and transfer learning techniques.
In this study, we explore a real-world pathology classification task to
highlight the effectiveness of data and model scaling and cross-dataset
knowledge transfer. As such, we observe varying performance improvements
through data scaling, indicating the need for careful evaluation and labelling.
Additionally, we identify the challenges of possible negative transfer and
emphasize the significance of some key components to overcome distribution
shifts and potential spurious correlations and achieve positive transfer. We
see improvement in the performance of the target model on the target (NMT)
datasets by using the knowledge from the source dataset (TUAB) when a low
amount of labelled data was available. Our findings indicate a small and
generic model (e.g. ShallowNet) performs well on a single dataset, however, a
larger model (e.g. TCN) performs better on transfer and learning from a larger
and diverse dataset.
[LINK]
http://arxiv.org/abs/2309.10910v1
[DATE]
2023-09-20 04:09:15+08:00
[CATEGORIES]
cs.LG
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms
[AUTHORS]
Anthony Francis, Claudia Pérez-D’Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Rachid Alami, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Rohan Chandra, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Jonathan P. How, Haresh Karnan, Tsang-Wei Edward Lee, Luis J. Manso, Reuth Mirksy, Sören Pirk, Phani Teja Singamaneni, Peter Stone, Ada V. Taylor, Peter Trautman, Nathan Tsoi, Marynel Vázquez, Xuesu Xiao, Peng Xu, Naoki Yokoyama, Alexander Toshev, Roberto Martín-Martín
[ABSTRACT]
A major challenge to deploying robots widely is navigation in human-populated
environments, commonly referred to as social robot navigation. While the field
of social navigation has advanced tremendously in recent years, the fair
evaluation of algorithms that tackle social navigation remains hard because it
involves not just robotic agents moving in static environments but also dynamic
human agents and their perceptions of the appropriateness of robot behavior. In
contrast, clear, repeatable, and accessible benchmarks have accelerated
progress in fields like computer vision, natural language processing and
traditional robot navigation by enabling researchers to fairly compare
algorithms, revealing limitations of existing solutions and illuminating
promising new directions. We believe the same approach can benefit social
navigation. In this paper, we pave the road towards common, widely accessible,
and repeatable benchmarking criteria to evaluate social robot navigation. Our
contributions include (a) a definition of a socially navigating robot as one
that respects the principles of safety, comfort, legibility, politeness, social
competency, agent understanding, proactivity, and responsiveness to context,
(b) guidelines for the use of metrics, development of scenarios, benchmarks,
datasets, and simulators to evaluate social navigation, and (c) a design of a
social navigation metrics framework to make it easier to compare results from
different simulators, robots and datasets.
[COMMENTS]
42 pages, 11 figures, 6 tables
[LINK]
http://arxiv.org/abs/2306.16740v4
[DATE]
2023-09-20 04:02:06+08:00
[CATEGORIES]
cs.LG
In-Context Operator Learning with Data Prompts for Differential Equation Problems
[AUTHORS]
Liu Yang, Siting Liu, Tingwei Meng, Stanley J. Osher
[ABSTRACT]
This paper introduces a new neural-network-based approach, namely In-Context
Operator Networks (ICON), to simultaneously learn operators from the prompted
data and apply it to new questions during the inference stage, without any
weight update. Existing methods are limited to using a neural network to
approximate a specific equation solution or a specific operator, requiring
retraining when switching to a new problem with different equations. By
training a single neural network as an operator learner, we can not only get
rid of retraining (even fine-tuning) the neural network for new problems, but
also leverage the commonalities shared across operators so that only a few
demos in the prompt are needed when learning a new operator. Our numerical
results show the neural network’s capability as a few-shot operator learner for
a diversified type of differential equation problems, including forward and
inverse problems of ordinary differential equations (ODEs), partial
differential equations (PDEs), and mean-field control (MFC) problems, and also
show that it can generalize its learning capability to operators beyond the
training distribution.
[COMMENTS]
The second and third authors contributed equally. This is an outdated
preprint. Please refer to the updated version published in PNAS:
www.pnas.org/doi/10.1073/pnas.2310142120 See code in
https://github.com/LiuYangMage/in-context-operator-networks
[LINK]
http://arxiv.org/abs/2304.07993v3
[DATE]
2023-09-20 04:00:39+08:00
[CATEGORIES]
cs.LG
Crypto’Graph: Leveraging Privacy-Preserving Distributed Link Prediction for Robust Graph Learning
[AUTHORS]
Sofiane Azogagh, Zelma Aubin Birba, Sébastien Gambs, Marc-Olivier Killijian
[ABSTRACT]
Graphs are a widely used data structure for collecting and analyzing
relational data. However, when the graph structure is distributed across
several parties, its analysis is particularly challenging. In particular, due
to the sensitivity of the data each party might want to keep their partial
knowledge of the graph private, while still willing to collaborate with the
other parties for tasks of mutual benefit, such as data curation or the removal
of poisoned data. To address this challenge, we propose Crypto’Graph, an
efficient protocol for privacy-preserving link prediction on distributed
graphs. More precisely, it allows parties partially sharing a graph with
distributed links to infer the likelihood of formation of new links in the
future. Through the use of cryptographic primitives, Crypto’Graph is able to
compute the likelihood of these new links on the joint network without
revealing the structure of the private individual graph of each party, even
though they know the number of nodes they have, since they share the same graph
but not the same links. Crypto’Graph improves on previous works by enabling the
computation of a certain number of similarity metrics without any additional
cost. The use of Crypto’Graph is illustrated for defense against graph
poisoning attacks, in which it is possible to identify potential adversarial
links without compromising the privacy of the graphs of individual parties. The
effectiveness of Crypto’Graph in mitigating graph poisoning attacks and
achieving high prediction accuracy on a graph neural network node
classification task is demonstrated through extensive experimentation on a
real-world dataset.
[LINK]
http://arxiv.org/abs/2309.10890v1
[DATE]
2023-09-20 03:30:28+08:00
[CATEGORIES]
cs.LG
Deep Active Learning in the Presence of Label Noise: A Survey
[AUTHORS]
Moseli Mots’oehli, Kyungim Baek
[ABSTRACT]
Deep active learning has emerged as a powerful tool for training deep
learning models within a predefined labeling budget. These models have achieved
performances comparable to those trained in an offline setting. However, deep
active learning faces substantial issues when dealing with classification
datasets containing noisy labels. In this literature review, we discuss the
current state of deep active learning in the presence of label noise,
highlighting unique approaches, their strengths, and weaknesses. With the
recent success of vision transformers in image classification tasks, we provide
a brief overview and consider how the transformer layers and attention
mechanisms can be used to enhance diversity, importance, and uncertainty-based
selection in queries sent to an oracle for labeling. We further propose
exploring contrastive learning methods to derive good image representations
that can aid in selecting high-value samples for labeling in an active learning
setting. We also highlight the need for creating unified benchmarks and
standardized datasets for deep active learning in the presence of label noise
for image classification to promote the reproducibility of research. The review
concludes by suggesting avenues for future research in this area.
[COMMENTS]
20 pages, PhD literature review
[LINK]
http://arxiv.org/abs/2302.11075v2
[DATE]
2023-09-20 03:13:07+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Naty Peter, Eliad Tsfadia, Jonathan Ullman [ABSTRACT]
Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (STOC
2014), are the most widely used method for establishing lower bounds on the
sample complexity or error of approximately differentially private (DP)
algorithms. Still, there are many problems in differential privacy for which we
don’t know suitable lower bounds, and even for problems that we do, the lower
bounds are not smooth, and usually become vacuous when the error is larger than
some threshold.
In this work, we present a simple method to generate hard instances by
applying a padding-and-permuting transformation to a fingerprinting code. We
illustrate the applicability of this method by providing new lower bounds in
various settings:
[LINK]
http://arxiv.org/abs/2307.07604v2 [DATE]
2023-09-20 03:06:02+08:00 [CATEGORIES]
cs.LG
DeepliteRT: Computer Vision at the Edge
[AUTHORS]
Saad Ashfaq, Alexander Hoffman, Saptarshi Mitra, Sudhakar Sah, MohammadHossein AskariHemmat, Ehsan Saboori
[ABSTRACT]
The proliferation of edge devices has unlocked unprecedented opportunities
for deep learning model deployment in computer vision applications. However,
these complex models require considerable power, memory and compute resources
that are typically not available on edge platforms. Ultra low-bit quantization
presents an attractive solution to this problem by scaling down the model
weights and activations from 32-bit to less than 8-bit. We implement highly
optimized ultra low-bit convolution operators for ARM-based targets that
outperform existing methods by up to 4.34x. Our operator is implemented within
Deeplite Runtime (DeepliteRT), an end-to-end solution for the compilation,
tuning, and inference of ultra low-bit models on ARM devices. Compiler passes
in DeepliteRT automatically convert a fake-quantized model in full precision to
a compact ultra low-bit representation, easing the process of quantized model
deployment on commodity hardware. We analyze the performance of DeepliteRT on
classification and detection models against optimized 32-bit floating-point,
8-bit integer, and 2-bit baselines, achieving significant speedups of up to
2.20x, 2.33x and 2.17x, respectively.
[COMMENTS]
Accepted at British Machine Vision Conference (BMVC) 2023
[LINK]
http://arxiv.org/abs/2309.10878v1
[DATE]
2023-09-20 02:58:38+08:00
[CATEGORIES]
cs.LG
Conformal Prediction is Robust to Dispersive Label Noise
[AUTHORS]
Shai Feldman, Bat-Sheva Einbinder, Stephen Bates, Anastasios N. Angelopoulos, Asaf Gendler, Yaniv Romano
[ABSTRACT]
We study the robustness of conformal prediction, a powerful tool for
uncertainty quantification, to label noise. Our analysis tackles both
regression and classification problems, characterizing when and how it is
possible to construct uncertainty sets that correctly cover the unobserved
noiseless ground truth labels. We further extend our theory and formulate the
requirements for correctly controlling a general loss function, such as the
false negative proportion, with noisy labels. Our theory and experiments
suggest that conformal prediction and risk-controlling techniques with noisy
labels attain conservative risk over the clean ground truth labels except in
adversarial cases. In such cases, we can also correct for noise of bounded size
in the conformal prediction algorithm in order to ensure achieving the correct
risk of the ground truth labels without score or data regularity.
[LINK]
http://arxiv.org/abs/2209.14295v2
[DATE]
2023-09-20 02:50:28+08:00
[CATEGORIES]
cs.LG
Bayesian Exploration Networks
[AUTHORS]
Mattie Fellows, Brandon Kaplowitz, Christian Schroeder de Witt, Shimon Whiteson
[ABSTRACT]
Bayesian reinforcement learning (RL) offers a principled and elegant approach
for sequential decision making under uncertainty. Most notably, Bayesian agents
do not face an exploration/exploitation dilemma, a major pathology of
frequentist methods. A key challenge for Bayesian RL is the computational
complexity of learning Bayes-optimal policies, which is only tractable in toy
domains. In this paper we propose a novel model-free approach to address this
challenge. Rather than modelling uncertainty in high-dimensional state
transition distributions as model-based approaches do, we model uncertainty in
a one-dimensional Bellman operator. Our theoretical analysis reveals that
existing model-free approaches either do not propagate epistemic uncertainty
through the MDP or optimise over a set of contextual policies instead of all
history-conditioned policies. Both approximations yield policies that can be
arbitrarily Bayes-suboptimal. To overcome these issues, we introduce the
Bayesian exploration network (BEN) which uses normalising flows to model both
the aleatoric uncertainty (via density estimation) and epistemic uncertainty
(via variational inference) in the Bellman operator. In the limit of complete
optimisation, BEN learns true Bayes-optimal policies, but like in variational
expectation-maximisation, partial optimisation renders our approach tractable.
Empirical results demonstrate that BEN can learn true Bayes-optimal policies in
tasks where existing model-free approaches fail.
[COMMENTS]
Changed email contact for joint first author 2. Fixed minor typos
[LINK]
http://arxiv.org/abs/2308.13049v2
[DATE]
2023-09-20 02:36:08+08:00
[CATEGORIES]
cs.LG
Dynamical Tests of a Deep-Learning Weather Prediction Model
[AUTHORS]
Gregory J. Hakim, Sanjit Masanam
[ABSTRACT]
Global deep-learning weather prediction models have recently been shown to
produce forecasts that rival those from physics-based models run at operational
centers. It is unclear whether these models have encoded atmospheric dynamics,
or simply pattern matching that produces the smallest forecast error. Answering
this question is crucial to establishing the utility of these models as tools
for basic science. Here we subject one such model, Pangu-weather, to a set of
four classical dynamical experiments that do not resemble the model training
data. Localized perturbations to the model output and the initial conditions
are added to steady time-averaged conditions, to assess the propagation speed
and structural evolution of signals away from the local source. Perturbing the
model physics by adding a steady tropical heat source results in a classical
Matsuno–Gill response near the heating, and planetary waves that radiate into
the extratropics. A localized disturbance on the winter-averaged North Pacific
jet stream produces realistic extratropical cyclones and fronts, including the
spontaneous emergence of polar lows. Perturbing the 500hPa height field alone
yields adjustment from a state of rest to one of wind–pressure balance over ~6
hours. Localized subtropical low pressure systems produce Atlantic hurricanes,
provided the initial amplitude exceeds about 5 hPa, and setting the initial
humidity to zero eliminates hurricane development. We conclude that the model
encodes realistic physics in all experiments, and suggest it can be used as a
tool for rapidly testing ideas before using expensive physics-based models.
[LINK]
http://arxiv.org/abs/2309.10867v1
[DATE]
2023-09-20 02:26:41+08:00
[CATEGORIES]
cs.LG
Assessing the capacity of a denoising diffusion probabilistic model to reproduce spatial context
[AUTHORS]
Rucha Deshpande, Muzaffer Özbey, Hua Li, Mark A. Anastasio, Frank J. Brooks
[ABSTRACT]
Diffusion models have emerged as a popular family of deep generative models
(DGMs). In the literature, it has been claimed that one class of diffusion
models – denoising diffusion probabilistic models (DDPMs) – demonstrate
superior image synthesis performance as compared to generative adversarial
networks (GANs). To date, these claims have been evaluated using either
ensemble-based methods designed for natural images, or conventional measures of
image quality such as structural similarity. However, there remains an
important need to understand the extent to which DDPMs can reliably learn
medical imaging domain-relevant information, which is referred to as spatial
context' in this work. To address this, a systematic assessment of the ability
of DDPMs to learn spatial context relevant to medical imaging applications is
reported for the first time. A key aspect of the studies is the use of
<span style="color:#e74d3c;">stochastic</span> context models (SCMs) to produce training data. In this way, the
ability of the DDPMs to reliably reproduce spatial context can be
quantitatively assessed by use of post-hoc image analyses. Error-rates in
DDPM-generated ensembles are reported, and compared to those corresponding to a
modern GAN. The studies reveal new and important insights regarding the
capacity of DDPMs to learn spatial context. Notably, the results demonstrate
that DDPMs hold significant capacity for generating contextually correct images
that are
interpolated’ between training samples, which may benefit
data-augmentation tasks in ways that GANs cannot.
[COMMENTS]
This paper is under consideration at IEEE TMI
[LINK]
http://arxiv.org/abs/2309.10817v1
[DATE]
2023-09-20 01:58:35+08:00
[CATEGORIES]
cs.LG
Multi-Context Dual Hyper-Prior Neural Image Compression
[AUTHORS]
Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Mohammad Akyash, Hossein Kashiani, Nasser M. Nasrabadi
[ABSTRACT]
Transform and entropy models are the two core components in deep image
compression neural networks. Most existing learning-based image compression
methods utilize convolutional-based transform, which lacks the ability to model
long-range dependencies, primarily due to the limited receptive field of the
convolution operation. To address this limitation, we propose a
Transformer-based nonlinear transform. This transform has the remarkable
ability to efficiently capture both local and global information from the input
image, leading to a more decorrelated latent representation. In addition, we
introduce a novel entropy model that incorporates two different hyperpriors to
model cross-channel and spatial dependencies of the latent representation. To
further improve the entropy model, we add a global context that leverages
distant relationships to predict the current latent more accurately. This
global context employs a causal attention mechanism to extract long-range
information in a content-dependent manner. Our experiments show that our
proposed framework performs better than the state-of-the-art methods in terms
of rate-distortion performance.
[COMMENTS]
Accepted to IEEE 22$^nd$ International Conference on Machine Learning
and Applications 2023 (ICMLA) - Selected for Oral Presentation
[LINK]
http://arxiv.org/abs/2309.10799v1
[DATE]
2023-09-20 01:44:44+08:00
[CATEGORIES]
cs.LG
Context-Aware Neural Video Compression on Solar Dynamics Observatory
[AUTHORS]
Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva
[ABSTRACT]
NASA’s Solar Dynamics Observatory (SDO) mission collects large data volumes
of the Sun’s daily activity. Data compression is crucial for space missions to
reduce data storage and video bandwidth requirements by eliminating
redundancies in the data. In this paper, we present a novel neural
Transformer-based video compression approach specifically designed for the SDO
images. Our primary objective is to efficiently exploit the temporal and
spatial redundancies inherent in solar images to obtain a high compression
ratio. Our proposed architecture benefits from a novel Transformer block called
Fused Local-aware Window (FLaWin), which incorporates window-based
self-attention modules and an efficient fused local-aware feed-forward (FLaFF)
network. This architectural design allows us to simultaneously capture
short-range and long-range information while facilitating the extraction of
rich and diverse contextual representations. Moreover, this design choice
results in reduced computational complexity. Experimental results demonstrate
the significant contribution of the FLaWin Transformer block to the compression
performance, outperforming conventional hand-engineered video codecs such as
H.264 and H.265 in terms of rate-distortion trade-off.
[COMMENTS]
Accepted to IEEE 22$^{nd}$ International Conference on Machine
Learning and Applications 2023 (ICMLA) - Selected for Oral Presentation
[LINK]
http://arxiv.org/abs/2309.10784v1
[DATE]
2023-09-20 01:33:12+08:00
[CATEGORIES]
cs.LG
$O(k)$-Equivariant Dimensionality Reduction on Stiefel Manifolds
[AUTHORS]
Andrew Lee, Harlin Lee, Jose A. Perea, Nikolas Schonsheck, Madeleine Weinstein
[ABSTRACT]
Many real-world datasets live on high-dimensional Stiefel and Grassmannian
manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and
benefit from projection onto lower-dimensional Stiefel (respectively,
Grassmannian) manifolds. In this work, we propose an algorithm called Principal
Stiefel Coordinates (PSC) to reduce data dimensionality from $
V_k(\mathbb{R}^N)$ to $V_k(\mathbb{R}^n)$ in an $O(k)$-equivariant manner ($k
\leq n \ll N$). We begin by observing that each element $\alpha \in
V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into
$V_k(\mathbb{R}^N)$. Next, we optimize for such an embedding map that minimizes
data fit error by warm-starting with the output of principal component analysis
(PCA) and applying gradient descent. Then, we define a continuous and
$O(k)$-equivariant map $\pi_\alpha$ that acts as a ``closest point operator’’
to project the data onto the image of $V_k(\mathbb{R}^n)$ in
$V_k(\mathbb{R}^N)$ under the embedding determined by $\alpha$, while
minimizing distortion. Because this dimensionality reduction is
$O(k)$-equivariant, these results extend to Grassmannian manifolds as well.
Lastly, we show that the PCA output globally minimizes projection error in a
noiseless setting, but that our algorithm achieves a meaningfully different and
improved outcome when the data does not lie exactly on the image of a linearly
embedded lower-dimensional Stiefel manifold as above. Multiple numerical
experiments using synthetic and real-world data are performed.
[COMMENTS]
26 pages, 8 figures, comments welcome!
[LINK]
http://arxiv.org/abs/2309.10775v1
[DATE]
2023-09-20 01:21:12+08:00
[CATEGORIES]
cs.LG
Semi-supervised Domain Adaptation in Graph Transfer Learning
[AUTHORS]
Ziyue Qiao, Xiao Luo, Meng Xiao, Hao Dong, Yuanchun Zhou, Hui Xiong
[ABSTRACT]
As a specific case of graph transfer learning, unsupervised domain adaptation
on graphs aims for knowledge transfer from label-rich source graphs to
unlabeled target graphs. However, graphs with topology and attributes usually
have considerable cross-domain disparity and there are numerous real-world
scenarios where merely a subset of nodes are labeled in the source graph. This
imposes critical challenges on graph transfer learning due to serious domain
shifts and label scarcity. To address these challenges, we propose a method
named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain
shift, we add adaptive shift parameters to each of the source nodes, which are
trained in an adversarial manner to align the cross-domain distributions of
node embedding, thus the node classifier trained on labeled source nodes can be
transferred to the target nodes. Moreover, to address the label scarcity, we
propose pseudo-labeling on unlabeled nodes, which improves classification on
the target graph via measuring the posterior influence of nodes based on their
relative position to the class centroids. Finally, extensive experiments on a
range of publicly accessible datasets validate the effectiveness of our
proposed SGDA in different experimental settings.
[LINK]
http://arxiv.org/abs/2309.10773v1
[DATE]
2023-09-20 01:20:58+08:00
[CATEGORIES]
cs.LG
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction
[AUTHORS]
Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Bregier, Matthieu Armando, Jean-Sebastien Franco, Gregory Rogez
[ABSTRACT]
Recent hand-object interaction datasets show limited real object variability
and rely on fitting the MANO parametric model to obtain groundtruth hand
shapes. To go beyond these limitations and spur further research, we introduce
the SHOWMe dataset which consists of 96 videos, annotated with real and
detailed hand-object 3D textured meshes. Following recent work, we consider a
rigid hand-object scenario, in which the pose of the hand with respect to the
object remains constant during the whole video sequence. This assumption allows
us to register sub-millimetre-precise groundtruth 3D scans to the image
sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of
applications where the required accuracy and level of detail is important eg.,
object hand-over in human-robot collaboration, object scanning, or manipulation
and contact point analysis. Importantly, the rigidity of the hand-object
systems allows to tackle video-based 3D reconstruction of unknown hand-held
objects using a 2-stage pipeline consisting of a rigid registration step
followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set
of non-trivial baselines for these two stages and show that it is possible to
achieve promising object-agnostic 3D hand-object reconstructions employing an
SfM toolbox or a hand pose estimator to recover the rigid transforms and
off-the-shelf MVR algorithms. However, these methods remain sensitive to the
initial camera pose estimates which might be imprecise due to lack of textures
on the objects or heavy occlusions of the hands, leaving room for improvements
in the reconstruction. Code and dataset are available at
https://europe.naverlabs.com/research/showme
[COMMENTS]
Paper and Appendix, Accepted in ACVR workshop at ICCV conference
[LINK]
http://arxiv.org/abs/2309.10748v1
[DATE]
2023-09-20 00:48:29+08:00
[CATEGORIES]
cs.LG
Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
[AUTHORS]
Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi
[ABSTRACT]
Diffusion models power a vast majority of text-to-audio (TTA) generation
methods. Unfortunately, these models suffer from slow inference speed due to
iterative queries to the underlying denoising network, thus unsuitable for
scenarios with inference time or computational constraints. This work modifies
the recently proposed consistency distillation framework to train TTA models
that require only a single neural network query. In addition to incorporating
classifier-free guidance into the distillation process, we leverage the
availability of generated audio during distillation training to fine-tune the
consistency TTA model with novel loss functions in the audio space, such as the
CLAP score. Our objective and subjective evaluation results on the AudioCaps
dataset show that consistency models retain diffusion models’ high generation
quality and diversity while reducing the number of queries by a factor of 400.
[LINK]
http://arxiv.org/abs/2309.10740v1
[DATE]
2023-09-20 00:36:33+08:00
[CATEGORIES]
cs.LG
Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation
[AUTHORS]
Yuyang Deng, Ilja Kuzborskij, Mehrdad Mahdavi
[ABSTRACT]
We consider the problem of learning a model from multiple heterogeneous
sources with the goal of performing well on a new target distribution. The goal
of learner is to mix these data sources in a target-distribution aware way and
simultaneously minimize the empirical risk on the mixed source. The literature
has made some tangible advancements in establishing theory of learning on
mixture domain. However, there are still two unsolved problems. Firstly, how to
estimate the optimal mixture of sources, given a target domain; Secondly, when
there are numerous target domains, how to solve empirical risk minimization
(ERM) for each target using possibly unique mixture of data sources in a
computationally efficient manner. In this paper we address both problems
efficiently and with guarantees. We cast the first problem, mixture weight
estimation, as a convex-nonconcave compositional minimax problem, and propose
an efficient stochastic algorithm with provable stationarity guarantees. Next,
for the second problem, we identify that for certain regimes, solving ERM for
each target domain individually can be avoided, and instead parameters for a
target optimal model can be viewed as a non-linear function on a space of the
mixture coefficients. Building upon this, we show that in the offline setting,
a GD-trained overparameterized neural network can provably learn such function
to predict the model of target domain instead of solving a designated ERM
problem. Finally, we also consider an online setting and propose a label
efficient online algorithm, which predicts parameters for new targets given an
arbitrary sequence of mixing coefficients, while enjoying regret guarantees.
[LINK]
http://arxiv.org/abs/2309.10736v1
[DATE]
2023-09-20 00:29:34+08:00
[CATEGORIES]
cs.LG
Promoting Fairness in GNNs: A Characterization of Stability
[AUTHORS]
Yaning Jia, Chunhui Zhang
[ABSTRACT]
The Lipschitz bound, a technique from robust statistics, can limit the
maximum changes in the output concerning the input, taking into account
associated irrelevant biased factors. It is an efficient and provable method
for examining the output stability of machine learning models without incurring
additional computation costs. Recently, Graph Neural Networks (GNNs), which
operate on non-Euclidean data, have gained significant attention. However, no
previous research has investigated the GNN Lipschitz bounds to shed light on
stabilizing model outputs, especially when working on non-Euclidean data with
inherent biases. Given the inherent biases in common graph data used for GNN
training, it poses a serious challenge to constraining the GNN output
perturbations induced by input biases, thereby safeguarding fairness during
training. Recently, despite the Lipschitz constant’s use in controlling the
stability of Euclideanneural networks, the calculation of the precise Lipschitz
constant remains elusive for non-Euclidean neural networks like GNNs,
especially within fairness contexts. To narrow this gap, we begin with the
general GNNs operating on an attributed graph, and formulate a Lipschitz bound
to limit the changes in the output regarding biases associated with the input.
Additionally, we theoretically analyze how the Lipschitz constant of a GNN
model could constrain the output perturbations induced by biases learned from
data for fairness training. We experimentally validate the Lipschitz bound’s
effectiveness in limiting biases of the model output. Finally, from a training
dynamics perspective, we demonstrate why the theoretical Lipschitz bound can
effectively guide the GNN training to better trade-off between accuracy and
fairness.
[LINK]
http://arxiv.org/abs/2309.03648v2
[DATE]
2023-09-20 00:20:42+08:00
[CATEGORIES]
cs.LG
[AUTHORS]
Yonggan Fu, Yongan Zhang, Zhongzhi Yu, Sixu Li, Zhifan Ye, Chaojian Li, Cheng Wan, Yingyan Lin [ABSTRACT]
The remarkable capabilities and intricate nature of Artificial Intelligence
(AI) have dramatically escalated the imperative for specialized AI
accelerators. Nonetheless, designing these accelerators for various AI
workloads remains both labor- and time-intensive. While existing design
exploration and automation tools can partially alleviate the need for extensive
human involvement, they still demand substantial hardware expertise, posing a
barrier to non-experts and stifling AI accelerator development. Motivated by
the astonishing potential of large language models (LLMs) for generating
high-quality content in response to human language instructions, we embark on
this work to examine the possibility of harnessing LLMs to automate AI
accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework
intended to democratize AI accelerator design by leveraging human natural
languages instead of domain-specific languages. Specifically, we first perform
an in-depth investigation into LLMs’ limitations and capabilities for AI
accelerator design, thus aiding our understanding of our current position and
garnering insights into LLM-powered automated AI accelerator design.
Furthermore, drawing inspiration from the above insights, we develop a
framework called GPT4AIGChip, which features an automated demo-augmented
prompt-generation pipeline utilizing in-context learning to guide LLMs towards
creating high-quality AI accelerator design. To our knowledge, this work is the
first to demonstrate an effective pipeline for LLM-powered automated AI
accelerator generation. Accordingly, we anticipate that our insights and
framework can serve as a catalyst for innovations in next-generation
LLM-powered design automation tools. [COMMENTS]
Accepted by ICCAD 2023 [LINK]
http://arxiv.org/abs/2309.10730v1 [DATE]
2023-09-20 00:14:57+08:00 [CATEGORIES]
cs.LG
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
[AUTHORS]
Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang
[ABSTRACT]
Large language models (LLMs) with billions of parameters have demonstrated
outstanding performance on various natural language processing tasks. This
report presents OpenBA, an open-sourced 15B bilingual asymmetric seq2seq model,
to contribute an LLM variant to the Chinese-oriented open-source model
community. We enhance OpenBA with effective and efficient techniques as well as
adopt a three-stage training strategy to train the model from scratch. Our
solution can also achieve very competitive performance with only 380B tokens,
which is better than LLaMA-70B on the BELEBELE benchmark, BLOOM-176B on the
MMLU benchmark, GLM-130B on the C-Eval (hard) benchmark. This report provides
the main details to pre-train an analogous model, including pre-training data
processing, Bilingual Flan data collection, the empirical observations that
inspire our model architecture design, training objectives of different stages,
and other enhancement techniques. We have refactored our code to follow the
design principles of the Huggingface Transformers Library, making it more
convenient for developers to use, and released checkpoints of different
training stages at https://huggingface.co/openBA. More details of our project
are available at https://github.com/OpenNLG/openBA.git.
[LINK]
http://arxiv.org/abs/2309.10706v1
[DATE]
2023-09-19 23:46:40+08:00
[CATEGORIES]
cs.CL
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
[AUTHORS]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
[LINK]
http://arxiv.org/abs/1910.10683v4
[DATE]
2023-09-19 23:14:48+08:00
[CATEGORIES]
cs.LG
cs.CL
Language Modeling Is Compression
[AUTHORS]
Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness
[ABSTRACT]
It has long been established that predictive models can be transformed into
lossless compressors and vice versa. Incidentally, in recent years, the machine
learning community has focused on training increasingly large and powerful
self-supervised (language) models. Since these large language models exhibit
impressive predictive capabilities, they are well-positioned to be strong
compressors. In this work, we advocate for viewing the prediction problem
through the lens of compression and evaluate the compression capabilities of
large (foundation) models. We show that large language models are powerful
general-purpose predictors and that the compression viewpoint provides novel
insights into scaling laws, tokenization, and in-context learning. For example,
Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to
43.4% and LibriSpeech samples to 16.4% of their raw size, beating
domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively.
Finally, we show that the prediction-compression equivalence allows us to use
any compressor (like gzip) to build a conditional generative model.
[LINK]
http://arxiv.org/abs/2309.10668v1
[DATE]
2023-09-19 22:50:38+08:00
[CATEGORIES]
cs.LG
cs.CL
CFGPT: Chinese Financial Assistant with Large Language Model
[AUTHORS]
Jiangtong Li, Yuxuan Bian, Guoxuan Wang, Yang Lei, Dawei Cheng, Zhijun Ding, Changjun Jiang
[ABSTRACT]
Large language models (LLMs) have demonstrated great potential in natural
language processing tasks within the financial domain. In this work, we present
a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT,
which includes a dataset~(CFData) for pre-training and supervised fine-tuning,
a financial LLM~(CFLLM) to adeptly manage financial texts, and a deployment
framework~(CFAPP) designed to navigate real-world financial applications. The
CFData comprising both a pre-training dataset and a supervised fine-tuning
dataset, where the pre-training dataset collates Chinese financial data and
analytics, alongside a smaller subset of general-purpose text with 584M
documents and 141B tokens in total, and the supervised fine-tuning dataset is
tailored for six distinct financial tasks, embodying various facets of
financial analysis and decision-making with 1.5M instruction pairs and 1.5B
tokens in total. The CFLLM, which is based on InternLM-7B to balance the model
capability and size, is trained on CFData in two stage, continued pre-training
and supervised fine-tuning. The CFAPP is centered on large language models
(LLMs) and augmented with additional modules to ensure multifaceted
functionality in real-world application. Our codes are released at
https://github.com/TongjiFinLab/CFGPT.
[COMMENTS]
12 pages, 5 figures
[LINK]
http://arxiv.org/abs/2309.10654v1
[DATE]
2023-09-19 22:34:01+08:00
[CATEGORIES]
cs.CL
ChatGraph: Interpretable Text Classification by Converting ChatGPT Knowledge to Graphs
[AUTHORS]
Yucheng Shi, Hehuan Ma, Wenliang Zhong, Qiaoyu Tan, Gengchen Mai, Xiang Li, Tianming Liu, Junzhou Huang
[ABSTRACT]
ChatGPT, as a recently launched large language model (LLM), has shown
superior performance in various natural language processing (NLP) tasks.
However, two major limitations hinder its potential applications: (1) the
inflexibility of finetuning on downstream tasks and (2) the lack of
interpretability in the decision-making process. To tackle these limitations,
we propose a novel framework that leverages the power of ChatGPT for specific
tasks, such as text classification, while improving its interpretability. The
proposed framework conducts a knowledge graph extraction task to extract
refined and structural knowledge from the raw data using ChatGPT. The rich
knowledge is then converted into a graph, which is further used to train an
interpretable linear classifier to make predictions. To evaluate the
effectiveness of our proposed method, we conduct experiments on four datasets.
The result shows that our method can significantly improve the performance
compared to directly utilizing ChatGPT for text classification tasks. And our
method provides a more transparent decision-making process compared with
previous text classification methods.
[COMMENTS]
6 pages, 2 figures
[LINK]
http://arxiv.org/abs/2305.03513v2
[DATE]
2023-09-19 22:26:17+08:00
[CATEGORIES]
cs.CL
cs.LG
Attention Is Not All You Need Anymore
[AUTHORS]
Zhe Chen
[ABSTRACT]
In recent years, the popular Transformer architecture has achieved great
success in many application areas, including natural language processing and
computer vision. Many existing works aim to reduce the computational and memory
complexity of the self-attention mechanism in the Transformer by trading off
performance. However, performance is key for the continuing success of the
Transformer. In this paper, a family of drop-in replacements for the
self-attention mechanism in the Transformer, called the Extractors, is
proposed. Four types of the Extractors, namely the super high-performance
Extractor (SHE), the higher-performance Extractor (HE), the worthwhile
Extractor (WE), and the minimalist Extractor (ME), are proposed as examples.
Experimental results show that replacing the self-attention mechanism with the
SHE evidently improves the performance of the Transformer, whereas the
simplified versions of the SHE, i.e., the HE, the WE, and the ME, perform close
to or better than the self-attention mechanism with less computational and
memory complexity. Furthermore, the proposed Extractors have the potential or
are able to run faster than the self-attention mechanism since their critical
paths of computation are much shorter. Additionally, the sequence prediction
problem in the context of text generation is formulated using variable-length
discrete-time Markov chains, and the Transformer is reviewed based on our
understanding.
[LINK]
http://arxiv.org/abs/2308.07661v2
[DATE]
2023-09-19 21:32:07+08:00
[CATEGORIES]
cs.LG
cs.CL
Improving Medical Dialogue Generation with Abstract Meaning Representations
[AUTHORS]
Bohao Yang, Chen Tang, Chenghua Lin
[ABSTRACT]
Medical Dialogue Generation serves a critical role in telemedicine by
facilitating the dissemination of medical expertise to patients. Existing
studies focus on incorporating textual representations, which have limited
their ability to represent the semantics of text, such as ignoring important
medical entities. To enhance the model’s understanding of the textual semantics
and the medical knowledge including entities and relations, we introduce the
use of Abstract Meaning Representations (AMR) to construct graphical
representations that delineate the roles of language constituents and medical
entities within the dialogues. In this paper, We propose a novel framework that
models dialogues between patients and healthcare professionals using AMR
graphs, where the neural networks incorporate textual and graphical knowledge
with a dual attention mechanism. Experimental results show that our framework
outperforms strong baseline models in medical dialogue generation,
demonstrating the effectiveness of AMR graphs in enhancing the representations
of medical knowledge and logical relationships. Furthermore, to support future
research in this domain, we provide the corresponding source code at
https://github.com/Bernard-Yang/MedDiaAMR.
[COMMENTS]
Submitted to ICASSP 2023
[LINK]
http://arxiv.org/abs/2309.10608v1
[DATE]
2023-09-19 21:31:49+08:00
[CATEGORIES]
cs.CL
Unsupervised Deep Cross-Language Entity Alignment
[AUTHORS]
Chuanyu Jiang, Yiming Qian, Lijun Chen, Yang Gu, Xia Xie
[ABSTRACT]
Cross-lingual entity alignment is the task of finding the same semantic
entities from different language knowledge graphs. In this paper, we propose a
simple and novel unsupervised method for cross-language entity alignment. We
utilize the deep learning multi-language encoder combined with a machine
translator to encode knowledge graph text, which reduces the reliance on label
data. Unlike traditional methods that only emphasize global or local alignment,
our method simultaneously considers both alignment strategies. We first view
the alignment task as a bipartite matching problem and then adopt the
re-exchanging idea to accomplish alignment. Compared with the traditional
bipartite matching algorithm that only gives one optimal solution, our
algorithm generates ranked matching results which enabled many potentials
downstream tasks. Additionally, our method can adapt two different types of
optimization (minimal and maximal) in the bipartite matching process, which
provides more flexibility. Our evaluation shows, we each scored 0.966, 0.990,
and 0.996 Hits@1 rates on the DBP15K dataset in Chinese, Japanese, and French
to English alignment tasks. We outperformed the state-of-the-art method in
unsupervised and semi-supervised categories. Compared with the state-of-the-art
supervised method, our method outperforms 2.6% and 0.4% in Ja-En and Fr-En
alignment tasks while marginally lower by 0.2% in the Zh-En alignment task.
[COMMENTS]
17 pages,5 figures, Accepted by ECML PKDD 2023(Research Track)
[LINK]
http://arxiv.org/abs/2309.10598v1
[DATE]
2023-09-19 21:12:48+08:00
[CATEGORIES]
cs.CL
cs.LG
Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning
[AUTHORS]
Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen
[ABSTRACT]
Prompt learning approaches have made waves in natural language processing by
inducing better few-shot performance while they still follow a parametric-based
learning paradigm; the oblivion and rote memorization problems in learning may
encounter unstable generalization issues. Specifically, vanilla prompt learning
may struggle to utilize atypical instances by rote during fully-supervised
training or overfit shallow patterns with low-shot data. To alleviate such
limitations, we develop RetroPrompt with the motivation of decoupling knowledge
from memorization to help the model strike a balance between generalization and
memorization. In contrast with vanilla prompt learning, RetroPrompt constructs
an open-book knowledge-store from training instances and implements a retrieval
mechanism during the process of input, training and inference, thus equipping
the model with the ability to retrieve related contexts from the training
corpus as cues for enhancement. Extensive experiments demonstrate that
RetroPrompt can obtain better performance in both few-shot and zero-shot
settings. Besides, we further illustrate that our proposed RetroPrompt can
yield better generalization abilities with new datasets. Detailed analysis of
memorization indeed reveals RetroPrompt can reduce the reliance of language
models on memorization; thus, improving generalization for downstream tasks.
Code is available in
https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt.
[COMMENTS]
NeurIPS 2022 (Spotlight)
[LINK]
http://arxiv.org/abs/2205.14704v5
[DATE]
2023-09-19 20:33:09+08:00
[CATEGORIES]
cs.CL
cs.LG
Contrastive Demonstration Tuning for Pre-trained Language Models
[AUTHORS]
Xiaozhuan Liang, Ningyu Zhang, Siyuan Cheng, Zhenru Zhang, Chuanqi Tan, Huajun Chen
[ABSTRACT]
Pretrained language models can be effectively stimulated by textual prompts
or demonstrations, especially in low-data scenarios. Recent works have focused
on automatically searching discrete or continuous prompts or optimized
verbalizers, yet studies for the demonstration are still limited. Concretely,
the demonstration examples are crucial for an excellent final performance of
prompt-tuning. In this paper, we propose a novel pluggable, extensible, and
efficient approach named contrastive demonstration tuning, which is free of
demonstration sampling. Furthermore, the proposed approach can be: (i) Plugged
into any previous prompt-tuning approaches; (ii) Extended to widespread
classification tasks with a large number of categories. Experimental results on
16 datasets illustrate that our method integrated with previous approaches
LM-BFF and P-tuning can yield better performance. Code is available in
https://github.com/zjunlp/PromptKG/tree/main/research/Demo-Tuning.
[COMMENTS]
Accepted to EMNLP 2022(Findings)
[LINK]
http://arxiv.org/abs/2204.04392v4
[DATE]
2023-09-19 20:27:36+08:00
[CATEGORIES]
cs.CL
cs.LG
Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt Tuning
[AUTHORS]
Xiang Chen, Lei Li, Ningyu Zhang, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen
[ABSTRACT]
Pre-trained language models have contributed significantly to relation
extraction by demonstrating remarkable few-shot learning abilities. However,
prompt tuning methods for relation extraction may still fail to generalize to
those rare or hard patterns. Note that the previous parametric learning
paradigm can be viewed as memorization regarding training data as a book and
inference as the close-book test. Those long-tailed or hard patterns can hardly
be memorized in parameters given few-shot instances. To this end, we regard RE
as an open-book examination and propose a new semiparametric paradigm of
retrieval-enhanced prompt tuning for relation extraction. We construct an
open-book datastore for retrieval regarding prompt-based instance
representations and corresponding relation labels as memorized key-value pairs.
During inference, the model can infer relations by linearly interpolating the
base output of PLM with the non-parametric nearest neighbor distribution over
the datastore. In this way, our model not only infers relation through
knowledge stored in the weights during training but also assists
decision-making by unwinding and querying examples in the open-book datastore.
Extensive experiments on benchmark datasets show that our method can achieve
state-of-the-art in both standard supervised and few-shot settings. Code are
available in https://github.com/zjunlp/PromptKG/tree/main/research/RetrievalRE.
[COMMENTS]
Accepted by SIGIR 2022, short paper
[LINK]
http://arxiv.org/abs/2205.02355v2
[DATE]
2023-09-19 20:21:53+08:00
[CATEGORIES]
cs.CL
cs.LG
A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings
[AUTHORS]
Danushka Bollegala, Shuichi Otake, Tomoya Machide, Ken-ichi Kawarabayashi
[ABSTRACT]
We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism
considering the neighbourhood of a word in a pretrained static word embedding
space to determine the minimal amount of noise required to guarantee a
specified privacy level. We first construct a nearest neighbour graph over the
words using their embeddings, and factorise it into a set of connected
components (i.e. neighbourhoods). We then separately apply different levels of
Gaussian noise to the words in each neighbourhood, determined by the set of
words in that neighbourhood. Experiments show that our proposed NADP mechanism
consistently outperforms multiple previously proposed DP mechanisms such as
Laplacian, Gaussian, and Mahalanobis in multiple downstream tasks, while
guaranteeing higher levels of privacy.
[COMMENTS]
Accepted to IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.10551v1
[DATE]
2023-09-19 19:58:08+08:00
[CATEGORIES]
cs.LG
cs.CL
Model Leeching: An Extraction Attack Targeting LLMs
[AUTHORS]
Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, Peter Garraghan
[ABSTRACT]
Model Leeching is a novel extraction attack targeting Large Language Models
(LLMs), capable of distilling task-specific knowledge from a target LLM into a
reduced parameter model. We demonstrate the effectiveness of our attack by
extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match
(EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%,
respectively for only $50 in API cost. We further demonstrate the feasibility
of adversarial attack transferability from an extracted model extracted via
Model Leeching to perform ML attack staging against a target LLM, resulting in
an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.
[LINK]
http://arxiv.org/abs/2309.10544v1
[DATE]
2023-09-19 19:45:29+08:00
[CATEGORIES]
cs.LG
cs.CL
NSOAMT – New Search Only Approach to Machine Translation
[AUTHORS]
João Luís, Diogo Cardoso, José Marques, Luís Campos
[ABSTRACT]
Translation automation mechanisms and tools have been developed for several
years to bring people who speak different languages together. A “new search
only approach to machine translation” was adopted to tackle some of the
slowness and inaccuracy of the other technologies. The idea is to develop a
solution that, by indexing an incremental set of words that combine a certain
semantic meaning, makes it possible to create a process of correspondence
between their native language record and the language of translation. This
research principle assumes that the vocabulary used in a given type of
publication/document is relatively limited in terms of language style and word
diversity, which enhances the greater effect of instantaneously and rigor in
the translation process through the indexing process. A volume of electronic
text documents where processed and loaded into a database, and analyzed and
measured in order confirm the previous premise. Although the observed and
projected metric values did not give encouraging results, it was possible to
develop and make available a translation tool using this approach.
[COMMENTS]
17 pages, 13 figures, 12 tables
[LINK]
http://arxiv.org/abs/2309.10526v1
[DATE]
2023-09-19 19:12:21+08:00
[CATEGORIES]
cs.CL
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
[AUTHORS]
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi
[ABSTRACT]
We present a novel integration of an instruction-tuned large language model
(LLM) and end-to-end automatic speech recognition (ASR). Modern LLMs can
perform a wide range of linguistic tasks within zero-shot learning when
provided with a precise instruction or a prompt to guide the text generation
process towards the desired task. We explore using this zero-shot capability of
LLMs to extract linguistic information that can contribute to improving ASR
performance. Specifically, we direct an LLM to correct grammatical errors in an
ASR hypothesis and harness the embedded linguistic knowledge to conduct
end-to-end ASR. The proposed model is built on the hybrid connectionist
temporal classification (CTC) and attention architecture, where an
instruction-tuned LLM (i.e., Llama2) is employed as a front-end of the decoder.
An ASR hypothesis, subject to correction, is obtained from the encoder via CTC
decoding, which is then fed into the LLM along with an instruction. The decoder
subsequently takes as input the LLM embeddings to perform sequence generation,
incorporating acoustic information from the encoder output. Experimental
results and analyses demonstrate that the proposed integration yields promising
performance improvements, and our approach largely benefits from LLM-based
rescoring.
[COMMENTS]
Submitted to ICASSP2024
[LINK]
http://arxiv.org/abs/2309.10524v1
[DATE]
2023-09-19 19:10:50+08:00
[CATEGORIES]
cs.CL
Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval
[AUTHORS]
Nengzheng Jin, Dongfang Li, Junying Chen, Joanna Siebert, Qingcai Chen
[ABSTRACT]
Open-domain table question answering aims to provide answers to a question by
retrieving and extracting information from a large collection of tables.
Existing studies of open-domain table QA either directly adopt text retrieval
methods or consider the table structure only in the encoding layer for table
retrieval, which may cause syntactical and structural information loss during
table scoring. To address this issue, we propose a syntax- and structure-aware
retrieval method for the open-domain table QA task. It provides syntactical
representations for the question and uses the structural header and value
representations for the tables to avoid the loss of fine-grained syntactical
and structural information. Then, a syntactical-to-structural aggregator is
used to obtain the matching score between the question and a candidate table by
mimicking the human retrieval process. Experimental results show that our
method achieves the state-of-the-art on the NQ-tables dataset and overwhelms
strong baselines on a newly curated open-domain Text-to-SQL dataset.
[COMMENTS]
IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.10506v1
[DATE]
2023-09-19 18:40:09+08:00
[CATEGORIES]
cs.CL
Large Language Models are Diverse Role-Players for Summarization Evaluation
[AUTHORS]
Ning Wu, Ming Gong, Linjun Shou, Shining Liang, Daxin Jiang
[ABSTRACT]
Text summarization has a wide range of applications in many scenarios. The
evaluation of the quality of the generated text is a complex problem. A big
challenge to language evaluation is that there is a clear divergence between
existing metrics and human evaluation. A document summary’s quality can be
assessed by human annotators on various criteria, both objective ones like
grammar and correctness, and subjective ones like informativeness,
succinctness, and appeal. Most of the automatic evaluation methods like
BLUE/ROUGE may be not able to adequately capture the above dimensions. In this
paper, we propose a new evaluation framework based on LLMs, which provides a
comprehensive evaluation framework by comparing generated text and reference
text from both objective and subjective aspects. First, we propose to model
objective and subjective dimensions of generated text based on roleplayers
prompting mechanism. Furthermore, we introduce a context-based prompting
mechanism that is able to generate dynamic roleplayer profiles based on input
context. Finally, we design a multi-roleplayer prompting technology based on
batch prompting and integrate multiple outputs into the final evaluation
results. Experimental results on three real datasets for summarization show
that our model is highly competitive and has a very high consistency with human
annotators.
[COMMENTS]
NLPCC 2023
[LINK]
http://arxiv.org/abs/2303.15078v3
[DATE]
2023-09-19 18:07:55+08:00
[CATEGORIES]
cs.CL
LLM4Jobs: Unsupervised occupation extraction and standardization leveraging Large Language Models
[AUTHORS]
Nan Li, Bo Kang, Tijl De Bie
[ABSTRACT]
Automated occupation extraction and standardization from free-text job
postings and resumes are crucial for applications like job recommendation and
labor market policy formation. This paper introduces LLM4Jobs, a novel
unsupervised methodology that taps into the capabilities of large language
models (LLMs) for occupation coding. LLM4Jobs uniquely harnesses both the
natural language understanding and generation capacities of LLMs. Evaluated on
rigorous experimentation on synthetic and real-world datasets, we demonstrate
that LLM4Jobs consistently surpasses unsupervised state-of-the-art benchmarks,
demonstrating its versatility across diverse datasets and granularities. As a
side result of our work, we present both synthetic and real-world datasets,
which may be instrumental for subsequent research in this domain. Overall, this
investigation highlights the promise of contemporary LLMs for the intricate
task of occupation extraction and standardization, laying the foundation for a
robust and adaptable framework relevant to both research and industrial
contexts.
[LINK]
http://arxiv.org/abs/2309.09708v2
[DATE]
2023-09-19 17:28:18+08:00
[CATEGORIES]
cs.CL
Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling
[AUTHORS]
Junzhe Jiang, Shang Qu, Mingyue Cheng, Qi Liu
[ABSTRACT]
Recommender systems are essential for online applications, and sequential
recommendation has enjoyed significant prevalence due to its expressive ability
to capture dynamic user interests. However, previous sequential modeling
methods still have limitations in capturing contextual information. The primary
reason for this issue is that language models often lack an understanding of
domain-specific knowledge and item-related textual content. To address this
issue, we adopt a new sequential recommendation paradigm and propose LANCER,
which leverages the semantic understanding capabilities of pre-trained language
models to generate personalized recommendations. Our approach bridges the gap
between language models and recommender systems, resulting in more human-like
recommendations. We demonstrate the effectiveness of our approach through
experiments on several benchmark datasets, showing promising results and
providing valuable insights into the influence of our model on sequential
recommendation tasks. Furthermore, our experimental codes are publicly
available.
[LINK]
http://arxiv.org/abs/2309.10435v1
[DATE]
2023-09-19 16:54:47+08:00
[CATEGORIES]
cs.CL
Writer-Defined AI Personas for On-Demand Feedback Generation
[AUTHORS]
Karim Benharrak, Tim Zindulka, Florian Lehmann, Hendrik Heuer, Daniel Buschek
[ABSTRACT]
Compelling writing is tailored to its audience. This is challenging, as
writers may struggle to empathize with readers, get feedback in time, or gain
access to the target group. We propose a concept that generates on-demand
feedback, based on writer-defined AI personas of any target audience. We
explore this concept with a prototype (using GPT-3.5) in two user studies (N=5
and N=11): Writers appreciated the concept and strategically used personas for
getting different perspectives. The feedback was seen as helpful and inspired
revisions of text and personas, although it was often verbose and unspecific.
We discuss the impact of on-demand feedback, the limited representativity of
contemporary AI systems, and further ideas for defining AI personas. This work
contributes to the vision of supporting writers with AI by expanding the
socio-technical perspective in AI tool design: To empower creators, we also
need to keep in mind their relationship to an audience.
[COMMENTS]
25 pages, 7 figures, 2 tables
[LINK]
http://arxiv.org/abs/2309.10433v1
[DATE]
2023-09-19 16:49:35+08:00
[CATEGORIES]
cs.CL
PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems
[AUTHORS]
Bryan Wilie, Yan Xu, Willy Chung, Samuel Cahyawijaya, Holy Lovenia, Pascale Fung
[ABSTRACT]
Grounding dialogue response generation on external knowledge is proposed to
produce informative and engaging responses. However, current knowledge-grounded
dialogue (KGD) systems often fail to align the generated responses with
human-preferred qualities due to several issues like hallucination and the lack
of coherence. Upon analyzing multiple language model generations, we observe
the presence of alternative generated responses within a single decoding
process. These alternative responses are more faithful and exhibit a comparable
or higher level of relevance to prior conversational turns compared to the
optimal responses prioritized by the decoding processes. To address these
challenges and driven by these observations, we propose Polished \& Informed
Candidate Scoring (PICK), a generation re-scoring framework that empowers
models to generate faithful and relevant responses without requiring additional
labeled data or model tuning. Through comprehensive automatic and human
evaluations, we demonstrate the effectiveness of PICK in generating responses
that are more faithful while keeping them relevant to the dialogue history.
Furthermore, PICK consistently improves the system’s performance with both
oracle and retrieved knowledge in all decoding strategies. We provide the
detailed implementation in https://github.com/bryanwilie/pick .
[LINK]
http://arxiv.org/abs/2309.10413v1
[DATE]
2023-09-19 16:27:09+08:00
[CATEGORIES]
cs.CL
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
[AUTHORS]
Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li
[ABSTRACT]
In this paper, we introduce Positional Skip-wisE (PoSE) training for
efficient adaptation of large language models~(LLMs) to extremely long context
windows. PoSE decouples train length from target context window size by
simulating long inputs using a fixed context window with manipulated position
indices during training. Concretely, we select several short chunks from a long
input sequence, and introduce distinct skipping bias terms to modify the
position indices of each chunk. These bias terms, along with the length of each
chunk, are altered for each training example, allowing the model to adapt to
all positions within the target context window without training on full length
inputs. Experiments show that, compared with fine-tuning on the full length,
PoSE greatly reduces memory and time overhead with minimal impact on
performance. Leveraging this advantage, we have successfully extended the LLaMA
model to 128k tokens. Furthermore, we empirically confirm that PoSE is
compatible with all RoPE-based LLMs and various position interpolation
strategies. Notably, by decoupling fine-tuning length from target context
window, PoSE can theoretically extend the context window infinitely,
constrained only by memory usage for inference. With ongoing advancements for
efficient inference, we believe PoSE holds great promise for scaling the
context window even further.
[LINK]
http://arxiv.org/abs/2309.10400v1
[DATE]
2023-09-19 16:03:38+08:00
[CATEGORIES]
cs.CL
cs.LG
FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation
[AUTHORS]
Yan Meng, Liangming Pan, Yixin Cao, Min-Yen Kan
[ABSTRACT]
Humans ask follow-up questions driven by curiosity, which reflects a creative
human cognitive process. We introduce the task of real-world
information-seeking follow-up question generation (FQG), which aims to generate
follow-up questions seeking a more in-depth understanding of an initial
question and answer. We construct FOLLOWUPQG, a dataset of over 3K real-world
(initial question, answer, follow-up question) tuples collected from a Reddit
forum providing layman-friendly explanations for open-ended questions. In
contrast to existing datasets, questions in FOLLOWUPQG use more diverse
pragmatic strategies to seek information, and they also show higher-order
cognitive skills (such as applying and relating). We evaluate current question
generation models on their efficacy for generating follow-up questions,
exploring how to generate specific types of follow-up questions based on
step-by-step demonstrations. Our results validate FOLLOWUPQG as a challenging
benchmark, as model-generated questions are adequate but far from human-raised
questions in terms of informativeness and complexity.
[LINK]
http://arxiv.org/abs/2309.05007v2
[DATE]
2023-09-19 15:51:03+08:00
[CATEGORIES]
cs.CL
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models
[AUTHORS]
Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan
[ABSTRACT]
Graphic layout generation, a growing research field, plays a significant role
in user engagement and information perception. Existing methods primarily treat
layout generation as a numerical optimization task, focusing on quantitative
aspects while overlooking the semantic information of layout, such as the
relationship between each layout element. In this paper, we propose LayoutNUWA,
the first model that treats layout generation as a code generation task to
enhance semantic information and harness the hidden layout expertise of large
language models~(LLMs). More concretely, we develop a Code Instruct Tuning
(CIT) approach comprising three interconnected modules: 1) the Code
Initialization (CI) module quantifies the numerical conditions and initializes
them as HTML code with strategically placed masks; 2) the Code Completion (CC)
module employs the formatting knowledge of LLMs to fill in the masked portions
within the HTML code; 3) the Code Rendering (CR) module transforms the
completed code into the final layout output, ensuring a highly interpretable
and transparent layout generation procedure that directly maps code to a
visualized layout. We attain significant state-of-the-art performance (even
over 50\% improvements) on multiple datasets, showcasing the strong
capabilities of LayoutNUWA. Our code is available at
https://github.com/ProjectNUWA/LayoutNUWA.
[LINK]
http://arxiv.org/abs/2309.09506v2
[DATE]
2023-09-19 15:47:52+08:00
[CATEGORIES]
cs.CL
Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning
[AUTHORS]
Rongxiang Weng, Qiang Wang, Wensen Cheng, Changfeng Zhu, Min Zhang
[ABSTRACT]
Neural machine translation (NMT) has achieved remarkable success in producing
high-quality translations. However, current NMT systems suffer from a lack of
reliability, as their outputs that are often affected by lexical or syntactic
changes in inputs, resulting in large variations in quality. This limitation
hinders the practicality and trustworthiness of NMT. A contributing factor to
this problem is that NMT models trained with the one-to-one paradigm struggle
to handle the source diversity phenomenon, where inputs with the same meaning
can be expressed differently. In this work, we treat this problem as a bilevel
optimization problem and present a consistency-aware meta-learning (CAML)
framework derived from the model-agnostic meta-learning (MAML) algorithm to
address it. Specifically, the NMT model with CAML (named CoNMT) first learns a
consistent meta representation of semantically equivalent sentences in the
outer loop. Subsequently, a mapping from the meta representation to the output
sentence is learned in the inner loop, allowing the NMT model to translate
semantically equivalent sentences to the same target sentence. We conduct
experiments on the NIST Chinese to English task, three WMT translation tasks,
and the TED M2O task. The results demonstrate that CoNMT effectively improves
overall translation quality and reliably handles diverse inputs.
[LINK]
http://arxiv.org/abs/2303.10966v2
[DATE]
2023-09-19 15:37:31+08:00
[CATEGORIES]
cs.CL
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
[AUTHORS]
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
[ABSTRACT]
We present a comprehensive empirical study for personalized spontaneous
speech synthesis on the basis of linguistic knowledge. With the advent of voice
cloning for reading-style speech synthesis, a new voice cloning paradigm for
human-like and spontaneous speech synthesis is required. We, therefore, focus
on personalized spontaneous speech synthesis that can clone both the
individual’s voice timbre and speech disfluency. Specifically, we deal with
filled pauses, a major source of speech disfluency, which is known to play an
important role in speech generation and communication in psychology and
linguistics. To comparatively evaluate personalized filled pause insertion and
non-personalized filled pause prediction methods, we developed a speech
synthesis method with a non-personalized external filled pause predictor
trained with a multi-speaker corpus. The results clarify the position-word
entanglement of filled pauses, i.e., the necessity of precisely predicting
positions for naturalness and the necessity of precisely predicting words for
individuality on the evaluation of synthesized speech.
[COMMENTS]
Accepted to APSIPA ASC 2022
[LINK]
http://arxiv.org/abs/2210.07559v2
[DATE]
2023-09-19 14:51:38+08:00
[CATEGORIES]
cs.CL
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
[AUTHORS]
Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein
[ABSTRACT]
Despite the power of Large Language Models (LLMs) like GPT-4, they still
struggle with tasks that require generating complex, structured outputs. In
this study, we assess the capability of Current LLMs in generating complex
structured data and propose a structure-aware fine-tuning approach as a
solution to improve this ability. To perform a comprehensive evaluation, we
propose Struc-Bench, include five representative LLMs (i.e., GPT-NeoX 20B,
GPT-3.5, GPT-4, and Vicuna) and evaluate them on our carefully constructed
datasets spanning raw text, HTML, and LaTeX tables. Based on our analysis of
current model performance, we identify specific common formatting errors and
areas of potential improvement. To address complex formatting requirements, we
utilize FormatCoT (Chain-of-Thought) to generate format instructions from
target outputs. Our experiments show that our structure-aware fine-tuning
method, when applied to LLaMA-7B, significantly improves adherence to natural
language constraints, outperforming other evaluated LLMs. Based on these
results, we present an ability map of model capabilities from six dimensions
(i.e., coverage, formatting, reasoning, comprehension, pragmatics, and
hallucination). This map highlights the weaknesses of LLMs in handling complex
structured outputs and suggests promising directions for future work. Our code
and models can be found at https://github.com/gersteinlab/Struc-Bench.
[LINK]
http://arxiv.org/abs/2309.08963v2
[DATE]
2023-09-19 13:58:47+08:00
[CATEGORIES]
cs.CL
KoBigBird-large: Transformation of Transformer for Korean Language Understanding
[AUTHORS]
Kisu Yang, Yoonna Jang, Taewoo Lee, Jinwoo Seong, Hyungjin Lee, Hwanseok Jang, Heuiseok Lim
[ABSTRACT]
This work presents KoBigBird-large, a large size of Korean BigBird that
achieves state-of-the-art performance and allows long sequence processing for
Korean language understanding. Without further pretraining, we only transform
the architecture and extend the positional encoding with our proposed Tapered
Absolute Positional Encoding Representations (TAPER). In experiments,
KoBigBird-large shows state-of-the-art overall performance on Korean language
understanding benchmarks and the best performance on document classification
and question answering tasks for longer sequences against the competitive
baseline models. We publicly release our model here.
[COMMENTS]
Accepted at IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.10339v1
[DATE]
2023-09-19 13:48:57+08:00
[CATEGORIES]
cs.CL
Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation
[AUTHORS]
Qiang Wang, Rongxiang Weng, Ming Chen
[ABSTRACT]
K-Nearest Neighbor Neural Machine Translation (kNN-MT) successfully
incorporates external corpus by retrieving word-level representations at test
time. Generally, kNN-MT borrows the off-the-shelf context representation in the
translation task, e.g., the output of the last decoder layer, as the query
vector of the retrieval task. In this work, we highlight that coupling the
representations of these two tasks is sub-optimal for fine-grained retrieval.
To alleviate it, we leverage supervised contrastive learning to learn the
distinctive retrieval representation derived from the original context
representation. We also propose a fast and effective approach to constructing
hard negative samples. Experimental results on five domains show that our
approach improves the retrieval accuracy and BLEU score compared to vanilla
kNN-MT.
[COMMENTS]
Accepted by COLING 2022
[LINK]
http://arxiv.org/abs/2209.08738v3
[DATE]
2023-09-19 12:25:32+08:00
[CATEGORIES]
cs.CL
Using fine-tuning and min lookahead beam search to improve Whisper
[AUTHORS]
Andrea Do, Oscar Brown, Zhengjie Wang, Nikhil Mathew, Zixin Liu, Jawwad Ahmed, Cheng Yu
[ABSTRACT]
The performance of Whisper in low-resource languages is still far from
perfect. In addition to a lack of training data on low-resource languages, we
identify some limitations in the beam search algorithm used in Whisper. To
address these issues, we fine-tune Whisper on additional data and propose an
improved decoding algorithm. On the Vietnamese language, fine-tuning
Whisper-Tiny with LoRA leads to an improvement of 38.49 in WER over the
zero-shot Whisper-Tiny setting which is a further reduction of 1.45 compared to
full-parameter fine-tuning. Additionally, by using Filter-Ends and Min
Lookahead decoding algorithms, the WER reduces by 2.26 on average over a range
of languages compared to standard beam search. These results generalise to
larger Whisper model sizes. We also prove a theorem that Min Lookahead
outperforms the standard beam search algorithm used in Whisper.
[COMMENTS]
8 pages, submitted to IEEE ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.10299v1
[DATE]
2023-09-19 12:04:14+08:00
[CATEGORIES]
cs.CL
cs.LG
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
[AUTHORS]
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen
[ABSTRACT]
In this paper, we explored how to boost speech emotion recognition (SER) with
the state-of-the-art speech pre-trained model (PTM), data2vec, text generation
technique, GPT-4, and speech synthesis technique, Azure TTS. First, we
investigated the representation ability of different speech self-supervised
pre-trained models, and we found that data2vec has a good representation
ability on the SER task. Second, we employed a powerful large language model
(LLM), GPT-4, and emotional text-to-speech (TTS) model, Azure TTS, to generate
emotionally congruent text and speech. We carefully designed the text prompt
and dataset construction, to obtain the synthetic emotional speech data with
high quality. Third, we studied different ways of data augmentation to promote
the SER task with synthetic speech, including random mixing, adversarial
training, transfer learning, and curriculum learning. Experiments and ablation
studies on the IEMOCAP dataset demonstrate the effectiveness of our method,
compared with other data augmentation methods, and data augmentation with other
synthetic data.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2309.10294v1
[DATE]
2023-09-19 11:52:01+08:00
[CATEGORIES]
cs.CL
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
[AUTHORS]
Wei Qi Leong, Jian Gang Ngui, Yosephine Susanto, Hamsawardhini Rengarajan, Kengatharaiyer Sarveswaran, William Chandra Tjhi
[ABSTRACT]
The rapid development of Large Language Models (LLMs) and the emergence of
novel abilities with scale have necessitated the construction of holistic,
diverse and challenging benchmarks such as HELM and BIG-bench. However, at the
moment, most of these benchmarks focus only on performance in English and
evaluations that include Southeast Asian (SEA) languages are few in number. We
therefore propose BHASA, a holistic linguistic and cultural evaluation suite
for LLMs in SEA languages. It comprises three components: (1) a NLP benchmark
covering eight tasks across Natural Language Understanding (NLU), Generation
(NLG) and Reasoning (NLR) tasks, (2) LINDSEA, a linguistic diagnostic toolkit
that spans the gamut of linguistic phenomena including syntax, semantics and
pragmatics, and (3) a cultural diagnostics dataset that probes for both
cultural representation and sensitivity. For this preliminary effort, we
implement the NLP benchmark only for Indonesian, Vietnamese, Thai and Tamil,
and we only include Indonesian and Tamil for LINDSEA and the cultural
diagnostics dataset. As GPT-4 is purportedly one of the best-performing
multilingual LLMs at the moment, we use it as a yardstick to gauge the
capabilities of LLMs in the context of SEA languages. Our initial experiments
on GPT-4 with BHASA find it lacking in various aspects of linguistic
capabilities, cultural representation and sensitivity in the targeted SEA
languages. BHASA is a work in progress and will continue to be improved and
expanded in the future. The repository for this paper can be found at:
https://github.com/aisingapore/BHASA
[COMMENTS]
86 pages, 7 figures, added link to repository in abstract, minor
formatting changes and typo corrections
[LINK]
http://arxiv.org/abs/2309.06085v2
[DATE]
2023-09-19 11:44:17+08:00
[CATEGORIES]
cs.CL
Differentially Private Optimization on Large Model at Small Cost
[AUTHORS]
Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis
[ABSTRACT]
Differentially private (DP) optimization is the standard paradigm to learn
large neural networks that are accurate and privacy-preserving. The
computational cost for DP deep learning, however, is notoriously heavy due to
the per-sample gradient clipping. Existing DP implementations are 2-1000X more
costly in time and space complexity than the standard (non-private) training.
In this work, we develop a novel Book-Keeping (BK) technique that implements
existing DP optimizers (thus achieving the same accuracy), with a substantial
improvement on the computational cost. Specifically, BK enables DP training on
large models and high dimensional data to be roughly as fast and memory-saving
as the standard training, whereas previous DP algorithms can be inefficient or
incapable of training due to memory error. The computational advantage of BK is
supported by the complexity analysis as well as extensive experiments on vision
and language tasks. Our implementation achieves state-of-the-art (SOTA)
accuracy with very small extra cost: on GPT2 and at almost the same memory cost
(<1% overhead), BK has 1.03X the time complexity of the standard training
(0.83X training speed in practice), and 0.61X the time complexity of the most
efficient DP implementation (1.36X training speed in practice). We open-source
the codebase for the BK algorithm at the FastDP library
(https://github.com/awslabs/fast-differential-privacy).
[LINK]
http://arxiv.org/abs/2210.00038v2
[DATE]
2023-09-19 10:14:06+08:00
[CATEGORIES]
cs.LG
cs.CL
What is the Best Automated Metric for Text to Motion Generation?
[AUTHORS]
Jordan Voas, Yili Wang, Qixing Huang, Raymond Mooney
[ABSTRACT]
There is growing interest in generating skeleton-based human motions from
natural language descriptions. While most efforts have focused on developing
better neural architectures for this task, there has been no significant work
on determining the proper evaluation metric. Human evaluation is the ultimate
accuracy measure for this task, and automated metrics should correlate well
with human quality judgments. Since descriptions are compatible with many
motions, determining the right metric is critical for evaluating and designing
effective generative models. This paper systematically studies which metrics
best align with human evaluations and proposes new metrics that align even
better. Our findings indicate that none of the metrics currently used for this
task show even a moderate correlation with human judgments on a sample level.
However, for assessing average model performance, commonly used metrics such as
R-Precision and less-used coordinate errors show strong correlations.
Additionally, several recently developed metrics are not recommended due to
their low correlation compared to alternatives. We also introduce a novel
metric based on a multimodal BERT-like model, MoBERT, which offers strongly
human-correlated sample-level evaluations while maintaining near-perfect
model-level correlation. Our results demonstrate that this new metric exhibits
extensive benefits over all current alternatives.
[COMMENTS]
8 pages, SIGGRAPH Asia 2023 Conference
[LINK]
http://arxiv.org/abs/2309.10248v1
[DATE]
2023-09-19 09:59:54+08:00
[CATEGORIES]
cs.CL
cs.LG
Applying Automated Machine Translation to Educational Video Courses
[AUTHORS]
Linden Wang
[ABSTRACT]
We studied the capability of automated machine translation in the online
video education space by automatically translating Khan Academy videos with
state-of-the-art translation models and applying text-to-speech synthesis and
audio/video synchronization to build engaging videos in target languages. We
also analyzed and established two reliable translation confidence estimators
based on round-trip translations in order to efficiently manage translation
quality and reduce human translation effort. Finally, we developed a deployable
system to deliver translated videos to end users and collect user corrections
for iterative improvement.
[LINK]
http://arxiv.org/abs/2301.03141v2
[DATE]
2023-09-19 09:16:00+08:00
[CATEGORIES]
cs.CL
Traveling Words: A Geometric Interpretation of Transformers
[AUTHORS]
Raul Molina
[ABSTRACT]
Transformers have significantly advanced the field of natural language
processing, but comprehending their internal mechanisms remains a challenge. In
this paper, we introduce a novel geometric perspective that elucidates the
inner mechanisms of transformer operations. Our primary contribution is
illustrating how layer normalization confines the latent features to a
hyper-sphere, subsequently enabling attention to mold the semantic
representation of words on this surface. This geometric viewpoint seamlessly
connects established properties such as iterative refinement and contextual
embeddings. We validate our insights by probing a pre-trained 124M parameter
GPT-2 model. Our findings reveal clear query-key attention patterns in early
layers and build upon prior observations regarding the subject-specific nature
of attention heads at deeper layers. Harnessing these geometric insights, we
present an intuitive understanding of transformers, depicting them as processes
that model the trajectory of word particles along the hyper-sphere.
[LINK]
http://arxiv.org/abs/2309.07315v2
[DATE]
2023-09-19 08:34:56+08:00
[CATEGORIES]
cs.CL
cs.LG
Few-Shot Adaptation for Parsing Contextual Utterances with LLMs
[AUTHORS]
Kevin Lin, Patrick Xia, Hao Fang
[ABSTRACT]
We evaluate the ability of semantic parsers based on large language models
(LLMs) to handle contextual utterances. In real-world settings, there typically
exists only a limited number of annotated contextual utterances due to
annotation cost, resulting in an imbalance compared to non-contextual
utterances. Therefore, parsers must adapt to contextual utterances with a few
training examples. We examine four major paradigms for doing so in
conversational semantic parsing i.e., Parse-with-Utterance-History,
Parse-with-Reference-Program, Parse-then-Resolve, and Rewrite-then-Parse. To
facilitate such cross-paradigm comparisons, we construct
SMCalFlow-EventQueries, a subset of contextual examples from SMCalFlow with
additional annotations. Experiments with in-context learning and fine-tuning
suggest that Rewrite-then-Parse is the most promising paradigm when
holistically considering parsing accuracy, annotation cost, and error types.
[COMMENTS]
Findings of IJCNLP-AACL 2023
[LINK]
http://arxiv.org/abs/2309.10168v1
[DATE]
2023-09-19 05:35:19+08:00
[CATEGORIES]
cs.CL
ChatGPT Informed Graph Neural Network for Stock Movement Prediction
[AUTHORS]
Zihan Chen, Lei Nico Zheng, Cheng Lu, Jialu Yuan, Di Zhu
[ABSTRACT]
ChatGPT has demonstrated remarkable capabilities across various natural
language processing (NLP) tasks. However, its potential for inferring dynamic
network structures from temporal textual data, specifically financial news,
remains an unexplored frontier. In this research, we introduce a novel
framework that leverages ChatGPT’s graph inference capabilities to enhance
Graph Neural Networks (GNN). Our framework adeptly extracts evolving network
structures from textual data, and incorporates these networks into graph neural
networks for subsequent predictive tasks. The experimental results from stock
movement forecasting indicate our model has consistently outperformed the
state-of-the-art Deep Learning-based benchmarks. Furthermore, the portfolios
constructed based on our model’s outputs demonstrate higher annualized
cumulative returns, alongside reduced volatility and maximum drawdown. This
superior performance highlights the potential of ChatGPT for text-based network
inferences and underscores its promising implications for the financial sector.
[COMMENTS]
Dataset is available at
[https://github.com/ZihanChen1995/ChatGPT-GNN-StockPredict]. Accepted for the
oral presentation at SIGKDD 2023 Workshop on Robust NLP for Finance
[LINK]
http://arxiv.org/abs/2306.03763v4
[DATE]
2023-09-19 04:26:04+08:00
[CATEGORIES]
cs.CL
cs.LG
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
[AUTHORS]
Suhas Kotha, Jacob Mitchell Springer, Aditi Raghunathan
[ABSTRACT]
Fine-tuning (via methods such as instruction-tuning or reinforcement learning
from human feedback) is a crucial step in training language models to robustly
carry out tasks of interest. However, we lack a systematic understanding of the
effects of fine-tuning, particularly on tasks outside the narrow fine-tuning
distribution. In a simplified scenario, we demonstrate that improving
performance on tasks within the fine-tuning data distribution comes at the
expense of suppressing model capabilities on other tasks. This degradation is
especially pronounced for tasks “closest” to the fine-tuning distribution. We
hypothesize that language models implicitly infer the task of the prompt
corresponds, and the fine-tuning process predominantly skews this task
inference towards tasks in the fine-tuning distribution. To test this
hypothesis, we propose Conjugate Prompting to see if we can recover pretrained
capabilities. Conjugate prompting artificially makes the task look farther from
the fine-tuning distribution while requiring the same capability. We find that
conjugate prompting systematically recovers some of the pretraining
capabilities on our synthetic setup. We then apply conjugate prompting to
real-world LLMs using the observation that fine-tuning distributions are
typically heavily skewed towards English. We find that simply translating the
prompts to different languages can cause the fine-tuned models to respond like
their pretrained counterparts instead. This allows us to recover the in-context
learning abilities lost via instruction tuning, and more concerningly, to
recover harmful content generation suppressed by safety fine-tuning in chatbots
like ChatGPT.
[LINK]
http://arxiv.org/abs/2309.10105v1
[DATE]
2023-09-19 03:28:48+08:00
[CATEGORIES]
cs.CL
cs.LG
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
[AUTHORS]
Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal
[ABSTRACT]
The canonical approach to video-text retrieval leverages a coarse-grained or
fine-grained alignment between visual and textual information. However,
retrieving the correct video according to the text query is often challenging
as it requires the ability to reason about both high-level (scene) and
low-level (object) visual clues and how they relate to the text query. To this
end, we propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA.
Specifically, our model captures the cross-modal similarity information at
different granularity levels. To alleviate the effect of irrelevant visual
clues, we also apply an Interactive Similarity Aggregation module (ISA) to
consider the importance of different visual features while aggregating the
cross-modal similarity to obtain a similarity score for each granularity.
Finally, we apply the Sinkhorn-Knopp algorithm to normalize the similarities of
each level before summing them, alleviating over- and under-representation
issues at different levels. By jointly considering the crossmodal similarity of
different granularity, UCoFiA allows the effective unification of multi-grained
alignments. Empirically, UCoFiA outperforms previous state-of-the-art
CLIP-based methods on multiple video-text retrieval benchmarks, achieving 2.4%,
1.4% and 1.3% improvements in text-to-video retrieval R@1 on MSR-VTT,
Activity-Net, and DiDeMo, respectively. Our code is publicly available at
https://github.com/Ziyang412/UCoFiA.
[COMMENTS]
ICCV 2023
[LINK]
http://arxiv.org/abs/2309.10091v1
[DATE]
2023-09-19 03:04:37+08:00
[CATEGORIES]
cs.CL
cs.LG
HTEC: Human Transcription Error Correction
[AUTHORS]
Hanbo Sun, Jian Gao, Xiaomin Wu, Anjie Fang, Cheng Cao, Zheng Du
[ABSTRACT]
High-quality human transcription is essential for training and improving
Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has
found that every 1% worse transcription Word Error Rate (WER) increases
approximately 2% ASR WER by using the transcriptions to train ASR models.
Transcription errors are inevitable for even highly-trained annotators.
However, few studies have explored human transcription correction. Error
correction methods for other problems, such as ASR error correction and
grammatical error correction, do not perform sufficiently for this problem.
Therefore, we propose HTEC for Human Transcription Error Correction. HTEC
consists of two stages: Trans-Checker, an error detection model that predicts
and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative
model that fills masked positions. We propose a holistic list of correction
operations, including four novel operations handling deletion errors. We
further propose a variant of embeddings that incorporates phoneme information
into the input of the transformer. HTEC outperforms other methods by a large
margin and surpasses human annotators by 2.2% to 4.5% in WER. Finally, we
deployed HTEC to assist human annotators and showed HTEC is particularly
effective as a co-pilot, which improves transcription quality by 15.1% without
sacrificing transcription velocity.
[COMMENTS]
13 pages, 4 figures, 11 tables, AMLC 2023
[LINK]
http://arxiv.org/abs/2309.10089v1
[DATE]
2023-09-19 03:03:21+08:00
[CATEGORIES]
cs.CL
cs.LG
Automatic Personalized Impression Generation for PET Reports Using Large Language Models
[AUTHORS]
Xin Tie, Muheon Shin, Ali Pirasteh, Nevein Ibrahim, Zachary Huemann, Sharon M. Castellino, Kara M. Kelly, John Garrett, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw
[COMMENTS]
18 pages for the main body, 13 pages for the appendix. 6 figures and
3 tables in the main body. This manuscript is submitted to Radiology:
Artificial Intelligence
[LINK]
http://arxiv.org/abs/2309.10066v1
[DATE]
2023-09-19 02:33:40+08:00
[CATEGORIES]
cs.CL
Hierarchy Builder: Organizing Textual Spans into a Hierarchy to Facilitate Navigation
[AUTHORS]
Itay Yair, Hillel Taub-Tabib, Yoav Goldberg
[ABSTRACT]
Information extraction systems often produce hundreds to thousands of strings
on a specific topic. We present a method that facilitates better consumption of
these strings, in an exploratory setting in which a user wants to both get a
broad overview of what’s available, and a chance to dive deeper on some
aspects. The system works by grouping similar items together and arranging the
remaining items into a hierarchical navigable DAG structure. We apply the
method to medical information extraction.
[COMMENTS]
9 pages including citations; Presented at the ACL 2023 DEMO track,
pages 282-290
[LINK]
http://arxiv.org/abs/2309.10057v1
[DATE]
2023-09-19 02:11:24+08:00
[CATEGORIES]
cs.CL
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
[AUTHORS]
Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons
[ABSTRACT]
Are foundation models secure from malicious actors? In this work, we focus on
the image input to a vision-language model (VLM). We discover image hijacks,
adversarial images that control generative models at runtime. We introduce
Behaviour Matching, a general method for creating image hijacks, and we use it
to explore three types of attacks. Specific string attacks generate arbitrary
output of the adversary’s choice. Leak context attacks leak information from
the context window into the output. Jailbreak attacks circumvent a model’s
safety training. We study these attacks against LLaVA, a state-of-the-art VLM
based on CLIP and LLaMA-2, and find that all our attack types have above a 90%
success rate. Moreover, our attacks are automated and require only small image
perturbations. These findings raise serious concerns about the security of
foundation models. If image hijacks are as difficult to defend against as
adversarial examples in CIFAR-10, then it might be many years before a solution
is found – if it even exists.
[COMMENTS]
Project page at https://image-hijacks.github.io
[LINK]
http://arxiv.org/abs/2309.00236v2
[DATE]
2023-09-19 01:59:23+08:00
[CATEGORIES]
cs.LG
cs.CL
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
[AUTHORS]
Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao
[ABSTRACT]
This paper presents a comprehensive survey of the taxonomy and evolution of
multimodal foundation models that demonstrate vision and vision-language
capabilities, focusing on the transition from specialist models to
general-purpose assistants. The research landscape encompasses five core
topics, categorized into two classes. (i) We start with a survey of
well-established research areas: multimodal foundation models pre-trained for
specific purposes, including two topics – methods of learning vision backbones
for visual understanding and text-to-image generation. (ii) Then, we present
recent advances in exploratory, open research areas: multimodal foundation
models that aim to play the role of general-purpose assistants, including three
topics – unified vision models inspired by large language models (LLMs),
end-to-end training of multimodal LLMs, and chaining multimodal tools with
LLMs. The target audiences of the paper are researchers, graduate students, and
professionals in computer vision and vision-language multimodal communities who
are eager to learn the basics and recent advances in multimodal foundation
models.
[COMMENTS]
119 pages, PDF file size 58MB; Tutorial website:
https://vlp-tutorial.github.io/2023/
[LINK]
http://arxiv.org/abs/2309.10020v1
[DATE]
2023-09-19 01:56:28+08:00
[CATEGORIES]
cs.CL
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
[AUTHORS]
Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen
[ABSTRACT]
Visual instruction tuning has recently shown encouraging progress with
open-source large multimodal models (LMM) such as LLaVA and MiniGPT-4. However,
most existing studies of open-source LMM are performed using models with 13B
parameters or smaller. In this paper we present an empirical study of scaling
LLaVA up to 33B and 65B/70B, and share our findings from our explorations in
image resolution, data mixing and parameter-efficient training methods such as
LoRA/QLoRA. These are evaluated by their impact on the multi-modal and language
capabilities when completing real-world tasks in the wild.
We find that scaling LMM consistently enhances model performance and improves
language capabilities, and performance of LoRA/QLoRA tuning of LMM are
comparable to the performance of full-model fine-tuning. Additionally, the
study highlights the importance of higher image resolutions and mixing
multimodal-language data to improve LMM performance, and visual instruction
tuning can sometimes improve LMM’s pure language capability. We hope that this
study makes state-of-the-art LMM research at a larger scale more accessible,
thus helping establish stronger baselines for future research. Code and
checkpoints will be made public.
[COMMENTS]
Released at LLaVA Model Zoo:
https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md
[LINK]
http://arxiv.org/abs/2309.09958v1
[DATE]
2023-09-19 01:30:46+08:00
[CATEGORIES]
cs.CL
How to Generate Popular Post Headlines on Social Media?
[AUTHORS]
Zhouxiang Fang, Min Yu, Zhendong Fu, Boning Zhang, Xuanwen Huang, Xiaoqi Tang, Yang Yang
[ABSTRACT]
Posts, as important containers of user-generated-content pieces on social
media, are of tremendous social influence and commercial value. As an integral
components of a post, the headline has a decisive contribution to the post’s
popularity. However, current mainstream method for headline generation is still
manually writing, which is unstable and requires extensive human effort. This
drives us to explore a novel research question: Can we automate the generation
of popular headlines on social media? We collect more than 1 million posts of
42,447 celebrities from public data of Xiaohongshu, which is a well-known
social media platform in China. We then conduct careful observations on the
headlines of these posts. Observation results demonstrate that trends and
personal styles are widespread in headlines on social medias and have
significant contribution to posts’s popularity. Motivated by these insights, we
present MEBART, which combines Multiple preference-Extractors with
Bidirectional and Auto-Regressive Transformers (BART), capturing trends and
personal styles to generate popular headlines on social medias. We perform
extensive experiments on real-world datasets and achieve state-of-the-art
performance compared with several advanced baselines. In addition, ablation and
case studies demonstrate that MEBART advances in capturing trends and personal
styles.
[LINK]
http://arxiv.org/abs/2309.09949v1
[DATE]
2023-09-19 01:12:58+08:00
[CATEGORIES]
cs.CL
Generative Knowledge Graph Construction: A Review
[AUTHORS]
Hongbin Ye, Ningyu Zhang, Hui Chen, Huajun Chen
[ABSTRACT]
Generative Knowledge Graph Construction (KGC) refers to those methods that
leverage the sequence-to-sequence framework for building knowledge graphs,
which is flexible and can be adapted to widespread tasks. In this study, we
summarize the recent compelling progress in generative knowledge graph
construction. We present the advantages and weaknesses of each paradigm in
terms of different generation targets and provide theoretical insight and
empirical analysis. Based on the review, we suggest promising research
directions for the future. Our contributions are threefold: (1) We present a
detailed, complete taxonomy for the generative KGC methods; (2) We provide a
theoretical and empirical analysis of the generative KGC methods; (3) We
propose several research directions that can be developed in the future.
[COMMENTS]
Accepted to EMNLP 2022 (oral) and a public repository is available in
https://github.com/zjunlp/Generative_KG_Construction_Papers
[LINK]
http://arxiv.org/abs/2210.12714v3
[DATE]
2023-09-19 00:56:10+08:00
[CATEGORIES]
cs.CL
cs.LG
Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction
[AUTHORS]
Yunzhi Yao, Shengyu Mao, Ningyu Zhang, Xiang Chen, Shumin Deng, Xi Chen, Huajun Chen
[ABSTRACT]
With the development of pre-trained language models, many prompt-based
approaches to data-efficient knowledge graph construction have been proposed
and achieved impressive performance. However, existing prompt-based learning
methods for knowledge graph construction are still susceptible to several
potential limitations: (i) semantic gap between natural language and output
structured knowledge with pre-defined schema, which means model cannot fully
exploit semantic knowledge with the constrained templates; (ii) representation
learning with locally individual instances limits the performance given the
insufficient features, which are unable to unleash the potential analogical
capability of pre-trained language models. Motivated by these observations, we
propose a retrieval-augmented approach, which retrieves schema-aware Reference
As Prompt (RAP), for data-efficient knowledge graph construction. It can
dynamically leverage schema and knowledge inherited from human-annotated and
weak-supervised data as a prompt for each sample, which is model-agnostic and
can be plugged into widespread existing approaches. Experimental results
demonstrate that previous methods integrated with RAP can achieve impressive
performance gains in low-resource settings on five datasets of relational
triple extraction and event extraction for knowledge graph construction. Code
is available in https://github.com/zjunlp/RAP.
[COMMENTS]
Accepted by SIGIR 2023
[LINK]
http://arxiv.org/abs/2210.10709v5
[DATE]
2023-09-19 00:53:26+08:00
[CATEGORIES]
cs.CL
cs.LG
One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER
[AUTHORS]
Xiang Chen, Lei Li, Shuofei Qiao, Ningyu Zhang, Chuanqi Tan, Yong Jiang, Fei Huang, Huajun Chen
[ABSTRACT]
Cross-domain NER is a challenging task to address the low-resource problem in
practical scenarios. Previous typical solutions mainly obtain a NER model by
pre-trained language models (PLMs) with data from a rich-resource domain and
adapt it to the target domain. Owing to the mismatch issue among entity types
in different domains, previous approaches normally tune all parameters of PLMs,
ending up with an entirely new NER model for each domain. Moreover, current
models only focus on leveraging knowledge in one general source domain while
failing to successfully transfer knowledge from multiple sources to the target.
To address these issues, we introduce Collaborative Domain-Prefix Tuning for
cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically,
we present text-to-text generation grounding domain-related instructors to
transfer knowledge to new domain NER tasks without structural modifications. We
utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate
the potential of PLMs to handle NER tasks across various domains. Experimental
results on the Cross-NER benchmark show that the proposed approach has flexible
transfer ability and performs better on both one-source and multiple-source
cross-domain NER tasks. Codes are available in
https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross.
[COMMENTS]
IJCAI 2023
[LINK]
http://arxiv.org/abs/2301.10410v5
[DATE]
2023-09-19 00:51:00+08:00
[CATEGORIES]
cs.CL
cs.LG
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction
[AUTHORS]
Xiang Chen, Ningyu Zhang, Xin Xie, Shumin Deng, Yunzhi Yao, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen
[ABSTRACT]
Recently, prompt-tuning has achieved promising results for specific few-shot
classification tasks. The core idea of prompt-tuning is to insert text pieces
(i.e., templates) into the input and transform a classification task into a
masked language modeling problem. However, for relation extraction, determining
an appropriate prompt template requires domain expertise, and it is cumbersome
and time-consuming to obtain a suitable label word. Furthermore, there exists
abundant semantic and prior knowledge among the relation labels that cannot be
ignored. To this end, we focus on incorporating knowledge among relation labels
into prompt-tuning for relation extraction and propose a Knowledge-aware
Prompt-tuning approach with synergistic optimization (KnowPrompt).
Specifically, we inject latent knowledge contained in relation labels into
prompt construction with learnable virtual type words and answer words. Then,
we synergistically optimize their representation with structured constraints.
Extensive experimental results on five datasets with standard and low-resource
settings demonstrate the effectiveness of our approach. Our code and datasets
are available in https://github.com/zjunlp/KnowPrompt for reproducibility.
[COMMENTS]
Accepted by WWW2022
[LINK]
http://arxiv.org/abs/2104.07650v7
[DATE]
2023-09-19 00:46:56+08:00
[CATEGORIES]
cs.CL
cs.LG
DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population
[AUTHORS]
Ningyu Zhang, Xin Xu, Liankuan Tao, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Peng Wang, Wen Zhang, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Feiyu Xiong, Fei Huang, Guozhou Zheng, Huajun Chen
[COMMENTS]
Accepted by EMNLP 2022 System Demonstrations and the project website
is http://deepke.zjukg.cn/
[LINK]
http://arxiv.org/abs/2201.03335v6
[DATE]
2023-09-19 00:42:06+08:00
[CATEGORIES]
cs.CL
cs.LG
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
[AUTHORS]
Xiang Chen, Ningyu Zhang, Lei Li, Shumin Deng, Chuanqi Tan, Changliang Xu, Fei Huang, Luo Si, Huajun Chen
[ABSTRACT]
Multimodal Knowledge Graphs (MKGs), which organize visual-text factual
knowledge, have recently been successfully applied to tasks such as information
retrieval, question answering, and recommendation system. Since most MKGs are
far from complete, extensive knowledge graph completion studies have been
proposed focusing on the multimodal entity, relation extraction and link
prediction. However, different tasks and modalities require changes to the
model architecture, and not all images/objects are relevant to text input,
which hinders the applicability to diverse real-world scenarios. In this paper,
we propose a hybrid transformer with multi-level fusion to address those
issues. Specifically, we leverage a hybrid transformer architecture with
unified input-output for diverse multimodal knowledge graph completion tasks.
Moreover, we propose multi-level fusion, which integrates visual and text
representation via coarse-grained prefix-guided interaction and fine-grained
correlation-aware fusion modules. We conduct extensive experiments to validate
that our MKGformer can obtain SOTA performance on four datasets of multimodal
link prediction, multimodal RE, and multimodal NER. Code is available in
https://github.com/zjunlp/MKGformer.
[COMMENTS]
Accepted by SIGIR 2022. Fix a severe bug
[LINK]
http://arxiv.org/abs/2205.02357v5
[DATE]
2023-09-19 00:37:55+08:00
[CATEGORIES]
cs.CL
cs.LG
Efficient Benchmarking (of Language Models)
[AUTHORS]
Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen
[ABSTRACT]
The increasing versatility of language models LMs has given rise to a new
class of benchmarks that comprehensively assess a broad range of capabilities.
Such benchmarks are associated with massive computational costs reaching
thousands of GPU hours per model. However the efficiency aspect of these
evaluation efforts had raised little discussion in the literature. In this work
we present the problem of Efficient Benchmarking namely intelligently reducing
the computation costs of LM evaluation without compromising reliability. Using
the HELM benchmark as a test case we investigate how different benchmark design
choices affect the computation-reliability tradeoff. We propose to evaluate the
reliability of such decisions by using a new measure Decision Impact on
Reliability DIoR for short. We find for example that the current leader on HELM
may change by merely removing a low-ranked model from the benchmark and observe
that a handful of examples suffice to obtain the correct benchmark ranking.
Conversely a slightly different choice of HELM scenarios varies ranking widely.
Based on our findings we outline a set of concrete recommendations for more
efficient benchmark design and utilization practices leading to dramatic cost
savings with minimal loss of benchmark reliability often reducing computation
by x100 or more.
[LINK]
http://arxiv.org/abs/2308.11696v3
[DATE]
2023-09-19 00:25:23+08:00
[CATEGORIES]
cs.CL
cs.LG
Harnessing Collective Intelligence Under a Lack of Cultural Consensus
[AUTHORS]
Necdet Gürkan, Jordan W. Suchow
[ABSTRACT]
Harnessing collective intelligence to drive effective decision-making and
collaboration benefits from the ability to detect and characterize
heterogeneity in consensus beliefs. This is particularly true in domains such
as technology acceptance or leadership perception, where a consensus defines an
intersubjective truth, leading to the possibility of multiple “ground truths”
when subsets of respondents sustain mutually incompatible consensuses. Cultural
Consensus Theory (CCT) provides a statistical framework for detecting and
characterizing these divergent consensus beliefs. However, it is unworkable in
modern applications because it lacks the ability to generalize across even
highly similar beliefs, is ineffective with sparse data, and can leverage
neither external knowledge bases nor learned machine representations. Here, we
overcome these limitations through Infinite Deep Latent Construct Cultural
Consensus Theory (iDLC-CCT), a nonparametric Bayesian model that extends CCT
with a latent construct that maps between pretrained deep neural network
embeddings of entities and the consensus beliefs regarding those entities among
one or more subsets of respondents. We validate the method across domains
including perceptions of risk sources, food healthiness, leadership, first
impressions, and humor. We find that iDLC-CCT better predicts the degree of
consensus, generalizes well to out-of-sample entities, and is effective even
with sparse data. To improve scalability, we introduce an efficient
hard-clustering variant of the iDLC-CCT using an algorithm derived from a
small-variance asymptotic analysis of the model. The iDLC-CCT, therefore,
provides a workable computational foundation for harnessing collective
intelligence under a lack of cultural consensus and may potentially form the
basis of consensus-aware information technologies.
[LINK]
http://arxiv.org/abs/2309.09787v2
[DATE]
2023-09-19 23:57:29+08:00
[CATEGORIES]
cs.LG
Recent Advancements in End-to-End Autonomous Driving using Deep Learning: A Survey
[AUTHORS]
Pranav Singh Chib, Pravendra Singh
[ABSTRACT]
End-to-End driving is a promising paradigm as it circumvents the drawbacks
associated with modular systems, such as their overwhelming complexity and
propensity for error propagation. Autonomous driving transcends conventional
traffic patterns by proactively recognizing critical events in advance,
ensuring passengers’ safety and providing them with comfortable transportation,
particularly in highly stochastic and variable traffic settings. This paper
presents a comprehensive review of the End-to-End autonomous driving stack. It
provides a taxonomy of automated driving tasks wherein neural networks have
been employed in an End-to-End manner, encompassing the entire driving process
from perception to control, while addressing key challenges encountered in
real-world applications. Recent developments in End-to-End autonomous driving
are analyzed, and research is categorized based on underlying principles,
methodologies, and core functionality. These categories encompass sensorial
input, main and auxiliary output, learning approaches ranging from imitation to
reinforcement learning, and model evaluation techniques. The survey
incorporates a detailed discussion of the explainability and safety aspects.
Furthermore, it assesses the state-of-the-art, identifies challenges, and
explores future possibilities. We maintained the latest advancements and their
corresponding open-source implementations at
https://github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning.
[LINK]
http://arxiv.org/abs/2307.04370v2
[DATE]
2023-09-19 23:33:47+08:00
[CATEGORIES]
cs.LG
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach
[AUTHORS]
Leonardo F. Toso, Han Wang, James Anderson
[ABSTRACT]
We investigate the problem of learning an $\epsilon$-approximate solution for
the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic
Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient
methods have proven to converge linearly to the optimal solution of the
model-free LQR problem, the substantial requirement for two-point cost queries
in gradient estimations may be intractable, particularly in applications where
obtaining cost function evaluations at two distinct control input
configurations is exceptionally costly. To this end, we propose an
oracle-efficient approach. Our method combines both one-point and two-point
estimations in a dual-loop variance-reduced algorithm. It achieves an
approximate optimal solution with only
$O\left(\log\left(1/\epsilon\right)^{\beta}\right)$ two-point cost information
for $\beta \in (0,1)$.
[LINK]
http://arxiv.org/abs/2309.10679v1
[DATE]
2023-09-19 23:03:18+08:00
[CATEGORIES]
cs.LG
Des-q: a quantum algorithm to construct and efficiently retrain decision trees for regression and binary classification
[AUTHORS]
Niraj Kumar, Romina Yalovetzky, Changhao Li, Pierre Minssen, Marco Pistoia
[ABSTRACT]
Decision trees are widely used in machine learning due to their simplicity in
construction and interpretability. However, as data sizes grow, traditional
methods for constructing and retraining decision trees become increasingly
slow, scaling polynomially with the number of training examples. In this work,
we introduce a novel quantum algorithm, named Des-q, for constructing and
retraining decision trees in regression and binary classification tasks.
Assuming the data stream produces small increments of new training examples, we
demonstrate that our Des-q algorithm significantly reduces the time required
for tree retraining, achieving a poly-logarithmic time complexity in the number
of training examples, even accounting for the time needed to load the new
examples into quantum-accessible memory. Our approach involves building a
decision tree algorithm to perform k-piecewise linear tree splits at each
internal node. These splits simultaneously generate multiple hyperplanes,
dividing the feature space into k distinct regions. To determine the k suitable
anchor points for these splits, we develop an efficient quantum-supervised
clustering method, building upon the q-means algorithm of Kerenidis et al.
Des-q first efficiently estimates each feature weight using a novel quantum
technique to estimate the Pearson correlation. Subsequently, we employ weighted
distance estimation to cluster the training examples in k disjoint regions and
then proceed to expand the tree using the same procedure. We benchmark the
performance of the simulated version of our algorithm against the
state-of-the-art classical decision tree for regression and binary
classification on multiple data sets with numerical features. Further, we
showcase that the proposed algorithm exhibits similar performance to the
state-of-the-art decision tree while significantly speeding up the periodic
tree retraining.
[COMMENTS]
48 pager, 4 figures, 4 tables
[LINK]
http://arxiv.org/abs/2309.09976v2
[DATE]
2023-09-19 23:02:33+08:00
[CATEGORIES]
cs.LG
A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications
[AUTHORS]
Yi Zhang, Yuying Zhao, Zhaoqing Li, Xueqi Cheng, Yu Wang, Olivera Kotevska, Philip S. Yu, Tyler Derr
[ABSTRACT]
Graph Neural Networks (GNNs) have gained significant attention owing to their
ability to handle graph-structured data and the improvement in practical
applications. However, many of these models prioritize high utility
performance, such as accuracy, with a lack of privacy consideration, which is a
major concern in modern society where privacy attacks are rampant. To address
this issue, researchers have started to develop privacy-preserving GNNs.
Despite this progress, there is a lack of a comprehensive overview of the
attacks and the techniques for preserving privacy in the graph domain. In this
survey, we aim to address this gap by summarizing the attacks on graph data
according to the targeted information, categorizing the privacy preservation
techniques in GNNs, and reviewing the datasets and applications that could be
used for analyzing/solving privacy issues in GNNs. We also outline potential
directions for future research in order to build better privacy-preserving
GNNs.
[LINK]
http://arxiv.org/abs/2308.16375v3
[DATE]
2023-09-19 23:00:52+08:00
[CATEGORIES]
cs.LG
Conformal Off-Policy Evaluation in Markov Decision Processes
[AUTHORS]
Daniele Foffano, Alessio Russo, Alexandre Proutiere
[LINK]
http://arxiv.org/abs/2304.02574v2
[DATE]
2023-09-19 22:40:48+08:00
[CATEGORIES]
cs.LG
Analysing race and sex bias in brain age prediction
[AUTHORS]
Carolina Piçarra, Ben Glocker
[ABSTRACT]
Brain age prediction from MRI has become a popular imaging biomarker
associated with a wide range of neuropathologies. The datasets used for
training, however, are often skewed and imbalanced regarding demographics,
potentially making brain age prediction models susceptible to bias. We analyse
the commonly used ResNet-34 model by conducting a comprehensive subgroup
performance analysis and feature inspection. The model is trained on 1,215
T1-weighted MRI scans from Cam-CAN and IXI, and tested on UK Biobank
(n=42,786), split into six racial and biological sex subgroups. With the
objective of comparing the performance between subgroups, measured by the
absolute prediction error, we use a Kruskal-Wallis test followed by two
post-hoc Conover-Iman tests to inspect bias across race and biological sex. To
examine biases in the generated features, we use PCA for dimensionality
reduction and employ two-sample Kolmogorov-Smirnov tests to identify
distribution shifts among subgroups. Our results reveal statistically
significant differences in predictive performance between Black and White,
Black and Asian, and male and female subjects. Seven out of twelve pairwise
comparisons show statistically significant differences in the feature
distributions. Our findings call for further analysis of brain age prediction
models.
[COMMENTS]
MICCAI Workshop on Fairness of AI in Medical Imaging (FAIMI 2023)
[LINK]
http://arxiv.org/abs/2309.10835v1
[DATE]
2023-09-19 22:40:19+08:00
[CATEGORIES]
cs.LG
Implementing a new fully stepwise decomposition-based sampling technique for the hybrid water level forecasting model in real-world application
[AUTHORS]
Ziqian Zhang, Nana Bao, Xingting Yan, Aokai Zhu, Chenyang Li, Mingyu Liu
[ABSTRACT]
Various time variant non-stationary signals need to be pre-processed properly
in hydrological time series forecasting in real world, for example, predictions
of water level. Decomposition method is a good candidate and widely used in
such a pre-processing problem. However, decomposition methods with an
inappropriate sampling technique may introduce future data which is not
available in practical applications, and result in incorrect
decomposition-based forecasting models. In this work, a novel Fully Stepwise
Decomposition-Based (FSDB) sampling technique is well designed for the
decomposition-based forecasting model, strictly avoiding introducing future
information. This sampling technique with decomposition methods, such as
Variational Mode Decomposition (VMD) and Singular spectrum analysis (SSA), is
applied to predict water level time series in three different stations of
Guoyang and Chaohu basins in China. Results of VMD-based hybrid model using
FSDB sampling technique show that Nash-Sutcliffe Efficiency (NSE) coefficient
is increased by 6.4%, 28.8% and 7.0% in three stations respectively, compared
with those obtained from the currently most advanced sampling technique. In the
meantime, for series of SSA-based experiments, NSE is increased by 3.2%, 3.1%
and 1.1% respectively. We conclude that the newly developed FSDB sampling
technique can be used to enhance the performance of decomposition-based hybrid
model in water level time series forecasting in real world.
[LINK]
http://arxiv.org/abs/2309.10658v1
[DATE]
2023-09-19 22:40:13+08:00
[CATEGORIES]
cs.LG
Learning Adaptive Safety for Multi-Agent Systems
[AUTHORS]
Luigi Berducci, Shuo Yang, Rahul Mangharam, Radu Grosu
[ABSTRACT]
Ensuring safety in dynamic multi-agent systems is challenging due to limited
information about the other agents. Control Barrier Functions (CBFs) are
showing promise for safety assurance but current methods make strong
assumptions about other agents and often rely on manual tuning to balance
safety, feasibility, and performance. In this work, we delve into the problem
of adaptive safe learning for multi-agent systems with CBF. We show how
emergent behavior can be profoundly influenced by the CBF configuration,
highlighting the necessity for a responsive and dynamic approach to CBF design.
We present ASRL, a novel adaptive safe RL framework, to fully automate the
optimization of policy and CBF coefficients, to enhance safety and long-term
performance through reinforcement learning. By directly interacting with the
other agents, ASRL learns to cope with diverse agent behaviours and maintains
the cost violations below a desired limit. We evaluate ASRL in a multi-robot
system and a competitive multi-agent racing scenario, against learning-based
and control-theoretic approaches. We empirically demonstrate the efficacy and
flexibility of ASRL, and assess generalization and scalability to
out-of-distribution scenarios. Code and supplementary material are public
online.
[LINK]
http://arxiv.org/abs/2309.10657v1
[DATE]
2023-09-19 22:39:39+08:00
[CATEGORIES]
cs.LG
A spectrum of physics-informed Gaussian processes for regression in engineering
[AUTHORS]
Elizabeth J Cross, Timothy J Rogers, Daniel J Pitchforth, Samuel J Gibson, Matthew R Jones
[ABSTRACT]
Despite the growing availability of sensing and data in general, we remain
unable to fully characterise many in-service engineering systems and structures
from a purely data-driven approach. The vast data and resources available to
capture human activity are unmatched in our engineered world, and, even in
cases where data could be referred to as ``big,’’ they will rarely hold
information across operational windows or life spans. This paper pursues the
combination of machine learning technology and physics-based reasoning to
enhance our ability to make predictive models with limited data. By explicitly
linking the physics-based view of stochastic processes with a data-based
regression approach, a spectrum of possible Gaussian process models are
introduced that enable the incorporation of different levels of expert
knowledge of a system. Examples illustrate how these approaches can
significantly reduce reliance on data collection whilst also increasing the
interpretability of the model, another important consideration in this context.
[LINK]
http://arxiv.org/abs/2309.10656v1
[DATE]
2023-09-19 22:39:03+08:00
[CATEGORIES]
cs.LG
Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics
[AUTHORS]
Marloes Arts, Victor Garcia Satorras, Chin-Wei Huang, Daniel Zuegner, Marco Federici, Cecilia Clementi, Frank Noé, Robert Pinsler, Rianne van den Berg
[ABSTRACT]
Coarse-grained (CG) molecular dynamics enables the study of biological
processes at temporal and spatial scales that would be intractable at an
atomistic resolution. However, accurately learning a CG force field remains a
challenge. In this work, we leverage connections between score-based generative
models, force fields and molecular dynamics to learn a CG force field without
requiring any force inputs during training. Specifically, we train a diffusion
generative model on protein structures from molecular dynamics simulations, and
we show that its score function approximates a force field that can directly be
used to simulate CG molecular dynamics. While having a vastly simplified
training setup compared to previous work, we demonstrate that our approach
leads to improved performance across several small- to medium-sized protein
simulations, reproducing the CG equilibrium distribution, and preserving
dynamics of all-atom simulations such as protein folding events.
[LINK]
http://arxiv.org/abs/2302.00600v2
[DATE]
2023-09-19 22:37:11+08:00
[CATEGORIES]
cs.LG
Towards Energy-Aware Federated Traffic Prediction for Cellular Networks
[AUTHORS]
Vasileios Perifanis, Nikolaos Pavlidis, Selim F. Yilmaz, Francesc Wilhelmi, Elia Guerra, Marco Miozzo, Pavlos S. Efraimidis, Paolo Dini, Remous-Aris Koutsiamanis
[ABSTRACT]
Cellular traffic prediction is a crucial activity for optimizing networks in
fifth-generation (5G) networks and beyond, as accurate forecasting is essential
for intelligent network design, resource allocation and anomaly mitigation.
Although machine learning (ML) is a promising approach to effectively predict
network traffic, the centralization of massive data in a single data center
raises issues regarding confidentiality, privacy and data transfer demands. To
address these challenges, federated learning (FL) emerges as an appealing ML
training framework which offers high accurate predictions through parallel
distributed computations. However, the environmental impact of these methods is
often overlooked, which calls into question their sustainability. In this
paper, we address the trade-off between accuracy and energy consumption in FL
by proposing a novel sustainability indicator that allows assessing the
feasibility of ML models. Then, we comprehensively evaluate state-of-the-art
deep learning (DL) architectures in a federated scenario using real-world
measurements from base station (BS) sites in the area of Barcelona, Spain. Our
findings indicate that larger ML models achieve marginally improved performance
but have a significant environmental impact in terms of carbon footprint, which
make them impractical for real-world applications.
[COMMENTS]
International Symposium on Federated Learning Technologies and
Applications (FLTA), 2023
[LINK]
http://arxiv.org/abs/2309.10645v1
[DATE]
2023-09-19 22:28:09+08:00
[CATEGORIES]
cs.LG
Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers
[AUTHORS]
Thomas Chen, Patricia Muñoz Ewald
[ABSTRACT]
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ramp activation
function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost
function, and input and output spaces ${\mathbb R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are defined on spaces ${\mathbb R}^{Q}$, as well.
We apply our recent results on shallow neural networks to construct an explicit
family of minimizers for the global minimum of the cost function in the case
$L\geq Q$, which we show to be degenerate. In the context presented here, the
hidden layers of the DL network “curate” the training inputs by recursive
application of a truncation map that minimizes the noise to signal ratio of the
training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate
local minima of the cost function.
[COMMENTS]
AMS Latex, 20 pages
[LINK]
http://arxiv.org/abs/2309.10639v1
[DATE]
2023-09-19 22:20:55+08:00
[CATEGORIES]
cs.LG
Sparser Random Networks Exist: Enforcing Communication-Efficient Federated Learning via Regularization
[AUTHORS]
Mohamad Mestoukirdi, Omid Esrafilian, David Gesbert, Qianrui Li, Nicolas Gresset
[ABSTRACT]
This work presents a new method for enhancing communication efficiency in
stochastic Federated Learning that trains over-parameterized random networks.
In this setting, a binary mask is optimized instead of the model weights, which
are kept fixed. The mask characterizes a sparse sub-network that is able to
generalize as good as a smaller target network. Importantly, sparse binary
masks are exchanged rather than the floating point weights in traditional
federated learning, reducing communication cost to at most 1 bit per parameter.
We show that previous state of the art stochastic methods fail to find the
sparse networks that can reduce the communication and storage overhead using
consistent loss objectives. To address this, we propose adding a regularization
term to local objectives that encourages sparser solutions by eliminating
redundant features across sub-networks. Extensive experiments demonstrate
significant improvements in communication and memory efficiency of up to five
magnitudes compared to the literature, with minimal performance degradation in
validation accuracy in some instances.
[COMMENTS]
Draft to be submitted
[LINK]
http://arxiv.org/abs/2309.10834v1
[DATE]
2023-09-19 22:05:12+08:00
[CATEGORIES]
cs.LG
A multiobjective continuation method to compute the regularization path of deep neural networks
[AUTHORS]
Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz
[ABSTRACT]
Sparsity is a highly desired feature in deep neural networks (DNNs) since it
ensures numerical efficiency, improves the interpretability of models (due to
the smaller number of relevant features), and robustness. In machine learning
approaches based on linear models, it is well known that there exists a
connecting path between the sparsest solution in terms of the $\ell^1$ norm
(i.e., zero weights) and the non-regularized solution, which is called the
regularization path. Very recently, there was a first attempt to extend the
concept of regularization paths to DNNs by means of treating the empirical loss
and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the
resulting multiobjective optimization problem. However, due to the
non-smoothness of the $\ell^1$ norm and the high number of parameters, this
approach is not very efficient from a computational perspective. To overcome
this limitation, we present an algorithm that allows for the approximation of
the entire Pareto front for the above-mentioned objectives in a very efficient
manner. We present numerical examples using both deterministic and stochastic
gradients. We furthermore demonstrate that knowledge of the regularization path
allows for a well-generalizing network parametrization.
[COMMENTS]
7 pages, 6 figures
[LINK]
http://arxiv.org/abs/2308.12044v3
[DATE]
2023-09-19 21:57:06+08:00
[CATEGORIES]
cs.LG
Deep Kernel Methods Learn Better: From Cards to Process Optimization
[AUTHORS]
Mani Valleti, Rama K. Vasudevan, Maxim A. Ziatdinov, Sergei V. Kalinin
[ABSTRACT]
The ability of deep learning methods to perform classification and regression
tasks relies heavily on their capacity to uncover manifolds in high-dimensional
data spaces and project them into low-dimensional representation spaces. In
this study, we investigate the structure and character of the manifolds
generated by classical variational autoencoder (VAE) approaches and deep kernel
learning (DKL). In the former case, the structure of the latent space is
determined by the properties of the input data alone, while in the latter, the
latent manifold forms as a result of an active learning process that balances
the data distribution and target functionalities. We show that DKL with active
learning can produce a more compact and smooth latent space which is more
conducive to optimization compared to previously reported methods, such as the
VAE. We demonstrate this behavior using a simple cards data set and extend it
to the optimization of domain-generated trajectories in physical systems. Our
findings suggest that latent manifolds constructed through active learning have
a more beneficial structure for optimization problems, especially in
feature-rich target-poor scenarios that are common in domain sciences, such as
materials synthesis, energy storage, and molecular discovery. The jupyter
notebooks that encapsulate the complete analysis accompany the article.
[COMMENTS]
8 Figures, 26 pages
[LINK]
http://arxiv.org/abs/2303.14554v2
[DATE]
2023-09-19 21:53:34+08:00
[CATEGORIES]
cs.LG
A Dynamic Linear Bias Incorporation Scheme for Nonnegative Latent Factor Analysis
[AUTHORS]
Yurong Zhong, Zhe Xie, Weiling Li, Xin Luo
[ABSTRACT]
High-Dimensional and Incomplete (HDI) data is commonly encountered in big
data-related applications like social network services systems, which are
concerning the limited interactions among numerous nodes. Knowledge acquisition
from HDI data is a vital issue in the domain of data science due to their
embedded rich patterns like node behaviors, where the fundamental task is to
perform HDI data representation learning. Nonnegative Latent Factor Analysis
(NLFA) models have proven to possess the superiority to address this issue,
where a linear bias incorporation (LBI) scheme is important in present the
training overshooting and fluctuation, as well as preventing the model from
premature convergence. However, existing LBI schemes are all statistic ones
where the linear biases are fixed, which significantly restricts the
scalability of the resultant NLFA model and results in loss of representation
learning ability to HDI data. Motivated by the above discoveries, this paper
innovatively presents the dynamic linear bias incorporation (DLBI) scheme. It
firstly extends the linear bias vectors into matrices, and then builds a binary
weight matrix to switch the active/inactive states of the linear biases. The
weight matrix’s each entry switches between the binary states dynamically
corresponding to the linear bias value variation, thereby establishing the
dynamic linear biases for an NLFA model. Empirical studies on three HDI
datasets from real applications demonstrate that the proposed DLBI-based NLFA
model obtains higher representation accuracy several than state-of-the-art
models do, as well as highly-competitive computational efficiency.
[COMMENTS]
arXiv admin note: substantial text overlap with arXiv:2306.03911,
arXiv:2302.12122, arXiv:2306.03647
[LINK]
http://arxiv.org/abs/2309.10618v1
[DATE]
2023-09-19 21:48:26+08:00
[CATEGORIES]
cs.LG
Near-Optimal $Φ$-Regret Learning in Extensive-Form Games
[AUTHORS]
Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm
[ABSTRACT]
In this paper, we establish efficient and uncoupled learning dynamics so
that, when employed by all players in multiplayer perfect-recall
imperfect-information extensive-form games, the trigger regret of each player
grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially
over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a
recent open question by Bai et al. (2022). As an immediate consequence, we
guarantee convergence to the set of extensive-form correlated equilibria and
coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$.
Building on prior work, at the heart of our construction lies a more general
result regarding fixed points deriving from rational functions with polynomial
degree, a property that we establish for the fixed points of (coarse) trigger
deviation functions. Moreover, our construction leverages a refined regret
circuit for the convex hull, which – unlike prior guarantees – preserves the
RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has
an independent interest in establishing near-optimal regret under learning
dynamics based on a CFR-type decomposition of the regret.
[COMMENTS]
Appearing at ICML 2023. V3 corrects a statement
[LINK]
http://arxiv.org/abs/2208.09747v3
[DATE]
2023-09-19 21:42:26+08:00
[CATEGORIES]
cs.LG
Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network
[AUTHORS]
Paapa Kwesi Quansah, Edwin Kwesi Ansah Tenkorang
[ABSTRACT]
Short-term load forecasting is of paramount importance in the efficient
operation and planning of power systems, given its inherent non-linear and
dynamic nature. Recent strides in deep learning have shown promise in
addressing this challenge. However, these methods often grapple with
hyperparameter sensitivity, opaqueness in interpretability, and high
computational overhead for real-time deployment. In this paper, I propose a
novel solution that surmounts these obstacles. Our approach harnesses the power
of the Particle-Swarm Optimization algorithm to autonomously explore and
optimize hyperparameters, a Multi-Head Attention mechanism to discern the
salient features crucial for accurate forecasting, and a streamlined framework
for computational efficiency. Our method undergoes rigorous evaluation using a
genuine electricity demand dataset. The results underscore its superiority in
terms of accuracy, robustness, and computational efficiency. Notably, our Mean
Absolute Percentage Error of 1.9376 marks a significant advancement over
existing state-of-the-art approaches, heralding a new era in short-term load
forecasting.
[LINK]
http://arxiv.org/abs/2309.03694v2
[DATE]
2023-09-19 21:41:07+08:00
[CATEGORIES]
cs.LG
An Extendable Python Implementation of Robust Optimisation Monte Carlo
[AUTHORS]
Vasilis Gkolemis, Michael Gutmann, Henri Pesonen
[ABSTRACT]
Performing inference in statistical models with an intractable likelihood is
challenging, therefore, most likelihood-free inference (LFI) methods encounter
accuracy and efficiency limitations. In this paper, we present the
implementation of the LFI method Robust Optimisation Monte Carlo (ROMC) in the
Python package ELFI. ROMC is a novel and efficient (highly-parallelizable) LFI
framework that provides accurate weighted samples from the posterior. Our
implementation can be used in two ways. First, a scientist may use it as an
out-of-the-box LFI algorithm; we provide an easy-to-use API harmonized with the
principles of ELFI, enabling effortless comparisons with the rest of the
methods included in the package. Additionally, we have carefully split ROMC
into isolated components for supporting extensibility. A researcher may
experiment with novel method(s) for solving part(s) of ROMC without
reimplementing everything from scratch. In both scenarios, the ROMC parts can
run in a fully-parallelized manner, exploiting all CPU cores. We also provide
helpful functionalities for (i) inspecting the inference process and (ii)
evaluating the obtained samples. Finally, we test the robustness of our
implementation on some typical LFI examples.
[COMMENTS]
the publication is based on the manuscript of MSc. thesis
arXiv:2011.03977
[LINK]
http://arxiv.org/abs/2309.10612v1
[DATE]
2023-09-19 21:37:47+08:00
[CATEGORIES]
cs.LG
ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference
[AUTHORS]
Nikita Durasov, Nik Dorndorf, Hieu Le, Pascal Fua
[ABSTRACT]
Whereas the ability of deep networks to produce useful predictions has been
amply demonstrated, estimating the reliability of these predictions remains
challenging. Sampling approaches such as MC-Dropout and Deep Ensembles have
emerged as the most popular ones for this purpose. Unfortunately, they require
many forward passes at inference time, which slows them down. Sampling-free
approaches can be faster but suffer from other drawbacks, such as lower
reliability of uncertainty estimates, difficulty of use, and limited
applicability to different types of tasks and data.
In this work, we introduce a sampling-free approach that is generic and easy
to deploy, while producing reliable uncertainty estimates on par with
state-of-the-art methods at a significantly lower computational cost. It is
predicated on training the network to produce the same output with and without
additional information about it. At inference time, when no prior information
is given, we use the network’s own prediction as the additional information. We
then take the distance between the predictions with and without prior
information as our uncertainty measure.
We demonstrate our approach on several classification and regression tasks.
We show that it delivers results on par with those of Ensembles but at a much
lower computational cost.
[LINK]
http://arxiv.org/abs/2211.11435v2
[DATE]
2023-09-19 21:29:25+08:00
[CATEGORIES]
cs.LG
SoftCTC – Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels
[AUTHORS]
Martin Kišš, Michal Hradiš, Karel Beneš, Petr Buchal, Michal Kula
[ABSTRACT]
This paper explores semi-supervised training for sequence tasks, such as
Optical Character Recognition or Automatic Speech Recognition. We propose a
novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an
extension of CTC allowing to consider multiple transcription variants at the
same time. This allows to omit the confidence based filtering step which is
otherwise a crucial component of pseudo-labeling approaches to semi-supervised
learning. We demonstrate the effectiveness of our method on a challenging
handwriting recognition task and conclude that SoftCTC matches the performance
of a finely-tuned filtering based pipeline. We also evaluated SoftCTC in terms
of computational efficiency, concluding that it is significantly more efficient
than a na"ive CTC-based approach for training on multiple transcription
variants, and we make our GPU implementation public.
[COMMENTS]
21 pages, 8 figures, 6 tables, accepted to International Journal on
Document Analysis and Recognition (IJDAR)
[LINK]
http://arxiv.org/abs/2212.02135v3
[DATE]
2023-09-19 21:21:49+08:00
[CATEGORIES]
cs.LG
Asteroids co-orbital motion classification based on Machine Learning
[AUTHORS]
Giulia Ciacci, Andrea Barucci, Sara Di Ruzza, Elisa Maria Alessi
[ABSTRACT]
In this work, we explore how to classify asteroids in co-orbital motion with
a given planet using Machine Learning. We consider four different kinds of
motion in mean motion resonance with the planet, nominally Tadpole, Horseshoe
and Quasi-satellite, building 3 datasets defined as Real (taking the
ephemerides of real asteroids from the JPL Horizons system), Ideal and
Perturbed (both simulated, obtained by propagating initial conditions
considering two different dynamical systems) for training and testing the
Machine Learning algorithms in different conditions.
The time series of the variable theta (angle related to the resonance) are
studied with a data analysis pipeline defined ad hoc for the problem and
composed by: data creation and annotation, time series features extraction
thanks to the tsfresh package (potentially followed by selection and
standardization) and the application of Machine Learning algorithms for
Dimensionality Reduction and Classification. Such approach, based on features
extracted from the time series, allows to work with a smaller number of data
with respect to Deep Learning algorithms, also allowing to define a ranking of
the importance of the features. Physical Interpretability of the features is
another key point of this approach. In addition, we introduce the SHapley
Additive exPlanations for Explainability technique.
Different training and test sets are used, in order to understand the power
and the limits of our approach. The results show how the algorithms are able to
identify and classify correctly the time series, with a high degree of
performance.
[LINK]
http://arxiv.org/abs/2309.10603v1
[DATE]
2023-09-19 21:19:31+08:00
[CATEGORIES]
cs.LG
Motif-Centric Representation Learning for Symbolic Music
[AUTHORS]
Yuxuan Wu, Roger B. Dannenberg, Gus Xia
[ABSTRACT]
Music motif, as a conceptual building block of composition, is crucial for
music structure analysis and automatic composition. While human listeners can
identify motifs easily, existing computational models fall short in
representing motifs and their developments. The reason is that the nature of
motifs is implicit, and the diversity of motif variations extends beyond simple
repetitions and modulations. In this study, we aim to learn the implicit
relationship between motifs and their variations via representation learning,
using the Siamese network architecture and a pretraining and fine-tuning
pipeline. A regularization-based method, VICReg, is adopted for pretraining,
while contrastive learning is used for fine-tuning. Experimental results on a
retrieval-based task show that these two methods complement each other,
yielding an improvement of 12.6% in the area under the precision-recall curve.
Lastly, we visualize the acquired motif representations, offering an intuitive
comprehension of the overall structure of a music piece. As far as we know,
this work marks a noteworthy step forward in computational modeling of music
motifs. We believe that this work lays the foundations for future applications
of motifs in automatic music composition and music information retrieval.
[LINK]
http://arxiv.org/abs/2309.10597v1
[DATE]
2023-09-19 21:09:03+08:00
[CATEGORIES]
cs.LG
The Lasso with general Gaussian designs with applications to hypothesis testing
[AUTHORS]
Michael Celentano, Andrea Montanari, Yuting Wei
[ABSTRACT]
The Lasso is a method for high-dimensional regression, which is now commonly
used when the number of covariates $p$ is of the same order or larger than the
number of observations $n$. Classical asymptotic normality theory does not
apply to this model due to two fundamental reasons: $(1)$ The regularized risk
is non-smooth; $(2)$ The distance between the estimator
$\widehat{\boldsymbol{\theta}}$ and the true parameters vector
$\boldsymbol{\theta}^$ cannot be neglected. As a consequence, standard
perturbative arguments that are the traditional basis for asymptotic normality
fail.
On the other hand, the Lasso estimator can be precisely characterized in the
regime in which both $n$ and $p$ are large and $n/p$ is of order one. This
characterization was first obtained in the case of Gaussian designs with i.i.d.
covariates: here we generalize it to Gaussian correlated designs with
non-singular covariance structure. This is expressed in terms of a simpler
``fixed-design’’ model. We establish non-asymptotic bounds on the distance
between the distribution of various quantities in the two models, which hold
uniformly over signals $\boldsymbol{\theta}^$ in a suitable sparsity class and
over values of the regularization parameter.
As an application, we study the distribution of the debiased Lasso and show
that a degrees-of-freedom correction is necessary for computing valid
confidence intervals.
[COMMENTS]
final version accepted to Annals of Statistics
[LINK]
http://arxiv.org/abs/2007.13716v3
[DATE]
2023-09-19 21:07:32+08:00
[CATEGORIES]
cs.LG
Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing
[AUTHORS]
Bernd Simon, Andrea Ortiz, Walid Saad, Anja Klein
[ABSTRACT]
The problem of coordinated data collection is studied for a mobile
crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially
publishes sensing tasks to the available mobile units (MUs) that signal their
willingness to participate in a task by sending sensing offers back to the
MCSP. From the received offers, the MCSP decides the task assignment. A stable
task assignment must address two challenges: the MCSP’s and MUs’ conflicting
goals, and the uncertainty about the MUs’ required efforts and preferences. To
overcome these challenges a novel decentralized approach combining matching
theory and online learning, called collision-avoidance multi-armed bandit with
strategic free sensing (CA-MAB-SFS), is proposed. The task assignment problem
is modeled as a matching game considering the MCSP’s and MUs’ individual goals
while the MUs learn their efforts online. Our innovative “free-sensing”
mechanism significantly improves the MU’s learning process while reducing
collisions during task allocation. The stable regret of CA-MAB-SFS, i.e., the
loss of learning, is analytically shown to be bounded by a sublinear function,
ensuring the convergence to a stable optimal solution. Simulation results show
that CA-MAB-SFS increases the MUs’ and the MCSP’s satisfaction compared to
state-of-the-art methods while reducing the average task completion time by at
least 16%.
[LINK]
http://arxiv.org/abs/2309.10594v1
[DATE]
2023-09-19 21:07:15+08:00
[CATEGORIES]
cs.LG
Explainable Deep Learning Methods in Medical Image Classification: A Survey
[AUTHORS]
Cristiano Patrício, João C. Neves, Luís F. Teixeira
[ABSTRACT]
The remarkable success of deep learning has prompted interest in its
application to medical imaging diagnosis. Even though state-of-the-art deep
learning models have achieved human-level accuracy on the classification of
different types of medical data, these models are hardly adopted in clinical
workflows, mainly due to their lack of interpretability. The black-box-ness of
deep learning models has raised the need for devising strategies to explain the
decision process of these models, leading to the creation of the topic of
eXplainable Artificial Intelligence (XAI). In this context, we provide a
thorough survey of XAI applied to medical imaging diagnosis, including visual,
textual, example-based and concept-based explanation methods. Moreover, this
work reviews the existing medical imaging datasets and the existing metrics for
evaluating the quality of the explanations. In addition, we include a
performance comparison among a set of report generation-based methods. Finally,
the major challenges in applying XAI to medical imaging and the future research
directions on the topic are also discussed.
[COMMENTS]
Accepted for publication in ACM Computing Surveys
[LINK]
http://arxiv.org/abs/2205.04766v3
[DATE]
2023-09-19 21:03:24+08:00
[CATEGORIES]
cs.LG
Adversarial Attacks Against Uncertainty Quantification
[AUTHORS]
Emanuele Ledda, Daniele Angioni, Giorgio Piras, Giorgio Fumera, Battista Biggio, Fabio Roli
[ABSTRACT]
Machine-learning models can be fooled by adversarial examples, i.e.,
carefully-crafted input perturbations that force models to output wrong
predictions. While uncertainty quantification has been recently proposed to
detect adversarial inputs, under the assumption that such attacks exhibit a
higher prediction uncertainty than pristine data, it has been shown that
adaptive attacks specifically aimed at reducing also the uncertainty estimate
can easily bypass this defense mechanism. In this work, we focus on a different
adversarial scenario in which the attacker is still interested in manipulating
the uncertainty estimate, but regardless of the correctness of the prediction;
in particular, the goal is to undermine the use of machine-learning models when
their outputs are consumed by a downstream module or by a human operator.
Following such direction, we: \textit{(i)} design a threat model for attacks
targeting uncertainty quantification; \textit{(ii)} devise different attack
strategies on conceptually different UQ techniques spanning for both
classification and semantic segmentation problems; \textit{(iii)} conduct a
first complete and extensive analysis to compare the differences between some
of the most employed UQ approaches under attack. Our extensive experimental
analysis shows that our attacks are more effective in manipulating uncertainty
quantification measures than attacks aimed to also induce misclassifications.
[LINK]
http://arxiv.org/abs/2309.10586v1
[DATE]
2023-09-19 20:54:09+08:00
[CATEGORIES]
cs.LG
Task Graph offloading via Deep Reinforcement Learning in Mobile Edge Computing
[AUTHORS]
Jiagang Liu, Yun Mi, Xinyu Zhang
[ABSTRACT]
Various mobile applications that comprise dependent tasks are gaining
widespread popularity and are increasingly complex. These applications often
have low-latency requirements, resulting in a significant surge in demand for
computing resources. With the emergence of mobile edge computing (MEC), it
becomes the most significant issue to offload the application tasks onto
small-scale devices deployed at the edge of the mobile network for obtaining a
high-quality user experience. However, since the environment of MEC is dynamic,
most existing works focusing on task graph offloading, which rely heavily on
expert knowledge or accurate analytical models, fail to fully adapt to such
environmental changes, resulting in the reduction of user experience. This
paper investigates the task graph offloading in MEC, considering the
time-varying computation capabilities of edge computing devices. To adapt to
environmental changes, we model the task graph scheduling for computation
offloading as a Markov Decision Process (MDP). Then, we design a deep
reinforcement learning algorithm (SATA-DRL) to learn the task scheduling
strategy from the interaction with the environment, to improve user experience.
Extensive simulations validate that SATA-DRL is superior to existing strategies
in terms of reducing average makespan and deadline violation.
[COMMENTS]
12 pages,13 figures
[LINK]
http://arxiv.org/abs/2309.10569v1
[DATE]
2023-09-19 20:26:56+08:00
[CATEGORIES]
cs.LG
A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents
[AUTHORS]
Nishchal Prasad, Mohand Boughanem, Taoufik Dkaki
[ABSTRACT]
Automatic legal judgment prediction and its explanation suffer from the
problem of long case documents exceeding tens of thousands of words, in
general, and having a non-uniform structure. Predicting judgments from such
documents and extracting their explanation becomes a challenging task, more so
on documents with no structural annotation. We define this problem as “scarce
annotated legal documents” and explore their lack of structural information and
their long lengths with a deep learning-based classification framework which we
call MESc; “Multi-stage Encoder-based Supervised with-clustering”; for judgment
prediction. Specifically, we divide a document into parts to extract their
embeddings from the last four layers of a custom fine-tuned Large Language
Model, and try to approximate their structure through unsupervised clustering.
Which we use in another set of transformer encoder layers to learn the
inter-chunk representations. We explore the adaptability of LLMs with
multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their
intra-domain(legal) transfer learning capacity. Alongside this, we compare
their performance with MESc and the impact of combining embeddings from their
last layers. For such hierarchical models, we also propose an explanation
extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence
Extractor;
[LINK]
http://arxiv.org/abs/2309.10563v1
[DATE]
2023-09-19 20:18:28+08:00
[CATEGORIES]
cs.LG
Hybrid State Space-based Learning for Sequential Data Prediction with Joint Optimization
[AUTHORS]
Mustafa E. Aydın, Arda Fazla, Suleyman S. Kozat
[ABSTRACT]
We investigate nonlinear prediction/regression in an online setting and
introduce a hybrid model that effectively mitigates, via a joint mechanism
through a state space formulation, the need for domain-specific feature
engineering issues of conventional nonlinear prediction models and achieves an
efficient mix of nonlinear and linear components. In particular, we use
recursive structures to extract features from raw sequential sequences and a
traditional linear time series model to deal with the intricacies of the
sequential data, e.g., seasonality, trends. The state-of-the-art ensemble or
hybrid models typically train the base models in a disjoint manner, which is
not only time consuming but also sub-optimal due to the separation of modeling
or independent training. In contrast, as the first time in the literature, we
jointly optimize an enhanced recurrent neural network (LSTM) for automatic
feature extraction from raw data and an ARMA-family time series model (SARIMAX)
for effectively addressing peculiarities associated with time series data. We
achieve this by introducing novel state space representations for the base
models, which are then combined to provide a full state space representation of
the hybrid or the ensemble. Hence, we are able to jointly optimize both models
in a single pass via particle filtering, for which we also provide the update
equations. The introduced architecture is generic so that one can use other
recurrent architectures, e.g., GRUs, traditional time series-specific models,
e.g., ETS or other optimization methods, e.g., EKF, UKF. Due to such novel
combination and joint optimization, we demonstrate significant improvements in
widely publicized real life competition datasets. We also openly share our code
for further research and replicability of our results.
[COMMENTS]
Submitted to the IEEE TNNLS journal
[LINK]
http://arxiv.org/abs/2309.10553v1
[DATE]
2023-09-19 20:00:28+08:00
[CATEGORIES]
cs.LG
Mean Absolute Directional Loss as a New Loss Function for Machine Learning Problems in Algorithmic Investment Strategies
[AUTHORS]
Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk
[ABSTRACT]
This paper investigates the issue of an adequate loss function in the
optimization of machine learning models used in the forecasting of financial
time series for the purpose of algorithmic investment strategies (AIS)
construction. We propose the Mean Absolute Directional Loss (MADL) function,
solving important problems of classical forecast error functions in extracting
information from forecasts to create efficient buy/sell signals in algorithmic
investment strategies. Finally, based on the data from two different asset
classes (cryptocurrencies: Bitcoin and commodities: Crude Oil), we show that
the new loss function enables us to select better hyperparameters for the LSTM
model and obtain more efficient investment strategies, with regard to
risk-adjusted return metrics on the out-of-sample data.
[COMMENTS]
12 pages, 6 figures
[LINK]
http://arxiv.org/abs/2309.10546v1
[DATE]
2023-09-19 19:52:13+08:00
[CATEGORIES]
cs.LG
Single-Image based unsupervised joint segmentation and denoising
[AUTHORS]
Nadja Gruber, Johannes Schwab, Noémie Debroux, Nicolas Papadakis, Markus Haltmeier
[ABSTRACT]
In this work, we develop an unsupervised method for the joint segmentation
and denoising of a single image. To this end, we combine the advantages of a
variational segmentation method with the power of a self-supervised,
single-image based deep learning approach. One major strength of our method
lies in the fact, that in contrast to data-driven methods, where huge amounts
of labeled samples are necessary, our model can segment an image into multiple
meaningful regions without any training database. Further, we introduce a novel
energy functional in which denoising and segmentation are coupled in a way that
both tasks benefit from each other. The limitations of existing single-image
based variational segmentation methods, which are not capable of dealing with
high noise or generic texture, are tackled by this specific combination with
self-supervised image denoising. We propose a unified optimisation strategy and
show that, especially for very noisy images available in microscopy, our
proposed joint approach outperforms its sequential counterpart as well as
alternative methods focused purely on denoising or segmentation. Another
comparison is conducted with a supervised deep learning approach designed for
the same application, highlighting the good performance of our approach.
[LINK]
http://arxiv.org/abs/2309.10511v1
[DATE]
2023-09-19 18:47:32+08:00
[CATEGORIES]
cs.LG
A Configurable Library for Generating and Manipulating Maze Datasets
[AUTHORS]
Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman Räuker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn, Samy Wu Fung
[ABSTRACT]
Understanding how machine learning models respond to distributional shifts is
a key research challenge. Mazes serve as an excellent testbed due to varied
generation algorithms offering a nuanced platform to simulate both subtle and
pronounced distributional shifts. To enable systematic investigations of model
behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a
comprehensive library for generating, processing, and visualizing datasets
consisting of maze-solving tasks. With this library, researchers can easily
create datasets, having extensive control over the generation algorithm used,
the parameters fed to the algorithm of choice, and the filters that generated
mazes must satisfy. Furthermore, it supports multiple output formats, including
rasterized and text-based, catering to convolutional neural networks and
autoregressive transformer models. These formats, along with tools for
visualizing and converting between them, ensure versatility and adaptability in
research applications.
[COMMENTS]
9 pages, 5 figures, 1 table. Corresponding author: Michael Ivanitskiy
([email protected]). Code available at
https://github.com/understanding-search/maze-dataset
[LINK]
http://arxiv.org/abs/2309.10498v1
[DATE]
2023-09-19 18:20:11+08:00
[CATEGORIES]
cs.LG
Elliptic PDE learning is provably data-efficient
[AUTHORS]
Nicolas Boullé, Diana Halikias, Alex Townsend
[ABSTRACT]
PDE learning is an emerging field that combines physics and machine learning
to recover unknown physical systems from experimental data. While deep learning
models traditionally require copious amounts of training data, recent PDE
learning techniques achieve spectacular results with limited data availability.
Still, these results are empirical. Our work provides theoretical guarantees on
the number of input-output training pairs required in PDE learning.
Specifically, we exploit randomized numerical linear algebra and PDE theory to
derive a provably data-efficient algorithm that recovers solution operators of
3D uniformly elliptic PDEs from input-output data and achieves an exponential
convergence rate of the error with respect to the size of the training dataset
with an exceptionally high probability of success.
[COMMENTS]
25 pages, 2 figures
[LINK]
http://arxiv.org/abs/2302.12888v2
[DATE]
2023-09-19 17:35:41+08:00
[CATEGORIES]
cs.LG
Ad-load Balancing via Off-policy Learning in a Content Marketplace
[AUTHORS]
Hitesh Sagtani, Madan Jhawar, Rishabh Mehrotra, Olivier Jeunen
[ABSTRACT]
Ad-load balancing is a critical challenge in online advertising systems,
particularly in the context of social media platforms, where the goal is to
maximize user engagement and revenue while maintaining a satisfactory user
experience. This requires the optimization of conflicting objectives, such as
user satisfaction and ads revenue. Traditional approaches to ad-load balancing
rely on static allocation policies, which fail to adapt to changing user
preferences and contextual factors. In this paper, we present an approach that
leverages off-policy learning and evaluation from logged bandit feedback. We
start by presenting a motivating analysis of the ad-load balancing problem,
highlighting the conflicting objectives between user satisfaction and ads
revenue. We emphasize the nuances that arise due to user heterogeneity and the
dependence on the user’s position within a session. Based on this analysis, we
define the problem as determining the optimal ad-load for a particular feed
fetch. To tackle this problem, we propose an off-policy learning framework that
leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and
Doubly Robust (DR) to learn and estimate the policy values using offline
collected stochastic data. We present insights from online A/B experiments
deployed at scale across over 80 million users generating over 200 million
sessions, where we find statistically significant improvements in both user
satisfaction metrics and ads revenue for the platform.
[COMMENTS]
Presented at the CONSEQUENCES’23 workshop at RecSys ‘23
[LINK]
http://arxiv.org/abs/2309.11518v1
[DATE]
2023-09-19 17:17:07+08:00
[CATEGORIES]
cs.LG
Long-term drought prediction using deep neural networks based on geospatial weather data
[AUTHORS]
Vsevolod Grabar, Alexander Marusov, Yury Maximov, Nazar Sotiriadi, Alexander Bulkin, Alexey Zaytsev
[ABSTRACT]
The accurate prediction of drought probability in specific regions is crucial
for informed decision-making in agricultural practices. It is important to make
predictions one year in advance, particularly for long-term decisions. However,
forecasting this probability presents challenges due to the complex interplay
of various factors within the region of interest and neighboring areas. In this
study, we propose an end-to-end solution to address this issue based on various
spatiotemporal neural networks. The models considered focus on predicting the
drought intensity based on the Palmer Drought Severity Index (PDSI) for
subregions of interest, leveraging intrinsic factors and insights from climate
models to enhance drought predictions.
Comparative evaluations demonstrate the superior accuracy of Convolutional
LSTM (ConvLSTM) and transformer models compared to baseline gradient boosting
and logistic regression solutions. The two former models achieved impressive
ROC AUC scores from 0.90 to 0.70 for forecast horizons from one to six months,
outperforming baseline models. The transformer showed superiority for shorter
horizons, while ConvLSTM did so for longer horizons. Thus, we recommend
selecting the models accordingly for long-term drought forecasting.
To ensure the broad applicability of the considered models, we conduct
extensive validation across regions worldwide, considering different
environmental conditions. We also run several ablation and sensitivity studies
to challenge our findings and provide additional information on how to solve
the problem.
[LINK]
http://arxiv.org/abs/2309.06212v2
[DATE]
2023-09-19 16:54:23+08:00
[CATEGORIES]
cs.LG
Graph Neural Networks for Dynamic Modeling of Roller Bearing
[AUTHORS]
Vinay Sharma, Jens Ravesloot, Cees Taal, Olga Fink
[ABSTRACT]
In the presented work, we propose to apply the framework of graph neural
networks (GNNs) to predict the dynamics of a rolling element bearing. This
approach offers generalizability and interpretability, having the potential for
scalable use in real-time operational digital twin systems for monitoring the
health state of rotating machines. By representing the bearing’s components as
nodes in a graph, the GNN can effectively model the complex relationships and
interactions among them. We utilize a dynamic spring-mass-damper model of a
bearing to generate the training data for the GNN. In this model, discrete
masses represent bearing components such as rolling elements, inner raceways,
and outer raceways, while a Hertzian contact model is employed to calculate the
forces between these components.
We evaluate the learning and generalization capabilities of the proposed GNN
framework by testing different bearing configurations that deviate from the
training configurations. Through this approach, we demonstrate the
effectiveness of the GNN-based method in accurately predicting the dynamics of
rolling element bearings, highlighting its potential for real-time health
monitoring of rotating machinery.
[LINK]
http://arxiv.org/abs/2309.10418v1
[DATE]
2023-09-19 16:30:10+08:00
[CATEGORIES]
cs.LG
Extended Graph Assessment Metrics for Graph Neural Networks
[AUTHORS]
Tamara T. Mueller, Sophie Starck, Leonhard F. Feiner, Kyriaki-Margarita Bintsi, Daniel Rueckert, Georgios Kaissis
[ABSTRACT]
When re-structuring patient cohorts into so-called population graphs,
initially independent data points can be incorporated into one interconnected
graph structure. This population graph can then be used for medical downstream
tasks using graph neural networks (GNNs). The construction of a suitable graph
structure is a challenging step in the learning pipeline that can have severe
impact on model performance. To this end, different graph assessment metrics
have been introduced to evaluate graph structures. However, these metrics are
limited to classification tasks and discrete adjacency matrices, only covering
a small subset of real-world applications. In this work, we introduce extended
graph assessment metrics (GAMs) for regression tasks and continuous adjacency
matrices. We focus on two GAMs in specific: \textit{homophily} and
\textit{cross-class neighbourhood similarity} (CCNS). We extend the notion of
GAMs to more than one hop, define homophily for regression tasks, as well as
continuous adjacency matrices, and propose a light-weight CCNS distance for
discrete and continuous adjacency matrices. We show the correlation of these
metrics with model performance on different medical population graphs and under
different learning settings.
[LINK]
http://arxiv.org/abs/2307.10112v2
[DATE]
2023-09-19 16:29:02+08:00
[CATEGORIES]
cs.LG
A Variational Auto-Encoder Enabled Multi-Band Channel Prediction Scheme for Indoor Localization
[AUTHORS]
Ruihao Yuan, Kaixuan Huang, Pan Yang, Shunqing Zhang
[LINK]
http://arxiv.org/abs/2309.12200v1
[DATE]
2023-09-19 16:19:34+08:00
[CATEGORIES]
cs.LG
Unsupervised Learning via Network-Aware Embeddings
[AUTHORS]
Anne Sophie Riis Damstrup, Sofie Tosti Madsen, Michele Coscia
[ABSTRACT]
Data clustering, the task of grouping observations according to their
similarity, is a key component of unsupervised learning – with real world
applications in diverse fields such as biology, medicine, and social science.
Often in these fields the data comes with complex interdependencies between the
dimensions of analysis, for instance the various characteristics and opinions
people can have live on a complex social network. Current clustering methods
are ill-suited to tackle this complexity: deep learning can approximate these
dependencies, but not take their explicit map as the input of the analysis. In
this paper, we aim at fixing this blind spot in the unsupervised learning
literature. We can create network-aware embeddings by estimating the network
distance between numeric node attributes via the generalized Euclidean
distance. Differently from all methods in the literature that we know of, we do
not cluster the nodes of the network, but rather its node attributes. In our
experiments we show that having these network embeddings is always beneficial
for the learning task; that our method scales to large networks; and that we
can actually provide actionable insights in applications in a variety of fields
such as marketing, economics, and political science. Our method is fully open
source and data and code are available to reproduce all results in the paper.
[LINK]
http://arxiv.org/abs/2309.10408v1
[DATE]
2023-09-19 16:17:48+08:00
[CATEGORIES]
cs.LG
Advancing Federated Learning in 6G: A Trusted Architecture with Graph-based Analysis
[AUTHORS]
Wenxuan Ye, Chendi Qian, Xueli An, Xueqiang Yan, Georg Carle
[ABSTRACT]
Integrating native AI support into the network architecture is an essential
objective of 6G. Federated Learning (FL) emerges as a potential paradigm,
facilitating decentralized AI model training across a diverse range of devices
under the coordination of a central server. However, several challenges hinder
its wide application in the 6G context, such as malicious attacks and privacy
snooping on local model updates, and centralization pitfalls. This work
proposes a trusted architecture for supporting FL, which utilizes Distributed
Ledger Technology (DLT) and Graph Neural Network (GNN), including three key
features. First, a pre-processing layer employing homomorphic encryption is
incorporated to securely aggregate local models, preserving the privacy of
individual models. Second, given the distributed nature and graph structure
between clients and nodes in the pre-processing layer, GNN is leveraged to
identify abnormal local models, enhancing system security. Third, DLT is
utilized to decentralize the system by selecting one of the candidates to
perform the central server’s functions. Additionally, DLT ensures reliable data
management by recording data exchanges in an immutable and transparent ledger.
The feasibility of the novel architecture is validated through simulations,
demonstrating improved performance in anomalous model detection and global
model accuracy compared to relevant baselines.
[LINK]
http://arxiv.org/abs/2309.05525v2
[DATE]
2023-09-19 15:52:52+08:00
[CATEGORIES]
cs.LG
Neural networks trained on synthetically generated crystals can extract structural information from ICSD powder X-ray diffractograms
[AUTHORS]
Henrik Schopmans, Patrick Reiser, Pascal Friederich
[ABSTRACT]
Machine learning techniques have successfully been used to extract structural
information such as the crystal space group from powder X-ray diffractograms.
However, training directly on simulated diffractograms from databases such as
the ICSD is challenging due to its limited size, class-inhomogeneity, and bias
toward certain structure types. We propose an alternative approach of
generating synthetic crystals with random coordinates by using the symmetry
operations of each space group. Based on this approach, we demonstrate online
training of deep ResNet-like models on up to a few million unique on-the-fly
generated synthetic diffractograms per hour. For our chosen task of space group
classification, we achieved a test accuracy of 79.9% on unseen ICSD structure
types from most space groups. This surpasses the 56.1% accuracy of the current
state-of-the-art approach of training on ICSD crystals directly. Our results
demonstrate that synthetically generated crystals can be used to extract
structural information from ICSD powder diffractograms, which makes it possible
to apply very large state-of-the-art machine learning models in the area of
powder X-ray diffraction. We further show first steps toward applying our
methodology to experimental data, where automated XRD data analysis is crucial,
especially in high-throughput settings. While we focused on the prediction of
the space group, our approach has the potential to be extended to related tasks
in the future.
[LINK]
http://arxiv.org/abs/2303.11699v3
[DATE]
2023-09-19 15:50:08+08:00
[CATEGORIES]
cs.LG
Community Detection Using Revised Medoid-Shift Based on KNN
[AUTHORS]
Jie Hou, Jiakang Li, Xiaokang Peng, Wei Ke, Yonggang Lu
[ABSTRACT]
Community detection becomes an important problem with the booming of social
networks. The Medoid-Shift algorithm preserves the benefits of Mean-Shift and
can be applied to problems based on distance matrix, such as community
detection. One drawback of the Medoid-Shift algorithm is that there may be no
data points within the neighborhood region defined by a distance parameter. To
deal with the community detection problem better, a new algorithm called
Revised Medoid-Shift (RMS) in this work is thus proposed. During the process of
finding the next medoid, the RMS algorithm is based on a neighborhood defined
by KNN, while the original Medoid-Shift is based on a neighborhood defined by a
distance parameter. Since the neighborhood defined by KNN is more stable than
the one defined by the distance parameter in terms of the number of data points
within the neighborhood, the RMS algorithm may converge more smoothly. In the
RMS method, each of the data points is shifted towards a medoid within the
neighborhood defined by KNN. After the iterative process of shifting, each of
the data point converges into a cluster center, and the data points converging
into the same center are grouped into the same cluster. The RMS algorithm is
tested on two kinds of datasets including community datasets with known ground
truth partition and community datasets without ground truth partition
respectively. The experiment results show sthat the proposed RMS algorithm
generally produces betster results than Medoid-Shift and some state-of-the-art
together with most classic community detection algorithms on different kinds of
community detection datasets.
[LINK]
http://arxiv.org/abs/2304.09512v2
[DATE]
2023-09-19 15:37:49+08:00
[CATEGORIES]
cs.LG
Generalized Inversion of Nonlinear Operators
[AUTHORS]
Eyal Gofer, Guy Gilboa
[ABSTRACT]
Inversion of operators is a fundamental concept in data processing. Inversion
of linear operators is well studied, supported by established theory. When an
inverse either does not exist or is not unique, generalized inverses are used.
Most notable is the Moore-Penrose inverse, widely used in physics, statistics,
and various fields of engineering. This work investigates generalized inversion
of nonlinear operators.
We first address broadly the desired properties of generalized inverses,
guided by the Moore-Penrose axioms. We define the notion for general sets, and
then a refinement, termed pseudo-inverse, for normed spaces. We present
conditions for existence and uniqueness of a pseudo-inverse and establish
theoretical results investigating its properties, such as continuity, its value
for operator compositions and projection operators, and others. Analytic
expressions are given for the pseudo-inverse of some well-known,
non-invertible, nonlinear operators, such as hard- or soft-thresholding and
ReLU. We analyze a neural layer and discuss relations to wavelet thresholding.
Next, the Drazin inverse, and a relaxation, are investigated for operators
with equal domain and range. We present scenarios where inversion is
expressible as a linear combination of forward applications of the operator.
Such scenarios arise for classes of nonlinear operators with vanishing
polynomials, similar to the minimal or characteristic polynomials for matrices.
Inversion using forward applications may facilitate the development of new
efficient algorithms for approximating generalized inversion of complex
nonlinear operators.
[COMMENTS]
A significant extension of the SSVM 2023 conference paper (see also
v2 here), in particular, new sections 7–9
[LINK]
http://arxiv.org/abs/2111.10755v3
[DATE]
2023-09-19 15:25:51+08:00
[CATEGORIES]
cs.LG
Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization
[AUTHORS]
Thomas Chen, Patricia Muñoz Ewald
[ABSTRACT]
In this paper, we provide a geometric interpretation of the structure of
shallow neural networks characterized by one hidden layer, a ramp activation
function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost
function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq
M$, and training input sample size $N>QM$. We prove an upper bound on the
minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures
the signal to noise ratio of training inputs. We obtain an approximate
optimizer using projections adapted to the averages $\overline{x_{0,j}}$ of
training input vectors belonging to the same output vector $y_j$,
$j=1,\dots,Q$. In the special case $M=Q$, we explicitly determine an exact
degenerate local minimum of the cost function; the sharp value differs from the
upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The
proof of the upper bound yields a constructively trained network; we show that
it metrizes the $Q$-dimensional subspace in the input space ${\mathbb R}^M$
spanned by $\overline{x_{0,j}}$, $j=1,\dots,Q$. We comment on the
characterization of the global minimum of the cost function in the given
context.
[COMMENTS]
AMS Latex, 29 pages
[LINK]
http://arxiv.org/abs/2309.10370v1
[DATE]
2023-09-19 15:12:41+08:00
[CATEGORIES]
cs.LG
Toward efficient resource utilization at edge nodes in federated learning
[AUTHORS]
Sadi Alawadi, Addi Ait-Mlouk, Salman Toor, Andreas Hellander
[ABSTRACT]
Federated learning (FL) enables edge nodes to collaboratively contribute to
constructing a global model without sharing their data. This is accomplished by
devices computing local, private model updates that are then aggregated by a
server. However, computational resource constraints and network communication
can become a severe bottleneck for larger model sizes typical for deep learning
applications. Edge nodes tend to have limited hardware resources (RAM, CPU),
and the network bandwidth and reliability at the edge is a concern for scaling
federated fleet applications. In this paper, we propose and evaluate a FL
strategy inspired by transfer learning in order to reduce resource utilization
on devices, as well as the load on the server and network in each global
training round. For each local model update, we randomly select layers to
train, freezing the remaining part of the model. In doing so, we can reduce
both server load and communication costs per round by excluding all untrained
layer weights from being transferred to the server. The goal of this study is
to empirically explore the potential trade-off between resource utilization on
devices and global model convergence under the proposed strategy. We implement
the approach using the federated learning framework FEDn. A number of
experiments were carried out over different datasets (CIFAR-10, CASA, and
IMDB), performing different tasks using different deep-learning model
architectures. Our results show that training the model partially can
accelerate the training process, efficiently utilizes resources on-device, and
reduce the data transmission by around 75% and 53% when we train 25%, and 50%
of the model layers, respectively, without harming the resulting global model
accuracy.
[COMMENTS]
16 pages, 5 tables, 8 figures
[LINK]
http://arxiv.org/abs/2309.10367v1
[DATE]
2023-09-19 15:04:50+08:00
[CATEGORIES]
cs.LG
Testable Likelihoods for Beyond-the-Standard Model Fits
[AUTHORS]
Anja Beck, Méril Reboud, Danny van Dyk
[ABSTRACT]
Studying potential BSM effects at the precision frontier requires accurate
transfer of information from low-energy measurements to high-energy BSM models.
We propose to use normalising flows to construct likelihood functions that
achieve this transfer. Likelihood functions constructed in this way provide the
means to generate additional samples and admit a ``trivial’’ goodness-of-fit
test in form of a $\chi^2$ test statistic. Here, we study a particular form of
normalising flow, apply it to a multi-modal and non-Gaussian example, and
quantify the accuracy of the likelihood function and its test statistic.
[COMMENTS]
11 pages, 7 figures
[LINK]
http://arxiv.org/abs/2309.10365v1
[DATE]
2023-09-19 15:03:41+08:00
[CATEGORIES]
cs.LG
Improving CLIP Robustness with Knowledge Distillation and Self-Training
[AUTHORS]
Clement Laroudie, Andrei Bursuc, Mai Lan Ha, Gianni Franchi
[ABSTRACT]
This paper examines the robustness of a multi-modal computer vision model,
CLIP (Contrastive Language-Image Pretraining), in the context of unsupervised
learning. The main objective is twofold: first, to evaluate the robustness of
CLIP, and second, to explore strategies for augmenting its robustness. To
achieve this, we introduce a novel approach named LP-CLIP. This technique
involves the distillation of CLIP features through the incorporation of a
linear probing layer positioned atop its encoding structure. This newly added
layer is trained utilizing pseudo-labels produced by CLIP, coupled with a
self-training strategy. The LP-CLIP technique offers a promising approach to
enhance the robustness of CLIP without the need for annotations. By leveraging
a simple linear probing layer, we aim to improve the model’s ability to
withstand various uncertainties and challenges commonly encountered in
real-world scenarios. Importantly, our approach does not rely on annotated
data, which makes it particularly valuable in situations where labeled data
might be scarce or costly to obtain. Our proposed approach increases the
robustness of CLIP with SOTA results compared to supervised technique on
various datasets.
[LINK]
http://arxiv.org/abs/2309.10361v1
[DATE]
2023-09-19 14:43:31+08:00
[CATEGORIES]
cs.LG
Language Guided Adversarial Purification
[AUTHORS]
Himanshu Singh, A V Subramanyam
[ABSTRACT]
Adversarial purification using generative models demonstrates strong
adversarial defense performance. These methods are classifier and
attack-agnostic, making them versatile but often computationally intensive.
Recent strides in diffusion and score networks have improved image generation
and, by extension, adversarial purification. Another highly efficient class of
adversarial defense methods known as adversarial training requires specific
knowledge of attack vectors, forcing them to be trained extensively on
adversarial examples. To overcome these limitations, we introduce a new
framework, namely Language Guided Adversarial Purification (LGAP), utilizing
pre-trained diffusion models and caption generators to defend against
adversarial attacks. Given an input image, our method first generates a
caption, which is then used to guide the adversarial purification process
through a diffusion network. Our approach has been evaluated against strong
adversarial attacks, proving its effectiveness in enhancing adversarial
robustness. Our results indicate that LGAP outperforms most existing
adversarial defense techniques without requiring specialized network training.
This underscores the generalizability of models trained on large datasets,
highlighting a promising direction for further research.
[LINK]
http://arxiv.org/abs/2309.10348v1
[DATE]
2023-09-19 14:17:18+08:00
[CATEGORIES]
cs.LG
Striking a Balance: An Optimal Mechanism Design for Heterogenous Differentially Private Data Acquisition for Logistic Regression
[AUTHORS]
Ameya Anjarlekar, Rasoul Etesami, R. Srikant
[ABSTRACT]
We investigate the problem of performing logistic regression on data
collected from privacy-sensitive sellers. Since the data is private, sellers
must be incentivized through payments to provide their data. Thus, the goal is
to design a mechanism that optimizes a weighted combination of test loss,
seller privacy, and payment, i.e., strikes a balance between multiple
objectives of interest. We solve the problem by combining ideas from game
theory, statistical learning theory, and differential privacy. The buyer’s
objective function can be highly non-convex. However, we show that, under
certain conditions on the problem parameters, the problem can be convexified by
using a change of variables. We also provide asymptotic results characterizing
the buyer’s test error and payments when the number of sellers becomes large.
Finally, we demonstrate our ideas by applying them to a real healthcare data
set.
[LINK]
http://arxiv.org/abs/2309.10340v1
[DATE]
2023-09-19 13:51:13+08:00
[CATEGORIES]
cs.LG
FedWOA: A Federated Learning Model that uses the Whale Optimization Algorithm for Renewable Energy Prediction
[AUTHORS]
Viorica Chifu, Tudor Cioara, Cristian Anitiei, Cristina Pop, Ionut Anghel
[ABSTRACT]
Privacy is important when dealing with sensitive personal information in
machine learning models, which require large data sets for training. In the
energy field, access to household prosumer energy data is crucial for energy
predictions to support energy grid management and large-scale adoption of
renewables however citizens are often hesitant to grant access to cloud-based
machine learning models. Federated learning has been proposed as a solution to
privacy challenges however report issues in generating the global prediction
model due to data heterogeneity, variations in generation patterns, and the
high number of parameters leading to even lower prediction accuracy. This paper
addresses these challenges by introducing FedWOA a novel federated learning
model that employs the Whale Optimization Algorithm to aggregate global
prediction models from the weights of local LTSM neural network models trained
on prosumer energy data. The proposed solution identifies the optimal vector of
weights in the search spaces of the local models to construct the global shared
model and then is subsequently transmitted to the local nodes to improve the
prediction quality at the prosumer site while for handling non-IID data K-Means
was used for clustering prosumers with similar scale of energy data. The
evaluation results on prosumers energy data have shown that FedWOA can
effectively enhance the accuracy of energy prediction models accuracy by 25%
for MSE and 16% for MAE compared to FedAVG while demonstrating good convergence
and reduced loss.
[LINK]
http://arxiv.org/abs/2309.10337v1
[DATE]
2023-09-19 13:44:18+08:00
[CATEGORIES]
cs.LG
Computational Approaches for App-to-App Retrieval and Design Consistency Check
[AUTHORS]
Seokhyeon Park, Wonjae Kim, Young-Ho Kim, Jinwook Seo
[COMMENTS]
AI & HCI Workshop at the ICML 2023
[LINK]
http://arxiv.org/abs/2309.10328v1
[DATE]
2023-09-19 13:21:22+08:00
[CATEGORIES]
cs.LG
Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms
[AUTHORS]
Keru Wu, Yuansi Chen, Wooseok Ha, Bin Yu
[ABSTRACT]
Domain adaptation (DA) is a statistical learning problem that arises when the
distribution of the source data used to train a model differs from that of the
target data used to evaluate the model. While many DA algorithms have
demonstrated considerable empirical success, blindly applying these algorithms
can often lead to worse performance on new datasets. To address this, it is
crucial to clarify the assumptions under which a DA algorithm has good target
performance. In this work, we focus on the assumption of the presence of
conditionally invariant components (CICs), which are relevant for prediction
and remain conditionally invariant across the source and target data. We
demonstrate that CICs, which can be estimated through conditional invariant
penalty (CIP), play three prominent roles in providing target risk guarantees
in DA. First, we propose a new algorithm based on CICs, importance-weighted
conditional invariant penalty (IW-CIP), which has target risk guarantees beyond
simple settings such as covariate shift and label shift. Second, we show that
CICs help identify large discrepancies between source and target risks of other
DA algorithms. Finally, we demonstrate that incorporating CICs into the domain
invariant projection (DIP) algorithm can address its failure scenario caused by
label-flipping features. We support our new algorithms and theoretical findings
via numerical experiments on synthetic data, MNIST, CelebA, and Camelyon17
datasets.
[LINK]
http://arxiv.org/abs/2309.10301v1
[DATE]
2023-09-19 12:04:59+08:00
[CATEGORIES]
cs.LG
Learning from Demonstration via Probabilistic Diagrammatic Teaching
[AUTHORS]
Weiming Zhi, Tianyi Zhang, Matthew Johnson-Roberson
[ABSTRACT]
Learning for Demonstration (LfD) enables robots to acquire new skills by
imitating expert demonstrations, allowing users to communicate their
instructions in an intuitive manner. Recent progress in LfD often relies on
kinesthetic teaching or teleoperation as the medium for users to specify the
demonstrations. Kinesthetic teaching requires physical handling of the robot,
while teleoperation demands proficiency with additional hardware. This paper
introduces an alternative paradigm for LfD called Diagrammatic Teaching.
Diagrammatic Teaching aims to teach robots novel skills by prompting the user
to sketch out demonstration trajectories on 2D images of the scene, these are
then synthesised as a generative model of motion trajectories in 3D task space.
Additionally, we present the Ray-tracing Probabilistic Trajectory Learning
(RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying
probability densities from the 2D sketches, applies ray-tracing to find
corresponding regions in 3D Cartesian space, and fits a probabilistic model of
motion trajectories to these regions. New motion trajectories, which mimic
those sketched by the user, can then be generated from the probabilistic model.
We empirically validate our framework both in simulation and on real robots,
which include a fixed-base manipulator and a quadruped-mounted manipulator.
[LINK]
http://arxiv.org/abs/2309.03835v2
[DATE]
2023-09-19 12:04:17+08:00
[CATEGORIES]
cs.LG
Learning Orbitally Stable Systems for Diagrammatically Teaching
[AUTHORS]
Weiming Zhi, Kangni Liu, Tianyi Zhang, Matthew Johnson-Roberson
[ABSTRACT]
Diagrammatic Teaching is a paradigm for robots to acquire novel skills,
whereby the user provides 2D sketches over images of the scene to shape the
robot’s motion. In this work, we tackle the problem of teaching a robot to
approach a surface and then follow cyclic motion on it, where the cycle of the
motion can be arbitrarily specified by a single user-provided sketch over an
image from the robot’s camera. Accordingly, we introduce the \emph{Stable
Diffeomorphic Diagrammatic Teaching} (SDDT) framework. SDDT models the robot’s
motion as an \emph{Orbitally Asymptotically Stable} (O.A.S.) dynamical system
that learns to follow the user-specified sketch. This is achieved by applying a
\emph{diffeomorphism}, i.e. a differentiable and invertible function, to morph
a known O.A.S. system. The parameterised diffeomorphism is then optimised with
respect to the Hausdorff distance between the limit cycle of our modelled
system and the sketch, to produce the desired robot motion. We provide
theoretical insight into the behaviour of the optimised system and also
empirically evaluate SDDT, both in simulation and on a quadruped with a mounted
6-DOF manipulator. Results show that we can diagrammatically teach complex
cyclic motion patterns with a high degree of accuracy.
[LINK]
http://arxiv.org/abs/2309.10298v1
[DATE]
2023-09-19 12:03:42+08:00
[CATEGORIES]
cs.LG
Revisiting Generalized p-Laplacian Regularized Framelet GCNs: Convergence, Energy Dynamic and Training with Non-Linear Diffusion
[AUTHORS]
Dai Shi, Zhiqi Shao, Yi Guo, Qibin Zhao, Junbin Gao
[ABSTRACT]
This paper presents a comprehensive theoretical analysis of the graph
p-Laplacian regularized framelet network (pL-UFG) to establish a solid
understanding of its properties. We conduct a convergence analysis on pL-UFG,
addressing the gap in the understanding of its asymptotic behaviors. Further by
investigating the generalized Dirichlet energy of pL-UFG, we demonstrate that
the Dirichlet energy remains non-zero throughout convergence, ensuring the
avoidance of over-smoothing issues. Additionally, we elucidate the energy
dynamic perspective, highlighting the synergistic relationship between the
implicit layer in pL-UFG and graph framelets. This synergy enhances the model’s
adaptability to both homophilic and heterophilic data. Notably, we reveal that
pL-UFG can be interpreted as a generalized non-linear diffusion process,
thereby bridging the gap between pL-UFG and differential equations on the
graph. Importantly, these multifaceted analyses lead to unified conclusions
that offer novel insights for understanding and implementing pL-UFG, as well as
other graph neural network (GNN) models. Finally, based on our dynamic
analysis, we propose two novel pL-UFG models with manually controlled energy
dynamics. We demonstrate empirically and theoretically that our proposed models
not only inherit the advantages of pL-UFG but also significantly reduce
computational costs for training on large-scale graph datasets.
[LINK]
http://arxiv.org/abs/2305.15639v4
[DATE]
2023-09-19 11:57:06+08:00
[CATEGORIES]
cs.LG
Contrastive Learning for Predicting Cancer Prognosis Using Gene Expression Values
[AUTHORS]
Anchen Sun, Zhibin Chen, Xiaodong Cai
[ABSTRACT]
Several artificial neural networks (ANNs) have been developed recently to
predict the prognosis of different types of cancer based on the tumor
transcriptome. However, they have not demonstrated significantly better
performance than the regularized Cox proportional hazards regression model.
Training an ANN is challenging with a limited number of data samples and a
high-dimensional feature space. Recent advancements in image classification
have shown that contrastive learning (CL) can facilitate further learning tasks
by learning good feature representation from a limited number of data samples.
In this paper, we applied supervised CL to tumor gene expression and clinical
data to learn feature representations in a low-dimensional space. We then used
these learned features to train a Cox model for predicting cancer prognosis.
Using data from The Cancer Genome Atlas (TCGA), we demonstrated that our
CL-based Cox model (CLCox) significantly outperformed existing methods in
predicting the prognosis of 19 types of cancer considered. We also developed
CL-based classifiers to classify tumors into different risk groups and showed
that CL can significantly improve classification accuracy. Specifically, our
CL-based classifiers achieved an area under the receiver operating
characteristic curve (AUC) of greater than 0.8 for 14 types of cancer and and
an AUC of greater than 0.9 for 2 types of cancer. CLCox models and CL-based
classifiers trained with TCGA lung cancer and prostate cancer data were
validated with the data of two independent cohorts.
[LINK]
http://arxiv.org/abs/2306.06276v2
[DATE]
2023-09-19 11:52:55+08:00
[CATEGORIES]
cs.LG
Koopman Invertible Autoencoder: Leveraging Forward and Backward Dynamics for Temporal Modeling
[AUTHORS]
Kshitij Tayal, Arvind Renganathan, Rahul Ghosh, Xiaowei Jia, Vipin Kumar
[ABSTRACT]
Accurate long-term predictions are the foundations for many machine learning
applications and decision-making processes. However, building accurate
long-term prediction models remains challenging due to the limitations of
existing temporal models like recurrent neural networks (RNNs), as they capture
only the statistical connections in the training data and may fail to learn the
underlying dynamics of the target system. To tackle this challenge, we propose
a novel machine learning model based on Koopman operator theory, which we call
Koopman Invertible Autoencoders (KIA), that captures the inherent
characteristic of the system by modeling both forward and backward dynamics in
the infinite-dimensional Hilbert space. This enables us to efficiently learn
low-dimensional representations, resulting in more accurate predictions of
long-term system behavior. Moreover, our method’s invertibility design
guarantees reversibility and consistency in both forward and inverse
operations. We illustrate the utility of KIA on pendulum and climate datasets,
demonstrating 300% improvements in long-term prediction capability for pendulum
while maintaining robustness against noise. Additionally, our method excels in
long-term climate prediction, further validating our method’s effectiveness.
[COMMENTS]
Accepted at IEEE International Conference on Data Mining (ICDM) 2023
[LINK]
http://arxiv.org/abs/2309.10291v1
[DATE]
2023-09-19 11:42:55+08:00
[CATEGORIES]
cs.LG
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
[AUTHORS]
Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
[ABSTRACT]
Directed Exploration is a crucial challenge in reinforcement learning (RL),
especially when rewards are sparse. Information-directed sampling (IDS), which
optimizes the information ratio, seeks to do so by augmenting regret with
information gain. However, estimating information gain is computationally
intractable or relies on restrictive assumptions which prohibit its use in many
practical instances. In this work, we posit an alternative exploration
incentive in terms of the integral probability metric (IPM) between a current
estimate of the transition model and the unknown optimal, which under suitable
conditions, can be computed in closed form with the kernelized Stein
discrepancy (KSD). Based on KSD, we develop a novel algorithm \algo:
\textbf{STE}in information dir\textbf{E}cted exploration for model-based
\textbf{R}einforcement Learn\textbf{ING}. To enable its derivation, we develop
fundamentally new variants of KSD for discrete conditional distributions. {We
further establish that {\algo} archives sublinear Bayesian regret, improving
upon prior learning rates of information-augmented MBRL.} Experimentally, we
show that the proposed algorithm is computationally affordable and outperforms
several prior approaches.
[LINK]
http://arxiv.org/abs/2301.12038v2
[DATE]
2023-09-19 11:21:17+08:00
[CATEGORIES]
cs.LG
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
[AUTHORS]
Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song
[ABSTRACT]
With the fast growth of parameter size, it becomes increasingly challenging
to deploy large generative models as they typically require large GPU memory
consumption and massive computation. Unstructured model pruning has been a
common approach to reduce both GPU memory footprint and the overall computation
while retaining good model accuracy. However, the existing solutions do not
provide a highly-efficient support for handling unstructured sparsity on modern
GPUs, especially on the highly-structured Tensor Core hardware. Therefore, we
propose Flash-LLM for enabling low-cost and highly-efficient large generative
model inference with the sophisticated support of unstructured sparsity on
high-performance but highly restrictive Tensor Cores. Based on our key
observation that the main bottleneck of generative model inference is the
several skinny matrix multiplications for which Tensor Cores would be
significantly under-utilized due to low computational intensity, we propose a
general Load-as-Sparse and Compute-as-Dense methodology for unstructured sparse
matrix multiplication. The basic insight is to address the significant memory
bandwidth bottleneck while tolerating redundant computations that are not
critical for end-to-end performance on Tensor Cores. Based on this, we design
an effective software framework for Tensor Core based unstructured SpMM,
leveraging on-chip resources for efficient sparse data extraction and
computation/memory-access overlapping. At SpMM kernel level, Flash-LLM
significantly outperforms the state-of-the-art library, i.e., Sputnik and
SparTA by an average of 2.9x and 1.5x, respectively. At end-to-end framework
level on OPT-30B/66B/175B models, for tokens per GPU-second, Flash-LLM achieves
up to 3.8x and 3.6x improvement over DeepSpeed and FasterTransformer,
respectively, with significantly lower inference cost.
[COMMENTS]
VLDB 2024
[LINK]
http://arxiv.org/abs/2309.10285v1
[DATE]
2023-09-19 11:20:02+08:00
[CATEGORIES]
cs.LG
FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning
[AUTHORS]
Thanveer Shaik, Xiaohui Tao, Lin Li, Haoran Xie, Taotao Cai, Xiaofeng Zhu, Qing Li
[ABSTRACT]
Machine Unlearning is an emerging field that addresses data privacy issues by
enabling the removal of private or irrelevant data from the Machine Learning
process. Challenges related to privacy and model efficiency arise from the use
of outdated, private, and irrelevant data. These issues compromise both the
accuracy and the computational efficiency of models in both Machine Learning
and Unlearning. To mitigate these challenges, we introduce a novel framework,
Attention-based Machine Unlearning using Federated Reinforcement Learning
(FRAMU). This framework incorporates adaptive learning mechanisms, privacy
preservation techniques, and optimization strategies, making it a well-rounded
solution for handling various data sources, either single-modality or
multi-modality, while maintaining accuracy and privacy. FRAMU’s strength lies
in its adaptability to fluctuating data landscapes, its ability to unlearn
outdated, private, or irrelevant data, and its support for continual model
evolution without compromising privacy. Our experiments, conducted on both
single-modality and multi-modality datasets, revealed that FRAMU significantly
outperformed baseline models. Additional assessments of convergence behavior
and optimization strategies further validate the framework’s utility in
federated learning applications. Overall, FRAMU advances Machine Unlearning by
offering a robust, privacy-preserving solution that optimizes model performance
while also addressing key challenges in dynamic data environments.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2309.10283v1
[DATE]
2023-09-19 11:13:17+08:00
[CATEGORIES]
cs.LG
Diffusion Methods for Generating Transition Paths
[AUTHORS]
Luke Triplett, Jianfeng Lu
[ABSTRACT]
In this work, we seek to simulate rare transitions between metastable states
using score-based generative models. An efficient method for generating
high-quality transition paths is valuable for the study of molecular systems
since data is often difficult to obtain. We develop two novel methods for path
generation in this paper: a chain-based approach and a midpoint-based approach.
The first biases the original dynamics to facilitate transitions, while the
second mirrors splitting techniques and breaks down the original transition
into smaller transitions. Numerical results of generated transition paths for
the M"uller potential and for Alanine dipeptide demonstrate the effectiveness
of these approaches in both the data-rich and data-scarce regimes.
[COMMENTS]
14 pages, 8 figures
[LINK]
http://arxiv.org/abs/2309.10276v1
[DATE]
2023-09-19 11:03:03+08:00
[CATEGORIES]
cs.LG
BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition
[AUTHORS]
Shikai Fang, Qingsong Wen, Yingtao Luo, Shandian Zhe, Liang Sun
[ABSTRACT]
In real-world scenarios like traffic and energy, massive time-series data
with missing values and noises are widely observed, even sampled irregularly.
While many imputation methods have been proposed, most of them work with a
local horizon, which means models are trained by splitting the long sequence
into batches of fit-sized patches. This local horizon can make models ignore
global trends or periodic patterns. More importantly, almost all methods assume
the observations are sampled at regular time stamps, and fail to handle complex
irregular sampled time series arising from different applications. Thirdly,
most existing methods are learned in an offline manner. Thus, it is not
suitable for many applications with fast-arriving streaming data. To overcome
these limitations, we propose BayOTIDE: Bayesian Online Multivariate Time
series Imputation with functional decomposition. We treat the multivariate time
series as the weighted combination of groups of low-rank temporal factors with
different patterns. We apply a group of Gaussian Processes (GPs) with different
kernels as functional priors to fit the factors. For computational efficiency,
we further convert the GPs into a state-space prior by constructing an
equivalent stochastic differential equation (SDE), and developing a scalable
algorithm for online inference. The proposed method can not only handle
imputation over arbitrary time stamps, but also offer uncertainty
quantification and interpretability for the downstream application. We evaluate
our method on both synthetic and real-world datasets.
[LINK]
http://arxiv.org/abs/2308.14906v2
[DATE]
2023-09-19 10:59:44+08:00
[CATEGORIES]
cs.LG
VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs
[AUTHORS]
Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, Jure Leskovec
[ABSTRACT]
GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn
computationally-efficient multi-layer perceptron (student MLP) on graph data by
mimicking the output representations of teacher GNN. Existing methods mainly
make the MLP to mimic the GNN predictions over a few class labels. However, the
class space may not be expressive enough for covering numerous diverse local
graph structures, thus limiting the performance of knowledge transfer from GNN
to MLP. To address this issue, we propose to learn a new powerful graph
representation space by directly labeling nodes’ diverse local structures for
GNN-to-MLP distillation. Specifically, we propose a variant of VQ-VAE to learn
a structure-aware tokenizer on graph data that can encode each node’s local
substructure as a discrete code. The discrete codes constitute a codebook as a
new graph representation space that is able to identify different local graph
structures of nodes with the corresponding code indices. Then, based on the
learned codebook, we propose a new distillation target, namely soft code
assignments, to directly transfer the structural knowledge of each node from
GNN to MLP. The resulting framework VQGraph achieves new state-of-the-art
performance on GNN-to-MLP distillation in both transductive and inductive
settings across seven graph datasets. We show that VQGraph with better
performance infers faster than GNNs by 828x, and also achieves accuracy
improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average,
respectively. Code: https://github.com/YangLing0818/VQGraph.
[COMMENTS]
Code: https://github.com/YangLing0818/VQGraph
[LINK]
http://arxiv.org/abs/2308.02117v2
[DATE]
2023-09-19 10:57:16+08:00
[CATEGORIES]
cs.LG
Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits
[AUTHORS]
Yi Shen, Pan Xu, Michael M. Zavlanos
[ABSTRACT]
Off-policy evaluation and learning are concerned with assessing a given
policy and learning an optimal policy from offline data without direct
interaction with the environment. Often, the environment in which the data are
collected differs from the environment in which the learned policy is applied.
To account for the effect of different environments during learning and
execution, distributionally robust optimization (DRO) methods have been
developed that compute worst-case bounds on the policy values assuming that the
distribution of the new environment lies within an uncertainty set. Typically,
this uncertainty set is defined based on the KL divergence around the empirical
distribution computed from the logging dataset. However, the KL uncertainty set
fails to encompass distributions with varying support and lacks awareness of
the geometry of the distribution support. As a result, KL approaches fall short
in addressing practical environment mismatches and lead to over-fitting to
worst-case scenarios. To overcome these limitations, we propose a novel DRO
approach that employs the Wasserstein distance instead. While Wasserstein DRO
is generally computationally more expensive compared to KL DRO, we present a
regularized method and a practical (biased) stochastic gradient descent method
to optimize the policy efficiently. We also provide a theoretical analysis of
the finite sample complexity and iteration complexity for our proposed method.
We further validate our approach using a public dataset that was recorded in a
randomized stoke trial.
[LINK]
http://arxiv.org/abs/2309.08748v2
[DATE]
2023-09-19 10:01:21+08:00
[CATEGORIES]
cs.LG
Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles
[AUTHORS]
Noah Golowich, Ankur Moitra, Dhruv Rohatgi
[ABSTRACT]
The key assumption underlying linear Markov Decision Processes (MDPs) is that
the learner has access to a known feature map $\phi(x, a)$ that maps
state-action pairs to $d$-dimensional vectors, and that the rewards and
transitions are linear functions in this representation. But where do these
features come from? In the absence of expert domain knowledge, a tempting
strategy is to use the ``kitchen sink” approach and hope that the true features
are included in a much larger set of potential features. In this paper we
revisit linear MDPs from the perspective of feature selection. In a $k$-sparse
linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing
all the relevant features, and the goal is to learn a near-optimal policy in
only poly$(k,\log d)$ interactions with the environment. Our main result is the
first polynomial-time algorithm for this problem. In contrast, earlier works
either made prohibitively strong assumptions that obviated the need for
exploration, or required solving computationally intractable optimization
problems.
Along the way we introduce the notion of an emulator: a succinct approximate
representation of the transitions that suffices for computing certain Bellman
backups. Since linear MDPs are a non-parametric model, it is not even obvious
whether polynomial-sized emulators exist. We show that they do exist and can be
computed efficiently via convex programming.
As a corollary of our main result, we give an algorithm for learning a
near-optimal policy in block MDPs whose decoding function is a low-depth
decision tree; the algorithm runs in quasi-polynomial time and takes a
polynomial number of samples. This can be seen as a reinforcement learning
analogue of classic results in computational learning theory. Furthermore, it
gives a natural model where improving the sample complexity via representation
learning is computationally feasible.
[LINK]
http://arxiv.org/abs/2309.09457v2
[DATE]
2023-09-19 09:56:24+08:00
[CATEGORIES]
cs.LG
Calibrating multi-dimensional complex ODE from noisy data via deep neural networks
[AUTHORS]
Kexuan Li, Fangfang Wang, Ruiqi Liu, Fan Yang, Zuofeng Shang
[ABSTRACT]
Ordinary differential equations (ODEs) are widely used to model complex
dynamics that arises in biology, chemistry, engineering, finance, physics, etc.
Calibration of a complicated ODE system using noisy data is generally very
difficult. In this work, we propose a two-stage nonparametric approach to
address this problem. We first extract the de-noised data and their higher
order derivatives using boundary kernel method, and then feed them into a
sparsely connected deep neural network with ReLU activation function. Our
method is able to recover the ODE system without being subject to the curse of
dimensionality and complicated ODE structure. When the ODE possesses a general
modular structure, with each modular component involving only a few input
variables, and the network architecture is properly chosen, our method is
proven to be consistent. Theoretical properties are corroborated by an
extensive simulation study that demonstrates the validity and effectiveness of
the proposed method. Finally, we use our method to simultaneously characterize
the growth rate of Covid-19 infection cases from 50 states of the USA.
[LINK]
http://arxiv.org/abs/2106.03591v2
[DATE]
2023-09-19 09:32:14+08:00
[CATEGORIES]
cs.LG
Graph topological property recovery with heat and wave dynamics-based features on graphs
[AUTHORS]
Dhananjay Bhaskar, Yanlei Zhang, Charles Xu, Xingzhi Sun, Oluwadamilola Fasina, Guy Wolf, Maximilian Nickel, Michael Perlmutter, Smita Krishnaswamy
[ABSTRACT]
In this paper, we propose Graph Differential Equation Network (GDeNet), an
approach that harnesses the expressive power of solutions to PDEs on a graph to
obtain continuous node- and graph-level representations for various downstream
tasks. We derive theoretical results connecting the dynamics of heat and wave
equations to the spectral properties of the graph and to the behavior of
continuous-time random walks on graphs. We demonstrate experimentally that
these dynamics are able to capture salient aspects of graph geometry and
topology by recovering generating parameters of random graphs, Ricci curvature,
and persistent homology. Furthermore, we demonstrate the superior performance
of GDeNet on real-world datasets including citation graphs, drug-like
molecules, and proteins.
[LINK]
http://arxiv.org/abs/2309.09924v2
[DATE]
2023-09-19 09:24:06+08:00
[CATEGORIES]
cs.LG
On Explicit Curvature Regularization in Deep Generative Models
[AUTHORS]
Yonghyeon Lee, Frank Chongwoo Park
[ABSTRACT]
We propose a family of curvature-based regularization terms for deep
generative model learning. Explicit coordinate-invariant formulas for both
intrinsic and extrinsic curvature measures are derived for the case of
arbitrary data manifolds embedded in higher-dimensional Euclidean space.
Because computing the curvature is a highly computation-intensive process
involving the evaluation of second-order derivatives, efficient formulas are
derived for approximately evaluating intrinsic and extrinsic curvatures.
Comparative studies are conducted that compare the relative efficacy of
intrinsic versus extrinsic curvature-based regularization measures, as well as
performance comparisons against existing autoencoder training methods.
Experiments involving noisy motion capture data confirm that curvature-based
methods outperform existing autoencoder regularization methods, with intrinsic
curvature measures slightly more effective than extrinsic curvature measures.
[COMMENTS]
2nd Annual Workshop on Topology, Algebra, and Geometry in Machine
Learning (TAG-ML) at the ICML 2023
[LINK]
http://arxiv.org/abs/2309.10237v1
[DATE]
2023-09-19 09:21:36+08:00
[CATEGORIES]
cs.LG
*ANet: A Scalable Path-based Reasoning Approach for Knowledge Graphs**
[AUTHORS]
Zhaocheng Zhu, Xinyu Yuan, Mikhail Galkin, Sophie Xhonneux, Ming Zhang, Maxime Gazeau, Jian Tang
[ABSTRACT]
Reasoning on large-scale knowledge graphs has been long dominated by
embedding methods. While path-based methods possess the inductive capacity that
embeddings lack, their scalability is limited by the exponential number of
paths. Here we present ANet, a scalable path-based method for knowledge graph
reasoning. Inspired by the A algorithm for shortest path problems, our ANet
learns a priority function to select important nodes and edges at each
iteration, to reduce time and memory footprint for both training and inference.
The ratio of selected nodes and edges can be specified to trade off between
performance and efficiency. Experiments on both transductive and inductive
knowledge graph reasoning benchmarks show that ANet achieves competitive
performance with existing state-of-the-art path-based methods, while merely
visiting 10% nodes and 10% edges at each iteration. On a million-scale dataset
ogbl-wikikg2, ANet not only achieves a new state-of-the-art result, but also
converges faster than embedding methods. ANet is the first path-based method
for knowledge graph reasoning at such scale.
[LINK]
http://arxiv.org/abs/2206.04798v4
[DATE]
2023-09-19 09:03:42+08:00
[CATEGORIES]
cs.LG
Multi-fidelity climate model parameterization for better generalization and extrapolation
[AUTHORS]
Mohamed Aziz Bhouri, Liran Peng, Michael S. Pritchard, Pierre Gentine
[ABSTRACT]
Machine-learning-based parameterizations (i.e. representation of sub-grid
processes) of global climate models or turbulent simulations have recently been
proposed as a powerful alternative to physical, but empirical, representations,
offering a lower computational cost and higher accuracy. Yet, those approaches
still suffer from a lack of generalization and extrapolation beyond the
training data, which is however critical to projecting climate change or
unobserved regimes of turbulence. Here we show that a multi-fidelity approach,
which integrates datasets of different accuracy and abundance, can provide the
best of both worlds: the capacity to extrapolate leveraging the
physically-based parameterization and a higher accuracy using the
machine-learning-based parameterizations. In an application to climate
modeling, the multi-fidelity framework yields more accurate climate projections
without requiring major increase in computational resources. Our multi-fidelity
randomized prior networks (MF-RPNs) combine physical parameterization data as
low-fidelity and storm-resolving historical run’s data as high-fidelity. To
extrapolate beyond the training data, the MF-RPNs are tested on high-fidelity
warming scenarios, $+4K$, data. We show the MF-RPN’s capacity to return much
more skillful predictions compared to either low- or high-fidelity (historical
data) simulations trained only on one regime while providing trustworthy
uncertainty quantification across a wide range of scenarios. Our approach paves
the way for the use of machine-learning based methods that can optimally
leverage historical observations or high-fidelity simulations and extrapolate
to unseen regimes such as climate change.
[COMMENTS]
27 pages, 16 figures
[LINK]
http://arxiv.org/abs/2309.10231v1
[DATE]
2023-09-19 09:03:39+08:00
[CATEGORIES]
cs.LG
CaT: Balanced Continual Graph Learning with Graph Condensation
[AUTHORS]
Yilun Liu, Ruihong Qiu, Zi Huang
[ABSTRACT]
Continual graph learning (CGL) is purposed to continuously update a graph
model with graph data being fed in a streaming manner. Since the model easily
forgets previously learned knowledge when training with new-coming data, the
catastrophic forgetting problem has been the major focus in CGL. Recent
replay-based methods intend to solve this problem by updating the model using
both (1) the entire new-coming data and (2) a sampling-based memory bank that
stores replayed graphs to approximate the distribution of historical data.
After updating the model, a new replayed graph sampled from the incoming graph
will be added to the existing memory bank. Despite these methods are intuitive
and effective for the CGL, two issues are identified in this paper. Firstly,
most sampling-based methods struggle to fully capture the historical
distribution when the storage budget is tight. Secondly, a significant data
imbalance exists in terms of the scales of the complex new-coming graph data
and the lightweight memory bank, resulting in unbalanced training. To solve
these issues, a Condense and Train (CaT) framework is proposed in this paper.
Prior to each model update, the new-coming graph is condensed to a small yet
informative synthesised replayed graph, which is then stored in a Condensed
Graph Memory with historical replay graphs. In the continual learning phase, a
Training in Memory scheme is used to update the model directly with the
Condensed Graph Memory rather than the whole new-coming graph, which alleviates
the data imbalance problem. Extensive experiments conducted on four benchmark
datasets successfully demonstrate superior performances of the proposed CaT
framework in terms of effectiveness and efficiency. The code has been released
on https://github.com/superallen13/CaT-CGL.
[COMMENTS]
The code has been released https://github.com/superallen13/CaT-CGL
[LINK]
http://arxiv.org/abs/2309.09455v2
[DATE]
2023-09-19 09:00:15+08:00
[CATEGORIES]
cs.LG
Method for Generating Synthetic Data Combining Chest Radiography Images with Tabular Clinical Information Using Dual Generative Models
[AUTHORS]
Tomohiro Kikuchi, Shouhei Hanaoka, Takahiro Nakao, Tomomi Takenaga, Yukihiro Nomura, Harushi Mori, Takeharu Yoshikawa
[ABSTRACT]
The generation of synthetic medical records using Generative Adversarial
Networks (GANs) is becoming crucial for addressing privacy concerns and
facilitating data sharing in the medical domain. In this paper, we introduce a
novel method to create synthetic hybrid medical records that combine both image
and non-image data, utilizing an auto-encoding GAN (alphaGAN) and a conditional
tabular GAN (CTGAN). Our methodology encompasses three primary steps: I)
Dimensional reduction of images in a private dataset (pDS) using the pretrained
encoder of the {\alpha}GAN, followed by integration with the remaining
non-image clinical data to form tabular representations; II) Training the CTGAN
on the encoded pDS to produce a synthetic dataset (sDS) which amalgamates
encoded image features with non-image clinical data; and III) Reconstructing
synthetic images from the image features using the alphaGAN’s pretrained
decoder. We successfully generated synthetic records incorporating both Chest
X-Rays (CXRs) and thirteen non-image clinical variables (comprising seven
categorical and six numeric variables). To evaluate the efficacy of the sDS, we
designed classification and regression tasks and compared the performance of
models trained on pDS and sDS against the pDS test set. Remarkably, by
leveraging five times the volume of sDS for training, we achieved
classification and regression results that were comparable, if slightly
inferior, to those obtained using the native pDS. Our method holds promise for
publicly releasing synthetic datasets without undermining the potential for
secondary data usage.
[LINK]
http://arxiv.org/abs/2308.07573v2
[DATE]
2023-09-19 08:42:29+08:00
[CATEGORIES]
cs.LG
Causal Theories and Structural Data Representations for Improving Out-of-Distribution Classification
[AUTHORS]
Donald Martin, Jr., David Kinney
[ABSTRACT]
We consider how human-centered causal theories and tools from the dynamical
systems literature can be deployed to guide the representation of data when
training neural networks for complex classification tasks. Specifically, we use
simulated data to show that training a neural network with a data
representation that makes explicit the invariant structural causal features of
the data generating process of an epidemic system improves out-of-distribution
(OOD) generalization performance on a classification task as compared to a more
naive approach to data representation. We take these results to demonstrate
that using human-generated causal knowledge to reduce the epistemic uncertainty
of ML developers can lead to more well-specified ML pipelines. This, in turn,
points to the utility of a dynamical systems approach to the broader effort
aimed at improving the robustness and safety of machine learning systems via
improved ML system development practices.
[COMMENTS]
22 pages, 5 figures
[LINK]
http://arxiv.org/abs/2309.10211v1
[DATE]
2023-09-19 07:49:42+08:00
[CATEGORIES]
cs.LG
The Kernel Density Integral Transformation
[AUTHORS]
Calvin McCarter
[ABSTRACT]
Feature preprocessing continues to play a critical role when applying machine
learning and statistical methods to tabular data. In this paper, we propose the
use of the kernel density integral transformation as a feature preprocessing
step. Our approach subsumes the two leading feature preprocessing methods as
limiting cases: linear min-max scaling and quantile transformation. We
demonstrate that, without hyperparameter tuning, the kernel density integral
transformation can be used as a simple drop-in replacement for either method,
offering robustness to the weaknesses of each. Alternatively, with tuning of a
single continuous hyperparameter, we frequently outperform both of these
methods. Finally, we show that the kernel density transformation can be
profitably applied to statistical data analysis, particularly in correlation
analysis and univariate clustering.
[LINK]
http://arxiv.org/abs/2309.10194v1
[DATE]
2023-09-19 06:54:05+08:00
[CATEGORIES]
cs.LG
Stochastic Deep Koopman Model for Quality Propagation Analysis in Multistage Manufacturing Systems
[AUTHORS]
Zhiyi Chen, Harshal Maske, Huanyi Shui, Devesh Upadhyay, Michael Hopka, Joseph Cohen, Xingjian Lai, Xun Huan, Jun Ni
[ABSTRACT]
The modeling of multistage manufacturing systems (MMSs) has attracted
increased attention from both academia and industry. Recent advancements in
deep learning methods provide an opportunity to accomplish this task with
reduced cost and expertise. This study introduces a stochastic deep Koopman
(SDK) framework to model the complex behavior of MMSs. Specifically, we present
a novel application of Koopman operators to propagate critical quality
information extracted by variational autoencoders. Through this framework, we
can effectively capture the general nonlinear evolution of product quality
using a transferred linear representation, thus enhancing the interpretability
of the data-driven model. To evaluate the performance of the SDK framework, we
carried out a comparative study on an open-source dataset. The main findings of
this paper are as follows. Our results indicate that SDK surpasses other
popular data-driven models in accuracy when predicting stagewise product
quality within the MMS. Furthermore, the unique linear propagation property in
the stochastic latent space of SDK enables traceability for quality evolution
throughout the process, thereby facilitating the design of root cause analysis
schemes. Notably, the proposed framework requires minimal knowledge of the
underlying physics of production lines. It serves as a virtual metrology tool
that can be applied to various MMSs, contributing to the ultimate goal of Zero
Defect Manufacturing.
[LINK]
http://arxiv.org/abs/2309.10193v1
[DATE]
2023-09-19 06:53:17+08:00
[CATEGORIES]
cs.LG
Graph-enabled Reinforcement Learning for Time Series Forecasting with Adaptive Intelligence
[AUTHORS]
Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Jianming Yong, Yuefeng Li
[ABSTRACT]
Reinforcement learning is well known for its ability to model sequential
tasks and learn latent data patterns adaptively. Deep learning models have been
widely explored and adopted in regression and classification tasks. However,
deep learning has its limitations such as the assumption of equally spaced and
ordered data, and the lack of ability to incorporate graph structure in terms
of time-series prediction. Graphical neural network (GNN) has the ability to
overcome these challenges and capture the temporal dependencies in time-series
data. In this study, we propose a novel approach for predicting time-series
data using GNN and monitoring with Reinforcement Learning (RL). GNNs are able
to explicitly incorporate the graph structure of the data into the model,
allowing them to capture temporal dependencies in a more natural way. This
approach allows for more accurate predictions in complex temporal structures,
such as those found in healthcare, traffic and weather forecasting. We also
fine-tune our GraphRL model using a Bayesian optimisation technique to further
improve performance. The proposed framework outperforms the baseline models in
time-series forecasting and monitoring. The contributions of this study include
the introduction of a novel GraphRL framework for time-series prediction and
the demonstration of the effectiveness of GNNs in comparison to traditional
deep learning models such as RNNs and LSTMs. Overall, this study demonstrates
the potential of GraphRL in providing accurate and efficient predictions in
dynamic RL environments.
[COMMENTS]
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible
[LINK]
http://arxiv.org/abs/2309.10186v1
[DATE]
2023-09-19 06:25:12+08:00
[CATEGORIES]
cs.LG
QoS-Aware Service Prediction and Orchestration in Cloud-Network Integrated Beyond 5G
[AUTHORS]
Mohammad Farhoudi, Masoud Shokrnezhad, Tarik Taleb
[ABSTRACT]
Novel applications such as the Metaverse have highlighted the potential of
beyond 5G networks, which necessitate ultra-low latency communications and
massive broadband connections. Moreover, the burgeoning demand for such
services with ever-fluctuating users has engendered a need for heightened
service continuity consideration in B5G. To enable these services, the
edge-cloud paradigm is a potential solution to harness cloud capacity and
effectively manage users in real time as they move across the network. However,
edge-cloud networks confront a multitude of limitations, including networking
and computing resources that must be collectively managed to unlock their full
potential. This paper addresses the joint problem of service placement and
resource allocation in a network-cloud integrated environment while considering
capacity constraints, dynamic users, and end-to-end delays. We present a
non-linear programming model that formulates the optimization problem with the
aiming objective of minimizing overall cost while enhancing latency. Next, to
address the problem, we introduce a DDQL-based technique using RNNs to predict
user behavior, empowered by a water-filling-based algorithm for service
placement. The proposed framework adeptly accommodates the dynamic nature of
users, the placement of services that mandate ultra-low latency in B5G, and
service continuity when users migrate from one location to another. Simulation
results show that our solution provides timely responses that optimize the
network’s potential, offering a scalable and efficient placement.
[LINK]
http://arxiv.org/abs/2309.10185v1
[DATE]
2023-09-19 06:24:42+08:00
[CATEGORIES]
cs.LG
Online Reinforcement Learning in Markov Decision Process Using Linear Programming
[AUTHORS]
Vincent Leon, S. Rasoul Etesami
[ABSTRACT]
We consider online reinforcement learning in episodic Markov decision process
(MDP) with unknown transition function and stochastic rewards drawn from some
fixed but unknown distribution. The learner aims to learn the optimal policy
and minimize their regret over a finite time horizon through interacting with
the environment. We devise a simple and efficient model-based algorithm that
achieves $\widetilde{O}(LX\sqrt{TA})$ regret with high probability, where $L$
is the episode length, $T$ is the number of episodes, and $X$ and $A$ are the
cardinalities of the state space and the action space, respectively. The
proposed algorithm, which is based on the concept of ``optimism in the face of
uncertainty”, maintains confidence sets of transition and reward functions and
uses occupancy measures to connect the online MDP with linear programming. It
achieves a tighter regret bound compared to the existing works that use a
similar confidence set framework and improves computational effort compared to
those that use a different framework but with a slightly tighter regret bound.
[LINK]
http://arxiv.org/abs/2304.00155v2
[DATE]
2023-09-19 06:21:41+08:00
[CATEGORIES]
cs.LG
Double Deep Q-Learning-based Path Selection and Service Placement for Latency-Sensitive Beyond 5G Applications
[AUTHORS]
Masoud Shokrnezhad, Tarik Taleb, Patrizio Dazzi
[ABSTRACT]
Nowadays, as the need for capacity continues to grow, entirely novel services
are emerging. A solid cloud-network integrated infrastructure is necessary to
supply these services in a real-time responsive, and scalable way. Due to their
diverse characteristics and limited capacity, communication and computing
resources must be collaboratively managed to unleash their full potential.
Although several innovative methods have been proposed to orchestrate the
resources, most ignored network resources or relaxed the network as a simple
graph, focusing only on cloud resources. This paper fills the gap by studying
the joint problem of communication and computing resource allocation, dubbed
CCRA, including function placement and assignment, traffic prioritization, and
path selection considering capacity constraints and quality requirements, to
minimize total cost. We formulate the problem as a non-linear programming model
and propose two approaches, dubbed B\&B-CCRA and WF-CCRA, based on the Branch
\& Bound and Water-Filling algorithms to solve it when the system is fully
known. Then, for partially known systems, a Double Deep Q-Learning (DDQL)
architecture is designed. Numerical simulations show that B\&B-CCRA optimally
solves the problem, whereas WF-CCRA delivers near-optimal solutions in a
substantially shorter time. Furthermore, it is demonstrated that DDQL-CCRA
obtains near-optimal solutions in the absence of request-specific information.
[COMMENTS]
in IEEE Transactions on Mobile Computing, 2023. arXiv admin note:
text overlap with arXiv:2309.09763
[LINK]
http://arxiv.org/abs/2309.10180v1
[DATE]
2023-09-19 06:17:23+08:00
[CATEGORIES]
cs.LG
Self-Sustaining Multiple Access with Continual Deep Reinforcement Learning for Dynamic Metaverse Applications
[AUTHORS]
Hamidreza Mazandarani, Masoud Shokrnezhad, Tarik Taleb, Richard Li
[ABSTRACT]
The Metaverse is a new paradigm that aims to create a virtual environment
consisting of numerous worlds, each of which will offer a different set of
services. To deal with such a dynamic and complex scenario, considering the
stringent quality of service requirements aimed at the 6th generation of
communication systems (6G), one potential approach is to adopt self-sustaining
strategies, which can be realized by employing Adaptive Artificial Intelligence
(Adaptive AI) where models are continually re-trained with new data and
conditions. One aspect of self-sustainability is the management of multiple
access to the frequency spectrum. Although several innovative methods have been
proposed to address this challenge, mostly using Deep Reinforcement Learning
(DRL), the problem of adapting agents to a non-stationary environment has not
yet been precisely addressed. This paper fills in the gap in the current
literature by investigating the problem of multiple access in multi-channel
environments to maximize the throughput of the intelligent agent when the
number of active User Equipments (UEs) may fluctuate over time. To solve the
problem, a Double Deep Q-Learning (DDQL) technique empowered by Continual
Learning (CL) is proposed to overcome the non-stationary situation, while the
environment is unknown. Numerical simulations demonstrate that, compared to
other well-known methods, the CL-DDQL algorithm achieves significantly higher
throughputs with a considerably shorter convergence time in highly dynamic
scenarios.
[LINK]
http://arxiv.org/abs/2309.10177v1
[DATE]
2023-09-19 06:02:47+08:00
[CATEGORIES]
cs.LG
Bridging the Gap between Spatial and Spectral Domains: A Unified Framework for Graph Neural Networks
[AUTHORS]
Zhiqian Chen, Fanglan Chen, Lei Zhang, Taoran Ji, Kaiqun Fu, Liang Zhao, Feng Chen, Lingfei Wu, Charu Aggarwal, Chang-Tien Lu
[ABSTRACT]
Deep learning’s performance has been extensively recognized recently. Graph
neural networks (GNNs) are designed to deal with graph-structural data that
classical deep learning does not easily manage. Since most GNNs were created
using distinct theories, direct comparisons are impossible. Prior research has
primarily concentrated on categorizing existing models, with little attention
paid to their intrinsic connections. The purpose of this study is to establish
a unified framework that integrates GNNs based on spectral graph and
approximation theory. The framework incorporates a strong integration between
spatial- and spectral-based GNNs while tightly associating approaches that
exist within each respective domain.
[COMMENTS]
ACM Computing Survey, to appear
[LINK]
http://arxiv.org/abs/2107.10234v5
[DATE]
2023-09-19 05:40:20+08:00
[CATEGORIES]
cs.LG
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning
[AUTHORS]
Ziheng Wang, Justin Sirignano
[ABSTRACT]
Actor-critic algorithms are widely used in reinforcement learning, but are
challenging to mathematically analyse due to the online arrival of non-i.i.d.
data samples. The distribution of the data samples dynamically changes as the
model is updated, introducing a complex feedback loop between the data
distribution and the reinforcement learning algorithm. We prove that, under a
time rescaling, the online actor-critic algorithm with tabular parametrization
converges to an ordinary differential equation (ODE) as the number of updates
becomes large. The proof first establishes the geometric ergodicity of the data
samples under a fixed actor policy. Then, using a Poisson equation, we prove
that the fluctuations of the data samples around a dynamic probability measure,
which is a function of the evolving actor model, vanish as the number of
updates become large. Once the ODE limit has been derived, we study its
convergence properties using a two time-scale analysis which asymptotically
de-couples the critic ODE from the actor ODE. The convergence of the critic to
the solution of the Bellman equation and the actor to the optimal policy are
proven. In addition, a convergence rate to this global minimum is also
established. Our convergence analysis holds under specific choices for the
learning rates and exploration rates in the actor-critic algorithm, which could
provide guidance for the implementation of actor-critic algorithms in practice.
[LINK]
http://arxiv.org/abs/2108.08655v2
[DATE]
2023-09-19 05:30:12+08:00
[CATEGORIES]
cs.LG
Autoencoder-based Anomaly Detection System for Online Data Quality Monitoring of the CMS Electromagnetic Calorimeter
[AUTHORS]
The CMS ECAL Collaboration
[ABSTRACT]
The CMS detector is a general-purpose apparatus that detects high-energy
collisions produced at the LHC. Online Data Quality Monitoring of the CMS
electromagnetic calorimeter is a vital operational tool that allows detector
experts to quickly identify, localize, and diagnose a broad range of detector
issues that could affect the quality of physics data. A real-time
autoencoder-based anomaly detection system using semi-supervised machine
learning is presented enabling the detection of anomalies in the CMS
electromagnetic calorimeter data. A novel method is introduced which maximizes
the anomaly detection performance by exploiting the time-dependent evolution of
anomalies as well as spatial variations in the detector response. The
autoencoder-based system is able to efficiently detect anomalies, while
maintaining a very low false discovery rate. The performance of the system is
validated with anomalies found in 2018 and 2022 LHC collision data.
Additionally, the first results from deploying the autoencoder-based system in
the CMS online Data Quality Monitoring workflow during the beginning of Run 3
of the LHC are presented, showing its ability to detect issues missed by the
existing system.
[LINK]
http://arxiv.org/abs/2309.10157v1
[DATE]
2023-09-19 05:11:25+08:00
[CATEGORIES]
cs.LG
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
[AUTHORS]
Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine
[ABSTRACT]
In this work, we present a scalable reinforcement learning method for
training multi-task policies from large offline datasets that can leverage both
human demonstrations and autonomously collected data. Our method uses a
Transformer to provide a scalable representation for Q-functions trained via
offline temporal difference backups. We therefore refer to the method as
Q-Transformer. By discretizing each action dimension and representing the
Q-value of each action dimension as separate tokens, we can apply effective
high-capacity sequence modeling techniques for Q-learning. We present several
design decisions that enable good performance with offline RL training, and
show that Q-Transformer outperforms prior offline RL algorithms and imitation
learning techniques on a large diverse real-world robotic manipulation task
suite. The project’s website and videos can be found at
https://q-transformer.github.io
[COMMENTS]
See website at https://q-transformer.github.io
[LINK]
http://arxiv.org/abs/2309.10150v1
[DATE]
2023-09-19 05:00:38+08:00
[CATEGORIES]
cs.LG
Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust?
[AUTHORS]
Minsu Kim, Walid Saad
[ABSTRACT]
In continual learning (CL), an AI agent (e.g., autonomous vehicles or
robotics) learns from non-stationary data streams under dynamic environments.
For the practical deployment of such applications, it is important to guarantee
robustness to unseen environments while maintaining past experiences. In this
paper, a novel CL framework is proposed to achieve robust generalization to
dynamic environments while retaining past knowledge. The considered CL agent
uses a capacity-limited memory to save previously observed environmental
information to mitigate forgetting issues. Then, data points are sampled from
the memory to estimate the distribution of risks over environmental change so
as to obtain predictors that are robust with unseen changes. The generalization
and memorization performance of the proposed framework are theoretically
analyzed. This analysis showcases the tradeoff between memorization and
generalization with the memory size. Experiments show that the proposed
algorithm outperforms memory-based CL baselines across all environments while
significantly improving the generalization performance on unseen target
environments.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.10149v1
[DATE]
2023-09-19 05:00:01+08:00
[CATEGORIES]
cs.LG
A Geometric Framework for Neural Feature Learning
[AUTHORS]
Xiangxiang Xu, Lizhong Zheng
[ABSTRACT]
We present a novel framework for learning system design based on neural
feature extractors by exploiting geometric structures in feature spaces. First,
we introduce the feature geometry, which unifies statistical dependence and
features in the same functional space with geometric structures. By applying
the feature geometry, we formulate each learning problem as solving the optimal
feature approximation of the dependence component specified by the learning
setting. We propose a nesting technique for designing learning algorithms to
learn the optimal features from data samples, which can be applied to
off-the-shelf network architectures and optimizers. To demonstrate the
application of the nesting technique, we further discuss multivariate learning
problems, including conditioned inference and multimodal learning, where we
present the optimal features and reveal their connections to classical
approaches.
[COMMENTS]
70 pages, 23 figures
[LINK]
http://arxiv.org/abs/2309.10140v1
[DATE]
2023-09-19 04:39:12+08:00
[CATEGORIES]
cs.LG
Efficient Low-Rank GNN Defense Against Structural Attacks
[AUTHORS]
Abdullah Alchihabi, Qing En, Yuhong Guo
[ABSTRACT]
Graph Neural Networks (GNNs) have been shown to possess strong representation
abilities over graph data. However, GNNs are vulnerable to adversarial attacks,
and even minor perturbations to the graph structure can significantly degrade
their performance. Existing methods either are ineffective against
sophisticated attacks or require the optimization of dense adjacency matrices,
which is time-consuming and prone to local minima. To remedy this problem, we
propose an Efficient Low-Rank Graph Neural Network (ELR-GNN) defense method,
which aims to learn low-rank and sparse graph structures for defending against
adversarial attacks, ensuring effective defense with greater efficiency.
Specifically, ELR-GNN consists of two modules: a Coarse Low-Rank Estimation
Module and a Fine-Grained Estimation Module. The first module adopts the
truncated Singular Value Decomposition (SVD) to initialize the low-rank
adjacency matrix estimation, which serves as a starting point for optimizing
the low-rank matrix. In the second module, the initial estimate is refined by
jointly learning a low-rank sparse graph structure with the GNN model. Sparsity
is incorporated into the learned low-rank adjacency matrix by pruning weak
connections, which can reduce redundant data while maintaining valuable
information. As a result, instead of using the dense adjacency matrix directly,
ELR-GNN can learn a low-rank and sparse estimate of it in a simple, efficient
and easy to optimize manner. The experimental results demonstrate that ELR-GNN
outperforms the state-of-the-art GNN defense methods in the literature, in
addition to being very efficient and easy to train.
[COMMENTS]
ICKG 2023
[LINK]
http://arxiv.org/abs/2309.10136v1
[DATE]
2023-09-19 04:22:27+08:00
[CATEGORIES]
cs.LG
GDM: Dual Mixup for Graph Classification with Limited Supervision
[AUTHORS]
Abdullah Alchihabi, Yuhong Guo
[ABSTRACT]
Graph Neural Networks (GNNs) require a large number of labeled graph samples
to obtain good performance on the graph classification task. The performance of
GNNs degrades significantly as the number of labeled graph samples decreases.
To reduce the annotation cost, it is therefore important to develop graph
augmentation methods that can generate new graph instances to increase the size
and diversity of the limited set of available labeled graph samples. In this
work, we propose a novel mixup-based graph augmentation method, Graph Dual
Mixup (GDM), that leverages both functional and structural information of the
graph instances to generate new labeled graph samples. GDM employs a graph
structural auto-encoder to learn structural embeddings of the graph samples,
and then applies mixup to the structural information of the graphs in the
learned structural embedding space and generates new graph structures from the
mixup structural embeddings. As for the functional information, GDM applies
mixup directly to the input node features of the graph samples to generate
functional node feature information for new mixup graph instances. Jointly, the
generated input node features and graph structures yield new graph samples
which can supplement the set of original labeled graphs. Furthermore, we
propose two novel Balanced Graph Sampling methods to enhance the balanced
difficulty and diversity for the generated graph samples. Experimental results
on the benchmark datasets demonstrate that our proposed method substantially
outperforms the state-of-the-art graph augmentation methods when the labeled
graphs are scarce.
[COMMENTS]
ECML 2023
[LINK]
http://arxiv.org/abs/2309.10134v1
[DATE]
2023-09-19 04:17:10+08:00
[CATEGORIES]
cs.LG
Deep Prompt Tuning for Graph Transformers
[AUTHORS]
Reza Shirkavand, Heng Huang
[ABSTRACT]
Graph transformers have gained popularity in various graph-based tasks by
addressing challenges faced by traditional Graph Neural Networks. However, the
quadratic complexity of self-attention operations and the extensive layering in
graph transformer architectures present challenges when applying them to graph
based prediction tasks. Fine-tuning, a common approach, is resource-intensive
and requires storing multiple copies of large models. We propose a novel
approach called deep graph prompt tuning as an alternative to fine-tuning for
leveraging large graph transformer models in downstream graph based prediction
tasks. Our method introduces trainable feature nodes to the graph and pre-pends
task-specific tokens to the graph transformer, enhancing the model’s expressive
power. By freezing the pre-trained parameters and only updating the added
tokens, our approach reduces the number of free parameters and eliminates the
need for multiple model copies, making it suitable for small datasets and
scalable to large graphs. Through extensive experiments on various-sized
datasets, we demonstrate that deep graph prompt tuning achieves comparable or
even superior performance to fine-tuning, despite utilizing significantly fewer
task-specific parameters. Our contributions include the introduction of prompt
tuning for graph transformers, its application to both graph transformers and
message passing graph neural networks, improved efficiency and resource
utilization, and compelling experimental results. This work brings attention to
a promising approach to leverage pre-trained models in graph based prediction
tasks and offers new opportunities for exploring and advancing graph
representation learning.
[LINK]
http://arxiv.org/abs/2309.10131v1
[DATE]
2023-09-19 04:12:17+08:00
[CATEGORIES]
cs.LG
Deep smoothness WENO scheme for two-dimensional hyperbolic conservation laws: A deep learning approach for learning smoothness indicators
[AUTHORS]
Tatiana Kossaczká, Ameya D. Jagtap, Matthias Ehrhardt
[ABSTRACT]
In this paper, we introduce an improved version of the fifth-order weighted
essentially non-oscillatory (WENO) shock-capturing scheme by incorporating deep
learning techniques. The established WENO algorithm is improved by training a
compact neural network to adjust the smoothness indicators within the WENO
scheme. This modification enhances the accuracy of the numerical results,
particularly near abrupt shocks. Unlike previous deep learning-based methods,
no additional post-processing steps are necessary for maintaining consistency.
We demonstrate the superiority of our new approach using several examples from
the literature for the two-dimensional Euler equations of gas dynamics. Through
intensive study of these test problems, which involve various shocks and
rarefaction waves, the new technique is shown to outperform traditional
fifth-order WENO schemes, especially in cases where the numerical solutions
exhibit excessive diffusion or overshoot around shocks.
[COMMENTS]
33 pages, 18 figures
[LINK]
http://arxiv.org/abs/2309.10117v1
[DATE]
2023-09-19 03:42:35+08:00
[CATEGORIES]
cs.LG
AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation
[AUTHORS]
Damian Sójka, Sebastian Cygert, Bartłomiej Twardowski, Tomasz Trzciński
[ABSTRACT]
Test-time adaptation is a promising research direction that allows the source
model to adapt itself to changes in data distribution without any supervision.
Yet, current methods are usually evaluated on benchmarks that are only a
simplification of real-world scenarios. Hence, we propose to validate test-time
adaptation methods using the recently introduced datasets for autonomous
driving, namely CLAD-C and SHIFT. We observe that current test-time adaptation
methods struggle to effectively handle varying degrees of domain shift, often
resulting in degraded performance that falls below that of the source model. We
noticed that the root of the problem lies in the inability to preserve the
knowledge of the source model and adapt to dynamically changing, temporally
correlated data streams. Therefore, we enhance well-established self-training
framework by incorporating a small memory buffer to increase model stability
and at the same time perform dynamic adaptation based on the intensity of
domain shift. The proposed method, named AR-TTA, outperforms existing
approaches on both synthetic and more real-world benchmarks and shows
robustness across a variety of TTA scenarios.
[LINK]
http://arxiv.org/abs/2309.10109v1
[DATE]
2023-09-19 03:34:23+08:00
[CATEGORIES]
cs.LG
A Semi-Supervised Approach for Power System Event Identification
[AUTHORS]
Nima Taghipourbazargani, Lalitha Sankar, Oliver Kosut
[ABSTRACT]
Event identification is increasingly recognized as crucial for enhancing the
reliability, security, and stability of the electric power system. With the
growing deployment of Phasor Measurement Units (PMUs) and advancements in data
science, there are promising opportunities to explore data-driven event
identification via machine learning classification techniques. However,
obtaining accurately-labeled eventful PMU data samples remains challenging due
to its labor-intensive nature and uncertainty about the event type (class) in
real-time. Thus, it is natural to use semi-supervised learning techniques,
which make use of both labeled and unlabeled samples. %We propose a novel
semi-supervised framework to assess the effectiveness of incorporating
unlabeled eventful samples to enhance existing event identification
methodologies. We evaluate three categories of classical semi-supervised
approaches: (i) self-training, (ii) transductive support vector machines
(TSVM), and (iii) graph-based label spreading (LS) method. Our approach
characterizes events using physically interpretable features extracted from
modal analysis of synthetic eventful PMU data. In particular, we focus on the
identification of four event classes whose identification is crucial for grid
operations. We have developed and publicly shared a comprehensive Event
Identification package which consists of three aspects: data generation,
feature extraction, and event identification with limited labels using
semi-supervised methodologies. Using this package, we generate and evaluate
eventful PMU data for the South Carolina synthetic network. Our evaluation
consistently demonstrates that graph-based LS outperforms the other two
semi-supervised methods that we consider, and can noticeably improve event
identification performance relative to the setting with only a small number of
labeled samples.
[LINK]
http://arxiv.org/abs/2309.10095v1
[DATE]
2023-09-19 03:07:41+08:00
[CATEGORIES]
cs.LG
Invariant Probabilistic Prediction
[AUTHORS]
Alexander Henzi, Xinwei Shen, Michael Law, Peter Bühlmann
[ABSTRACT]
In recent years, there has been a growing interest in statistical methods
that exhibit robust performance under distribution changes between training and
test data. While most of the related research focuses on point predictions with
the squared error loss, this article turns the focus towards probabilistic
predictions, which aim to comprehensively quantify the uncertainty of an
outcome variable given covariates. Within a causality-inspired framework, we
investigate the invariance and robustness of probabilistic predictions with
respect to proper scoring rules. We show that arbitrary distribution shifts do
not, in general, admit invariant and robust probabilistic predictions, in
contrast to the setting of point prediction. We illustrate how to choose
evaluation metrics and restrict the class of distribution shifts to allow for
identifiability and invariance in the prototypical Gaussian heteroscedastic
linear model. Motivated by these findings, we propose a method to yield
invariant probabilistic predictions, called IPP, and study the consistency of
the underlying parameters. Finally, we demonstrate the empirical performance of
our proposed procedure on simulated as well as on single-cell data.
[LINK]
http://arxiv.org/abs/2309.10083v1
[DATE]
2023-09-19 02:50:24+08:00
[CATEGORIES]
cs.LG
A Unifying Perspective on Non-Stationary Kernels for Deeper Gaussian Processes
[AUTHORS]
Marcus M. Noack, Hengrui Luo, Mark D. Risser
[ABSTRACT]
The Gaussian process (GP) is a popular statistical technique for stochastic
function approximation and uncertainty quantification from data. GPs have been
adopted into the realm of machine learning in the last two decades because of
their superior prediction abilities, especially in data-sparse scenarios, and
their inherent ability to provide robust uncertainty estimates. Even so, their
performance highly depends on intricate customizations of the core methodology,
which often leads to dissatisfaction among practitioners when standard setups
and off-the-shelf software tools are being deployed. Arguably the most
important building block of a GP is the kernel function which assumes the role
of a covariance operator. Stationary kernels of the Mat'ern class are used in
the vast majority of applied studies; poor prediction performance and
unrealistic uncertainty quantification are often the consequences.
Non-stationary kernels show improved performance but are rarely used due to
their more complicated functional form and the associated effort and expertise
needed to define and tune them optimally. In this perspective, we want to help
ML practitioners make sense of some of the most common forms of
non-stationarity for Gaussian processes. We show a variety of kernels in action
using representative datasets, carefully study their properties, and compare
their performances. Based on our findings, we propose a new kernel that
combines some of the identified advantages of existing kernels.
[LINK]
http://arxiv.org/abs/2309.10068v1
[DATE]
2023-09-19 02:34:51+08:00
[CATEGORIES]
cs.LG
Algorithmic Hallucinations of Near-Surface Winds: Statistical Downscaling with Generative Adversarial Networks to Convection-Permitting Scales
[AUTHORS]
Nicolaas J. Annau, Alex J. Cannon, Adam H. Monahan
[ABSTRACT]
This paper explores the application of emerging machine learning methods from
image super-resolution (SR) to the task of statistical downscaling. We
specifically focus on convolutional neural network-based Generative Adversarial
Networks (GANs). Our GANs are conditioned on low-resolution (LR) inputs to
generate high-resolution (HR) surface winds emulating Weather Research and
Forecasting (WRF) model simulations over North America. Unlike traditional SR
models, where LR inputs are idealized coarsened versions of the HR images, WRF
emulation involves using non-idealized LR and HR pairs resulting in
shared-scale mismatches due to internal variability. Our study builds upon
current SR-based statistical downscaling by experimenting with a novel
frequency-separation (FS) approach from the computer vision field. To assess
the skill of SR models, we carefully select evaluation metrics, and focus on
performance measures based on spatial power spectra. Our analyses reveal how
GAN configurations influence spatial structures in the generated fields,
particularly biases in spatial variability spectra. Using power spectra to
evaluate the FS experiments reveals that successful applications of FS in
computer vision do not translate to climate fields. However, the FS experiments
demonstrate the sensitivity of power spectra to a commonly used GAN-based SR
objective function, which helps interpret and understand its role in
determining spatial structures. This result motivates the development of a
novel partial frequency-separation scheme as a promising configuration option.
We also quantify the influence on GAN performance of non-idealized LR fields
resulting from internal variability. Furthermore, we conduct a spectra-based
feature-importance experiment allowing us to explore the dependence of the
spatial structure of generated fields on different physically relevant LR
covariates.
[COMMENTS]
43 pages, including 11 main figures, and 16 supplemental figures
[LINK]
http://arxiv.org/abs/2302.08720v3
[DATE]
2023-09-19 02:32:22+08:00
[CATEGORIES]
cs.LG
Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach
[AUTHORS]
Ahmad M. Nagib, Hatem Abou-Zeid, Hossam S. Hassanein
[ABSTRACT]
The open radio access network (O-RAN) architecture supports intelligent
network control algorithms as one of its core capabilities. Data-driven
applications incorporate such algorithms to optimize radio access network (RAN)
functions via RAN intelligent controllers (RICs). Deep reinforcement learning
(DRL) algorithms are among the main approaches adopted in the O-RAN literature
to solve dynamic radio resource management problems. However, despite the
benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms
in real network deployments falls behind. This is primarily due to the slow
convergence and unstable performance exhibited by DRL agents upon deployment
and when encountering previously unseen network conditions. In this paper, we
address these challenges by proposing transfer learning (TL) as a core
component of the training and deployment workflows for the DRL-based
closed-loop control of O-RAN functionalities. To this end, we propose and
design a hybrid TL-aided approach that leverages the advantages of both policy
reuse and distillation TL methods to provide safe and accelerated convergence
in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates
multiple services, including real VR gaming traffic to reflect practical
scenarios of O-RAN slicing. We also propose and implement policy reuse and
distillation-aided DRL and non-TL-aided DRL as three separate baselines. The
proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the
average initial reward value and the percentage of converged scenarios, and a
64.6% decrease in reward variance while maintaining fast convergence and
enhancing the generalizability compared with the baselines.
[COMMENTS]
This paper has been accepted for publication in a future issue of
IEEE Journal on Selected Areas in Communications (JSAC)
[LINK]
http://arxiv.org/abs/2309.07265v2
[DATE]
2023-09-19 02:28:29+08:00
[CATEGORIES]
cs.LG
Learning High-Dimensional McKean-Vlasov Forward-Backward Stochastic Differential Equations with General Distribution Dependence
[AUTHORS]
Jiequn Han, Ruimeng Hu, Jihao Long
[ABSTRACT]
One of the core problems in mean-field control and mean-field games is to
solve the corresponding McKean-Vlasov forward-backward stochastic differential
equations (MV-FBSDEs). Most existing methods are tailored to special cases in
which the mean-field interaction only depends on expectation or other moments
and thus inadequate to solve problems when the mean-field interaction has full
distribution dependence.
In this paper, we propose a novel deep learning method for computing
MV-FBSDEs with a general form of mean-field interactions. Specifically, built
on fictitious play, we recast the problem into repeatedly solving standard
FBSDEs with explicit coefficient functions. These coefficient functions are
used to approximate the MV-FBSDEs’ model coefficients with full distribution
dependence, and are updated by solving another supervising learning problem
using training data simulated from the last iteration’s FBSDE solutions. We use
deep neural networks to solve standard BSDEs and approximate coefficient
functions in order to solve high-dimensional MV-FBSDEs. Under proper
assumptions on the learned functions, we prove that the convergence of the
proposed method is free of the curse of dimensionality (CoD) by using a class
of integral probability metrics previously developed in [Han, Hu and Long,
arXiv:2104.12036]. The proved theorem shows the advantage of the method in high
dimensions. We present the numerical performance in high-dimensional MV-FBSDE
problems, including a mean-field game example of the well-known Cucker-Smale
model whose cost depends on the full distribution of the forward process.
[LINK]
http://arxiv.org/abs/2204.11924v3
[DATE]
2023-09-19 02:25:23+08:00
[CATEGORIES]
cs.LG
A unified scalable framework for causal sweeping strategies for Physics-Informed Neural Networks (PINNs) and their temporal decompositions
[AUTHORS]
Michael Penwarden, Ameya D. Jagtap, Shandian Zhe, George Em Karniadakis, Robert M. Kirby
[ABSTRACT]
Physics-informed neural networks (PINNs) as a means of solving partial
differential equations (PDE) have garnered much attention in the Computational
Science and Engineering (CS&E) world. However, a recent topic of interest is
exploring various training (i.e., optimization) challenges - in particular,
arriving at poor local minima in the optimization landscape results in a PINN
approximation giving an inferior, and sometimes trivial, solution when solving
forward time-dependent PDEs with no data. This problem is also found in, and in
some sense more difficult, with domain decomposition strategies such as
temporal decomposition using XPINNs. We furnish examples and explanations for
different training challenges, their cause, and how they relate to information
propagation and temporal decomposition. We then propose a new
stacked-decomposition method that bridges the gap between time-marching PINNs
and XPINNs. We also introduce significant computational speed-ups by using
transfer learning concepts to initialize subnetworks in the domain and loss
tolerance-based propagation for the subdomains. Finally, we formulate a new
time-sweeping collocation point algorithm inspired by the previous PINNs
causality literature, which our framework can still describe, and provides a
significant computational speed-up via reduced-cost collocation point
segmentation. The proposed methods form our unified framework, which overcomes
training challenges in PINNs and XPINNs for time-dependent PDEs by respecting
the causality in multiple forms and improving scalability by limiting the
computation required per optimization iteration. Finally, we provide numerical
results for these methods on baseline PDE problems for which unmodified PINNs
and XPINNs struggle to train.
[LINK]
http://arxiv.org/abs/2302.14227v2
[DATE]
2023-09-19 02:19:26+08:00
[CATEGORIES]
cs.LG
Dual Student Networks for Data-Free Model Stealing
[AUTHORS]
James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah
[ABSTRACT]
Existing data-free model stealing methods use a generator to produce samples
in order to train a student model to match the target model outputs. To this
end, the two main challenges are estimating gradients of the target model
without access to its parameters, and generating a diverse set of training
samples that thoroughly explores the input space. We propose a Dual Student
method where two students are symmetrically trained in order to provide the
generator a criterion to generate samples that the two students disagree on. On
one hand, disagreement on a sample implies at least one student has classified
the sample incorrectly when compared to the target model. This incentive
towards disagreement implicitly encourages the generator to explore more
diverse regions of the input space. On the other hand, our method utilizes
gradients of student models to indirectly estimate gradients of the target
model. We show that this novel training objective for the generator network is
equivalent to optimizing a lower bound on the generator’s loss if we had access
to the target model gradients. We show that our new optimization framework
provides more accurate gradient estimation of the target model and better
accuracies on benchmark classification datasets. Additionally, our approach
balances improved query efficiency with training computation cost. Finally, we
demonstrate that our method serves as a better proxy model for transfer-based
adversarial attacks than existing data-free model stealing methods.
[COMMENTS]
Published in the ICLR 2023 - The Eleventh International Conference on
Learning Representations
[LINK]
http://arxiv.org/abs/2309.10058v1
[DATE]
2023-09-19 02:11:31+08:00
[CATEGORIES]
cs.LG
Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++
[AUTHORS]
Bin Lei, Caiwen Ding, Le Chen, Pei-Hung Lin, Chunhua Liao
[ABSTRACT]
In this study, we present a novel dataset for training machine learning
models translating between OpenMP Fortran and C++ code. To ensure reliability
and applicability, the dataset is created from a range of representative
open-source OpenMP benchmarks. It is also refined using a meticulous code
similarity test. The effectiveness of our dataset is assessed using both
quantitative (CodeBLEU) and qualitative (human evaluation) methods. We showcase
how this dataset significantly elevates the translation competencies of large
language models (LLMs). Specifically, models without prior coding knowledge
experienced a boost of $\mathbf{\times~5.1}$ in their CodeBLEU scores, while
models with some coding familiarity saw an impressive
$\mathbf{\times~9.9}$-fold increase. The best fine-tuned model using our
dataset outperforms GPT-4. It is also reaching human-level accuracy. This work
underscores the immense potential of our dataset in propelling advancements in
the domain of code translation for high-performance computing. The dataset is
accessible at
\href{https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset}{OpenMP-Fortran-CPP-Translation}.
[COMMENTS]
This paper was accepted by the HPEC 2023 conference and received the
Outstanding Student Paper Award
[LINK]
http://arxiv.org/abs/2307.07686v4
[DATE]
2023-09-19 02:10:37+08:00
[CATEGORIES]
cs.LG
Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach
[AUTHORS]
Mohammad S. Ramadan, Mahmoud A. Hayajnh, Michael T. Tolley, Kyriakos G. Vamvoudakis
[ABSTRACT]
In this paper we provide framework to cope with two problems: (i) the
fragility of reinforcement learning due to modeling uncertainties because of
the mismatch between controlled laboratory/simulation and real-world conditions
and (ii) the prohibitive computational cost of stochastic optimal control. We
approach both problems by using reinforcement learning to solve the stochastic
dynamic programming equation. The resulting reinforcement learning controller
is safe with respect to several types of constraints constraints and it can
actively learn about the modeling uncertainties. Unlike exploration and
exploitation, probing and safety are employed automatically by the controller
itself, resulting real-time learning. A simulation example demonstrates the
efficacy of the proposed approach.
[LINK]
http://arxiv.org/abs/2309.10831v1
[DATE]
2023-09-19 02:05:35+08:00
[CATEGORIES]
cs.LG
A Modular Spatial Clustering Algorithm with Noise Specification
[AUTHORS]
Akhil K, Srikanth H R
[ABSTRACT]
Clustering techniques have been the key drivers of data mining, machine
learning and pattern recognition for decades. One of the most popular
clustering algorithms is DBSCAN due to its high accuracy and noise tolerance.
Many superior algorithms such as DBSCAN have input parameters that are hard to
estimate. Therefore, finding those parameters is a time consuming process. In
this paper, we propose a novel clustering algorithm Bacteria-Farm, which
balances the performance and ease of finding the optimal parameters for
clustering. Bacteria- Farm algorithm is inspired by the growth of bacteria in
closed experimental farms - their ability to consume food and grow - which
closely represents the ideal cluster growth desired in clustering algorithms.
In addition, the algorithm features a modular design to allow the creation of
versions of the algorithm for specific tasks / distributions of data. In
contrast with other clustering algorithms, our algorithm also has a provision
to specify the amount of noise to be excluded during clustering.
[COMMENTS]
Presented at International Conference for Machine Learning and Data
Science 2018
[LINK]
http://arxiv.org/abs/2309.10047v1
[DATE]
2023-09-19 02:05:06+08:00
[CATEGORIES]
cs.LG
General In-Hand Object Rotation with Vision and Touch
[AUTHORS]
Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik
[ABSTRACT]
We introduce RotateIt, a system that enables fingertip-based object rotation
along multiple axes by leveraging multimodal sensory inputs. Our system is
trained in simulation, where it has access to ground-truth object shapes and
physical properties. Then we distill it to operate on realistic yet noisy
simulated visuotactile and proprioceptive sensory inputs. These multimodal
inputs are fused via a visuotactile transformer, enabling online inference of
object shapes and physical properties during deployment. We show significant
performance improvements over prior methods and the importance of visual and
tactile sensing.
[COMMENTS]
CoRL 2023; Website: https://haozhi.io/rotateit/
[LINK]
http://arxiv.org/abs/2309.09979v1
[DATE]
2023-09-19 01:59:25+08:00
[CATEGORIES]
cs.LG
A Multi-Token Coordinate Descent Method for Semi-Decentralized Vertical Federated Learning
[AUTHORS]
Pedro Valdeira, Yuejie Chi, Cláudia Soares, João Xavier
[ABSTRACT]
Communication efficiency is a major challenge in federated learning (FL). In
client-server schemes, the server constitutes a bottleneck, and while
decentralized setups spread communications, they do not necessarily reduce them
due to slower convergence. We propose Multi-Token Coordinate Descent (MTCD), a
communication-efficient algorithm for semi-decentralized vertical federated
learning, exploiting both client-server and client-client communications when
each client holds a small subset of features. Our multi-token method can be
seen as a parallel Markov chain (block) coordinate descent algorithm and it
subsumes the client-server and decentralized setups as special cases. We obtain
a convergence rate of $\mathcal{O}(1/T)$ for nonconvex objectives when tokens
roam over disjoint subsets of clients and for convex objectives when they roam
over possibly overlapping subsets. Numerical results show that MTCD improves
the state-of-the-art communication efficiency and allows for a tunable amount
of parallel communications.
[LINK]
http://arxiv.org/abs/2309.09977v1
[DATE]
2023-09-19 01:59:01+08:00
[CATEGORIES]
cs.LG
Prompt a Robot to Walk with Large Language Models
[AUTHORS]
Yen-Jen Wang, Bike Zhang, Jianyu Chen, Koushil Sreenath
[ABSTRACT]
Large language models (LLMs) pre-trained on vast internet-scale data have
showcased remarkable capabilities across diverse domains. Recently, there has
been escalating interest in deploying LLMs for robotics, aiming to harness the
power of foundation models in real-world settings. However, this approach faces
significant challenges, particularly in grounding these models in the physical
world and in generating dynamic robot motions. To address these issues, we
introduce a novel paradigm in which we use few-shot prompts collected from the
physical environment, enabling the LLM to autoregressively generate low-level
control commands for robots without task-specific fine-tuning. Experiments
across various robots and environments validate that our method can effectively
prompt a robot to walk. We thus illustrate how LLMs can proficiently function
as low-level feedback controllers for dynamic motion control even in
high-dimensional robotic systems. The project website and source code can be
found at: https://prompt2walk.github.io/ .
[LINK]
http://arxiv.org/abs/2309.09969v1
[DATE]
2023-09-19 01:50:17+08:00
[CATEGORIES]
cs.LG
Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning
[AUTHORS]
Ali Bouland, Shengyuan Niu, Sai Tej Paruchuri, Andrew Kurdila, John Burns, Eugenio Schuster
[ABSTRACT]
This paper studies convergence rates for some value function approximations
that arise in a collection of reproducing kernel Hilbert spaces (RKHS)
$H(\Omega)$. By casting an optimal control problem in a specific class of
native spaces, strong rates of convergence are derived for the operator
equation that enables offline approximations that appear in policy iteration.
Explicit upper bounds on error in value function approximations are derived in
terms of power function $\Pwr_{H,N}$ for the space of finite dimensional
approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric
in nature and refine some well-known, now classical results concerning
convergence of approximations of value functions.
[COMMENTS]
7 pages, 5 figures
[LINK]
http://arxiv.org/abs/2309.07383v2
[DATE]
2023-09-19 01:49:50+08:00
[CATEGORIES]
cs.LG
Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees
[AUTHORS]
Alexia Jolicoeur-Martineau, Kilian Fatras, Tal Kachman
[ABSTRACT]
Tabular data is hard to acquire and is subject to missing values. This paper
proposes a novel approach to generate and impute mixed-type (continuous and
categorical) tabular data using score-based diffusion and conditional flow
matching. Contrary to previous work that relies on neural networks as function
approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree
(GBT) method. In addition to being elegant, we empirically show on various
datasets that our method i) generates highly realistic synthetic data when the
training dataset is either clean or tainted by missing data and ii) generates
diverse plausible data imputations. Our method often outperforms deep-learning
generation methods and can trained in parallel using CPUs without the need for
a GPU. To make it easily accessible, we release our code through a Python
library on PyPI and an R package on CRAN.
[LINK]
http://arxiv.org/abs/2309.09968v1
[DATE]
2023-09-19 01:49:09+08:00
[CATEGORIES]
cs.LG
What is a Fair Diffusion Model? Designing Generative Text-To-Image Models to Incorporate Various Worldviews
[AUTHORS]
Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson
[COMMENTS]
20 pages, 5 figures
[LINK]
http://arxiv.org/abs/2309.09944v1
[DATE]
2023-09-19 01:04:04+08:00
[CATEGORIES]
cs.LG
Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation
[AUTHORS]
Kathia Melbouci, Fawzi Nashashibi
[ABSTRACT]
The most commonly used method for addressing 3D geometric registration is the
iterative closet-point algorithm, this approach is incremental and prone to
drift over multiple consecutive frames. The Common strategy to address the
drift is the pose graph optimization subsequent to frame-to-frame registration,
incorporating a loop closure process that identifies previously visited places.
In this paper, we explore a framework that replaces traditional geometric
registration and pose graph optimization with a learned model utilizing
hierarchical attention mechanisms and graph neural networks. We propose a
strategy to condense the data flow, preserving essential information required
for the precise estimation of rigid poses. Our results, derived from tests on
the KITTI Odometry dataset, demonstrate a significant improvement in pose
estimation accuracy. This improvement is especially notable in determining
rotational components when compared with results obtained through conventional
multi-way registration via pose graph optimization. The code will be made
available upon completion of the review process.
[LINK]
http://arxiv.org/abs/2309.09934v1
[DATE]
2023-09-19 00:51:56+08:00
[CATEGORIES]
cs.LG
Evaluating Adversarial Robustness with Expected Viable Performance
[AUTHORS]
Ryan McCoppin, Colin Dawson, Sean M. Kennedy, Leslie M. Blaha
[ABSTRACT]
We introduce a metric for evaluating the robustness of a classifier, with
particular attention to adversarial perturbations, in terms of expected
functionality with respect to possible adversarial perturbations. A classifier
is assumed to be non-functional (that is, has a functionality of zero) with
respect to a perturbation bound if a conventional measure of performance, such
as classification accuracy, is less than a minimally viable threshold when the
classifier is tested on examples from that perturbation bound. Defining
robustness in terms of an expected value is motivated by a domain general
approach to robustness quantification.
[COMMENTS]
Accepted at the 22nd International Conference on Machine Learning and
Applications (IEEE 2023)
[LINK]
http://arxiv.org/abs/2309.09928v1
[DATE]
2023-09-19 00:47:24+08:00
[CATEGORIES]
cs.LG
Interpretable and Fair Boolean Rule Sets via Column Generation
[AUTHORS]
Connor Lawless, Sanjeeb Dash, Oktay Gunluk, Dennis Wei
[ABSTRACT]
This paper considers the learning of Boolean rules in disjunctive normal form
(DNF, OR-of-ANDs, equivalent to decision rule sets) as an interpretable model
for classification. An integer program is formulated to optimally trade
classification accuracy for rule simplicity. We also consider the fairness
setting and extend the formulation to include explicit constraints on two
different measures of classification parity: equality of opportunity and
equalized odds. Column generation (CG) is used to efficiently search over an
exponential number of candidate rules without the need for heuristic rule
mining. To handle large data sets, we propose an approximate CG algorithm using
randomization. Compared to three recently proposed alternatives, the CG
algorithm dominates the accuracy-simplicity trade-off in 8 out of 16 data sets.
When maximized for accuracy, CG is competitive with rule learners designed for
this purpose, sometimes finding significantly simpler solutions that are no
less accurate. Compared to other fair and interpretable classifiers, our method
is able to find rule sets that meet stricter notions of fairness with a modest
trade-off in accuracy.
[COMMENTS]
arXiv admin note: substantial text overlap with arXiv:2107.01325,
arXiv:1805.09901
[LINK]
http://arxiv.org/abs/2111.08466v2
[DATE]
2023-09-19 00:36:31+08:00
[CATEGORIES]
cs.LG
A Heterogeneous Graph-Based Multi-Task Learning for Fault Event Diagnosis in Smart Grid
[AUTHORS]
Dibaloke Chanda, Nasim Yahya Soltani
[ABSTRACT]
Precise and timely fault diagnosis is a prerequisite for a distribution
system to ensure minimum downtime and maintain reliable operation. This
necessitates access to a comprehensive procedure that can provide the grid
operators with insightful information in the case of a fault event. In this
paper, we propose a heterogeneous multi-task learning graph neural network
(MTL-GNN) capable of detecting, locating and classifying faults in addition to
providing an estimate of the fault resistance and current. Using a graph neural
network (GNN) allows for learning the topological representation of the
distribution system as well as feature learning through a message-passing
scheme. We investigate the robustness of our proposed model using the IEEE-123
test feeder system. This work also proposes a novel GNN-based explainability
method to identify key nodes in the distribution system which then facilitates
informed sparse measurements. Numerical tests validate the performance of the
model across all tasks.
[LINK]
http://arxiv.org/abs/2309.09921v1
[DATE]
2023-09-19 00:35:30+08:00
[CATEGORIES]
cs.LG
Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation
[AUTHORS]
Danilo de Oliveira, Timo Gerkmann
[ABSTRACT]
Much research effort is being applied to the task of compressing the
knowledge of self-supervised models, which are powerful, yet large and memory
consuming. In this work, we show that the original method of knowledge
distillation (and its more recently proposed extension, decoupled knowledge
distillation) can be applied to the task of distilling HuBERT. In contrast to
methods that focus on distilling internal features, this allows for more
freedom in the network architecture of the compressed model. We thus propose to
distill HuBERT’s Transformer layers into an LSTM-based distilled model that
reduces the number of parameters even below DistilHuBERT and at the same time
shows improved performance in automatic speech recognition.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.09920v1
[DATE]
2023-09-19 00:34:40+08:00
[CATEGORIES]
cs.LG
Evaluation of Human-Understandability of Global Model Explanations using Decision Tree
[AUTHORS]
Adarsa Sivaprasad, Ehud Reiter, Nava Tintarev, Nir Oren
[ABSTRACT]
In explainable artificial intelligence (XAI) research, the predominant focus
has been on interpreting models for experts and practitioners. Model agnostic
and local explanation approaches are deemed interpretable and sufficient in
many applications. However, in domains like healthcare, where end users are
patients without AI or domain expertise, there is an urgent need for model
explanations that are more comprehensible and instil trust in the model’s
operations. We hypothesise that generating model explanations that are
narrative, patient-specific and global(holistic of the model) would enable
better understandability and enable decision-making. We test this using a
decision tree model to generate both local and global explanations for patients
identified as having a high risk of coronary heart disease. These explanations
are presented to non-expert users. We find a strong individual preference for a
specific type of explanation. The majority of participants prefer global
explanations, while a smaller group prefers local explanations. A task based
evaluation of mental models of these participants provide valuable feedback to
enhance narrative global explanations. This, in turn, guides the design of
health informatics systems that are both trustworthy and actionable.
[LINK]
http://arxiv.org/abs/2309.09917v1
[DATE]
2023-09-19 00:30:14+08:00
[CATEGORIES]
cs.LG
Learning Nonparametric High-Dimensional Generative Models: The Empirical-Beta-Copula Autoencoder
[AUTHORS]
Maximilian Coblenz, Oliver Grothe, Fabian Kächele
[ABSTRACT]
By sampling from the latent space of an autoencoder and decoding the latent
space samples to the original data space, any autoencoder can simply be turned
into a generative model. For this to work, it is necessary to model the
autoencoder’s latent space with a distribution from which samples can be
obtained. Several simple possibilities (kernel density estimates, Gaussian
distribution) and more sophisticated ones (Gaussian mixture models, copula
models, normalization flows) can be thought of and have been tried recently.
This study aims to discuss, assess, and compare various techniques that can be
used to capture the latent space so that an autoencoder can become a generative
model while striving for simplicity. Among them, a new copula-based method, the
Empirical Beta Copula Autoencoder, is considered. Furthermore, we provide
insights into further aspects of these methods, such as targeted sampling or
synthesizing new data with specific features.
[LINK]
http://arxiv.org/abs/2309.09916v1
[DATE]
2023-09-19 00:29:36+08:00
[CATEGORIES]
cs.LG
Physics-informed PointNet: On how many irregular geometries can it solve an inverse problem simultaneously? Application to linear elasticity
[AUTHORS]
Ali Kashefi, Leonidas J. Guibas, Tapan Mukerji
[ABSTRACT]
Regular physics-informed neural networks (PINNs) predict the solution of
partial differential equations using sparse labeled data but only over a single
domain. On the other hand, fully supervised learning models are first trained
usually over a few thousand domains with known solutions (i.e., labeled data)
and then predict the solution over a few hundred unseen domains.
Physics-informed PointNet (PIPN) is primarily designed to fill this gap between
PINNs (as weakly supervised learning models) and fully supervised learning
models. In this article, we demonstrate that PIPN predicts the solution of
desired partial differential equations over a few hundred domains
simultaneously, while it only uses sparse labeled data. This framework benefits
fast geometric designs in the industry when only sparse labeled data are
available. Particularly, we show that PIPN predicts the solution of a plane
stress problem over more than 500 domains with different geometries,
simultaneously. Moreover, we pioneer implementing the concept of remarkable
batch size (i.e., the number of geometries fed into PIPN at each sub-epoch)
into PIPN. Specifically, we try batch sizes of 7, 14, 19, 38, 76, and 133.
Additionally, the effect of the PIPN size, symmetric function in the PIPN
architecture, and static and dynamic weights for the component of the sparse
labeled data in the loss function are investigated.
[LINK]
http://arxiv.org/abs/2303.13634v3
[DATE]
2023-09-19 00:22:03+08:00
[CATEGORIES]
cs.LG
Finite Expression Methods for Discovering Physical Laws from Data
[AUTHORS]
Zhongyi Jiang, Chunmei Wang, Haizhao Yang
[ABSTRACT]
Nonlinear dynamics is a pervasive phenomenon observed in scientific and
engineering disciplines. However, the task of deriving analytical expressions
to describe nonlinear dynamics from limited data remains challenging. In this
paper, we shall present a novel deep symbolic learning method called the
“finite expression method” (FEX) to discover governing equations within a
function space containing a finite set of analytic expressions, based on
observed dynamic data. The key concept is to employ FEX to generate analytical
expressions of the governing equations by learning the derivatives of partial
differential equation (PDE) solutions through convolutions. Our numerical
results demonstrate that our FEX surpasses other existing methods (such as
PDE-Net, SINDy, GP, and SPL) in terms of numerical performance across a range
of problems, including time-dependent PDE problems and nonlinear dynamical
systems with time-varying coefficients. Moreover, the results highlight FEX’s
flexibility and expressive power in accurately approximating symbolic governing
equations.
[LINK]
http://arxiv.org/abs/2305.08342v2
[DATE]
2023-09-19 00:18:32+08:00
[CATEGORIES]
cs.LG
Evaluation of GPT-3 for Anti-Cancer Drug Sensitivity Prediction
[AUTHORS]
Shaika Chowdhury, Sivaraman Rajaganapathy, Lichao Sun, James Cerhan, Nansu Zong
[ABSTRACT]
In this study, we investigated the potential of GPT-3 for the anti-cancer
drug sensitivity prediction task using structured pharmacogenomics data across
five tissue types and evaluated its performance with zero-shot prompting and
fine-tuning paradigms. The drug’s smile representation and cell line’s genomic
mutation features were predictive of the drug response. The results from this
study have the potential to pave the way for designing more efficient treatment
protocols in precision oncology.
[LINK]
http://arxiv.org/abs/2309.10016v1
[DATE]
2023-09-19 00:17:44+08:00
[CATEGORIES]
cs.LG
Learning to Generate Lumped Hydrological Models
[AUTHORS]
Yang Yang, Ting Fong May Chui
[ABSTRACT]
In a lumped hydrological model structure, the hydrological function of a
catchment is characterized by only a few parameters. Given a set of parameter
values, a numerical function useful for hydrological prediction is generated.
Thus, this study assumes that the hydrological function of a catchment can be
sufficiently well characterized by a small number of latent variables. By
specifying the variable values, a numerical function resembling the
hydrological function of a real-world catchment can be generated using a
generative model. In this study, a deep learning method is used to learn both
the generative model and the latent variable values of different catchments
directly from their climate forcing and runoff data, without using catchment
attributes. The generative models can be used similarly to a lumped model
structure, i.e., by estimating the optimal parameter or latent variable values
using a generic model calibration algorithm, an optimal numerical model can be
derived. In this study, generative models using eight latent variables were
learned from data from over 3,000 catchments worldwide, and the learned
generative models were applied to model over 700 different catchments using a
generic calibration algorithm. The quality of the resulting optimal models was
generally comparable to or better than that obtained using 36 different types
of lump model structures or using non-generative deep learning methods. In
summary, this study presents a data-driven approach for representing the
hydrological function of a catchment in low-dimensional space and a method for
reconstructing specific hydrological functions from the representations.
[LINK]
http://arxiv.org/abs/2309.09904v1
[DATE]
2023-09-19 00:07:41+08:00
[CATEGORIES]
cs.LG
Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts
[AUTHORS]
Joseph Gatto, Sarah M. Preum
[ABSTRACT]
User-generated texts available on the web and social platforms are often long
and semantically challenging, making them difficult to annotate. Obtaining
human annotation becomes increasingly difficult as problem domains become more
specialized. For example, many health NLP problems require domain experts to be
a part of the annotation pipeline. Thus, it is crucial that we develop
low-resource NLP solutions able to work with this set of limited-data problems.
In this study, we employ Abstract Meaning Representation (AMR) graphs as a
means to model low-resource Health NLP tasks sourced from various online health
resources and communities. AMRs are well suited to model online health texts as
they can represent multi-sentence inputs, abstract away from complex
terminology, and model long-distance relationships between co-referring tokens.
AMRs thus improve the ability of pre-trained language models to reason about
high-complexity texts. Our experiments show that we can improve performance on
6 low-resource health NLP tasks by augmenting text embeddings with semantic
graph embeddings. Our approach is task agnostic and easy to merge into any
standard text classification pipeline. We experimentally validate that AMRs are
useful in the modeling of complex texts by analyzing performance through the
lens of two textual complexity measures: the Flesch Kincaid Reading Level and
Syntactic Complexity. Our error analysis shows that AMR-infused language models
perform better on complex texts and generally show less predictive variance in
the presence of changing complexity.
[LINK]
http://arxiv.org/abs/2309.09877v1
[DATE]
2023-09-18 23:37:30+08:00
[CATEGORIES]
cs.CL
SYNDICOM: Improving Conversational Commonsense with Error-Injection and Natural Language Feedback
[AUTHORS]
Christopher Richardson, Anirudh Sundar, Larry Heck
[ABSTRACT]
Commonsense reasoning is a critical aspect of human communication. Despite
recent advances in conversational AI driven by large language models,
commonsense reasoning remains a challenging task. In this work, we introduce
SYNDICOM - a method for improving commonsense in dialogue response generation.
SYNDICOM consists of two components. The first component is a dataset composed
of commonsense dialogues created from a knowledge graph and synthesized into
natural language. This dataset includes both valid and invalid responses to
dialogue contexts, along with natural language feedback (NLF) for the invalid
responses. The second contribution is a two-step procedure: training a model to
predict natural language feedback (NLF) for invalid responses, and then
training a response generation model conditioned on the predicted NLF, the
invalid response, and the dialogue. SYNDICOM is scalable and does not require
reinforcement learning. Empirical results on three tasks are evaluated using a
broad range of metrics. SYNDICOM achieves a relative improvement of 53% over
ChatGPT on ROUGE1, and human evaluators prefer SYNDICOM over ChatGPT 57% of the
time. We will publicly release the code and the full dataset.
[COMMENTS]
Published at SigDial 2023, Number 129
[LINK]
http://arxiv.org/abs/2309.10015v1
[DATE]
2023-09-18 23:08:48+08:00
[CATEGORIES]
cs.CL
cs.LG
RECAP: Retrieval-Augmented Audio Captioning
[AUTHORS]
Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha
[ABSTRACT]
We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and
effective audio captioning system that generates captions conditioned on an
input audio and other captions similar to the audio retrieved from a datastore.
Additionally, our proposed method can transfer to any domain without the need
for any additional fine-tuning. To generate a caption for an audio sample, we
leverage an audio-text model CLAP to retrieve captions similar to it from a
replaceable datastore, which are then used to construct a prompt. Next, we feed
this prompt to a GPT-2 decoder and introduce cross-attention layers between the
CLAP encoder and GPT-2 to condition the audio for caption generation.
Experiments on two benchmark datasets, Clotho and AudioCaps, show that RECAP
achieves competitive performance in in-domain settings and significant
improvements in out-of-domain settings. Additionally, due to its capability to
exploit a large text-captions-only datastore in a \textit{training-free}
fashion, RECAP shows unique capabilities of captioning novel audio events never
seen during training and compositional audios with multiple events. To promote
research in this space, we also release 150,000+ new weakly labeled captions
for AudioSet, AudioCaps, and Clotho.
[COMMENTS]
Code and data soon here: https://github.com/Sreyan88/RECAP
[LINK]
http://arxiv.org/abs/2309.09836v1
[DATE]
2023-09-18 22:53:08+08:00
[CATEGORIES]
cs.CL
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits
[AUTHORS]
Xiangheng He, Junjie Chen, Björn W. Schuller
[ABSTRACT]
Multi-task learning (MTL) aims to improve the performance of a primary task
by jointly learning with related auxiliary tasks. Traditional MTL methods
select tasks randomly during training. However, both previous studies and our
results suggest that such the random selection of tasks may not be helpful, and
can even be harmful to performance. Therefore, new strategies for task
selection and assignment in MTL need to be explored. This paper studies the
multi-modal, multi-task dialogue act classification task, and proposes a method
for selecting and assigning tasks based on non-stationary multi-armed bandits
(MAB) with discounted Thompson Sampling (TS) using Gaussian priors. Our
experimental results show that in different training stages, different tasks
have different utility. Our proposed method can effectively identify the task
utility, actively avoid useless or harmful tasks, and realise the task
assignment during training. Our proposed method is significantly superior in
terms of UAR and F1 to the single-task and multi-task baselines with p-values <
0.05. Further analysis of experiments indicates that for the dataset with the
data imbalance problem, our proposed method has significantly higher stability
and can obtain consistent and decent performance for minority classes. Our
proposed method is superior to the current state-of-the-art model.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.09832v1
[DATE]
2023-09-18 22:51:51+08:00
[CATEGORIES]
cs.CL
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding
[AUTHORS]
André Storhaug, Jingyue Li, Tianyuan Hu
[ABSTRACT]
Auto-completing code enables developers to speed up coding significantly.
Recent advances in transformer-based large language model (LLM) technologies
have been applied to code synthesis. However, studies show that many of such
synthesized codes contain vulnerabilities. We propose a novel
vulnerability-constrained decoding approach to reduce the amount of vulnerable
code generated by such models. Using a small dataset of labeled vulnerable
lines of code, we fine-tune an LLM to include vulnerability labels when
generating code, acting as an embedded classifier. Then, during decoding, we
deny the model to generate these labels to avoid generating vulnerable code. To
evaluate the method, we chose to automatically complete Ethereum Blockchain
smart contracts (SCs) as the case study due to the strict requirements of SC
security. We first fine-tuned the 6-billion-parameter GPT-J model using 186,397
Ethereum SCs after removing the duplication from 2,217,692 SCs. The fine-tuning
took more than one week using ten GPUs. The results showed that our fine-tuned
model could synthesize SCs with an average BLEU (BiLingual Evaluation
Understudy) score of 0.557. However, many codes in the auto-completed SCs were
vulnerable. Using the code before the vulnerable line of 176 SCs containing
different types of vulnerabilities to auto-complete the code, we found that
more than 70% of the auto-completed codes were insecure. Thus, we further
fine-tuned the model on other 941 vulnerable SCs containing the same types of
vulnerabilities and applied vulnerability-constrained decoding. The fine-tuning
took only one hour with four GPUs. We then auto-completed the 176 SCs again and
found that our approach could identify 62% of the code to be generated as
vulnerable and avoid generating 67% of them, indicating the approach could
efficiently and effectively avoid vulnerabilities in the auto-completed code.
[COMMENTS]
12 pages, 8 figures, 2 tables, 5 listings, accepted to the 34th IEEE
International Symposium on Software Reliability Engineering (ISSRE 2023)
[LINK]
http://arxiv.org/abs/2309.09826v1
[DATE]
2023-09-18 22:47:34+08:00
[CATEGORIES]
cs.CL
AMuRD: Annotated Multilingual Receipts Dataset for Cross-lingual Key Information Extraction and Classification
[AUTHORS]
Abdelrahman Abdallah, Mahmoud Abdalla, Mohamed Elkasaby, Yasser Elbendary, Adam Jatowt
[ABSTRACT]
Key information extraction involves recognizing and extracting text from
scanned receipts, enabling retrieval of essential content, and organizing it
into structured documents. This paper presents a novel multilingual dataset for
receipt extraction, addressing key challenges in information extraction and
item classification. The dataset comprises $47,720$ samples, including
annotations for item names, attributes like (price, brand, etc.), and
classification into $44$ product categories. We introduce the InstructLLaMA
approach, achieving an F1 score of $0.76$ and an accuracy of $0.68$ for key
information extraction and item classification. We provide code, datasets, and
checkpoints.\footnote{\url{https://github.com/Update-For-Integrated-Business-AI/AMuRD}}.
[LINK]
http://arxiv.org/abs/2309.09800v1
[DATE]
2023-09-18 22:18:19+08:00
[CATEGORIES]
cs.CL
When Large Language Models Meet Citation: A Survey
[AUTHORS]
Yang Zhang, Yufei Wang, Kai Wang, Quan Z. Sheng, Lina Yao, Adnan Mahmood, Wei Emma Zhang, Rongying Zhao
[ABSTRACT]
Citations in scholarly work serve the essential purpose of acknowledging and
crediting the original sources of knowledge that have been incorporated or
referenced. Depending on their surrounding textual context, these citations are
used for different motivations and purposes. Large Language Models (LLMs) could
be helpful in capturing these fine-grained citation information via the
corresponding textual context, thereby enabling a better understanding towards
the literature. Furthermore, these citations also establish connections among
scientific papers, providing high-quality inter-document relationships and
human-constructed knowledge. Such information could be incorporated into LLMs
pre-training and improve the text representation in LLMs. Therefore, in this
paper, we offer a preliminary review of the mutually beneficial relationship
between LLMs and citation analysis. Specifically, we review the application of
LLMs for in-text citation analysis tasks, including citation classification,
citation-based summarization, and citation recommendation. We then summarize
the research pertinent to leveraging citation linkage knowledge to improve text
representations of LLMs via citation prediction, network structure information,
and inter-document relationship. We finally provide an overview of these
contemporary methods and put forth potential promising avenues in combining
LLMs and citation analysis for further investigation.
[LINK]
http://arxiv.org/abs/2309.09727v1
[DATE]
2023-09-18 20:48:48+08:00
[CATEGORIES]
cs.CL
A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture
[AUTHORS]
Cheonsu Jeong
[ABSTRACT]
This study presents a method for implementing generative AI services by
utilizing the Large Language Models (LLM) application architecture. With recent
advancements in generative AI technology, LLMs have gained prominence across
various domains. In this context, the research addresses the challenge of
information scarcity and proposes specific remedies by harnessing LLM
capabilities. The investigation delves into strategies for mitigating the issue
of inadequate data, offering tailored solutions. The study delves into the
efficacy of employing fine-tuning techniques and direct document integration to
alleviate data insufficiency. A significant contribution of this work is the
development of a Retrieval-Augmented Generation (RAG) model, which tackles the
aforementioned challenges. The RAG model is carefully designed to enhance
information storage and retrieval processes, ensuring improved content
generation. The research elucidates the key phases of the information storage
and retrieval methodology underpinned by the RAG model. A comprehensive
analysis of these steps is undertaken, emphasizing their significance in
addressing the scarcity of data. The study highlights the efficacy of the
proposed method, showcasing its applicability through illustrative instances.
By implementing the RAG model for information storage and retrieval, the
research not only contributes to a deeper comprehension of generative AI
technology but also facilitates its practical usability within enterprises
utilizing LLMs. This work holds substantial value in advancing the field of
generative AI, offering insights into enhancing data-driven content generation
and fostering active utilization of LLM-based services within corporate
settings.
[LINK]
http://arxiv.org/abs/2309.01105v2
[DATE]
2023-09-18 19:36:50+08:00
[CATEGORIES]
cs.CL
Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study
[AUTHORS]
Xin Xu, Xiang Chen, Ningyu Zhang, Xin Xie, Xi Chen, Huajun Chen
[COMMENTS]
Accepted to EMNLP 2022 (Findings) and the project website is
https://zjunlp.github.io/project/LREBench/
[LINK]
http://arxiv.org/abs/2210.10678v3
[DATE]
2023-09-18 19:16:48+08:00
[CATEGORIES]
cs.CL
cs.LG
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
[AUTHORS]
Ching-Hsun Tseng, Shin-Jye Lee, Po-Wei Cheng, Chien Lee, Chih-Chieh Hung
[ABSTRACT]
Topic modeling is admittedly a convenient way to monitor markets trend.
Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model
to gain this type of information. By given the merit of deducing keyword with
token conditional probability in LDA, we can know the most possible or
essential topic. However, the results are not intuitive because the given
topics cannot wholly fit human knowledge. LDA offers the first possible
relevant keywords, which also brings out another problem of whether the
connection is reliable based on the statistic possibility. It is also hard to
decide the topic number manually in advance. As the booming trend of using
fuzzy membership to cluster and using transformers to embed words, this work
presents the fuzzy topic modeling based on soft clustering and document
embedding from state-of-the-art transformer-based model. In our practical
application in a press release monitoring, the fuzzy topic modeling gives a
more natural result than the traditional output from LDA.
[COMMENTS]
Asian Journal of Information and Communications, Vol.12, No. 1,
125-140
[LINK]
http://arxiv.org/abs/2309.09658v1
[DATE]
2023-09-18 18:52:54+08:00
[CATEGORIES]
cs.CL
Reasoning with Language Model Prompting: A Survey
[AUTHORS]
Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen
[COMMENTS]
ACL 2023, 24 pages, add references of theoretical analysis
[LINK]
http://arxiv.org/abs/2212.09597v8
[DATE]
2023-09-18 18:47:13+08:00
[CATEGORIES]
cs.CL
cs.LG
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer
[AUTHORS]
Peter Ochieng
[ABSTRACT]
Diffusion based vocoders have been criticised for being slow due to the many
steps required during sampling. Moreover, the model’s loss function that is
popularly implemented is designed such that the target is the original input
$x_0$ or error $\epsilon_0$. For early time steps of the reverse process, this
results in large prediction errors, which can lead to speech distortions and
increase the learning time. We propose a setup where the targets are the
different outputs of forward process time steps with a goal to reduce the
magnitude of prediction errors and reduce the training time. We use the
different layers of a neural network (NN) to perform denoising by training them
to learn to generate representations similar to the noised outputs in the
forward process of the diffusion. The NN layers learn to progressively denoise
the input in the reverse process until finally the final layer estimates the
clean speech. To avoid 1:1 mapping between layers of the neural network and the
forward process steps, we define a skip parameter $\tau>1$ such that an NN
layer is trained to cumulatively remove the noise injected in the $\tau$ steps
in the forward process. This significantly reduces the number of data
distribution recovery steps and, consequently, the time to generate speech. We
show through extensive evaluation that the proposed technique generates
high-fidelity speech in competitive time that outperforms current
state-of-the-art tools. The proposed technique is also able to generalize well
to unseen speech.
[COMMENTS]
10 pages
[LINK]
http://arxiv.org/abs/2309.09652v1
[DATE]
2023-09-18 18:35:27+08:00
[CATEGORIES]
cs.CL
SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres
[AUTHORS]
Shumin Deng, Shengyu Mao, Ningyu Zhang, Bryan Hooi
[ABSTRACT]
Event-centric structured prediction involves predicting structured outputs of
events. In most NLP cases, event structures are complex with manifold
dependency, and it is challenging to effectively represent these complicated
structured events. To address these issues, we propose Structured Prediction
with Energy-based Event-Centric Hyperspheres (SPEECH). SPEECH models complex
dependency among event structured components with energy-based modeling, and
represents event classes with simple but effective hyperspheres. Experiments on
two unified-annotated event datasets indicate that SPEECH is predominant in
event detection and event-relation extraction tasks.
[COMMENTS]
Accepted by ACL 2023 Main Conference. Code is released at
\url{https://github.com/zjunlp/SPEECH}
[LINK]
http://arxiv.org/abs/2305.13617v3
[DATE]
2023-09-18 18:19:10+08:00
[CATEGORIES]
cs.CL
cs.LG
Decouple knowledge from parameters for plug-and-play language modeling
[AUTHORS]
Xin Cheng, Yankai Lin, Xiuying Chen, Dongyan Zhao, Rui Yan
[ABSTRACT]
Pre-trained language models(PLM) have made impressive results in various NLP
tasks. It has been revealed that one of the key factors to their success is the
parameters of these models implicitly learn all kinds of knowledge during
pre-training. However, encoding knowledge implicitly in the model parameters
has two fundamental drawbacks. First, the knowledge is neither editable nor
scalable once the model is trained, which is especially problematic in that
knowledge is consistently evolving. Second, it lacks interpretability and
prevents humans from understanding which knowledge PLM requires for a certain
problem. In this paper, we introduce PlugLM, a pre-training model with
differentiable plug-in memory(DPM). The key intuition is to decouple the
knowledge storage from model parameters with an editable and scalable key-value
memory and leverage knowledge in an explainable manner by knowledge retrieval
in the DPM. To justify this design choice, we conduct evaluations in three
settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements
across four domains on average without any in-domain pre-training. (2)
knowledge update. PlugLM could absorb new knowledge in a training-free way
after pre-training is done. (3) in-task knowledge learning. PlugLM could be
further improved by incorporating training samples into DPM with knowledge
prompting.
[COMMENTS]
ACL2023 Findings
[LINK]
http://arxiv.org/abs/2305.11564v2
[DATE]
2023-09-18 17:42:26+08:00
[CATEGORIES]
cs.CL
Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark
[AUTHORS]
Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu
[ABSTRACT]
Existing propositions often rely on logical constants for classification.
Compared with Western languages that lean towards hypotaxis such as English,
Chinese often relies on semantic or logical understanding rather than logical
connectives in daily expressions, exhibiting the characteristics of parataxis.
However, existing research has rarely paid attention to this issue. And
accurately classifying these propositions is crucial for natural language
understanding and reasoning. In this paper, we put forward the concepts of
explicit and implicit propositions and propose a comprehensive multi-level
proposition classification system based on linguistics and logic.
Correspondingly, we create a large-scale Chinese proposition dataset PEACE from
multiple domains, covering all categories related to propositions. To evaluate
the Chinese proposition classification ability of existing models and explore
their limitations, We conduct evaluations on PEACE using several different
methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT.
Results show the importance of properly modeling the semantic features of
propositions. BERT has relatively good proposition classification capability,
but lacks cross-domain transferability. ChatGPT performs poorly, but its
classification ability can be improved by providing more proposition
information. Many issues are still far from being resolved and require further
study.
[LINK]
http://arxiv.org/abs/2309.09602v1
[DATE]
2023-09-18 17:18:39+08:00
[CATEGORIES]
cs.CL
cs.LG
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
[AUTHORS]
Jonas Golde, Patrick Haller, Felix Hamborg, Julian Risch, Alan Akbik
[ABSTRACT]
Most NLP tasks are modeled as supervised learning and thus require labeled
training data to train effective models. However, manually producing such data
at sufficient quality and quantity is known to be costly and time-intensive.
Current research addresses this bottleneck by exploring a novel paradigm called
zero-shot learning via dataset generation. Here, a powerful LLM is prompted
with a task description to generate labeled data that can be used to train a
downstream NLP model. For instance, an LLM might be prompted to “generate 500
movie reviews with positive overall sentiment, and another 500 with negative
sentiment.” The generated data could then be used to train a binary sentiment
classifier, effectively leveraging an LLM as a teacher to a smaller student
model. With this demo, we introduce Fabricator, an open-source Python toolkit
for dataset generation. Fabricator implements common dataset generation
workflows, supports a wide range of downstream NLP tasks (such as text
classification, question answering, and entity recognition), and is integrated
with well-known libraries to facilitate quick experimentation. With Fabricator,
we aim to support researchers in conducting reproducible dataset generation
experiments using LLMs and help practitioners apply this approach to train
models for downstream tasks.
[COMMENTS]
3 Figures and 2 Tables
[LINK]
http://arxiv.org/abs/2309.09582v1
[DATE]
2023-09-18 16:45:47+08:00
[CATEGORIES]
cs.CL
Summarization is (Almost) Dead
[AUTHORS]
Xiao Pu, Mingqi Gao, Xiaojun Wan
[ABSTRACT]
How well can large language models (LLMs) generate summaries? We develop new
datasets and conduct human evaluation experiments to evaluate the zero-shot
generation capability of LLMs across five distinct summarization tasks. Our
findings indicate a clear preference among human evaluators for LLM-generated
summaries over human-written summaries and summaries generated by fine-tuned
models. Specifically, LLM-generated summaries exhibit better factual
consistency and fewer instances of extrinsic hallucinations. Due to the
satisfactory performance of LLMs in summarization tasks (even surpassing the
benchmark of reference summaries), we believe that most conventional works in
the field of text summarization are no longer necessary in the era of LLMs.
However, we recognize that there are still some directions worth exploring,
such as the creation of novel datasets with higher quality and more reliable
evaluation methods.
[LINK]
http://arxiv.org/abs/2309.09558v1
[DATE]
2023-09-18 16:13:01+08:00
[CATEGORIES]
cs.CL
The expected sum of edge lengths in planar linearizations of trees. Theory and applications
[AUTHORS]
Lluís Alemany-Puig, Ramon Ferrer-i-Cancho
[ABSTRACT]
Dependency trees have proven to be a very successful model to represent the
syntactic structure of sentences of human languages. In these structures,
vertices are words and edges connect syntactically-dependent words. The
tendency of these dependencies to be short has been demonstrated using random
baselines for the sum of the lengths of the edges or its variants. A ubiquitous
baseline is the expected sum in projective orderings (wherein edges do not
cross and the root word of the sentence is not covered by any edge), that can
be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely
planarity. In the theoretical domain, we present a characterization of
planarity that, given a sentence, yields either the number of planar
permutations or an efficient algorithm to generate uniformly random planar
permutations of the words. We also show the relationship between the expected
sum in planar arrangements and the expected sum in projective arrangements. In
the domain of applications, we derive a $O(n)$-time algorithm to calculate the
expected value of the sum of edge lengths. We also apply this research to a
parallel corpus and find that the gap between actual dependency distance and
the random baseline reduces as the strength of the formal constraint on
dependency structures increases, suggesting that formal constraints absorb part
of the dependency distance minimization effect. Our research paves the way for
replicating past research on dependency distance minimization using random
planar linearizations as random baseline.
[COMMENTS]
New version updated
[LINK]
http://arxiv.org/abs/2207.05564v4
[DATE]
2023-09-18 15:50:43+08:00
[CATEGORIES]
cs.CL
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
[AUTHORS]
George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Alessio Brutti
[ABSTRACT]
The possibility of dynamically modifying the computational load of neural
models at inference time is crucial for on-device processing, where
computational power is limited and time-varying. Established approaches for
neural model compression exist, but they provide architecturally static models.
In this paper, we investigate the use of early-exit architectures, that rely on
intermediate exit branches, applied to large-vocabulary speech recognition.
This allows for the development of dynamic models that adjust their
computational cost to the available resources and recognition performance.
Unlike previous works, besides using pre-trained backbones we also train the
model from scratch with an early-exit architecture. Experiments on public
datasets show that early-exit architectures from scratch not only preserve
performance levels when using fewer encoder layers, but also improve task
accuracy as compared to using single-exit models or using pre-trained models.
Additionally, we investigate an exit selection strategy based on posterior
probabilities as an alternative to frame-based entropy.
[LINK]
http://arxiv.org/abs/2309.09546v1
[DATE]
2023-09-18 15:45:16+08:00
[CATEGORIES]
cs.CL
Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model
[AUTHORS]
Stefano De Paoli
[ABSTRACT]
Large Language Models (LLMs) have emerged as powerful generative Artificial
Intelligence solutions which can be applied to several fields and areas of
work. This paper presents results and reflection of an experiment done to use
the model GPT 3.5-Turbo to emulate some aspects of an inductive Thematic
Analysis. Previous research on this subject has largely worked on conducting
deductive analysis. Thematic Analysis is a qualitative method for analysis
commonly used in social sciences and it is based on interpretations made by the
human analyst(s) and the identification of explicit and latent meanings in
qualitative data. Attempting an analysis based on human interpretation with an
LLM clearly is a provocation but also a way to learn something about how these
systems can or cannot be used in qualitative research. The paper presents the
motivations for attempting this emulation, it reflects on how the six steps to
a Thematic Analysis proposed by Braun and Clarke can at least partially be
reproduced with the LLM and it also reflects on what are the outputs produced
by the model. The paper used two existing datasets of open access
semi-structured interviews, previously analysed with Thematic Analysis by other
researchers. It used the previously produced analysis (and the related themes)
to compare with the results produced by the LLM. The results show that the
model can infer at least partially some of the main Themes. The objective of
the paper is not to replace human analysts in qualitative analysis but to learn
if some elements of LLM data manipulation can to an extent be of support for
qualitative research.
[LINK]
http://arxiv.org/abs/2305.13014v3
[DATE]
2023-09-18 15:36:55+08:00
[CATEGORIES]
cs.CL
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
[AUTHORS]
Junzhe Liu, Jianwei Yu, Xie Chen
[ABSTRACT]
End-to-end models, such as the neural Transducer, have been successful in
integrating acoustic and linguistic information jointly to achieve excellent
recognition performance. However, adapting these models with text-only data is
challenging. Factorized neural Transducer (FNT) aims to address this issue by
introducing a separate vocabulary decoder to predict the vocabulary, which can
effectively perform traditional text data adaptation. Nonetheless, this
approach has limitations in fusing acoustic and language information
seamlessly. Moreover, a degradation in word error rate (WER) on the general
test sets was also observed, leading to doubts about its overall performance.
In response to this challenge, we present an improved factorized neural
Transducer (IFNT) model structure designed to comprehensively integrate
acoustic and language information while enabling effective text adaptation. We
evaluate the performance of our proposed methods through in-domain experiments
on GigaSpeech and out-of-domain experiments adapting to EuroParl, TED-LIUM, and
Medical datasets. After text-only adaptation, IFNT yields 7.9% to 28.5%
relative WER improvements over the standard neural Transducer with shallow
fusion, and relative WER reductions ranging from 1.6% to 8.2% on the three test
sets compared to the FNT model.
[LINK]
http://arxiv.org/abs/2309.09524v1
[DATE]
2023-09-18 15:02:04+08:00
[CATEGORIES]
cs.CL
Pruning Large Language Models via Accuracy Predictor
[AUTHORS]
Yupeng Ji, Yibo Cao, Jiucai Liu
[ABSTRACT]
Large language models(LLMs) containing tens of billions of parameters (or
even more) have demonstrated impressive capabilities in various NLP tasks.
However, substantial model size poses challenges to training, inference, and
deployment so that it is necessary to compress the model. At present, most
model compression for LLMs requires manual design of pruning features, which
has problems such as complex optimization pipeline and difficulty in retaining
the capabilities of certain parts of the model.Therefore, we propose a novel
pruning approach: firstly, a training set of a certain number of
architecture-accuracy pairs is established, and then a non-neural model is
trained as an accuracy predictor. Using the accuracy predictor to further
optimize the search space and search, the optimal model can be automatically
selected. Experiments show that our proposed approach is effective and
efficient. Compared with the baseline, the perplexity(PPL) on Wikitext2 and PTB
dropped by 9.48% and 5,76% respectively, and the average accuracy of MMLU
increased by 6.28%.
[COMMENTS]
6 pages, 4 figs
[LINK]
http://arxiv.org/abs/2309.09507v1
[DATE]
2023-09-18 14:38:24+08:00
[CATEGORIES]
cs.CL
Search and Learning for Unsupervised Text Generation
[AUTHORS]
Lili Mou
[ABSTRACT]
With the advances of deep learning techniques, text generation is attracting
increasing interest in the artificial intelligence (AI) community, because of
its wide applications and because it is an essential component of AI.
Traditional text generation systems are trained in a supervised way, requiring
massive labeled parallel corpora. In this paper, I will introduce our recent
work on search and learning approaches to unsupervised text generation, where a
heuristic objective function estimates the quality of a candidate sentence, and
discrete search algorithms generate a sentence by maximizing the search
objective. A machine learning model further learns from the search results to
smooth out noise and improve efficiency. Our approach is important to the
industry for building minimal viable products for a new task; it also has high
social impacts for saving human annotation labor and for processing
low-resource languages.
[COMMENTS]
AI Magazine}, 43(4), 344–352, 2022
[LINK]
http://arxiv.org/abs/2309.09497v1
[DATE]
2023-09-18 13:44:11+08:00
[CATEGORIES]
cs.CL
cs.LG
Improved NL2SQL based on Multi-layer Expert Network
[AUTHORS]
Chenduo Hao, Xu Zhang
[ABSTRACT]
The Natural Language to SQL (NL2SQL) technique is used to convert natural
language queries into executable SQL statements. Typically, slot-filling is
employed as a classification method for multi-task cases to achieve this goal.
However, slot-filling can result in inaccurate SQL statement generation due to
negative migration issues arising from different classification tasks. To
overcome this limitation, this study introduces a new approach called
Multi-Layer Expert Generate SQL (MLEG-SQL), which utilizes a dedicated
multi-task hierarchical network. The lower layer of the network extracts
semantic features of natural language statements, while the upper layer builds
a specialized expert system for handling specific classification tasks. This
hierarchical approach mitigates performance degradation resulting from
different task conflicts. The proposed method was evaluated on the WiKSQL
dataset and was found to be effective in generating accurate SQL statements.
[COMMENTS]
our paper need to be repaired
[LINK]
http://arxiv.org/abs/2306.17727v3
[DATE]
2023-09-18 11:39:23+08:00
[CATEGORIES]
cs.CL
Investigating Zero- and Few-shot Generalization in Fact Verification
[AUTHORS]
Liangming Pan, Yunxiang Zhang, Min-Yen Kan
[ABSTRACT]
In this paper, we explore zero- and few-shot generalization for fact
verification (FV), which aims to generalize the FV model trained on
well-resourced domains (e.g., Wikipedia) to low-resourced domains that lack
human annotations. To this end, we first construct a benchmark dataset
collection which contains 11 FV datasets representing 6 domains. We conduct an
empirical analysis of generalization across these FV datasets, finding that
current models generalize poorly. Our analysis reveals that several factors
affect generalization, including dataset size, length of evidence, and the type
of claims. Finally, we show that two directions of work improve generalization:
1) incorporating domain knowledge via pretraining on specialized domains, and
2) automatically generating training data via claim generation.
[COMMENTS]
AACL-IJCNLP 2023 (main conference, long paper)
[LINK]
http://arxiv.org/abs/2309.09444v1
[DATE]
2023-09-18 10:53:12+08:00
[CATEGORIES]
cs.CL
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
[AUTHORS]
Zhengliang Shi, Weiwei Sun, Shuo Zhang, Zhen Zhang, Pengjie Ren, Zhaochun Ren
[ABSTRACT]
Evaluating open-domain dialogue systems is challenging for reasons such as
the one-to-many problem, i.e., many appropriate responses other than just the
golden response. As of now, automatic evaluation methods need better
consistency with humans, while reliable human evaluation can be time- and
cost-intensive. To this end, we propose the Reference-Assisted Dialogue
Evaluation (RADE) approach under the multi-task learning framework, which
leverages the pre-created utterance as reference other than the gold response
to relief the one-to-many problem. Specifically, RADE explicitly compares
reference and the candidate response to predict their overall scores. Moreover,
an auxiliary response generation task enhances prediction via a shared encoder.
To support RADE, we extend three datasets with additional rated responses other
than just a golden response by human annotation. Experiments on our three
datasets and two existing benchmarks demonstrate the effectiveness of our
method, where Pearson, Spearman, and Kendall correlations with human evaluation
outperform state-of-the-art baselines.
[COMMENTS]
19 pages, Accepted by ACL2023 main conference
[LINK]
http://arxiv.org/abs/2309.08156v2
[DATE]
2023-09-18 08:43:47+08:00
[CATEGORIES]
cs.CL
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization
[AUTHORS]
Yoonsoo Nam, Adam Lehavi, Daniel Yang, Digbalay Bose, Swabha Swayamdipta, Shrikanth Narayanan
[ABSTRACT]
Video summarization remains a huge challenge in computer vision due to the
size of the input videos to be summarized. We propose an efficient,
language-only video summarizer that achieves competitive accuracy with high
data efficiency. Using only textual captions obtained via a zero-shot approach,
we train a language transformer model and forego image representations. This
method allows us to perform filtration amongst the representative text vectors
and condense the sequence. With our approach, we gain explainability with
natural language that comes easily for human interpretation and textual
summaries of the videos. An ablation study that focuses on modality and data
compression shows that leveraging text modality only effectively reduces input
data processing while retaining comparable results.
[COMMENTS]
\c{opyright} 2024 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works
[LINK]
http://arxiv.org/abs/2309.09405v1
[DATE]
2023-09-18 08:08:49+08:00
[CATEGORIES]
cs.CL
Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings
[AUTHORS]
Stephen Fitz
[ABSTRACT]
As Large Language Models are deployed within Artificial Intelligence systems,
that are increasingly integrated with human society, it becomes more important
than ever to study their internal structures. Higher level abilities of LLMs
such as GPT-3.5 emerge in large part due to informative language
representations they induce from raw text data during pre-training on trillions
of words. These embeddings exist in vector spaces of several thousand
dimensions, and their processing involves mapping between multiple vector
spaces, with total number of parameters on the order of trillions. Furthermore,
these language representations are induced by gradient optimization, resulting
in a black box system that is hard to interpret. In this paper, we take a look
at the topological structure of neuronal activity in the “brain” of Chat-GPT’s
foundation language model, and analyze it with respect to a metric representing
the notion of fairness. We develop a novel approach to visualize GPT’s moral
dimensions. We first compute a fairness metric, inspired by social psychology
literature, to identify factors that typically influence fairness assessments
in humans, such as legitimacy, need, and responsibility. Subsequently, we
summarize the manifold’s shape using a lower-dimensional simplicial complex,
whose topology is derived from this metric. We color it with a heat map
associated with this fairness metric, producing human-readable visualizations
of the high-dimensional sentence manifold. Our results show that sentence
embeddings based on GPT-3.5 can be decomposed into two submanifolds
corresponding to fair and unfair moral judgments. This indicates that GPT-based
language models develop a moral dimension within their representation spaces
and induce an understanding of fairness during their training process.
[LINK]
http://arxiv.org/abs/2309.09397v1
[DATE]
2023-09-18 07:38:39+08:00
[CATEGORIES]
cs.CL
cs.LG
Blockwise Compression of Transformer-based Models without Retraining
[AUTHORS]
Gaochen Dong, Wei Chen
[ABSTRACT]
Transformer-based models, exemplified by GPT-3, ChatGPT, and GPT-4, have
recently garnered considerable attention in both academia and industry due to
their promising performance in general language tasks. Nevertheless, these
models typically involve computationally encoding processes, and in some cases,
decoding processes as well, both of which are fundamentally large-scale matrix
multiplication. These operations bring the inevitable challenges of massive
computation resources and huge memory footprint, usually requiring at least
10^23 FLOPs and hundreds of gigabytes, respectively. A common method to address
this issue is to reduce the computational and memory requirements by applying
layerwise quantization to the transformer, replacing the usual fp32 data type
with a low-bit equivalent. Unfortunately, this method often leads to decreased
model accuracy and necessitates time-consuming retraining. Such retraining not
only requires fine-tuning skills but also substantial computational resources,
posing challenges for users. To specifically tackle these issues, we propose
BCT, a framework of blockwise compression for transformers without retraining,
aiming to facilitate model deployment. Unlike layerwise compression methods,
BCT achieves finer compression of the entire transformer by operating
blockwise. This method mitigates data distribution deviation caused by
quantization, eliminating the requirement for retraining. BCT effectively
compresses all components of the model, including but not limited to the
embedding, matrix multiplication, GELU, Softmax, layer normalization, and
intermediate results. In a case study, an efficient model is compressed by BCT
achieving up to 7.988x compression. Subsequently, we also evaluate it on
several General Language Understanding Evaluation (GLUE) datasets.
[COMMENTS]
6 pages, 4 figures. arXiv admin note: text overlap with
arXiv:2303.09184
[LINK]
http://arxiv.org/abs/2304.01483v2
[DATE]
2023-09-18 06:47:50+08:00
[CATEGORIES]
cs.CL
cs.LG
Augmenting text for spoken language understanding with Large Language Models
[AUTHORS]
Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer
[ABSTRACT]
Spoken semantic parsing (SSP) involves generating machine-comprehensible
parses from input speech. Training robust models for existing application
domains represented in training data or extending to new domains requires
corresponding triplets of speech-transcript-semantic parse data, which is
expensive to obtain. In this paper, we address this challenge by examining
methods that can use transcript-semantic parse data (unpaired text) without
corresponding speech. First, when unpaired text is drawn from existing textual
corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways
to generate speech representations for unpaired text. Experiments on the STOP
dataset show that unpaired text from existing and new domains improves
performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we
consider the setting when unpaired text is not available in existing textual
corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired
text for existing and new domains. Experiments show that examples and words
that co-occur with intents can be used to generate unpaired text with Llama
2.0. Using the generated text with JAT and TTS for spoken semantic parsing
improves EM on STOP by 1.4% and 2.6% absolute for existing and new domains
respectively.
[COMMENTS]
Submitted to ICASSP 2024
[LINK]
http://arxiv.org/abs/2309.09390v1
[DATE]
2023-09-18 06:25:34+08:00
[CATEGORIES]
cs.CL
Mitigating Shortcuts in Language Models with Soft Label Encoding
[AUTHORS]
Zirui He, Huiqi Deng, Haiyan Zhao, Ninghao Liu, Mengnan Du
[ABSTRACT]
Recent research has shown that large language models rely on spurious
correlations in the data for natural language understanding (NLU) tasks. In
this work, we aim to answer the following research question: Can we reduce
spurious correlations by modifying the ground truth labels of the training
data? Specifically, we propose a simple yet effective debiasing framework,
named Soft Label Encoding (SoftLE). We first train a teacher model with hard
labels to determine each sample’s degree of relying on shortcuts. We then add
one dummy class to encode the shortcut degree, which is used to smooth other
dimensions in the ground truth label to generate soft labels. This new ground
truth label is used to train a more robust student model. Extensive experiments
on two NLU benchmark tasks demonstrate that SoftLE significantly improves
out-of-distribution generalization while maintaining satisfactory
in-distribution accuracy.
[LINK]
http://arxiv.org/abs/2309.09380v1
[DATE]
2023-09-18 05:18:02+08:00
[CATEGORIES]
cs.CL
cs.LG
Language models are susceptible to incorrect patient self-diagnosis in medical applications
[AUTHORS]
Rojin Ziaei, Samuel Schmidgall
[COMMENTS]
4 pages, Deep Generative Models for Health NeurIPS 2023
[LINK]
http://arxiv.org/abs/2309.09362v1
[DATE]
2023-09-18 03:56:39+08:00
[CATEGORIES]
cs.CL
Weaker Than You Think: A Critical Look at Weakly Supervised Learning
[AUTHORS]
Dawei Zhu, Xiaoyu Shen, Marius Mosbach, Andreas Stephan, Dietrich Klakow
[COMMENTS]
ACL 2023, oral presentation
[LINK]
http://arxiv.org/abs/2305.17442v3
[DATE]
2023-09-18 03:04:44+08:00
[CATEGORIES]
cs.CL
Temporal Analysis on Topics Using Word2Vec
[AUTHORS]
Angad Sandhu, Aneesh Edara, Vishesh Narayan, Faizan Wajid, Ashok Agrawala
[ABSTRACT]
The present study proposes a novel method of trend detection and
visualization - more specifically, modeling the change in a topic over time.
Where current models used for the identification and visualization of trends
only convey the popularity of a singular word based on stochastic counting of
usage, the approach in the present study illustrates the popularity and
direction that a topic is moving in. The direction in this case is a distinct
subtopic within the selected corpus. Such trends are generated by modeling the
movement of a topic by using k-means clustering and cosine similarity to group
the distances between clusters over time. In a convergent scenario, it can be
inferred that the topics as a whole are meshing (tokens between topics,
becoming interchangeable). On the contrary, a divergent scenario would imply
that each topics’ respective tokens would not be found in the same context (the
words are increasingly different to each other). The methodology was tested on
a group of articles from various media houses present in the 20 Newsgroups
dataset.
[LINK]
http://arxiv.org/abs/2209.11717v2
[DATE]
2023-09-18 02:27:13+08:00
[CATEGORIES]
cs.CL
A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers
[AUTHORS]
Paleti Nikhil Chowdary, Vadlapudi Sai Aravind, Gorantla V N S L Vishnu Vardhan, Menta Sai Akshay, Menta Sai Aashish, Jyothish Lal. G
[ABSTRACT]
Dysarthria is a speech disorder that hinders communication due to
difficulties in articulating words. Detection of dysarthria is important for
several reasons as it can be used to develop a treatment plan and help improve
a person’s quality of life and ability to communicate effectively. Much of the
literature focused on improving ASR systems for dysarthric speech. The
objective of the current work is to develop models that can accurately classify
the presence of dysarthria and also give information about the intelligibility
level using limited data by employing a few-shot approach using a transformer
model. This work also aims to tackle the data leakage that is present in
previous studies. Our whisper-large-v2 transformer model trained on a subset of
the UASpeech dataset containing medium intelligibility level patients achieved
an accuracy of 85%, precision of 0.92, recall of 0.8 F1-score of 0.85, and
specificity of 0.91. Experimental results also demonstrate that the model
trained using the ‘words’ dataset performed better compared to the model
trained on the ‘letters’ and ‘digits’ dataset. Moreover, the multiclass model
achieved an accuracy of 67%.
[COMMENTS]
Paper has been presented at ICCCNT 2023 and the final version will be
published in IEEE Digital Library Xplore
[LINK]
http://arxiv.org/abs/2309.09329v1
[DATE]
2023-09-18 01:23:41+08:00
[CATEGORIES]
cs.CL
How People Perceive The Dynamic Zero-COVID Policy: A Retrospective Analysis From The Perspective of Appraisal Theory
[AUTHORS]
Na Yang, Kyrie Zhixuan Zhou, Yunzhe Li
[ABSTRACT]
The Dynamic Zero-COVID Policy in China spanned three years and diverse
emotional responses have been observed at different times. In this paper, we
retrospectively analyzed public sentiments and perceptions of the policy,
especially regarding how they evolved over time, and how they related to
people’s lived experiences. Through sentiment analysis of 2,358 collected Weibo
posts, we identified four representative points, i.e., policy initialization,
sharp sentiment change, lowest sentiment score, and policy termination, for an
in-depth discourse analysis through the lens of appraisal theory. In the end,
we reflected on the evolving public sentiments toward the Dynamic Zero-COVID
Policy and proposed implications for effective epidemic prevention and control
measures for future crises.
[LINK]
http://arxiv.org/abs/2309.09324v1
[DATE]
2023-09-18 01:05:18+08:00
[CATEGORIES]
cs.CL
Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings
[AUTHORS]
Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, Partha Pratim Das
[ABSTRACT]
Nowadays many research articles are prefaced with research highlights to
summarize the main findings of the paper. Highlights not only help researchers
precisely and quickly identify the contributions of a paper, they also enhance
the discoverability of the article via search engines. We aim to automatically
construct research highlights given certain segments of a research paper. We
use a pointer-generator network with coverage mechanism and a contextual
embedding layer at the input that encodes the input tokens into SciBERT
embeddings. We test our model on a benchmark dataset, CSPubSum, and also
present MixSub, a new multi-disciplinary corpus of papers for automatic
research highlight generation. For both CSPubSum and MixSub, we have observed
that the proposed model achieves the best performance compared to related
variants and other models proposed in the literature. On the CSPubSum dataset,
our model achieves the best performance when the input is only the abstract of
a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2
and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR score of
32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the
new MixSub dataset, where only the abstract is the input, our proposed model
(when trained on the whole training corpus without distinguishing between the
subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78,
9.76 and 29.3, respectively, METEOR score of 24.00, and BERTScore F1 of 85.25.
[COMMENTS]
19 Pages, 7 Figures, 8 Tables
[LINK]
http://arxiv.org/abs/2302.07729v3
[DATE]
2023-09-18 00:45:44+08:00
[CATEGORIES]
cs.CL
cs.LG
On Model Explanations with Transferable Neural Pathways
[AUTHORS]
Xinmiao Lin, Wentao Bao, Qi Yu, Yu Kong
[ABSTRACT]
Neural pathways as model explanations consist of a sparse set of neurons that
provide the same level of prediction performance as the whole model. Existing
methods primarily focus on accuracy and sparsity but the generated pathways may
offer limited interpretability thus fall short in explaining the model
behavior. In this paper, we suggest two interpretability criteria of neural
pathways: (i) same-class neural pathways should primarily consist of
class-relevant neurons; (ii) each instance’s neural pathway sparsity should be
optimally determined. To this end, we propose a Generative Class-relevant
Neural Pathway (GEN-CNP) model that learns to predict the neural pathways from
the target model’s feature maps. We propose to learn class-relevant information
from features of deep and shallow layers such that same-class neural pathways
exhibit high similarity. We further impose a faithfulness criterion for GEN-CNP
to generate pathways with instance-specific sparsity. We propose to transfer
the class-relevant neural pathways to explain samples of the same class and
show experimentally and qualitatively their faithfulness and interpretability.
[COMMENTS]
Arxiv preprint
[LINK]
http://arxiv.org/abs/2309.09887v1
[DATE]
2023-09-18 23:50:38+08:00
[CATEGORIES]
cs.LG
Error Reduction from Stacked Regressions
[AUTHORS]
Xin Chen, Jason M. Klusowski, Yan Shuo Tan
[ABSTRACT]
Stacking regressions is an ensemble technique that forms linear combinations
of different regression estimators to enhance predictive accuracy. The
conventional approach uses cross-validation data to generate predictions from
the constituent estimators, and least-squares with nonnegativity constraints to
learn the combination weights. In this paper, we learn these weights
analogously by minimizing an estimate of the population risk subject to a
nonnegativity constraint. When the constituent estimators are linear
least-squares projections onto nested subspaces separated by at least three
dimensions, we show that thanks to a shrinkage effect, the resulting stacked
estimator has strictly smaller population risk than best single estimator among
them. Here ``best’’ refers to a model that minimizes a selection criterion such
as AIC or BIC. In other words, in this setting, the best single estimator is
inadmissible. Because the optimization problem can be reformulated as isotonic
regression, the stacked estimator requires the same order of computation as the
best single estimator, making it an attractive alternative in terms of both
performance and implementation.
[LINK]
http://arxiv.org/abs/2309.09880v1
[DATE]
2023-09-18 23:42:12+08:00
[CATEGORIES]
cs.LG
Neural Operator: Is data all you need to model the world? An insight into the impact of Physics Informed Machine Learning
[AUTHORS]
Hrishikesh Viswanath, Md Ashiqur Rahman, Abhijeet Vyas, Andrey Shor, Beatriz Medeiros, Stephanie Hernandez, Suhas Eswarappa Prameela, Aniket Bera
[ABSTRACT]
Numerical approximations of partial differential equations (PDEs) are
routinely employed to formulate the solution of physics, engineering and
mathematical problems involving functions of several variables, such as the
propagation of heat or sound, fluid flow, elasticity, electrostatics,
electrodynamics, and more. While this has led to solving many complex
phenomena, there are some limitations. Conventional approaches such as Finite
Element Methods (FEMs) and Finite Differential Methods (FDMs) require
considerable time and are computationally expensive. In contrast, data driven
machine learning-based methods such as neural networks provide a faster, fairly
accurate alternative, and have certain advantages such as discretization
invariance and resolution invariance. This article aims to provide a
comprehensive insight into how data-driven approaches can complement
conventional techniques to solve engineering and physics problems, while also
noting some of the major pitfalls of machine learning-based approaches.
Furthermore, we highlight, a novel and fast machine learning-based approach
(~1000x) to learning the solution operator of a PDE operator learning. We will
note how these new computational approaches can bring immense advantages in
tackling many problems in fundamental and applied physics.
[LINK]
http://arxiv.org/abs/2301.13331v2
[DATE]
2023-09-18 23:26:18+08:00
[CATEGORIES]
cs.LG
Investigation of Compressor Cascade Flow Using Physics- Informed Neural Networks with Adaptive Learning Strategy
[AUTHORS]
Zhihui Li, Francesco Montomoli, Sanjiv Sharma
[ABSTRACT]
In this study, we utilize the emerging Physics Informed Neural Networks
(PINNs) approach for the first time to predict the flow field of a compressor
cascade. Different from conventional training methods, a new adaptive learning
strategy that mitigates gradient imbalance through incorporating adaptive
weights in conjunction with dynamically adjusting learning rate is used during
the training process to improve the convergence of PINNs. The performance of
PINNs is assessed here by solving both the forward and inverse problems. In the
forward problem, by encapsulating the physical relations among relevant
variables, PINNs demonstrate their effectiveness in accurately forecasting the
compressor’s flow field. PINNs also show obvious advantages over the
traditional CFD approaches, particularly in scenarios lacking complete boundary
conditions, as is often the case in inverse engineering problems. PINNs
successfully reconstruct the flow field of the compressor cascade solely based
on partial velocity vectors and near-wall pressure information. Furthermore,
PINNs show robust performance in the environment of various levels of aleatory
uncertainties stemming from labeled data. This research provides evidence that
PINNs can offer turbomachinery des