ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
[AUTHORS]
Ziyu Guo, Yiwen Tang, Renrui Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
[ABSTRACT]
Understanding 3D scenes from multi-view inputs has been proven to alleviate
the view discrepancy issue in 3D visual grounding. However, existing methods
normally neglect the view cues embedded in the text modality and fail to weigh
the relative importance of different views. In this paper, we propose
ViewRefer, a multi-view framework for 3D visual grounding exploring how to
grasp the view knowledge from both text and 3D modalities. For the text branch,
ViewRefer leverages the diverse linguistic knowledge of large-scale language
models, e.g., GPT, to expand a single grounding text to multiple
geometry-consistent descriptions. Meanwhile, in the 3D modality, a transformer
fusion module with inter-view attention is introduced to boost the interaction
of objects across views. On top of that, we further present a set of learnable
multi-view prototypes, which memorize scene-agnostic knowledge for different
views, and enhance the framework from two perspectives: a view-guided attention
module for more robust text features, and a view-guided scoring strategy during
the final prediction. With our designed paradigm, ViewRefer achieves superior
performance on three benchmarks and surpasses the second-best by +2.8%, +1.2%,
and +0.73% on Sr3D, Nr3D, and ScanRefer. Code will be released at
https://github.com/ZiyuGuo99/ViewRefer3D.
[COMMENTS]
Code will be released at https://github.com/ZiyuGuo99/ViewRefer3D
[LINK]
http://arxiv.org/abs/2303.16894v1
[DATE]
2023-03-29 17:59:10+00:00
[CATEGORIES]
cs.CL
End-to-End $n$-ary Relation Extraction for Combination Drug Therapies
[AUTHORS]
Yuhang Jiang, Ramakanth Kavuluru
[ABSTRACT]
Combination drug therapies are treatment regimens that involve two or more
drugs, administered more commonly for patients with cancer, HIV, malaria, or
tuberculosis. Currently there are over 350K articles in PubMed that use the
“combination drug therapy” MeSH heading with at least 10K articles published
per year over the past two decades. Extracting combination therapies from
scientific literature inherently constitutes an $n$-ary relation extraction
problem. Unlike in the general $n$-ary setting where $n$ is fixed (e.g.,
drug-gene-mutation relations where $n=3$), extracting combination therapies is
a special setting where $n \geq 2$ is dynamic, depending on each instance.
Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset,
CombDrugExt, for extracting such therapies from literature. Here, we use a
sequence-to-sequence style end-to-end extraction method to achieve an F1-Score
of $66.7\%$ on the CombDrugExt test set for positive (or effective)
combinations. This is an absolute $\approx 5\%$ F1-score improvement even over
the prior best relation classification score with spotted drug entities (hence,
not end-to-end). Thus our effort introduces a state-of-the-art first model for
end-to-end extraction that is already superior to the best prior non end-to-end
model for this task. Our model seamlessly extracts all drug entities and
relations in a single pass and is highly suitable for dynamic $n$-ary
extraction scenarios.
[COMMENTS]
Accepted to appear in IEEE ICHI 2023. Code:
https://github.com/bionlproc/end-to-end-CombDrugExt
[LINK]
http://arxiv.org/abs/2303.16886v1
[DATE]
2023-03-29 17:55:50+00:00
[CATEGORIES]
cs.CL
Did You Mean…? Confidence-based Trade-offs in Semantic Parsing
[AUTHORS]
Elias Stengel-Eskin, Benjamin Van Durme
[ABSTRACT]
We illustrate how a calibrated model can help balance common trade-offs in
task-oriented parsing. In a simulated annotator-in-the-loop experiment, we show
that well-calibrated confidence scores allow us to balance cost with annotator
load, improving accuracy with a small number of interactions. We then examine
how confidence scores can help optimize the trade-off between usability and
safety. We show that confidence-based thresholding can substantially reduce the
number of incorrect low-confidence programs executed; however, this comes at a
cost to usability. We propose the DidYouMean system which better balances
usability and safety.
[COMMENTS]
9 pages. arXiv admin note: substantial text overlap with
arXiv:2211.07443
[LINK]
http://arxiv.org/abs/2303.16857v1
[DATE]
2023-03-29 17:07:26+00:00
[CATEGORIES]
cs.CL
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
[AUTHORS]
Xingwei He, Zhenghao Lin, Yeyun Gong, A-Long Jin, Hang Zhang, Chen Lin, Jian Jiao, Siu Ming Yiu, Nan Duan, Weizhu Chen
[LINK]
http://arxiv.org/abs/2303.16854v1
[DATE]
2023-03-29 17:03:21+00:00
[CATEGORIES]
cs.CL
Editing Models with Task Arithmetic
[AUTHORS]
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi
[COMMENTS]
In Proceedings of the 11th International Conference on Learning
Representations (ICLR 2023)
[LINK]
http://arxiv.org/abs/2212.04089v2
[DATE]
2023-03-29 16:52:08+00:00
[CATEGORIES]
cs.LG
cs.CL
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
[AUTHORS]
Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova
[LINK]
http://arxiv.org/abs/2303.16839v1
[DATE]
2023-03-29 16:42:30+00:00
[CATEGORIES]
cs.CL
cs.LG
Zero-shot Entailment of Leaderboards for Empirical AI Research
[AUTHORS]
Salomon Kabongo, Jennifer D’Souza, Sören Auer
[COMMENTS]
5 pages, 1 figure. Accepted for publication at JCDL 2023 - Late
Breaking Results and Datasets track
(https://2023.jcdl.org/calls/papers/#paper_types), official citation
forthcoming
[LINK]
http://arxiv.org/abs/2303.16835v1
[DATE]
2023-03-29 16:28:43+00:00
[CATEGORIES]
cs.CL
cs.LG
Calibrated Interpretation: Confidence Estimation in Semantic Parsing
[AUTHORS]
Elias Stengel-Eskin, Benjamin Van Durme
[ABSTRACT]
Sequence generation models are increasingly being used to translate language
into executable programs, i.e. to perform executable semantic parsing. The fact
that semantic parsing aims to execute actions in the real world motivates
developing safe systems, which in turn makes measuring calibration – a central
component to safety – particularly important. We investigate the calibration
of common generation models across four popular semantic parsing datasets,
finding that it varies across models and datasets. We then analyze factors
associated with calibration error and release new confidence-based challenge
splits of two parsing datasets. To facilitate the inclusion of calibration in
semantic parsing evaluations, we release a library for computing calibration
metrics.
[COMMENTS]
17 pages
[LINK]
http://arxiv.org/abs/2211.07443v4
[DATE]
2023-03-29 15:59:32+00:00
[CATEGORIES]
cs.CL
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
[AUTHORS]
Martha Lewis, Nihal V. Nayak, Peilin Yu, Qinan Yu, Jack Merullo, Stephen H. Bach, Ellie Pavlick
[ABSTRACT]
Large-scale neural network models combining text and images have made
incredible progress in recent years. However, it remains an open question to
what extent such models encode compositional representations of the concepts
over which they operate, such as correctly identifying ‘‘red cube’’ by
reasoning over the constituents ‘‘red’’ and ‘‘cube’’. In this work, we focus on
the ability of a large pretrained vision and language model (CLIP) to encode
compositional concepts and to bind variables in a structure-sensitive way
(e.g., differentiating ‘‘cube behind sphere’’ from ‘‘sphere behind cube’’). In
order to inspect the performance of CLIP, we compare several architectures from
research on compositional distributional semantics models (CDSMs), a line of
research that attempts to implement traditional compositional linguistic
structures within embedding spaces. We find that CLIP can compose concepts in a
single-object setting, but in situations where concept binding is needed,
performance drops dramatically. At the same time, CDSMs also perform poorly,
with best performance at chance level.
[LINK]
http://arxiv.org/abs/2212.10537v2
[DATE]
2023-03-29 15:34:23+00:00
[CATEGORIES]
cs.CL
Evaluating NLG systems: A brief introduction
[AUTHORS]
Emiel van Miltenburg
[ABSTRACT]
This year the International Conference on Natural Language Generation (INLG)
will feature an award for the paper with the best evaluation. The purpose of
this award is to provide an incentive for NLG researchers to pay more attention
to the way they assess the output of their systems. This essay provides a short
introduction to evaluation in NLG, explaining key terms and distinctions.
[COMMENTS]
To be published on the INLG2023 conference website
[LINK]
http://arxiv.org/abs/2303.16742v1
[DATE]
2023-03-29 14:49:29+00:00
[CATEGORIES]
cs.CL
Text revision in Scientific Writing Assistance: An Overview
[AUTHORS]
Léane Jourdan, Florian Boudin, Richard Dufour, Nicolas Hernandez
[COMMENTS]
Published at 13th International Workshop on Bibliometric-enhanced
Information Retrieval 12 pages
[LINK]
http://arxiv.org/abs/2303.16726v1
[DATE]
2023-03-29 14:25:30+00:00
[CATEGORIES]
cs.CL
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
[AUTHORS]
Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie
[ABSTRACT]
ChatGPT is a recent chatbot service released by OpenAI and is receiving
increasing attention over the past few months. While evaluations of various
aspects of ChatGPT have been done, its robustness, i.e., the performance to
unexpected inputs, is still unclear to the public. Robustness is of particular
concern in responsible AI, especially for safety-critical applications. In this
paper, we conduct a thorough evaluation of the robustness of ChatGPT from the
adversarial and out-of-distribution (OOD) perspective. To do so, we employ the
AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart
review and DDXPlus medical diagnosis datasets for OOD evaluation. We select
several popular foundation models as baselines. Results show that ChatGPT shows
consistent advantages on most adversarial and OOD classification and
translation tasks. However, the absolute performance is far from perfection,
which suggests that adversarial and OOD robustness remains a significant threat
to foundation models. Moreover, ChatGPT shows astounding performance in
understanding dialogue-related texts and we find that it tends to provide
informal suggestions for medical tasks instead of definitive answers. Finally,
we present in-depth discussions of possible research directions.
[COMMENTS]
Technical report; code is at:
https://github.com/microsoft/robustlearn
[LINK]
http://arxiv.org/abs/2302.12095v4
[DATE]
2023-03-29 14:21:51+00:00
[CATEGORIES]
cs.CL
cs.LG
Using Semantic Similarity and Text Embedding to Measure the Social Media Echo of Strategic Communications
[AUTHORS]
Tristan J. B. Cann, Ben Dennes, Travis Coan, Saffron O’Neill, Hywel T. P. Williams
[COMMENTS]
12 pages, 5 figures
[LINK]
http://arxiv.org/abs/2303.16694v1
[DATE]
2023-03-29 13:46:07+00:00
[CATEGORIES]
cs.CL
DBLP-QuAD: A Question Answering Dataset over the DBLP Scholarly Knowledge Graph
[AUTHORS]
Debayan Banerjee, Sushil Awale, Ricardo Usbeck, Chris Biemann
[COMMENTS]
12 pages ceur-ws 1 column accepted at International Bibliometric
Information Retrieval Workshp @ ECIR 2023
[LINK]
http://arxiv.org/abs/2303.13351v3
[DATE]
2023-03-29 13:37:52+00:00
[CATEGORIES]
cs.CL
Emergent Linguistic Structures in Neural Networks are Fragile
[AUTHORS]
Emanuele La Malfa, Matthew Wicker, Marta Kwiatkowska
[ABSTRACT]
Large Language Models (LLMs) have been reported to have strong performance on
natural language processing tasks. However, performance metrics such as
accuracy do not measure the quality of the model in terms of its ability to
robustly represent complex linguistic structure. In this paper, focusing on the
ability of language models to represent syntax, we propose a framework to
assess the consistency and robustness of linguistic representations. To this
end, we introduce measures of robustness of neural network models that leverage
recent advances in extracting linguistic constructs from LLMs via probing
tasks, i.e., simple tasks used to extract meaningful information about a single
facet of a language model, such as syntax reconstruction and root
identification. Empirically, we study the performance of four LLMs across six
different corpora on the proposed robustness measures by analysing their
performance and robustness with respect to syntax-preserving perturbations. We
provide evidence that context-free representation (e.g., GloVe) are in some
cases competitive with context-dependent representations from modern LLMs
(e.g., BERT), yet equally brittle to syntax-preserving perturbations. Our key
observation is that emergent syntactic representations in neural networks are
brittle. We make the code, trained models and logs available to the community
as a contribution to the debate about the capabilities of LLMs.
[LINK]
http://arxiv.org/abs/2210.17406v7
[DATE]
2023-03-29 13:29:51+00:00
[CATEGORIES]
cs.LG
cs.CL
Zero-Shot Retrieval with Search Agents and Hybrid Environments
[AUTHORS]
Michelle Chen Huebscher, Christian Buck, Massimiliano Ciaramita, Sascha Rothe
[ABSTRACT]
Learning to search is the task of building artificial agents that learn to
autonomously use a search box to find information. So far, it has been shown
that current language models can learn symbolic query reformulation policies,
in combination with traditional term-based retrieval, but fall short of
outperforming neural retrievers. We extend the previous learning to search
setup to a hybrid environment, which accepts discrete query refinement
operations, after a first-pass retrieval step via a dual encoder. Experiments
on the BEIR task show that search agents, trained via behavioral cloning,
outperform the underlying search system based on a combined dual encoder
retriever and cross encoder reranker. Furthermore, we find that simple
heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance
by several nDCG points. The search agent based on HRE (HARE) matches
state-of-the-art performance, balanced in both zero-shot and in-domain
evaluations, via interpretable actions, and at twice the speed.
[LINK]
http://arxiv.org/abs/2209.15469v2
[DATE]
2023-03-29 13:29:35+00:00
[CATEGORIES]
cs.CL
Summarizing Indian Languages using Multilingual Transformers based Models
[AUTHORS]
Dhaval Taunk, Vasudeva Varma
[LINK]
http://arxiv.org/abs/2303.16657v1
[DATE]
2023-03-29 13:05:17+00:00
[CATEGORIES]
cs.CL
GPTEval: NLG Evaluation using GPT-4 with Better Human Alignment
[AUTHORS]
Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu
[ABSTRACT]
The quality of texts generated by natural language generation (NLG) systems
is hard to measure automatically. Conventional reference-based metrics, such as
BLEU and ROUGE, have been shown to have relatively low correlation with human
judgments, especially for tasks that require creativity and diversity. Recent
studies suggest using large language models (LLMs) as reference-free metrics
for NLG evaluation, which have the benefit of being applicable to new tasks
that lack human references. However, these LLM-based evaluators still have
lower human correspondence than medium-size neural evaluators. In this work, we
present GPTEval, a framework of using large language models with
chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of
NLG outputs. We experiment with two generation tasks, text summarization and
dialogue generation. We show that GPTEval with GPT-4 as the backbone model
achieves a Spearman correlation of 0.514 with human on summarization task,
outperforming all previous methods by a large margin. We also propose
preliminary analysis on the behavior of LLM-based evaluators, and highlight the
potential issue of LLM-based evaluators having a bias towards the LLM-generated
texts.
[LINK]
http://arxiv.org/abs/2303.16634v1
[DATE]
2023-03-29 12:46:54+00:00
[CATEGORIES]
cs.CL
AraSpot: Arabic Spoken Command Spotting
[AUTHORS]
Mahmoud Salhab, Haidar Harmanani
[ABSTRACT]
Spoken keyword spotting (KWS) is the task of identifying a keyword in an
audio stream and is widely used in smart devices at the edge in order to
activate voice assistants and perform hands-free tasks. The task is daunting as
there is a need, on the one hand, to achieve high accuracy while at the same
time ensuring that such systems continue to run efficiently on low power and
possibly limited computational capabilities devices. This work presents AraSpot
for Arabic keyword spotting trained on 40 Arabic keywords, using different
online data augmentation, and introducing ConformerGRU model architecture.
Finally, we further improve the performance of the model by training a
text-to-speech model for synthetic data generation. AraSpot achieved a
State-of-the-Art SOTA 99.59% result outperforming previous approaches.
[COMMENTS]
A preprint
[LINK]
http://arxiv.org/abs/2303.16621v1
[DATE]
2023-03-29 12:22:17+00:00
[CATEGORIES]
cs.CL
Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations
[AUTHORS]
Sebastian Vincent, Rowanne Sumner, Alice Dowek, Charlotte Blundell, Emily Preston, Chris Bayliss, Chris Oakley, Carolina Scarton
[COMMENTS]
9 pages; 4 figures; 6 tables. Preprint
[LINK]
http://arxiv.org/abs/2303.16618v1
[DATE]
2023-03-29 12:19:23+00:00
[CATEGORIES]
cs.CL
cs.LG
Large Language Models are reasoners with Self-Verification
[AUTHORS]
Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, Jun Zhao
[ABSTRACT]
When a large language model (LLM) performs complex reasoning by chain of
thought (CoT), it can be highly sensitive to individual mistakes. We have had
to train verifiers to address this issue. As we all know, after human inferring
a conclusion, they often check it by re-verifying it, which can avoid some
mistakes. We propose a new method called self-verification that uses the
conclusion of the CoT as a condition to build a new sample and asks the LLM to
re-predict the original conditions which be masked. We calculate an explainable
verification score based on the accuracy. This method can improve the accuracy
of multiple arithmetics and logical reasoning datasets when using few-shot
learning. we have demonstrated that LLMs can conduct explainable
self-verification of their own conclusions and achieve competitive reasoning
performance. Extensive experimentals have demonstrated that our method can help
multiple large language models with self-verification can avoid interference
from incorrect CoT. Code is available at
\url{https://github.com/WENGSYX/Self-Verification}
[LINK]
http://arxiv.org/abs/2212.09561v2
[DATE]
2023-03-29 11:52:10+00:00
[CATEGORIES]
cs.CL
Ousiometrics and Telegnomics: The essence of meaning conforms to a two-dimensional powerful-weak and dangerous-safe framework with diverse corpora presenting a safety bias
[AUTHORS]
P. S. Dodds, T. Alshaabi, M. I. Fudolig, J. W. Zimmerman, J. Lovato, S. Beaulieu, J. R. Minot, M. V. Arnold, A. J. Reagan, C. M. Danforth
[ABSTRACT]
We define ousiometrics' to be the study of essential meaning in whatever
context that meaningful signals are communicated, and
telegnomics’ as the
study of remotely sensed knowledge. From work emerging through the middle of
the 20th century, the essence of meaning has become generally accepted as being
well captured by the three orthogonal dimensions of evaluation, potency, and
activation (EPA). By re-examining first types and then tokens for the English
language, and through the use of automatically annotated histograms –
ousiograms' -- we find here that: 1. The essence of meaning conveyed by words
is instead best described by a compass-like power-danger (PD) framework, and 2.
Analysis of a disparate collection of large-scale English language corpora --
literature, news, Wikipedia, talk radio, and social media -- shows that natural
language exhibits a systematic bias toward safe, low danger words -- a
reinterpretation of the Pollyanna principle's positivity bias for written
expression. To help justify our choice of dimension names and to help address
the problems with representing observed ousiometric dimensions by bipolar
adjective pairs, we introduce and explore
synousionyms’ and antousionyms' --
ousiometric counterparts of synonyms and antonyms. We further show that the PD
framework revises the circumplex model of affect as a more general model of
state of mind. Finally, we use our findings to construct and test a <span style="color:#e74d3c;">prototype</span>
ousiometer’, a telegnomic instrument that measures ousiometric time series for
temporal corpora. We contend that our power-danger ousiometric framework
provides a complement for entropy-based measurements, and may be of value for
the study of a wide variety of communication across biological and artificial
life.
[COMMENTS]
40 pages (34 page main manuscript, 6 page appendix), 15 figures (9
main, 6 appendix), 4 tables
[LINK]
http://arxiv.org/abs/2110.06847v2
[DATE]
2023-03-29 11:35:52+00:00
[CATEGORIES]
cs.CL
Trained on 100 million words and still in shape: BERT meets British National Corpus
[AUTHORS]
David Samuel, Andrey Kutuzov, Lilja Øvrelid, Erik Velldal
[COMMENTS]
Accepted to EACL 2023
[LINK]
http://arxiv.org/abs/2303.09859v2
[DATE]
2023-03-29 09:00:21+00:00
[CATEGORIES]
cs.CL
LMExplainer: a Knowledge-Enhanced Explainer for Language Models
[AUTHORS]
Zichen Chen, Ambuj K Singh, Misha Sra
[ABSTRACT]
Large language models (LMs) such as GPT-4 are very powerful and can process
different kinds of natural language processing (NLP) tasks. However, it can be
difficult to interpret the results due to the multi-layer nonlinear model
structure and millions of parameters. Lack of understanding of how the model
works can make the model unreliable and dangerous for everyday users in
real-world scenarios. Most recent works exploit the weights of attention to
provide explanations for model predictions. However, pure attention-based
explanation is unable to support the growing complexity of the models, and
cannot reason about their decision-making processes. Thus, we propose
LMExplainer, a knowledge-enhanced interpretation module for language models
that can provide human-understandable explanations. We use a knowledge graph
(KG) and a graph attention neural network to extract the key decision signals
of the LM. We further explore whether interpretation can also help AI
understand the task better. Our experimental results show that LMExplainer
outperforms existing LM+KG methods on CommonsenseQA and OpenBookQA. We also
compare the explanation results with generated explanation methods and
human-annotated results. The comparison shows our method can provide more
comprehensive and clearer explanations. LMExplainer demonstrates the potential
to enhance model performance and furnish explanations for the reasoning
processes of models in natural language.
[LINK]
http://arxiv.org/abs/2303.16537v1
[DATE]
2023-03-29 08:59:44+00:00
[CATEGORIES]
cs.CL
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
[AUTHORS]
Björn Plüster, Jakob Ambsdorf, Lukas Braach, Jae Hee Lee, Stefan Wermter
[ABSTRACT]
Natural language explanations promise to offer intuitively understandable
explanations of a neural network’s decision process in complex vision-language
tasks, as pursued in recent VL-NLE models. While current models offer
impressive performance on task accuracy and explanation plausibility, they
suffer from a range of issues: Some models feature a modular design where the
explanation generation module is poorly integrated with a separate module for
task-answer prediction, employ backbone models trained on limited sets of
tasks, or incorporate ad hoc solutions to increase performance on single
datasets. We propose to evade these limitations by applying recent advances in
large-scale multi-task pretraining of generative Transformer models to the
problem of VL-NLE tasks. Our approach outperforms recent models by a large
margin, with human annotators preferring the generated explanations over the
ground truth in two out of three evaluated datasets. As a novel challenge in
VL-NLE research, we propose the problem of multi-task VL-NLE and show that
jointly training on multiple tasks can increase the explanation quality. We
discuss the ethical implications of high-quality NLE generation and other
issues in recent VL-NLE research.
[COMMENTS]
Minor changes
[LINK]
http://arxiv.org/abs/2212.04231v2
[DATE]
2023-03-29 08:48:35+00:00
[CATEGORIES]
cs.CL
Building a Knowledge Graph of Distributed Ledger Technologies
[AUTHORS]
Lukas König, Sebastian Neumaier
[COMMENTS]
URI: https://w3id.org/DLTOntology
[LINK]
http://arxiv.org/abs/2303.16528v1
[DATE]
2023-03-29 08:34:01+00:00
[CATEGORIES]
cs.CL
Reproducibility is Nothing without Correctness: The Importance of Testing Code in NLP
[AUTHORS]
Sara Papi, Marco Gaido, Andrea Pilzer, Matteo Negri
[ABSTRACT]
Despite its pivotal role in research experiments, code correctness is often
presumed only on the basis of the perceived quality of the results. This comes
with the risk of erroneous outcomes and potentially misleading findings. To
address this issue, we posit that the current focus on result reproducibility
should go hand in hand with the emphasis on coding best practices. We bolster
our call to the NLP community by presenting a case study, in which we identify
(and correct) three bugs in widely used open-source implementations of the
state-of-the-art Conformer architecture. Through comparative experiments on
automatic speech recognition and translation in various language settings, we
demonstrate that the existence of bugs does not prevent the achievement of good
and reproducible results and can lead to incorrect conclusions that potentially
misguide future research. In response to this, this study is a call to action
toward the adoption of coding best practices aimed at fostering correctness and
improving the quality of the developed software.
[LINK]
http://arxiv.org/abs/2303.16166v2
[DATE]
2023-03-29 07:49:54+00:00
[CATEGORIES]
cs.CL
Zero-Shot Rumor Detection with Propagation Structure via Prompt Learning
[AUTHORS]
Hongzhan Lin, Pengyao Yi, Jing Ma, Haiyun Jiang, Ziyang Luo, Shuming Shi, Ruifang Liu
[ABSTRACT]
The spread of rumors along with breaking events seriously hinders the truth
in the era of social media. Previous studies reveal that due to the lack of
annotated resources, rumors presented in minority languages are hard to be
detected. Furthermore, the unforeseen breaking events not involved in
yesterday’s news exacerbate the scarcity of data resources. In this work, we
propose a novel zero-shot framework based on prompt learning to detect rumors
falling in different domains or presented in different languages. More
specifically, we firstly represent rumor circulated on social media as diverse
propagation threads, then design a hierarchical prompt encoding mechanism to
learn language-agnostic contextual representations for both prompts and rumor
data. To further enhance domain adaptation, we model the domain-invariant
structural features from the propagation threads, to incorporate structural
position representations of influential community response. In addition, a new
virtual response augmentation method is used to improve model training.
Extensive experiments conducted on three real-world datasets demonstrate that
our proposed model achieves much better performance than state-of-the-art
methods and exhibits a superior capacity for detecting rumors at early stages.
[COMMENTS]
AAAI 2023
[LINK]
http://arxiv.org/abs/2212.01117v4
[DATE]
2023-03-29 06:50:57+00:00
[CATEGORIES]
cs.CL
TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models
[AUTHORS]
Md Kamrul Hasan, Md Saiful Islam, Sangwu Lee, Wasifur Rahman, Iftekhar Naim, Mohammed Ibrahim Khan, Ehsan Hoque
[LINK]
http://arxiv.org/abs/2303.15430v2
[DATE]
2023-03-29 04:49:46+00:00
[CATEGORIES]
cs.CL
cs.LG
Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning
[AUTHORS]
Namrata Shivagunde, Vladislav Lialin, Anna Rumshisky
[ABSTRACT]
Language model probing is often used to test specific capabilities of these
models. However, conclusions from such studies may be limited when the probing
benchmarks are small and lack statistical power. In this work, we introduce
new, larger datasets for negation (NEG-1500-SIMP) and role reversal (ROLE-1500)
inspired by psycholinguistic studies. We dramatically extend existing NEG-136
and ROLE-88 benchmarks using GPT3, increasing their size from 18 and 44
sentence pairs to 750 each. We also create another version of extended negation
dataset (NEG-1500-SIMP-TEMP), created using template-based generation. It
consists of 770 sentence pairs. We evaluate 22 models on the extended datasets,
seeing model performance dip 20-57% compared to the original smaller
benchmarks. We observe high levels of negation sensitivity in models like BERT
and ALBERT demonstrating that previous findings might have been skewed due to
smaller test sets. Finally, we observe that while GPT3 has generated all the
examples in ROLE-1500 is only able to solve 24.6% of them during probing.
[LINK]
http://arxiv.org/abs/2303.16445v1
[DATE]
2023-03-29 04:00:53+00:00
[CATEGORIES]
cs.CL
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
[AUTHORS]
Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan
[ABSTRACT]
Artificial Intelligence (AI) has made incredible progress recently. On the
one hand, advanced foundation models like ChatGPT can offer powerful
conversation, in-context learning and code generation abilities on a broad
range of open-domain tasks. They can also generate high-level solution outlines
for domain-specific tasks based on the common sense knowledge they have
acquired. However, they still face difficulties with some specialized tasks
because they lack enough domain-specific data during pre-training or they often
have errors in their neural network computations on those tasks that need
accurate executions. On the other hand, there are also many existing models and
systems (symbolic-based or neural-based) that can do some domain-specific tasks
very well. However, due to the different implementation or working mechanisms,
they are not easily accessible or compatible with foundation models. Therefore,
there is a clear and pressing need for a mechanism that can leverage foundation
models to propose task solution outlines and then automatically match some of
the sub-tasks in the outlines to the off-the-shelf models and systems with
special functionalities to complete them. Inspired by this, we introduce
TaskMatrix.AI as a new AI ecosystem that connects foundation models with
millions of APIs for task completion. Unlike most previous work that aimed to
improve a single AI model, TaskMatrix.AI focuses more on using existing
foundation models (as a brain-like central system) and APIs of other AI models
and systems (as sub-task solvers) to achieve diversified tasks in both digital
and physical domains. As a position paper, we will present our vision of how to
build such an ecosystem, explain each key component, and use study cases to
illustrate both the feasibility of this vision and the main challenges we need
to address next.
[LINK]
http://arxiv.org/abs/2303.16434v1
[DATE]
2023-03-29 03:30:38+00:00
[CATEGORIES]
cs.CL
Translating Radiology Reports into Plain Language using ChatGPT and GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential
[AUTHORS]
Qing Lyu, Josh Tan, Michael E. Zapadka, Janardhana Ponnatapura, Chuang Niu, Kyle J. Myers, Ge Wang, Christopher T. Whitlow
[LINK]
http://arxiv.org/abs/2303.09038v3
[DATE]
2023-03-29 03:22:52+00:00
[CATEGORIES]
cs.CL
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models
[AUTHORS]
Ning Bian, Xianpei Han, Le Sun, Hongyu Lin, Yaojie Lu, Ben He
[LINK]
http://arxiv.org/abs/2303.16421v1
[DATE]
2023-03-29 03:05:43+00:00
[CATEGORIES]
cs.CL
Zero-shot Clinical Entity Recognition using ChatGPT
[AUTHORS]
Yan Hu, Iqra Ameer, Xu Zuo, Xueqing Peng, Yujia Zhou, Zehan Li, Yiming Li, Jianfu Li, Xiaoqian Jiang, Hua Xu
[COMMENTS]
7 pages, 5 tables, 1 figure
[LINK]
http://arxiv.org/abs/2303.16416v1
[DATE]
2023-03-29 02:46:18+00:00
[CATEGORIES]
cs.CL
Hierarchical Video-Moment Retrieval and Step-Captioning
[AUTHORS]
Abhay Zala, Jaemin Cho, Satwik Kottur, Xilun Chen, Barlas Oğuz, Yasher Mehdad, Mohit Bansal
[COMMENTS]
CVPR 2023 (15 pages; the first two authors contributed equally;
Project website: https://hirest-cvpr2023.github.io)
[LINK]
http://arxiv.org/abs/2303.16406v1
[DATE]
2023-03-29 02:33:54+00:00
[CATEGORIES]
cs.CL
cs.LG
ChatGPT or academic scientist? Distinguishing authorship with over 99% accuracy using off-the-shelf machine learning tools
[AUTHORS]
Heather Desaire, Aleesa E. Chua, Madeline Isom, Romana Jarosova, David Hua
[LINK]
http://arxiv.org/abs/2303.16352v1
[DATE]
2023-03-28 23:16:00+00:00
[CATEGORIES]
cs.LG
cs.CL
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
[AUTHORS]
Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko
[COMMENTS]
Accepted at CVPR 2023
[LINK]
http://arxiv.org/abs/2303.16342v1
[DATE]
2023-03-28 22:45:40+00:00
[CATEGORIES]
cs.CL
A methodology to characterize bias and harmful stereotypes in natural language processing in Latin America
[AUTHORS]
Laura Alonso Alemany, Luciana Benotti, Hernán Maina, Lucía González, Mariela Rajngewerc, Lautaro Martínez, Jorge Sánchez, Mauro Schilman, Guido Ivetta, Alexia Halvorsen, Amanda Mata Rojo, Matías Bordone, Beatriz Busaniche
[LINK]
http://arxiv.org/abs/2207.06591v3
[DATE]
2023-03-28 21:22:17+00:00
[CATEGORIES]
cs.CL
Context-aware Fine-tuning of Self-supervised Speech Models
[AUTHORS]
Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe
[LINK]
http://arxiv.org/abs/2212.08542v2
[DATE]
2023-03-28 21:20:11+00:00
[CATEGORIES]
cs.CL
InceptionNeXt: When Inception Meets ConvNeXt
[AUTHORS]
Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
[COMMENTS]
Code: https://github.com/sail-sg/inceptionnext
[LINK]
http://arxiv.org/abs/2303.16900v1
[DATE]
2023-03-29 17:59:58+00:00
[CATEGORIES]
cs.LG
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
[AUTHORS]
Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan
[ABSTRACT]
Modeling sounds emitted from physical object interactions is critical for
immersive perceptual experiences in real and virtual worlds. Traditional
methods of impact sound synthesis use physics simulation to obtain a set of
physics parameters that could represent and synthesize the sound. However, they
require fine details of both the object geometries and impact locations, which
are rarely available in the real world and can not be applied to synthesize
impact sounds from common videos. On the other hand, existing video-driven deep
learning-based approaches could only capture the weak correspondence between
visual content and impact sounds since they lack of physics knowledge. In this
work, we propose a physics-driven diffusion model that can synthesize
high-fidelity impact sound for a silent video clip. In addition to the video
content, we propose to use additional physics priors to guide the impact sound
synthesis procedure. The physics priors include both physics parameters that
are directly estimated from noisy real-world impact sound examples without
sophisticated setup and learned residual parameters that interpret the sound
environment via neural networks. We further implement a novel diffusion model
with specific training and inference strategies to combine physics priors and
visual information for impact sound synthesis. Experimental results show that
our model outperforms several existing systems in generating realistic impact
sounds. More importantly, the physics-based representations are fully
interpretable and transparent, thus enabling us to perform sound editing
flexibly.
[COMMENTS]
CVPR 2023. Project page:
https://sukun1045.github.io/video-physics-sound-diffusion/
[LINK]
http://arxiv.org/abs/2303.16897v1
[DATE]
2023-03-29 17:59:53+00:00
[CATEGORIES]
cs.LG
Your Diffusion Model is Secretly a Zero-Shot Classifier
[AUTHORS]
Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak
[ABSTRACT]
The recent wave of large-scale text-to-image diffusion models has
dramatically increased our text-based image generation abilities. These models
can generate realistic images for a staggering variety of prompts and exhibit
impressive compositional generalization abilities. Almost all use cases thus
far have solely focused on sampling; however, diffusion models can also provide
conditional density estimates, which are useful for tasks beyond image
generation. In this paper, we show that the density estimates from large-scale
text-to-image diffusion models like Stable Diffusion can be leveraged to
perform zero-shot classification without any additional training. Our
generative approach to classification, which we call Diffusion Classifier,
attains strong results on a variety of benchmarks and outperforms alternative
methods of extracting knowledge from diffusion models. Although a gap remains
between generative and discriminative approaches on zero-shot recognition
tasks, we find that our diffusion-based approach has stronger multimodal
relational reasoning abilities than competing discriminative approaches.
Finally, we use Diffusion Classifier to extract standard classifiers from
class-conditional diffusion models trained on ImageNet. Even though these
models are trained with weak augmentations and no regularization, they approach
the performance of SOTA discriminative classifiers. Overall, our results are a
step toward using generative over discriminative models for downstream tasks.
Results and visualizations at https://diffusion-classifier.github.io/
[COMMENTS]
Website at https://diffusion-classifier.github.io/
[LINK]
http://arxiv.org/abs/2303.16203v2
[DATE]
2023-03-29 17:58:24+00:00
[CATEGORIES]
cs.LG
Towards Understanding the Effect of Pretraining Label Granularity
[AUTHORS]
Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley H. Chan, Enming Luo
[LINK]
http://arxiv.org/abs/2303.16887v1
[DATE]
2023-03-29 17:56:36+00:00
[CATEGORIES]
cs.LG
The Hidden-Manifold Hopfield Model and a learning phase transition
[AUTHORS]
Matteo Negri, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico Malatesta
[LINK]
http://arxiv.org/abs/2303.16880v1
[DATE]
2023-03-29 17:39:21+00:00
[CATEGORIES]
cs.LG
ALUM: Adversarial Data Uncertainty Modeling from Latent Model Uncertainty Compensation
[AUTHORS]
Wei Wei, Jiahuan Zhou, Hongze Li, Ying Wu
[ABSTRACT]
It is critical that the models pay attention not only to accuracy but also to
the certainty of prediction. Uncertain predictions of deep models caused by
noisy data raise significant concerns in trustworthy AI areas. To explore and
handle uncertainty due to intrinsic data noise, we propose a novel method
called ALUM to simultaneously handle the model uncertainty and data uncertainty
in a unified scheme. Rather than solely modeling data uncertainty in the
ultimate layer of a deep model based on randomly selected training data, we
propose to explore mined adversarial triplets to facilitate data uncertainty
modeling and non-parametric uncertainty estimations to compensate for the
insufficiently trained latent model layers. Thus, the critical data uncertainty
and model uncertainty caused by noisy data can be readily quantified for
improving model robustness. Our proposed ALUM is model-agnostic which can be
easily implemented into any existing deep model with little extra computation
overhead. Extensive experiments on various noisy learning tasks validate the
superior robustness and generalization ability of our method. The code is
released at https://github.com/wwzjer/ALUM.
[COMMENTS]
10 pages, 5 figures
[LINK]
http://arxiv.org/abs/2303.16866v1
[DATE]
2023-03-29 17:24:12+00:00
[CATEGORIES]
cs.LG
Beyond Empirical Risk Minimization: Local Structure Preserving Regularization for Improving Adversarial Robustness
[AUTHORS]
Wei Wei, Jiahuan Zhou, Ying Wu
[ABSTRACT]
It is broadly known that deep neural networks are susceptible to being fooled
by adversarial examples with perturbations imperceptible by humans. Various
defenses have been proposed to improve adversarial robustness, among which
adversarial training methods are most effective. However, most of these methods
treat the training samples independently and demand a tremendous amount of
samples to train a robust network, while ignoring the latent structural
information among these samples. In this work, we propose a novel Local
Structure Preserving (LSP) regularization, which aims to preserve the local
structure of the input space in the learned embedding space. In this manner,
the attacking effect of adversarial samples lying in the vicinity of clean
samples can be alleviated. We show strong empirical evidence that with or
without adversarial training, our method consistently improves the performance
of adversarial robustness on several image classification datasets compared to
the baselines and some state-of-the-art approaches, thus providing promising
direction for future research.
[COMMENTS]
13 pages, 4 figures
[LINK]
http://arxiv.org/abs/2303.16861v1
[DATE]
2023-03-29 17:18:58+00:00
[CATEGORIES]
cs.LG
Physical Deep Reinforcement Learning Towards Safety Guarantee
[AUTHORS]
Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo
[COMMENTS]
Working Paper
[LINK]
http://arxiv.org/abs/2303.16860v1
[DATE]
2023-03-29 17:17:59+00:00
[CATEGORIES]
cs.LG
Diffusion Schrödinger Bridge Matching
[AUTHORS]
Yuyang Shi, Valentin De Bortoli, Andrew Campbell, Arnaud Doucet
[ABSTRACT]
Solving transport problems, i.e. finding a map transporting one given
distribution to another, has numerous applications in machine learning. Novel
mass transport methods motivated by generative modeling have recently been
proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models
(FMMs) implement such a transport through a Stochastic Differential Equation
(SDE) or an Ordinary Differential Equation (ODE). However, while it is
desirable in many applications to approximate the deterministic dynamic Optimal
Transport (OT) map which admits attractive properties, DDMs and FMMs are not
guaranteed to provide transports close to the OT map. In contrast,
Schr"odinger bridges (SBs) compute stochastic dynamic mappings which recover
entropy-regularized versions of OT. Unfortunately, existing numerical methods
approximating SBs either scale poorly with dimension or accumulate errors
across iterations. In this work, we introduce Iterative Markovian Fitting, a
new methodology for solving SB problems, and Diffusion Schr"odinger Bridge
Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM
significantly improves over previous SB numerics and recovers as
special/limiting cases various recent transport methods. We demonstrate the
performance of DSBM on a variety of problems.
[LINK]
http://arxiv.org/abs/2303.16852v1
[DATE]
2023-03-29 16:59:22+00:00
[CATEGORIES]
cs.LG
Generalizable Denoising of Microscopy Images using Generative Adversarial Networks and Contrastive Learning
[AUTHORS]
Felix Fuentes-Hurtado, Jean-Baptiste Sibarita, Virgile Viasnoff
[ABSTRACT]
Microscopy images often suffer from high levels of noise, which can hinder
further analysis and interpretation. Content-aware image restoration (CARE)
methods have been proposed to address this issue, but they often require large
amounts of training data and suffer from over-fitting. To overcome these
challenges, we propose a novel framework for few-shot microscopy image
denoising. Our approach combines a generative adversarial network (GAN) trained
via contrastive learning (CL) with two structure preserving loss terms
(Structural Similarity Index and Total Variation loss) to further improve the
quality of the denoised images using little data. We demonstrate the
effectiveness of our method on three well-known microscopy imaging datasets,
and show that we can drastically reduce the amount of training data while
retaining the quality of the denoising, thus alleviating the burden of
acquiring paired data and enabling few-shot learning. The proposed framework
can be easily extended to other image restoration tasks and has the potential
to significantly advance the field of microscopy image analysis.
[LINK]
http://arxiv.org/abs/2303.15214v2
[DATE]
2023-03-29 16:51:15+00:00
[CATEGORIES]
cs.LG
A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests
[AUTHORS]
Bohang Zhang, Guhao Feng, Yiheng Du, Di He, Liwei Wang
[COMMENTS]
76 pages, 13 figures
[LINK]
http://arxiv.org/abs/2302.07090v2
[DATE]
2023-03-29 16:48:49+00:00
[CATEGORIES]
cs.LG
Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees
[AUTHORS]
Ziwen Wang, Yancheng Yuan, Jiaming Ma, Tieyong Zeng, Defeng Sun
[LINK]
http://arxiv.org/abs/2303.16841v1
[DATE]
2023-03-29 16:47:25+00:00
[CATEGORIES]
cs.LG
Selective experience replay compression using coresets for lifelong deep reinforcement learning in medical imaging
[AUTHORS]
Guangyao Zheng, Samson Zhou, Vishwa S. Parekh, Michael A. Jacobs, Vladimir Braverman
[ABSTRACT]
Selective experience replay is a popular strategy for integrating lifelong
learning with deep reinforcement learning. Selective experience replay aims to
recount selected experiences from previous tasks to avoid catastrophic
forgetting. Furthermore, selective experience replay based techniques are model
agnostic and allow experiences to be shared across different models. However,
storing experiences from all previous tasks make lifelong learning using
selective experience replay computationally very expensive and impractical as
the number of tasks increase. To that end, we propose a reward
distribution-preserving coreset compression technique for compressing
experience replay buffers stored for selective experience replay.
We evaluated the coreset compression technique on the brain tumor
segmentation (BRATS) dataset for the task of ventricle localization and on the
whole-body MRI for localization of left knee cap, left kidney, right
trochanter, left lung, and spleen. The coreset lifelong learning models trained
on a sequence of 10 different brain MR imaging environments demonstrated
excellent performance localizing the ventricle with a mean pixel error distance
of 12.93 for the compression ratio of 10x. In comparison, the conventional
lifelong learning model localized the ventricle with a mean pixel distance of
10.87. Similarly, the coreset lifelong learning models trained on whole-body
MRI demonstrated no significant difference (p=0.28) between the 10x compressed
coreset lifelong learning models and conventional lifelong learning models for
all the landmarks. The mean pixel distance for the 10x compressed models across
all the landmarks was 25.30, compared to 19.24 for the conventional lifelong
learning models. Our results demonstrate that the potential of the
coreset-based ERB compression method for compressing experiences without a
significant drop in performance.
[LINK]
http://arxiv.org/abs/2302.11510v3
[DATE]
2023-03-29 16:25:43+00:00
[CATEGORIES]
cs.LG
Decision Making for Autonomous Driving in Interactive Merge Scenarios via Learning-based Prediction
[AUTHORS]
Salar Arbabi, Davide Tavernini, Saber Fallah, Richard Bowden
[ABSTRACT]
Autonomous agents that drive on roads shared with human drivers must reason
about the nuanced interactions among traffic participants. This poses a highly
challenging decision making problem since human behavior is influenced by a
multitude of factors (e.g., human intentions and emotions) that are hard to
model. This paper presents a decision making approach for autonomous driving,
focusing on the complex task of merging into moving traffic where uncertainty
emanates from the behavior of other drivers and imperfect sensor measurements.
We frame the problem as a partially observable Markov decision process (POMDP)
and solve it online with Monte Carlo tree search. The solution to the POMDP is
a policy that performs high-level driving maneuvers, such as giving way to an
approaching car, keeping a safe distance from the vehicle in front or merging
into traffic. Our method leverages a model learned from data to predict the
future states of traffic while explicitly accounting for interactions among the
surrounding agents. From these predictions, the autonomous vehicle can
anticipate the future consequences of its actions on the environment and
optimize its trajectory accordingly. We thoroughly test our approach in
simulation, showing that the autonomous vehicle can adapt its behavior to
different situations. We also compare against other methods, demonstrating an
improvement with respect to the considered performance metrics.
[COMMENTS]
12 pages, 12 figures
[LINK]
http://arxiv.org/abs/2303.16821v1
[DATE]
2023-03-29 16:12:45+00:00
[CATEGORIES]
cs.LG
PAC-Bayesian bounds for learning LTI-ss systems with input from empirical loss
[AUTHORS]
Deividas Eringis, John Leth, Zheng-Hua Tan, Rafael Wisniewski, Mihaly Petreczky
[ABSTRACT]
In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian error
bound for linear time-invariant (LTI) stochastic dynamical systems with inputs.
Such bounds are widespread in machine learning, and they are useful for
characterizing the predictive power of models learned from finitely many data
points. In particular, with the bound derived in this paper relates future
average prediction errors with the prediction error generated by the model on
the data used for learning. In turn, this allows us to provide finite-sample
error bounds for a wide class of learning/system identification algorithms.
Furthermore, as LTI systems are a sub-class of recurrent neural networks
(RNNs), these error bounds could be a first step towards PAC-Bayesian bounds
for RNNs.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2212.14838
[LINK]
http://arxiv.org/abs/2303.16816v1
[DATE]
2023-03-29 16:06:07+00:00
[CATEGORIES]
cs.LG
Optimal approximation of $C^k$-functions using shallow complex-valued neural networks
[AUTHORS]
Paul Geuchen, Felix Voigtlaender
[ABSTRACT]
We prove a quantitative result for the approximation of functions of
regularity $C^k$ (in the sense of real variables) defined on the complex cube
$\Omega_n := [-1,1]^n +i[-1,1]^n\subseteq \mathbb{C}^n$ using shallow
complex-valued neural networks. Precisely, we consider neural networks with a
single hidden layer and $m$ neurons, i.e., networks of the form $z \mapsto
\sum_{j=1}^m \sigma_j \cdot \phi\big(\rho_j^T z + b_j\big)$ and show that one
can approximate every function in $C^k \left( \Omega_n; \mathbb{C}\right)$
using a function of that form with error of the order $m^{-k/(2n)}$ as $m \to
\infty$, provided that the activation function $\phi: \mathbb{C} \to
\mathbb{C}$ is smooth but not polyharmonic on some non-empty open set.
Furthermore, we show that the selection of the weights $\sigma_j, b_j \in
\mathbb{C}$ and $\rho_j \in \mathbb{C}^n$ is continuous with respect to $f$ and
prove that the derived rate of approximation is optimal under this continuity
assumption. We also discuss the optimality of the result for a possibly
discontinuous choice of the weights.
[LINK]
http://arxiv.org/abs/2303.16813v1
[DATE]
2023-03-29 15:56:43+00:00
[CATEGORIES]
cs.LG
Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification
[AUTHORS]
Ricardo Montalvo-Lezama, Berenice Montalvo-Lezama, Gibran Fuentes-Pineda
[ABSTRACT]
In this paper, we study the transferability of ImageNet spatial and Kinetics
spatio-temporal representations to multi-label Movie Trailer Genre
Classification (MTGC). In particular, we present an extensive evaluation of the
transferability of ConvNet and Transformer models pretrained on ImageNet and
Kinetics to Trailers12k, a new manually-curated movie trailer dataset composed
of 12,000 videos labeled with 10 different genres and associated metadata. We
analyze different aspects that can influence transferability, such as frame
rate, input video extension, and spatio-temporal modeling. In order to reduce
the spatio-temporal structure gap between ImageNet/Kinetics and Trailers12k, we
propose Dual Image and Video Transformer Architecture (DIViTA), which performs
shot detection so as to segment the trailer into highly correlated clips,
providing a more cohesive input for pretrained backbones and improving
transferability (a 1.83% increase for ImageNet and 3.75% for Kinetics). Our
results demonstrate that representations learned on either ImageNet or Kinetics
are comparatively transferable to Trailers12k. Moreover, both datasets provide
complementary information that can be combined to improve classification
performance (a 2.91% gain compared to the top single pretraining).
Interestingly, using lightweight ConvNets as pretrained backbones resulted in
only a 3.46% drop in classification performance compared with the top
Transformer while requiring only 11.82% of its parameters and 0.81% of its
FLOPS.
[LINK]
http://arxiv.org/abs/2210.07983v4
[DATE]
2023-03-29 15:55:03+00:00
[CATEGORIES]
cs.LG
Context-aware Bayesian Mixed Multinomial Logit Model
[AUTHORS]
Mirosława Łukawska, Anders Fjendbo Jensen, Filipe Rodrigues
[ABSTRACT]
The mixed multinomial logit model assumes constant preference parameters of a
decision-maker throughout different choice situations, which may be considered
too strong for certain choice modelling applications. This paper proposes an
effective approach to model context-dependent intra-respondent heterogeneity,
thereby introducing the concept of the Context-aware Bayesian mixed multinomial
logit model, where a neural network maps contextual information to
interpretable shifts in the preference parameters of each individual in each
choice occasion. The proposed model offers several key advantages. First, it
supports both continuous and discrete variables, as well as complex non-linear
interactions between both types of variables. Secondly, each context
specification is considered jointly as a whole by the neural network rather
than each variable being considered independently. Finally, since the neural
network parameters are shared across all decision-makers, it can leverage
information from other decision-makers to infer the effect of a particular
context on a particular decision-maker. Even though the context-aware Bayesian
mixed multinomial logit model allows for flexible interactions between
attributes, the increase in computational complexity is minor, compared to the
mixed multinomial logit model. We illustrate the concept and interpretation of
the proposed model in a simulation study. We furthermore present a real-world
case study from the travel behaviour domain - a bicycle route choice model,
based on a large-scale, crowdsourced dataset of GPS trajectories including
119,448 trips made by 8,555 cyclists.
[LINK]
http://arxiv.org/abs/2210.05737v2
[DATE]
2023-03-29 15:42:30+00:00
[CATEGORIES]
cs.LG
Module-based regularization improves Gaussian graphical models when observing noisy data
[AUTHORS]
Magnus Neuman, Joaquín Calatayud, Viktor Tasselius, Martin Rosvall
[ABSTRACT]
Researchers often represent relations in multi-variate correlational data
using Gaussian graphical models, which require regularization to sparsify the
models. Acknowledging that they often study the modular structure of the
inferred network, we suggest integrating it in the cross-validation of the
regularization strength to balance under- and overfitting. Using synthetic and
real data, we show that this approach allows us to better recover and infer
modular structure in noisy data compared with the graphical lasso, a standard
approach using the Gaussian log-likelihood when cross-validating the
regularization strength.
[LINK]
http://arxiv.org/abs/2303.16796v1
[DATE]
2023-03-29 15:38:25+00:00
[CATEGORIES]
cs.LG
The Prominence of Artificial Intelligence in COVID-19
[AUTHORS]
MD Abdullah Al Nasim, Aditi Dhali, Faria Afrin, Noshin Tasnim Zaman, Nazmul Karimm, Md Mahim Anjum Haque
[COMMENTS]
63 pages, 3 tables, 17 figures
[LINK]
http://arxiv.org/abs/2111.09537v3
[DATE]
2023-03-29 15:33:16+00:00
[CATEGORIES]
cs.LG
GRAF: Graph Attention-aware Fusion Networks
[AUTHORS]
Ziynet Nesibe Kesimoglu, Serdar Bozdag
[ABSTRACT]
A large number of real-world networks include multiple types of nodes and
edges. Graph Neural Network (GNN) emerged as a deep learning framework to
utilize node features on graph-structured data showing superior performance.
However, popular GNN-based architectures operate on one homogeneous network.
Enabling them to work on multiple networks brings additional challenges due to
the heterogeneity of the networks and the multiplicity of the existing
associations. In this study, we present a computational approach named GRAF
utilizing GNN-based approaches on multiple networks with the help of attention
mechanisms and network fusion. Using attention-based neighborhood aggregation,
GRAF learns the importance of each neighbor per node (called node-level
attention) followed by the importance of association (called association-level
attention) in a hierarchical way. Then, GRAF processes a network fusion step
weighing each edge according to learned node- and association-level attention,
which results in a fused enriched network. Considering that the fused network
could be a highly dense network with many weak edges depending on the given
input networks, we included an edge elimination step with respect to edges’
weights. Finally, GRAF utilizes Graph Convolutional Network (GCN) on the fused
network and incorporates the node features on the graph-structured data for the
prediction task or any other downstream analysis. Our extensive evaluations of
prediction tasks from different domains showed that GRAF outperformed the
state-of-the-art methods. Utilization of learned node-level and
association-level attention allowed us to prioritize the edges properly. The
source code for our tool is publicly available at
https://github.com/bozdaglab/GRAF.
[COMMENTS]
11 pages, 1 figure
[LINK]
http://arxiv.org/abs/2303.16781v1
[DATE]
2023-03-29 15:17:05+00:00
[CATEGORIES]
cs.LG
Multi-View Clustering via Semi-non-negative Tensor Factorization
[AUTHORS]
Jing Li, Quanxue Gao, Qianqian Wang, Wei Xia, Xinbo Gao
[ABSTRACT]
Multi-view clustering (MVC) based on non-negative matrix factorization (NMF)
and its variants have received a huge amount of attention in recent years due
to their advantages in clustering interpretability. However, existing NMF-based
multi-view clustering methods perform NMF on each view data respectively and
ignore the impact of between-view. Thus, they can’t well exploit the
within-view spatial structure and between-view complementary information. To
resolve this issue, we present semi-non-negative tensor factorization
(Semi-NTF) and develop a novel multi-view clustering based on Semi-NTF with
one-side orthogonal constraint. Our model directly performs Semi-NTF on the
3rd-order tensor which is composed of anchor graphs of views. Thus, our model
directly considers the between-view relationship. Moreover, we use the tensor
Schatten p-norm regularization as a rank approximation of the 3rd-order tensor
which characterizes the cluster structure of multi-view data and exploits the
between-view complementary information. In addition, we provide an optimization
algorithm for the proposed method and prove mathematically that the algorithm
always converges to the stationary KKT point. Extensive experiments on various
benchmark datasets indicate that our proposed method is able to achieve
satisfactory clustering performance.
[LINK]
http://arxiv.org/abs/2303.16748v1
[DATE]
2023-03-29 14:54:19+00:00
[CATEGORIES]
cs.LG
Environmental Sensor Placement with Convolutional Gaussian Neural Processes
[AUTHORS]
Tom R. Andersson, Wessel P. Bruinsma, Stratis Markou, James Requeima, Alejandro Coca-Castro, Anna Vaughan, Anna-Louise Ellis, Matthew A. Lazzara, Daniel C. Jones, J. Scott Hosking, Richard E. Turner
[ABSTRACT]
Environmental sensors are crucial for monitoring weather conditions and the
impacts of climate change. However, it is challenging to maximise measurement
informativeness and place sensors efficiently, particularly in remote regions
like Antarctica. Probabilistic machine learning models can evaluate placement
informativeness by predicting the uncertainty reduction provided by a new
sensor. Gaussian process (GP) models are widely used for this purpose, but they
struggle with capturing complex non-stationary behaviour and scaling to large
datasets. This paper proposes using a convolutional Gaussian neural process
(ConvGNP) to address these issues. A ConvGNP uses neural networks to
parameterise a joint Gaussian distribution at arbitrary target locations,
enabling flexibility and scalability. Using simulated surface air temperature
anomaly over Antarctica as ground truth, the ConvGNP learns spatial and
seasonal non-stationarities, outperforming a non-stationary GP baseline. In a
simulated sensor placement experiment, the ConvGNP better predicts the
performance boost obtained from new observations than GP baselines, leading to
more informative sensor placements. We contrast our approach with physics-based
sensor placement methods and propose future work towards an operational sensor
placement recommendation system. This system could help to realise
environmental digital twins that actively direct measurement sampling to
improve the digital representation of reality.
[COMMENTS]
In review for the Climate Informatics 2023 special issue of
Environmental Data Science
[LINK]
http://arxiv.org/abs/2211.10381v4
[DATE]
2023-03-29 14:50:13+00:00
[CATEGORIES]
cs.LG
Who You Play Affects How You Play: Predicting Sports Performance Using Graph Attention Networks With Temporal Convolution
[AUTHORS]
Rui Luo, Vikram Krishnamurthy
[ABSTRACT]
This study presents a novel deep learning method, called GATv2-GCN, for
predicting player performance in sports. To construct a dynamic player
interaction graph, we leverage player statistics and their interactions during
gameplay. We use a graph attention network to capture the attention that each
player pays to each other, allowing for more accurate modeling of the dynamic
player interactions. To handle the multivariate player statistics time series,
we incorporate a temporal convolution layer, which provides the model with
temporal predictive power. We evaluate the performance of our model using
real-world sports data, demonstrating its effectiveness in predicting player
performance. Furthermore, we explore the potential use of our model in a sports
betting context, providing insights into profitable strategies that leverage
our predictive power. The proposed method has the potential to advance the
state-of-the-art in player performance prediction and to provide valuable
insights for sports analytics and betting industries.
[LINK]
http://arxiv.org/abs/2303.16741v1
[DATE]
2023-03-29 14:48:51+00:00
[CATEGORIES]
cs.LG
Multi-Agent Reinforcement Learning with Action Masking for UAV-enabled Mobile Communications
[AUTHORS]
Danish Rizvi, David Boyle
[ABSTRACT]
Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations
to provide ad hoc communications infrastructure. Building upon prior research
efforts which consider either static nodes, 2D trajectories or single UAV
systems, this paper focuses on the use of multiple UAVs for providing wireless
communication to mobile users in the absence of terrestrial communications
infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA
power allocation to maximize system throughput. Firstly, a weighted
K-means-based clustering algorithm establishes UAV-user associations at regular
intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with
action masking is then explored. Unlike training each UAV separately using DQN,
the SDQN reduces training time by using the experiences of multiple UAVs
instead of a single agent. We also show that SDQN can be used to train a
multi-agent system with differing action spaces. Simulation results confirm
that: 1) training a shared DQN outperforms a conventional DQN in terms of
maximum system throughput (+20%) and training time (-10%); 2) it can converge
for agents with different action spaces, yielding a 9% increase in throughput
compared to mutual learning algorithms; and 3) combining NOMA with an SDQN
architecture enables the network to achieve a better sum rate compared with
existing baseline schemes.
[LINK]
http://arxiv.org/abs/2303.16737v1
[DATE]
2023-03-29 14:41:03+00:00
[CATEGORIES]
cs.LG
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[AUTHORS]
Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao
[ABSTRACT]
Scale is the primary factor for building a powerful foundation model that
could well generalize to a variety of downstream tasks. However, it is still
challenging to train video foundation models with billions of parameters. This
paper shows that video masked autoencoder (VideoMAE) is a scalable and general
self-supervised pre-trainer for building video foundation models. We scale the
VideoMAE in both model and data with a core design. Specifically, we present a
dual masking strategy for efficient pre-training, with an encoder operating on
a subset of video tokens and a decoder processing another subset of video
tokens. Although VideoMAE is very efficient due to high masking ratio in
encoder, masking decoder can still further reduce the overall computational
cost. This enables the efficient pre-training of billion-level models in video.
We also use a progressive training paradigm that involves an initial
pre-training on a diverse multi-sourced unlabeled dataset, followed by a
post-pre-training on a mixed labeled dataset. Finally, we successfully train a
video ViT model with a billion parameters, which achieves a new
state-of-the-art performance on the datasets of Kinetics (90.0% on K400 and
89.9% on K600) and Something-Something (68.7% on V1 and 77.0% on V2). In
addition, we extensively verify the pre-trained video ViT models on a variety
of downstream tasks, demonstrating its effectiveness as a general video
representation learner.
[COMMENTS]
CVPR 2023 camera-ready version
[LINK]
http://arxiv.org/abs/2303.16727v1
[DATE]
2023-03-29 14:28:41+00:00
[CATEGORIES]
cs.LG
Machine Learning for Uncovering Biological Insights in Spatial Transcriptomics Data
[AUTHORS]
Alex J. Lee, Robert Cahill, Reza Abbasi-Asl
[LINK]
http://arxiv.org/abs/2303.16725v1
[DATE]
2023-03-29 14:22:08+00:00
[CATEGORIES]
cs.LG
Maximum likelihood method revisited: Gauge symmetry in Kullback – Leibler divergence and performance-guaranteed regularization
[AUTHORS]
Akihisa Ichiki
[ABSTRACT]
The maximum likelihood method is the best-known method for estimating the
probabilities behind the data. However, the conventional method obtains the
probability model closest to the empirical distribution, resulting in
overfitting. Then regularization methods prevent the model from being
excessively close to the wrong probability, but little is known systematically
about their performance. The idea of regularization is similar to
error-correcting codes, which obtain optimal decoding by mixing suboptimal
solutions with an incorrectly received code. The optimal decoding in
error-correcting codes is achieved based on gauge symmetry. We propose a
theoretically guaranteed regularization in the maximum likelihood method by
focusing on a gauge symmetry in Kullback – Leibler divergence. In our
approach, we obtain the optimal model without the need to search for
hyperparameters frequently appearing in regularization.
[COMMENTS]
8 pages, 2 figures
[LINK]
http://arxiv.org/abs/2303.16721v1
[DATE]
2023-03-29 14:17:21+00:00
[CATEGORIES]
cs.LG
Topological Point Cloud Clustering
[AUTHORS]
Vincent P. Grande, Michael T. Schaub
[ABSTRACT]
We present Topological Point Cloud Clustering (TPCC), a new method to cluster
points in an arbitrary point cloud based on their contribution to global
topological features. TPCC synthesizes desirable features from spectral
clustering and topological data analysis and is based on considering the
spectral properties of a simplicial complex associated to the considered point
cloud. As it is based on considering sparse eigenvector computations, TPCC is
similarly easy to interpret and implement as spectral clustering. However, by
focusing not just on a single matrix associated to a graph created from the
point cloud data, but on a whole set of Hodge-Laplacians associated to an
appropriately constructed simplicial complex, we can leverage a far richer set
of topological features to characterize the data points within the point cloud
and benefit from the relative robustness of topological techniques against
noise. We test the performance of TPCC on both synthetic and real-world data
and compare it with classical spectral clustering.
[LINK]
http://arxiv.org/abs/2303.16716v1
[DATE]
2023-03-29 14:15:38+00:00
[CATEGORIES]
cs.LG
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness
[AUTHORS]
Xingjun Ma, Linxi Jiang, Hanxun Huang, Zejia Weng, James Bailey, Yu-Gang Jiang
[ABSTRACT]
Evaluating the robustness of a defense model is a challenging task in
adversarial robustness research. Obfuscated gradients have previously been
found to exist in many defense methods and cause a false signal of robustness.
In this paper, we identify a more subtle situation called Imbalanced Gradients
that can also cause overestimated adversarial robustness. The phenomenon of
imbalanced gradients occurs when the gradient of one term of the margin loss
dominates and pushes the attack towards to a suboptimal direction. To exploit
imbalanced gradients, we formulate a Margin Decomposition (MD) attack that
decomposes a margin loss into individual terms and then explores the
attackability of these terms separately via a two-stage process. We also
propose a multi-targeted and ensemble version of our MD attack. By
investigating 24 defense models proposed since 2018, we find that 11 models are
susceptible to a certain degree of imbalanced gradients and our MD attack can
decrease their robustness evaluated by the best standalone baseline attack by
more than 1%. We also provide an in-depth investigation on the likely causes of
imbalanced gradients and effective countermeasures. Our code is available at
https://github.com/HanxunH/MDAttack.
[COMMENTS]
To appear in Machine Learning
[LINK]
http://arxiv.org/abs/2006.13726v4
[DATE]
2023-03-29 13:57:28+00:00
[CATEGORIES]
cs.LG
FuNVol: A Multi-Asset Implied Volatility Market Simulator using Functional Principal Components and Neural SDEs
[AUTHORS]
Vedant Choudhary, Sebastian Jaimungal, Maxime Bergeron
[ABSTRACT]
Here, we introduce a new approach for generating sequences of implied
volatility (IV) surfaces across multiple assets that is faithful to historical
prices. We do so using a combination of functional data analysis and neural
stochastic differential equations (SDEs) combined with a probability integral
transform penalty to reduce model misspecification. We demonstrate that
learning the joint dynamics of IV surfaces and prices produces market scenarios
that are consistent with historical features and lie within the sub-manifold of
surfaces that are essentially free of static arbitrage. Finally, we demonstrate
that delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
[COMMENTS]
30 pages, 12 figures, 5 tables
[LINK]
http://arxiv.org/abs/2303.00859v2
[DATE]
2023-03-29 13:54:36+00:00
[CATEGORIES]
cs.LG
TraVaG: Differentially Private Trace Variant Generation Using GANs
[AUTHORS]
Majid Rafiei, Frederik Wangelik, Mahsa Pourbafrani, Wil M. P. van der Aalst
[ABSTRACT]
Process mining is rapidly growing in the industry. Consequently, privacy
concerns regarding sensitive and private information included in event data,
used by process mining algorithms, are becoming increasingly relevant.
State-of-the-art research mainly focuses on providing privacy guarantees, e.g.,
differential privacy, for trace variants that are used by the main process
mining techniques, e.g., process discovery. However, privacy preservation
techniques for releasing trace variants still do not fulfill all the
requirements of industry-scale usage. Moreover, providing privacy guarantees
when there exists a high rate of infrequent trace variants is still a
challenge. In this paper, we introduce TraVaG as a new approach for releasing
differentially private trace variants based on \text{Generative Adversarial
Networks} (GANs) that provides industry-scale benefits and enhances the level
of privacy guarantees when there exists a high ratio of infrequent variants.
Moreover, TraVaG overcomes shortcomings of conventional privacy preservation
techniques such as bounding the length of variants and introducing fake
variants. Experimental results on real-life event data show that our approach
outperforms state-of-the-art techniques in terms of privacy guarantees, plain
data utility preservation, and result utility preservation.
[LINK]
http://arxiv.org/abs/2303.16704v1
[DATE]
2023-03-29 13:54:32+00:00
[CATEGORIES]
cs.LG
Probabilistic inverse optimal control with local linearization for non-linear partially observable systems
[AUTHORS]
Dominik Straub, Matthias Schultheis, Heinz Koeppl, Constantin A. Rothkopf
[ABSTRACT]
Inverse optimal control methods can be used to characterize behavior in
sequential decision-making tasks. Most existing work, however, requires the
control signals to be known, or is limited to fully-observable or linear
systems. This paper introduces a probabilistic approach to inverse optimal
control for stochastic non-linear systems with missing control signals and
partial observability that unifies existing approaches. By using an explicit
model of the noise characteristics of the sensory and control systems of the
agent in conjunction with local linearization techniques, we derive an
approximate likelihood for the model parameters, which can be computed within a
single forward pass. We evaluate our proposed method on stochastic and
partially observable version of classic control tasks, a navigation task, and a
manual reaching task. The proposed method has broad applicability, ranging from
imitation learning to sensorimotor neuroscience.
[LINK]
http://arxiv.org/abs/2303.16698v1
[DATE]
2023-03-29 13:51:06+00:00
[CATEGORIES]
cs.LG
Neuro-symbolic Rule Learning in Real-world Classification Tasks
[AUTHORS]
Kexin Gu Baugh, Nuri Cingillioglu, Alessandra Russo
[ABSTRACT]
Neuro-symbolic rule learning has attracted lots of attention as it offers
better interpretability than pure neural models and scales better than symbolic
rule learning. A recent approach named pix2rule proposes a neural Disjunctive
Normal Form (neural DNF) module to learn symbolic rules with feed-forward
layers. Although proved to be effective in synthetic binary classification,
pix2rule has not been applied to more challenging tasks such as multi-label and
multi-class classifications over real-world data. In this paper, we address
this limitation by extending the neural DNF module to (i) support rule learning
in real-world multi-class and multi-label classification tasks, (ii) enforce
the symbolic property of mutual exclusivity (i.e. predicting exactly one class)
in multi-class classification, and (iii) explore its scalability over large
inputs and outputs. We train a vanilla neural DNF model similar to pix2rule’s
neural DNF module for multi-label classification, and we propose a novel
extended model called neural DNF-EO (Exactly One) which enforces mutual
exclusivity in multi-class classification. We evaluate the classification
performance, scalability and interpretability of our neural DNF-based models,
and compare them against pure neural models and a state-of-the-art symbolic
rule learner named FastLAS. We demonstrate that our neural DNF-based models
perform similarly to neural networks, but provide better interpretability by
enabling the extraction of logical rules. Our models also scale well when the
rule search space grows in size, in contrast to FastLAS, which fails to learn
in multi-class classification tasks with 200 classes and in all multi-label
settings.
[COMMENTS]
Accepted at AAAI-MAKE 2023
[LINK]
http://arxiv.org/abs/2303.16674v1
[DATE]
2023-03-29 13:27:14+00:00
[CATEGORIES]
cs.LG
A Byzantine-Resilient Aggregation Scheme for Federated Learning via Matrix Autoregression on Client Updates
[AUTHORS]
Gabriele Tolomei, Edoardo Gabrielli, Dimitri Belli, Vittorio Miori
[ABSTRACT]
In this work, we propose FLANDERS, a novel federated learning (FL)
aggregation scheme robust to Byzantine attacks. FLANDERS considers the local
model updates sent by clients at each FL round as a matrix-valued time series.
Then, it identifies malicious clients as outliers of this time series by
comparing actual observations with those estimated by a matrix autoregressive
forecasting model. Experiments conducted on several datasets under different FL
settings demonstrate that FLANDERS matches the robustness of the most powerful
baselines against Byzantine clients. Furthermore, FLANDERS remains highly
effective even under extremely severe attack scenarios, as opposed to existing
defense strategies.
[LINK]
http://arxiv.org/abs/2303.16668v1
[DATE]
2023-03-29 13:22:20+00:00
[CATEGORIES]
cs.LG
Learning Flow Functions from Data with Applications to Nonlinear Oscillators
[AUTHORS]
Miguel Aguiar, Amritam Das, Karl H. Johansson
[ABSTRACT]
We describe a recurrent neural network (RNN) based architecture to learn the
flow function of a causal, time-invariant and continuous-time control system
from trajectory data. By restricting the class of control inputs to piecewise
constant functions, we show that learning the flow function is equivalent to
learning the input-to-state map of a discrete-time dynamical system. This
motivates the use of an RNN together with encoder and decoder networks which
map the state of the system to the hidden state of the RNN and back. We show
that the proposed architecture is able to approximate the flow function by
exploiting the system’s causality and time-invariance. The output of the
learned flow function model can be queried at any time instant. We
experimentally validate the proposed method using models of the Van der Pol and
FitzHugh Nagumo oscillators. In both cases, the results demonstrate that the
architecture is able to closely reproduce the trajectories of these two
systems. For the Van der Pol oscillator, we further show that the trained model
generalises to the system’s response with a prolonged prediction time horizon
as well as control inputs outside the training distribution. For the
FitzHugh-Nagumo oscillator, we show that the model accurately captures the
input-dependent phenomena of excitability.
[LINK]
http://arxiv.org/abs/2303.16656v1
[DATE]
2023-03-29 13:04:04+00:00
[CATEGORIES]
cs.LG
Targeted Adversarial Attacks on Wind Power Forecasts
[AUTHORS]
René Heinrich, Christoph Scholz, Stephan Vogt, Malte Lehna
[ABSTRACT]
In recent years, researchers proposed a variety of deep learning models for
wind power forecasting. These models predict the wind power generation of wind
farms or entire regions more accurately than traditional machine learning
algorithms or physical models. However, latest research has shown that deep
learning models can often be manipulated by adversarial attacks. Since wind
power forecasts are essential for the stability of modern power systems, it is
important to protect them from this threat. In this work, we investigate the
vulnerability of two different forecasting models to targeted, semitargeted,
and untargeted adversarial attacks. We consider a Long Short-Term Memory (LSTM)
network for predicting the power generation of a wind farm and a Convolutional
Neural Network (CNN) for forecasting the wind power generation throughout
Germany. Moreover, we propose the Total Adversarial Robustness Score (TARS), an
evaluation metric for quantifying the robustness of regression models to
targeted and semi-targeted adversarial attacks. It assesses the impact of
attacks on the model’s performance, as well as the extent to which the
attacker’s goal was achieved, by assigning a score between 0 (very vulnerable)
and 1 (very robust). In our experiments, the LSTM forecasting model was fairly
robust and achieved a TARS value of over 0.81 for all adversarial attacks
investigated. The CNN forecasting model only achieved TARS values below 0.06
when trained ordinarily, and was thus very vulnerable. Yet, its robustness
could be significantly improved by adversarial training, which always resulted
in a TARS above 0.46.
[COMMENTS]
20 pages, including appendix, 12 figures
[LINK]
http://arxiv.org/abs/2303.16633v1
[DATE]
2023-03-29 12:43:36+00:00
[CATEGORIES]
cs.LG
Class-Guided Image-to-Image Diffusion: Cell Painting from Brightfield Images with Class Labels
[AUTHORS]
Jan Oscar Cross-Zamirski, Praveen Anand, Guy Williams, Elizabeth Mouchet, Yinhai Wang, Carola-Bibiane Schönlieb
[ABSTRACT]
Image-to-image reconstruction problems with free or inexpensive metadata in
the form of class labels appear often in biological and medical image domains.
Existing text-guided or style-transfer image-to-image approaches do not
translate to datasets where additional information is provided as discrete
classes. We introduce and implement a model which combines image-to-image and
class-guided denoising diffusion probabilistic models. We train our model on a
real-world dataset of microscopy images used for drug discovery, with and
without incorporating metadata labels. By exploring the properties of
image-to-image diffusion with relevant labels, we show that class-guided
image-to-image diffusion can improve the meaningful content of the
reconstructed images and outperform the unguided model in useful downstream
tasks.
[LINK]
http://arxiv.org/abs/2303.08863v2
[DATE]
2023-03-29 12:42:48+00:00
[CATEGORIES]
cs.LG
Diffusion Denoised Smoothing for Certified and Adversarial Robust Out-Of-Distribution Detection
[AUTHORS]
Nicola Franco, Daniel Korth, Jeanette Miriam Lorenz, Karsten Roscher, Stephan Guennemann
[ABSTRACT]
As the use of machine learning continues to expand, the importance of
ensuring its safety cannot be overstated. A key concern in this regard is the
ability to identify whether a given sample is from the training distribution,
or is an “Out-Of-Distribution” (OOD) sample. In addition, adversaries can
manipulate OOD samples in ways that lead a classifier to make a confident
prediction. In this study, we present a novel approach for certifying the
robustness of OOD detection within a $\ell_2$-norm around the input, regardless
of network architecture and without the need for specific components or
additional training. Further, we improve current techniques for detecting
adversarial attacks on OOD samples, while providing high levels of certified
and adversarial robustness on in-distribution samples. The average of all OOD
detection metrics on CIFAR10/100 shows an increase of $\sim 13 \% / 5\%$
relative to previous approaches.
[LINK]
http://arxiv.org/abs/2303.14961v2
[DATE]
2023-03-29 12:31:06+00:00
[CATEGORIES]
cs.LG
Fairlearn: Assessing and Improving Fairness of AI Systems
[AUTHORS]
Hilde Weerts, Miroslav Dudík, Richard Edgar, Adrin Jalali, Roman Lutz, Michael Madaio
[LINK]
http://arxiv.org/abs/2303.16626v1
[DATE]
2023-03-29 12:28:49+00:00
[CATEGORIES]
cs.LG
Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant
[AUTHORS]
John Chiang
[COMMENTS]
The basic work of this paper, $\texttt{quadratic gradient}$ and the
enhanced full batch NAG, was nearly finished in September 2019. The initial
version of this paper was written in April 2020, rejected by ICANN 2020. The
enhanced mini-batch NAG was introduced into this paper in September 2020 and
later rejected by a special issue on the journal FGCS 2020
[LINK]
http://arxiv.org/abs/2201.10838v3
[DATE]
2023-03-29 12:26:32+00:00
[CATEGORIES]
cs.LG
What Does the Gradient Tell When Attacking the Graph Structure
[AUTHORS]
Zihan Liu, Ge Wang, Yun Luo, Stan Z. Li
[ABSTRACT]
Recent research has revealed that Graph Neural Networks (GNNs) are
susceptible to adversarial attacks targeting the graph structure. A malicious
attacker can manipulate a limited number of edges, given the training labels,
to impair the victim model’s performance. Previous empirical studies indicate
that gradient-based attackers tend to add edges rather than remove them. In
this paper, we present a theoretical demonstration revealing that attackers
tend to increase inter-class edges due to the message passing mechanism of
GNNs, which explains some previous empirical observations. By connecting
dissimilar nodes, attackers can more effectively corrupt node features, making
such attacks more advantageous. However, we demonstrate that the inherent
smoothness of GNN’s message passing tends to blur node dissimilarity in the
feature space, leading to the loss of crucial information during the forward
process. To address this issue, we propose a novel surrogate model with
multi-level propagation that preserves the node dissimilarity information. This
model parallelizes the propagation of unaggregated raw features and multi-hop
aggregated features, while introducing batch normalization to enhance the
dissimilarity in node representations and counteract the smoothness resulting
from topological aggregation. Our experiments show significant improvement with
our approach.Furthermore, both theoretical and experimental evidence suggest
that adding inter-class edges constitutes an easily observable attack pattern.
We propose an innovative attack loss that balances attack effectiveness and
imperceptibility, sacrificing some attack effectiveness to attain greater
imperceptibility. We also provide experiments to validate the compromise
performance achieved through this attack loss.
[LINK]
http://arxiv.org/abs/2208.12815v2
[DATE]
2023-03-29 12:19:49+00:00
[CATEGORIES]
cs.LG
Multinomial Logistic Regression Algorithms via Quadratic Gradient
[AUTHORS]
John Chiang
[COMMENTS]
There is a good chance that the enhanced gradient methods for
multiclass LR could be used in the classisation neural-network training via
the softmax activation and the cross-entropy loss
[LINK]
http://arxiv.org/abs/2208.06828v2
[DATE]
2023-03-29 12:10:09+00:00
[CATEGORIES]
cs.LG
Quadratic Gradient: Combining Gradient Algorithms and Newton’s Method as One
[AUTHORS]
John Chiang
[COMMENTS]
In this work, we proposed an enhanced Adam method via quadratic
gradient and applied the quadratic gradient to the general numerical
optimization problems. The quadratic gradient can indeed be used to build
enhanced gradient methods for general optimization problems. There is a good
chance that quadratic gradient can also be applied to quasi-Newton methods,
such as the famous BFGS method
[LINK]
http://arxiv.org/abs/2209.03282v2
[DATE]
2023-03-29 12:05:23+00:00
[CATEGORIES]
cs.LG
Multi-Viewpoint and Multi-Evaluation with Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability
[AUTHORS]
Qinglai Wei, Diancheng Chen, Beiming Yuan
[LINK]
http://arxiv.org/abs/2210.14914v2
[DATE]
2023-03-29 11:46:27+00:00
[CATEGORIES]
cs.LG
Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery
[AUTHORS]
Mingxuan Liu, Subhankar Roy, Zhun Zhong, Nicu Sebe, Elisa Ricci
[LINK]
http://arxiv.org/abs/2303.15975v2
[DATE]
2023-03-29 11:46:22+00:00
[CATEGORIES]
cs.LG
Bi-directional Training for Composed Image Retrieval via Text Prompt Learning
[AUTHORS]
Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould
[COMMENTS]
12 pages, 5 figures
[LINK]
http://arxiv.org/abs/2303.16604v1
[DATE]
2023-03-29 11:37:41+00:00
[CATEGORIES]
cs.LG
Compute and Energy Consumption Trends in Deep Learning Inference
[AUTHORS]
Radosvet Desislavov, Fernando Martínez-Plumed, José Hernández-Orallo
[ABSTRACT]
The progress of some AI paradigms such as deep learning is said to be linked
to an exponential growth in the number of parameters. There are many studies
corroborating these trends, but does this translate into an exponential
increase in energy consumption? In order to answer this question we focus on
inference costs rather than training costs, as the former account for most of
the computing effort, solely because of the multiplicative factors. Also, apart
from algorithmic innovations, we account for more specific and powerful
hardware (leading to higher FLOPS) that is usually accompanied with important
energy efficiency optimisations. We also move the focus from the first
implementation of a breakthrough paper towards the consolidated version of the
techniques one or two year later. Under this distinctive and comprehensive
perspective, we study relevant models in the areas of computer vision and
natural language processing: for a sustained increase in performance we see a
much softer growth in energy consumption than previously anticipated. The only
caveat is, yet again, the multiplicative factor, as future AI increases
penetration and becomes more pervasive.
[COMMENTS]
For a revised version and its published version refer to: Desislavov,
Radosvet, Fernando Mart'inez-Plumed, and Jos'e Hern'andez-Orallo. Trends
in AI inference energy consumption: Beyond the performance-vs-parameter laws
of deep learning. Sustainable Computing: Informatics and Systems, Volume 38,
April 2023. (https://doi.org/10.1016/j.suscom.2023.100857)
[LINK]
http://arxiv.org/abs/2109.05472v2
[DATE]
2023-03-29 11:34:30+00:00
[CATEGORIES]
cs.LG
Federated Learning in MIMO Satellite Broadcast System
[AUTHORS]
Raphael Pinard, Mitra Hassani, Wayne Lemieux
[ABSTRACT]
Federated learning (FL) is a type of distributed machine learning at the
wireless edge that preserves the privacy of clients’ data from adversaries and
even the central server. Existing federated learning approaches either use (i)
secure multiparty computation (SMC) which is vulnerable to inference or (ii)
differential privacy which may decrease the test accuracy given a large number
of parties with relatively small amounts of data each. To tackle the problem
with the existing methods in the literature, In this paper, we introduce
incorporate federated learning in the inner-working of MIMO systems.
[LINK]
http://arxiv.org/abs/2303.16603v1
[DATE]
2023-03-29 11:33:51+00:00
[CATEGORIES]
cs.LG
Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation
[AUTHORS]
Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov
[ABSTRACT]
This paper provides a finite-time analysis of linear stochastic approximation
(LSA) algorithms with fixed step size, a core method in statistics and machine
learning. LSA is used to compute approximate solutions of a $d$-dimensional
linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$ for which
$(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by
(asymptotically) unbiased observations
${(\mathbf{A}(Z_n),\mathbf{b}(Z_n))}{n \in \mathbb{N}}$. We consider here
the case where ${Z_n}{n \in \mathbb{N}}$ is an i.i.d. sequence or a
uniformly geometrically ergodic Markov chain. We derive $p$-th moment and
high-probability deviation bounds for the iterates defined by LSA and its
Polyak-Ruppert-averaged version. Our finite-time instance-dependent bounds for
the averaged LSA iterates are sharp in the sense that the leading term we
obtain coincides with the local asymptotic minimax limit. Moreover, the
remainder terms of our bounds admit a tight dependence on the mixing time
$t_{\operatorname{mix}}$ of the underlying chain and the norm of the noise
variables. We emphasize that our result requires the SA step size to scale only
with logarithm of the problem dimension $d$.
[LINK]
http://arxiv.org/abs/2207.04475v2
[DATE]
2023-03-29 11:11:31+00:00
[CATEGORIES]
cs.LG
Poster: Link between Bias, Node Sensitivity and Long-Tail Distribution in trained DNNs
[AUTHORS]
Mahum Naseer, Muhammad Shafique
[COMMENTS]
To appear at the 16th IEEE International Conference on Software
Testing, Verification and Validation (ICST 2023), Dublin, Ireland
[LINK]
http://arxiv.org/abs/2303.16589v1
[DATE]
2023-03-29 10:49:31+00:00
[CATEGORIES]
cs.LG
Quantum Deep Hedging
[AUTHORS]
El Amine Cherrat, Snehal Raj, Iordanis Kerenidis, Abhishek Shekhar, Ben Wood, Jon Dee, Shouvanik Chakrabarti, Richard Chen, Dylan Herman, Shaohan Hu, Pierre Minssen, Ruslan Shaydulin, Yue Sun, Romina Yalovetzky, Marco Pistoia
[LINK]
http://arxiv.org/abs/2303.16585v1
[DATE]
2023-03-29 10:42:50+00:00
[CATEGORIES]
cs.LG
Improved Kernel Alignment Regret Bound for Online Kernel Learning
[AUTHORS]
Junfan Li, Shizhong Liao
[LINK]
http://arxiv.org/abs/2212.12989v2
[DATE]
2023-03-29 10:37:40+00:00
[CATEGORIES]
cs.LG
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
[AUTHORS]
Mohammed Alloulah, Maximilian Arnold
[ABSTRACT]
Next generation cellular networks will implement radio sensing functions
alongside customary communications, thereby enabling unprecedented worldwide
sensing coverage outdoors. Deep learning has revolutionised computer vision but
has had limited application to radio perception tasks, in part due to lack of
systematic datasets and benchmarks dedicated to the study of the performance
and promise of radio sensing. To address this gap, we present MaxRay: a
synthetic radio-visual dataset and benchmark that facilitate precise target
localisation in radio. We further propose to learn to localise targets in radio
without supervision by extracting self-coordinates from radio-visual
correspondence. We use such self-supervised coordinates to train a radio
localiser network. We characterise our performance against a number of
state-of-the-art baselines. Our results indicate that accurate radio target
localisation can be automatically learned from paired radio-visual data without
labels, which is important for empirical data. This opens the door for vast
data scalability and may prove key to realising the promise of robust radio
sensing atop a unified communication-perception cellular infrastructure.
Dataset will be hosted on IEEE DataPort.
[COMMENTS]
To appear in IEEE/CVF CVPR ‘23
[LINK]
http://arxiv.org/abs/2206.06424v4
[DATE]
2023-03-29 10:11:26+00:00
[CATEGORIES]
cs.LG
A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning
[AUTHORS]
Paul Janson, Wenxuan Zhang, Rahaf Aljundi, Mohamed Elhoseiny
[ABSTRACT]
With the success of pretraining techniques in representation learning, a
number of continual learning methods based on pretrained models have been
proposed. Some of these methods design continual learning mechanisms on the
pre-trained representations and only allow minimum updates or even no updates
of the backbone models during the training of continual learning. In this
paper, we question whether the complexity of these models is needed to achieve
good performance by comparing them to a simple baseline that we designed. We
argue that the pretrained feature extractor itself can be strong enough to
achieve a competitive or even better continual learning performance on
Split-CIFAR100 and CoRe 50 benchmarks. To validate this, we conduct a very
simple baseline that 1) use the frozen pretrained model to extract image
features for every class encountered during the continual learning stage and
compute their corresponding mean features on training data, and 2) predict the
class of the input based on the nearest neighbor distance between test samples
and mean features of the classes; i.e., Nearest Mean Classifier (NMC). This
baseline is single-headed, exemplar-free, and can be task-free (by updating the
means continually). This baseline achieved 88.53% on 10-Split-CIFAR-100,
surpassing most state-of-the-art continual learning methods that are all
initialized using the same pretrained transformer model. We hope our baseline
may encourage future progress in designing learning systems that can
continually add quality to the learning representations even if they started
from some pretrained weights.
[COMMENTS]
6 pages, Workshop on Distribution Shifts 2022 , Code available at
https://github.com/Pauljanson002/pretrained-cl.git
[LINK]
http://arxiv.org/abs/2210.04428v2
[DATE]
2023-03-29 10:05:04+00:00
[CATEGORIES]
cs.LG
Cooperative Retriever and Ranker in Deep Recommenders
[AUTHORS]
Xu Huang, Defu Lian, Jin Chen, Zheng Liu, Xing Xie, Enhong Chen
[ABSTRACT]
Deep recommender systems (DRS) are intensively applied in modern web
services. To deal with the massive web contents, DRS employs a two-stage
workflow: retrieval and ranking, to generate its recommendation results. The
retriever aims to select a small set of relevant candidates from the entire
items with high efficiency; while the ranker, usually more precise but
time-consuming, is supposed to further refine the best items from the retrieved
candidates. Traditionally, the two components are trained either independently
or within a simple cascading pipeline, which is prone to poor collaboration
effect. Though some latest works suggested to train retriever and ranker
jointly, there still exist many severe limitations: item distribution shift
between training and inference, false negative, and misalignment of ranking
order. As such, it remains to explore effective collaborations between
retriever and ranker.
[COMMENTS]
12pages, 4 figures, WWW’23
[LINK]
http://arxiv.org/abs/2206.14649v2
[DATE]
2023-03-29 10:01:09+00:00
[CATEGORIES]
cs.LG
PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-Performance Cloud Removal from Multi-temporal Satellite Imagery
[AUTHORS]
Xuechao Zou, Kai Li, Junliang Xing, Pin Tao, Yachao Cui
[ABSTRACT]
Satellite imagery analysis plays a vital role in remote sensing, but the
information loss caused by cloud cover seriously hinders its application. This
study presents a high-performance cloud removal architecture called Progressive
Multi-scale Attention Autoencoder (PMAA), which simultaneously leverages global
and local information. It mainly consists of a cloud detection backbone and a
cloud removal module. The cloud detection backbone uses cloud masks to
reinforce cloudy areas to prompt the cloud removal module. The cloud removal
module mainly comprises a novel Multi-scale Attention Module (MAM) and a Local
Interaction Module (LIM). PMAA establishes the long-range dependency of
multi-scale features using MAM and modulates the reconstruction of the
fine-grained details using LIM, allowing for the simultaneous representation of
fine- and coarse-grained features at the same level. With the help of diverse
and multi-scale feature representation, PMAA outperforms the previous
state-of-the-art model CTGAN consistently on the Sen2_MTC_Old and Sen2_MTC_New
datasets. Furthermore, PMAA has a considerable efficiency advantage, with only
0.5% and 14.6% of the parameters and computational complexity of CTGAN,
respectively. These extensive results highlight the potential of PMAA as a
lightweight cloud removal network suitable for deployment on edge devices. We
will release the code and trained models to facilitate the study in this
direction.
[COMMENTS]
8 pages, 5 figures
[LINK]
http://arxiv.org/abs/2303.16565v1
[DATE]
2023-03-29 09:47:48+00:00
[CATEGORIES]
cs.LG
Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks
[AUTHORS]
Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
[ABSTRACT]
We study building a multi-task agent in Minecraft. Without human
demonstrations, solving long-horizon tasks in this open-ended environment with
reinforcement learning (RL) is extremely sample inefficient. To tackle the
challenge, we decompose solving Minecraft tasks into learning basic skills and
planning over the skills. We propose three types of fine-grained basic skills
in Minecraft, and use RL with intrinsic rewards to accomplish basic skills with
high success rates. For skill planning, we use Large Language Models to find
the relationships between skills and build a skill graph in advance. When the
agent is solving a task, our skill search algorithm walks on the skill graph
and generates the proper skill plans for the agent. In experiments, our method
accomplishes 24 diverse Minecraft tasks, where many tasks require sequentially
executing for more than 10 skills. Our method outperforms baselines in most
tasks by a large margin. The project’s website and code can be found at
https://sites.google.com/view/plan4mc.
[COMMENTS]
19 pages
[LINK]
http://arxiv.org/abs/2303.16563v1
[DATE]
2023-03-29 09:45:50+00:00
[CATEGORIES]
cs.LG
Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations
[AUTHORS]
Letian Chen, Sravan Jayanthi, Rohan Paleja, Daniel Martin, Viacheslav Zakharov, Matthew Gombolay
[ABSTRACT]
Learning from Demonstration (LfD) approaches empower end-users to teach
robots novel tasks via demonstrations of the desired behaviors, democratizing
access to robotics. However, current LfD frameworks are not capable of fast
adaptation to heterogeneous human demonstrations nor the large-scale deployment
in ubiquitous robotics applications. In this paper, we propose a novel LfD
framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our
approach (1) leverages learned strategies to construct policy mixtures for fast
adaptation to new demonstrations, allowing for quick end-user personalization,
(2) distills common knowledge across demonstrations, achieving accurate task
inference; and (3) expands its model only when needed in lifelong deployments,
maintaining a concise set of prototypical strategies that can approximate all
behaviors via policy mixtures. We empirically validate that FLAIR achieves
adaptability (i.e., the robot adapts to heterogeneous, user-specific task
preferences), efficiency (i.e., the robot achieves sample-efficient
adaptation), and scalability (i.e., the model grows sublinearly with the number
of demonstrations while maintaining high performance). FLAIR surpasses
benchmarks across three control tasks with an average 57% improvement in policy
returns and an average 78% fewer episodes required for demonstration modeling
using policy mixtures. Finally, we demonstrate the success of FLAIR in a table
tennis task and find users rate FLAIR as having higher task (p<.05) and
personalization (p<.05) performance.
[LINK]
http://arxiv.org/abs/2209.11908v5
[DATE]
2023-03-29 09:22:21+00:00
[CATEGORIES]
cs.LG
COVID-19 Detection Using Segmentation, Region Extraction and Classification Pipeline
[AUTHORS]
Kenan Morani
[LINK]
http://arxiv.org/abs/2210.02992v4
[DATE]
2023-03-29 09:18:05+00:00
[CATEGORIES]
cs.LG
Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters
[AUTHORS]
Deyue Li
[ABSTRACT]
This paper studies an infinite horizon optimal control problem for
discrete-time linear system and quadratic criteria, both with random parameters
which are independent and identically distributed with respect to time. In this
general setting, we apply the policy gradient method, a reinforcement learning
technique, to search for the optimal control without requiring knowledge of
statistical information of the parameters. We investigate the sub-Gaussianity
of the state process and establish global linear convergence guarantee for this
approach based on assumptions that are weaker and easier to verify compared to
existing results. Numerical experiments are presented to illustrate our result.
[COMMENTS]
55 pages, 3 figures
[LINK]
http://arxiv.org/abs/2303.16548v1
[DATE]
2023-03-29 09:14:38+00:00
[CATEGORIES]
cs.LG
Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI
[AUTHORS]
Baturay Saglam, Doga Gurgunoglu, Suleyman S. Kozat
[COMMENTS]
2023 IEEE International Conference on Communications Workshops (ICC
Workshops)
[LINK]
http://arxiv.org/abs/2211.09702v2
[DATE]
2023-03-29 09:10:16+00:00
[CATEGORIES]
cs.LG
A Primal-dual Approach for Solving Variational Inequalities with General-form Constraints
[AUTHORS]
Tatjana Chavdarova, Matteo Pagliardini, Tong Yang, Michael I. Jordan
[ABSTRACT]
Yang et al. (2023) recently addressed the open problem of solving Variational
Inequalities (VIs) with equality and inequality constraints through a
first-order gradient method. However, the proposed primal-dual method called
ACVI is applicable when we can compute analytic solutions of its subproblems;
thus, the general case remains an open problem. In this paper, we adopt a
warm-starting technique where we solve the subproblems approximately at each
iteration and initialize the variables with the approximate solution found at
the previous iteration. We prove its convergence and show that the gap function
of the last iterate of this inexact-ACVI method decreases at a rate of
$\mathcal{O}(\frac{1}{\sqrt{K}})$ when the operator is $L$-Lipschitz and
monotone, provided that the errors decrease at appropriate rates.
Interestingly, we show that often in numerical experiments, this technique
converges faster than its exact counterpart. Furthermore, for the cases when
the inequality constraints are simple, we propose a variant of ACVI named
P-ACVI and prove its convergence for the same setting. We further demonstrate
the efficacy of the proposed methods through numerous experiments. We also
relax the assumptions in Yang et al., yielding, to our knowledge, the first
convergence result that does not rely on the assumption that the operator is
$L$-Lipschitz. Our source code is provided at
$\texttt{https://github.com/mpagli/Revisiting-ACVI}$.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2206.10575
[LINK]
http://arxiv.org/abs/2210.15659v3
[DATE]
2023-03-29 08:54:29+00:00
[CATEGORIES]
cs.LG
Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning
[AUTHORS]
Aapo Hyvarinen, Ilyes Khemakhem, Hiroshi Morioka
[ABSTRACT]
A central problem in unsupervised deep learning is how to find useful
representations of high-dimensional data, sometimes called “disentanglement”.
Most approaches are heuristic and lack a proper theoretical foundation. In
linear representation learning, independent component analysis (ICA) has been
successful in many applications areas, and it is principled, i.e. based on a
well-defined probabilistic model. However, extension of ICA to the nonlinear
case has been problematic due to the lack of identifiability, i.e. uniqueness
of the representation. Recently, nonlinear extensions that utilize temporal
structure or some auxiliary information have been proposed. Such models are in
fact identifiable, and consequently, an increasing number of algorithms have
been developed. In particular, some self-supervised algorithms can be shown to
estimate nonlinear ICA, even though they have initially been proposed from
heuristic perspectives. This paper reviews the state-of-the-art of nonlinear
ICA theory and algorithms.
[LINK]
http://arxiv.org/abs/2303.16535v1
[DATE]
2023-03-29 08:51:28+00:00
[CATEGORIES]
cs.LG
Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network
[AUTHORS]
Zhizhong Tan, Min Hu, Yixuan Wang, Lu Wei, Bin Liu
[ABSTRACT]
It is a challenging problem to predict trends of futures prices with
traditional econometric models as one needs to consider not only futures’
historical data but also correlations among different futures. Spatial-temporal
graph neural networks (STGNNs) have great advantages in dealing with such kind
of spatial-temporal data. However, we cannot directly apply STGNNs to
high-frequency future data because future investors have to consider both the
long-term and short-term characteristics when doing decision-making. To capture
both the long-term and short-term features, we exploit more label information
by designing four heterogeneous tasks: price regression, price moving average
regression, price gap regression (within a short interval), and change-point
detection, which involve both long-term and short-term scenes. To make full use
of these labels, we train our model in a continual manner. Traditional
continual GNNs define the gradient of prices as the parameter important to
overcome catastrophic forgetting (CF). Unfortunately, the losses of the four
heterogeneous tasks lie in different spaces. Hence it is improper to calculate
the parameter importance with their losses. We propose to calculate parameter
importance with mutual information between original observations and the
extracted features. The empirical results based on 49 commodity futures
demonstrate that our model has higher prediction performance on capturing
long-term or short-term dynamic change.
[LINK]
http://arxiv.org/abs/2303.16532v1
[DATE]
2023-03-29 08:39:36+00:00
[CATEGORIES]
cs.LG
Importance Sampling for Stochastic Gradient Descent in Deep Neural Networks
[AUTHORS]
Thibault Lahire
[ABSTRACT]
Stochastic gradient descent samples uniformly the training set to build an
unbiased gradient estimate with a limited number of samples. However, at a
given step of the training process, some data are more helpful than others to
continue learning. Importance sampling for training deep neural networks has
been widely studied to propose sampling schemes yielding better performance
than the uniform sampling scheme. After recalling the theory of importance
sampling for deep learning, this paper reviews the challenges inherent to this
research area. In particular, we propose a metric allowing the assessment of
the quality of a given sampling scheme; and we study the interplay between the
sampling scheme and the optimizer used.
[COMMENTS]
17 pages, 3 figures
[LINK]
http://arxiv.org/abs/2303.16529v1
[DATE]
2023-03-29 08:35:11+00:00
[CATEGORIES]
cs.LG
Ensemble Learning Model on Artificial Neural Network-Backpropagation (ANN-BP) Architecture for Coal Pillar Stability Classification
[AUTHORS]
G. Aileen Mendrofa, Gatot Fatwanto Hertono, Bevina Desjwiandara Handari
[ABSTRACT]
Pillars are important structural units used to ensure mining safety in
underground hard rock mines. Therefore, precise predictions regarding the
stability of underground pillars are required. One common index that is often
used to assess pillar stability is the Safety Factor (SF). Unfortunately, such
crisp boundaries in pillar stability assessment using SF are unreliable. This
paper presents a novel application of Artificial Neural Network-Backpropagation
(ANN-BP) and Deep Ensemble Learning for pillar stability classification. There
are three types of ANN-BP used for the classification of pillar stability
distinguished by their activation functions: ANN-BP ReLU, ANN-BP ELU, and
ANN-BP GELU. This research also presents a new labeling alternative for pillar
stability by considering its suitability with the SF. Thus, pillar stability is
expanded into four categories: failed with a suitable safety factor, intact
with a suitable safety factor, failed without a suitable safety factor, and
intact without a suitable safety factor. There are five inputs used for each
model: pillar width, mining height, bord width, depth to floor, and ratio. The
results showed that the ANN-BP model with Ensemble Learning could improve
ANN-BP performance with an average accuracy of 86.48% and an F_2-score of
96.35% for the category of failed with a suitable safety factor.
[LINK]
http://arxiv.org/abs/2303.16524v1
[DATE]
2023-03-29 08:26:26+00:00
[CATEGORIES]
cs.LG
Hard Regularization to Prevent Collapse in Online Deep Clustering without Data Augmentation
[AUTHORS]
Louis Mahon, Thomas Lukasiewicz
[ABSTRACT]
Online deep clustering refers to the joint use of a feature extraction
network and a clustering model to assign cluster labels to each new data point
or batch as it is processed. While faster and more versatile than offline
methods, online clustering can easily reach the collapsed solution where the
encoder maps all inputs to the same point and all are put into a single
cluster. Successful existing models have employed various techniques to avoid
this problem, most of which require data augmentation or which aim to make the
average soft assignment across the dataset the same for each cluster. We
propose a method that does not require data augmentation, and that, differently
from existing methods, regularizes the hard assignments. Using a Bayesian
framework, we derive an intuitive optimization objective that can be
straightforwardly included in the training of the encoder network. Tested on
four image datasets, we show that it consistently avoids collapse more robustly
than other methods and that it leads to more accurate clustering. We also
conduct further experiments and analyses justifying our choice to regularize
the hard cluster assignments.
[LINK]
http://arxiv.org/abs/2303.16521v1
[DATE]
2023-03-29 08:23:26+00:00
[CATEGORIES]
cs.LG
Fair Federated Medical Image Segmentation via Client Contribution Estimation
[AUTHORS]
Meirui Jiang, Holger R Roth, Wenqi Li, Dong Yang, Can Zhao, Vishwesh Nath, Daguang Xu, Qi Dou, Ziyue Xu
[COMMENTS]
Accepted at CVPR 2023
[LINK]
http://arxiv.org/abs/2303.16520v1
[DATE]
2023-03-29 08:21:54+00:00
[CATEGORIES]
cs.LG
[AUTHORS]
Maximilian Tschuchnig, Petra Tschuchnig, Cornelia Ferner, Michael Gadermayr [ABSTRACT]
Inflation is a major determinant for allocation decisions and its forecast is
a fundamental aim of governments and central banks. However, forecasting
inflation is not a trivial task, as its prediction relies on low frequency,
highly fluctuating data with unclear explanatory variables. While classical
models show some possibility of predicting inflation, reliably beating the
random walk benchmark remains difficult. Recently, (deep) neural networks have
shown impressive results in a multitude of applications, increasingly setting
the new state-of-the-art. This paper investigates the potential of the
transformer deep neural network architecture to forecast different inflation
rates. The results are compared to a study on classical time series and machine
learning models. We show that our adapted transformer, on average, outperforms
the baseline in 6 out of 16 experiments, showing best scores in two out of four
investigated inflation rates. Our results demonstrate that a transformer based
neural network can outperform classical regression and machine learning models
in certain inflation rates and forecasting horizons. [COMMENTS]
Paper was rejected and we want to switch to a new dataset. So there
will not be a simple resubmit with minor changes but some bigger changes in
[LINK]
http://arxiv.org/abs/2303.15364v2 [DATE]
2023-03-29 08:08:52+00:00 [CATEGORIES]
cs.LG
Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints
[AUTHORS]
Pierre Ablin, Simon Vary, Bin Gao, P. -A. Absil
[ABSTRACT]
Orthogonality constraints naturally appear in many machine learning problems,
from Principal Components Analysis to robust neural network training. They are
usually solved using Riemannian optimization algorithms, which minimize the
objective function while enforcing the constraint. However, enforcing the
orthogonality constraint can be the most time-consuming operation in such
algorithms. Recently, Ablin & Peyr'e (2022) proposed the Landing algorithm, a
method with cheap iterations that does not enforce the orthogonality constraint
but is attracted towards the manifold in a smooth manner. In this article, we
provide new practical and theoretical developments for the landing algorithm.
First, the method is extended to the Stiefel manifold, the set of rectangular
orthogonal matrices. We also consider stochastic and variance reduction
algorithms when the cost function is an average of many functions. We
demonstrate that all these methods have the same rate of convergence as their
Riemannian counterparts that exactly enforce the constraint. Finally, our
experiments demonstrate the promise of our approach to an array of
machine-learning problems that involve orthogonality constraints.
[LINK]
http://arxiv.org/abs/2303.16510v1
[DATE]
2023-03-29 07:36:54+00:00
[CATEGORIES]
cs.LG
Local Interpretability of Random Forests for Multi-Target Regression
[AUTHORS]
Avraam Bardos, Nikolaos Mylonas, Ioannis Mollas, Grigorios Tsoumakas
[ABSTRACT]
Multi-target regression is useful in a plethora of applications. Although
random forest models perform well in these tasks, they are often difficult to
interpret. Interpretability is crucial in machine learning, especially when it
can directly impact human well-being. Although model-agnostic techniques exist
for multi-target regression, specific techniques tailored to random forest
models are not available. To address this issue, we propose a technique that
provides rule-based interpretations for instances made by a random forest model
for multi-target regression, influenced by a recent model-specific technique
for random forest interpretability. The proposed technique was evaluated
through extensive experiments and shown to offer competitive interpretations
compared to state-of-the-art techniques.
[COMMENTS]
8 pages, 1 figure, 2 tables, to be submitted to XAI conference 2023
as an extended abstract
[LINK]
http://arxiv.org/abs/2303.16506v1
[DATE]
2023-03-29 07:32:01+00:00
[CATEGORIES]
cs.LG
An Over-parameterized Exponential Regression
[AUTHORS]
Yeqi Gao, Sridhar Mahadevan, Zhao Song
[ABSTRACT]
Over the past few years, there has been a significant amount of research
focused on studying the ReLU activation function, with the aim of achieving
neural network convergence through over-parametrization. However, recent
developments in the field of Large Language Models (LLMs) have sparked interest
in the use of exponential activation functions, specifically in the attention
mechanism.
Mathematically, we define the neural function $F: \mathbb{R}^{d \times m}
\times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation
function. Given a set of data points with labels ${(x_1, y_1), (x_2, y_2),
\dots, (x_n, y_n)} \subset \mathbb{R}^d \times \mathbb{R}$ where $n$ denotes
the number of the data. Here $F(W(t),x)$ can be expressed as $F(W(t),x) :=
\sum_{r=1}^m a_r \exp(\langle w_r, x \rangle)$, where $m$ represents the number
of neurons, and $w_r(t)$ are weights at time $t$. It’s standard in literature
that $a_r$ are the fixed weights and it’s never changed during the training. We
initialize the weights $W(0) \in \mathbb{R}^{d \times m}$ with random Gaussian
distributions, such that $w_r(0) \sim \mathcal{N}(0, I_d)$ and initialize $a_r$
from random sign distribution for each $r \in [m]$.
Using the gradient descent algorithm, we can find a weight $W(T)$ such that
$| F(W(T), X) - y |_2 \leq \epsilon$ holds with probability $1-\delta$, where
$\epsilon \in (0,0.1)$ and $m = \Omega(n^{2+o(1)}\log(n/\delta))$. To optimize
the over-parameterization bound $m$, we employ several tight analysis
techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor,
Song and Woodruff ICML 2022].
[LINK]
http://arxiv.org/abs/2303.16504v1
[DATE]
2023-03-29 07:29:07+00:00
[CATEGORIES]
cs.LG
Mixtures of All Trees
[AUTHORS]
Nikil Roashan Selvam, Honghua Zhang, Guy Van den Broeck
[ABSTRACT]
Tree-shaped graphical models are widely used for their tractability. However,
they unfortunately lack expressive power as they require committing to a
particular sparse dependency structure. We propose a novel class of generative
models called mixtures of all trees: that is, a mixture over all possible
($n^{n-2}$) tree-shaped graphical models over $n$ variables. We show that it is
possible to parameterize this Mixture of All Trees (MoAT) model compactly
(using a polynomial-size representation) in a way that allows for tractable
likelihood computation and optimization via stochastic gradient descent.
Furthermore, by leveraging the tractability of tree-shaped models, we devise
fast-converging conditional sampling algorithms for approximate inference, even
though our theoretical analysis suggests that exact computation of marginals in
the MoAT model is NP-hard. Empirically, MoAT achieves state-of-the-art
performance on density estimation benchmarks when compared against powerful
probabilistic models including hidden Chow-Liu Trees.
[COMMENTS]
Accepted to AISTATS 2023
[LINK]
http://arxiv.org/abs/2302.14202v2
[DATE]
2023-03-29 07:27:28+00:00
[CATEGORIES]
cs.LG
IOP-FL: Inside-Outside Personalization for Federated Medical Image Segmentation
[AUTHORS]
Meirui Jiang, Hongzheng Yang, Chen Cheng, Qi Dou
[ABSTRACT]
Federated learning (FL) allows multiple medical institutions to
collaboratively learn a global model without centralizing client data. It is
difficult, if possible at all, for such a global model to commonly achieve
optimal performance for each individual client, due to the heterogeneity of
medical images from various scanners and patient demographics. This problem
becomes even more significant when deploying the global model to unseen clients
outside the FL with unseen distributions not presented during federated
training. To optimize the prediction accuracy of each individual client for
medical imaging tasks, we propose a novel unified framework for both
\textit{Inside and Outside model Personalization in FL} (IOP-FL). Our inside
personalization uses a lightweight gradient-based approach that exploits the
local adapted model for each client, by accumulating both the global gradients
for common knowledge and the local gradients for client-specific optimization.
Moreover, and importantly, the obtained local personalized models and the
global model can form a diverse and informative routing space to personalize an
adapted model for outside FL clients. Hence, we design a new test-time routing
scheme using the consistency loss with a shape constraint to dynamically
incorporate the models, given the distribution information conveyed by the test
data. Our extensive experimental results on two medical image segmentation
tasks present significant improvements over SOTA methods on both inside and
outside personalization, demonstrating the potential of our IOP-FL scheme for
clinical practice.
[COMMENTS]
Accepted by IEEE TMI special issue on federated learning for medical
imaging
[LINK]
http://arxiv.org/abs/2204.08467v2
[DATE]
2023-03-29 07:25:54+00:00
[CATEGORIES]
cs.LG
Unified analysis of SGD-type methods
[AUTHORS]
Eduard Gorbunov
[ABSTRACT]
This note focuses on a simple approach to the unified analysis of SGD-type
methods from (Gorbunov et al., 2020) for strongly convex smooth optimization
problems. The similarities in the analyses of different stochastic first-order
methods are discussed along with the existing extensions of the framework. The
limitations of the analysis and several alternative approaches are mentioned as
well.
[COMMENTS]
Part of the Encyclopedia of Optimization. 8 pages
[LINK]
http://arxiv.org/abs/2303.16502v1
[DATE]
2023-03-29 07:25:03+00:00
[CATEGORIES]
cs.LG
Coupled Multiwavelet Neural Operator Learning for Coupled Partial Differential Equations
[AUTHORS]
Xiongye Xiao, Defu Cao, Ruochen Yang, Gaurav Gupta, Gengshuo Liu, Chenzhong Yin, Radu Balan, Paul Bogdan
[ABSTRACT]
Coupled partial differential equations (PDEs) are key tasks in modeling the
complex dynamics of many physical processes. Recently, neural operators have
shown the ability to solve PDEs by learning the integral kernel directly in
Fourier/Wavelet space, so the difficulty for solving the coupled PDEs depends
on dealing with the coupled mappings between the functions. Towards this end,
we propose a \textit{coupled multiwavelets neural operator} (CMWNO) learning
scheme by decoupling the coupled integral kernels during the multiwavelet
decomposition and reconstruction procedures in the Wavelet space. The proposed
model achieves significantly higher accuracy compared to previous
learning-based solvers in solving the coupled PDEs including Gray-Scott (GS)
equations and the non-local mean field game (MFG) problem. According to our
experimental results, the proposed model exhibits a $2\times \sim 4\times$
improvement relative $L$2 error compared to the best results from the
state-of-the-art models.
[COMMENTS]
Accepted to ICLR 2023
[LINK]
http://arxiv.org/abs/2303.02304v2
[DATE]
2023-03-29 06:50:55+00:00
[CATEGORIES]
cs.LG
Instance-Aware Image Completion
[AUTHORS]
Jinoh Cho, Minguk Kang, Vibhav Vineet, Jaesik Park
[ABSTRACT]
Image completion is a task that aims to fill in the missing region of a
masked image with plausible contents. However, existing image completion
methods tend to fill in the missing region with the surrounding texture instead
of hallucinating a visual instance that is suitable in accordance with the
context of the scene. In this work, we propose a novel image completion model,
dubbed ImComplete, that hallucinates the missing instance that harmonizes well
with - and thus preserves - the original context. ImComplete first adopts a
transformer architecture that considers the visible instances and the location
of the missing region. Then, ImComplete completes the semantic segmentation
masks within the missing region, providing pixel-level semantic and structural
guidance. Finally, the image synthesis blocks generate photo-realistic content.
We perform a comprehensive evaluation of the results in terms of visual quality
(LPIPS and FID) and contextual preservation scores (CLIPscore and object
detection accuracy) with COCO-panoptic and Visual Genome datasets. Experimental
results show the superiority of ImComplete on various natural images.
[LINK]
http://arxiv.org/abs/2210.12350v2
[DATE]
2023-03-29 06:35:48+00:00
[CATEGORIES]
cs.LG
Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers
[AUTHORS]
Mohammad Lashkari, Amin Gheibi
[COMMENTS]
13 pages, 6 figures, 3 tables
[LINK]
http://arxiv.org/abs/2303.16464v1
[DATE]
2023-03-29 05:33:53+00:00
[CATEGORIES]
cs.LG
GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing
[AUTHORS]
Jiang Xie, Shuyin Xia, Guoyin Wang, Xinbo Gao
[ABSTRACT]
Most of the existing clustering methods are based on a single granularity of
information, such as the distance and density of each data. This most
fine-grained based approach is usually inefficient and susceptible to noise.
Therefore, we propose a clustering algorithm that combines multi-granularity
Granular-Ball and minimum spanning tree (MST). We construct coarsegrained
granular-balls, and then use granular-balls and MST to implement the clustering
method based on “large-scale priority”, which can greatly avoid the influence
of outliers and accelerate the construction process of MST. Experimental
results on several data sets demonstrate the power of the algorithm. All codes
have been released at https://github.com/xjnine/GBMST.
[LINK]
http://arxiv.org/abs/2303.01082v2
[DATE]
2023-03-29 05:25:48+00:00
[CATEGORIES]
cs.LG
GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization
[AUTHORS]
Stefan Abi-Karam, Cong Hao
[ABSTRACT]
There are plenty of graph neural network (GNN) accelerators being proposed.
However, they highly rely on users’ hardware expertise and are usually
optimized for one specific GNN model, making them challenging for practical use
. Therefore, in this work, we propose GNNBuilder, the first automated, generic,
end-to-end GNN accelerator generation framework. It features four advantages:
(1) GNNBuilder can automatically generate GNN accelerators for a wide range of
GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch
programming interface, introducing zero overhead for algorithm developers; (3)
GNNBuilder supports end-to-end code generation, simulation, accelerator
optimization, and hardware deployment, realizing a push-button fashion for GNN
accelerator design; (4) GNNBuilder is equipped with accurate performance models
of its generated accelerator, enabling fast and flexible design space
exploration (DSE). In the experiments, first, we show that our accelerator
performance model has errors within $36\%$ for latency prediction and $18\%$
for BRAM count prediction. Second, we show that our generated accelerators can
outperform CPU by $6.33\times$ and GPU by $6.87\times$. This framework is
open-source, and the code is available at
https://anonymous.4open.science/r/gnn-builder-83B4/.
[COMMENTS]
10 pages, 7 figures, 4 tables, 3 listings
[LINK]
http://arxiv.org/abs/2303.16459v1
[DATE]
2023-03-29 05:08:21+00:00
[CATEGORIES]
cs.LG
When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective!
[AUTHORS]
Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, Yang Yang
[ABSTRACT]
Recently, graph pre-training has attracted wide research attention, which
aims to learn transferable knowledge from unlabeled graph data so as to improve
downstream performance. Despite these recent attempts, the negative transfer is
a major issue when applying graph pre-trained models to downstream tasks.
Existing works made great efforts on the issue of what to pre-train and how to
pre-train by designing a number of graph pre-training and fine-tuning
strategies. However, there are indeed cases where no matter how advanced the
strategy is, the “pre-train and fine-tune” paradigm still cannot achieve clear
benefits. This paper introduces a generic framework W2PGNN to answer the
crucial question of when to pre-train (i.e., in what situations could we take
advantage of graph pre-training) before performing effortful pre-training or
fine-tuning. We start from a new perspective to explore the complex generative
mechanisms from the pre-training data to downstream data. In particular, W2PGNN
first fits the pre-training data into graphon bases, each element of graphon
basis (i.e., a graphon) identifies a fundamental transferable pattern shared by
a collection of pre-training graphs. All convex combinations of graphon bases
give rise to a generator space, from which graphs generated form the solution
space for those downstream data that can benefit from pre-training. In this
manner, the feasibility of pre-training can be quantified as the generation
probability of the downstream data from any generator in the generator space.
W2PGNN provides three broad applications, including providing the application
scope of graph pre-trained models, quantifying the feasibility of performing
pre-training, and helping select pre-training data to enhance downstream
performance. We give a theoretically sound solution for the first application
and extensive empirical justifications for the latter two applications.
[LINK]
http://arxiv.org/abs/2303.16458v1
[DATE]
2023-03-29 05:05:02+00:00
[CATEGORIES]
cs.LG
Conductivity Imaging from Internal Measurements with Mixed Least-Squares Deep Neural Networks
[AUTHORS]
Bangti Jin, Xiyao Li, Qimeng Quan, Zhi Zhou
[COMMENTS]
28 pages. 12 figures
[LINK]
http://arxiv.org/abs/2303.16454v1
[DATE]
2023-03-29 04:43:03+00:00
[CATEGORIES]
cs.LG
ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models
[AUTHORS]
Youhan Lee, Hasun Yu
[ABSTRACT]
Protein language models (pLMs), pre-trained via causal language modeling on
protein sequences, have been a promising tool for protein sequence design. In
real-world protein engineering, there are many cases where the amino acids in
the middle of a protein sequence are optimized while maintaining other
residues. Unfortunately, because of the left-to-right nature of pLMs, existing
pLMs modify suffix residues by prompting prefix residues, which are
insufficient for the infilling task that considers the whole surrounding
context. To find the more effective pLMs for protein engineering, we design a
new benchmark, Secondary structureE InFilling rEcoveRy, SEIFER, which
approximates infilling sequence design scenarios. With the evaluation of
existing models on the benchmark, we reveal the weakness of existing language
models and show that language models trained via fill-in-middle transformation,
called ProtFIM, are more appropriate for protein engineering. Also, we prove
that ProtFIM generates protein sequences with decent protein representations
through exhaustive experiments and visualizations.
[COMMENTS]
Preprint
[LINK]
http://arxiv.org/abs/2303.16452v1
[DATE]
2023-03-29 04:35:50+00:00
[CATEGORIES]
cs.LG
Single-Pass Contrastive Learning Can Work for Both Homophilic and Heterophilic Graph
[AUTHORS]
Haonan Wang, Jieyu Zhang, Qi Zhu, Wei Huang, Kenji Kawaguchi, Xiaokui Xiao
[ABSTRACT]
Existing graph contrastive learning (GCL) techniques typically require two
forward passes for a single instance to construct the contrastive loss, which
is effective for capturing the low-frequency signals of node features. Such a
dual-pass design has shown empirical success on homophilic graphs, but its
effectiveness on heterophilic graphs, where directly connected nodes typically
have different labels, is unknown. In addition, existing GCL approaches fail to
provide strong performance guarantees. Coupled with the unpredictability of GCL
approaches on heterophilic graphs, their applicability in real-world contexts
is limited. Then, a natural question arises: Can we design a GCL method that
works for both homophilic and heterophilic graphs with a performance guarantee?
To answer this question, we theoretically study the concentration property of
features obtained by neighborhood aggregation on homophilic and heterophilic
graphs, introduce the single-pass graph contrastive learning loss based on the
property, and provide performance guarantees for the minimizer of the loss on
downstream tasks. As a direct consequence of our analysis, we implement the
Single-Pass Graph Contrastive Learning method (SP-GCL). Empirically, on 14
benchmark datasets with varying degrees of homophily, the features learned by
the SP-GCL can match or outperform existing strong baselines with significantly
less computational overhead, which demonstrates the usefulness of our findings
in real-world cases.
[COMMENTS]
21 pages, 5 figures, 8 tables. arXiv admin note: substantial text
overlap with arXiv:2204.04874. The code is available at
https://github.com/haonan3/SPGCL
[LINK]
http://arxiv.org/abs/2211.10890v2
[DATE]
2023-03-29 04:04:55+00:00
[CATEGORIES]
cs.LG
Motif-aware temporal GCN for fraud detection in signed cryptocurrency trust networks
[AUTHORS]
Song Li, Jiandong Zhou, Chong MO, Jin LI, Geoffrey K. F. Tso, Yuxing Tian
[ABSTRACT]
Graph convolutional networks (GCNs) is a class of artificial neural networks
for processing data that can be represented as graphs. Since financial
transactions can naturally be constructed as graphs, GCNs are widely applied in
the financial industry, especially for financial fraud detection. In this
paper, we focus on fraud detection on cryptocurrency truct networks. In the
literature, most works focus on static networks. Whereas in this study, we
consider the evolving nature of cryptocurrency networks, and use local
structural as well as the balance theory to guide the training process. More
specifically, we compute motif matrices to capture the local topological
information, then use them in the GCN aggregation process. The generated
embedding at each snapshot is a weighted average of embeddings within a time
window, where the weights are learnable parameters. Since the trust networks is
signed on each edge, balance theory is used to guide the training process.
Experimental results on bitcoin-alpha and bitcoin-otc datasets show that the
proposed model outperforms those in the literature.
[LINK]
http://arxiv.org/abs/2211.13123v2
[DATE]
2023-03-29 04:01:17+00:00
[CATEGORIES]
cs.LG
Global Convergence of Over-parameterized Deep Equilibrium Models
[AUTHORS]
Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, Zhouchen Lin
[ABSTRACT]
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium
point of an infinite-depth weight-tied model with an input-injection. Instead
of infinite computations, it solves an equilibrium point directly with
root-finding and computes gradients with implicit differentiation. The training
dynamics of over-parameterized DEQs are investigated in this study. By
supposing a condition on the initial equilibrium point, we show that the unique
equilibrium point always exists during the training process, and the gradient
descent is proved to converge to a globally optimal solution at a linear
convergence rate for the quadratic loss function. In order to show that the
required initial condition is satisfied via mild over-parameterization, we
perform a fine-grained analysis on random DEQs. We propose a novel
probabilistic framework to overcome the technical difficulty in the
non-asymptotic analysis of infinite-depth weight-tied models.
[COMMENTS]
Accepted by AISTATS 2023
[LINK]
http://arxiv.org/abs/2205.13814v2
[DATE]
2023-03-29 03:56:52+00:00
[CATEGORIES]
cs.LG
Parameterizing the cost function of Dynamic Time Warping with application to time series classification
[AUTHORS]
Matthieu Herrmann, Chang Wei Tan, Geoffrey I. Webb
[ABSTRACT]
Dynamic Time Warping (DTW) is a popular time series distance measure that
aligns the points in two series with one another. These alignments support
warping of the time dimension to allow for processes that unfold at differing
rates. The distance is the minimum sum of costs of the resulting alignments
over any allowable warping of the time dimension. The cost of an alignment of
two points is a function of the difference in the values of those points. The
original cost function was the absolute value of this difference. Other cost
functions have been proposed. A popular alternative is the square of the
difference. However, to our knowledge, this is the first investigation of both
the relative impacts of using different cost functions and the potential to
tune cost functions to different tasks. We do so in this paper by using a
tunable cost function {\lambda}{\gamma} with parameter {\gamma}. We show that
higher values of {\gamma} place greater weight on larger pairwise differences,
while lower values place greater weight on smaller pairwise differences. We
demonstrate that training {\gamma} significantly improves the accuracy of both
the DTW nearest neighbor and Proximity Forest classifiers.
[LINK]
http://arxiv.org/abs/2301.10350v2
[DATE]
2023-03-29 03:16:44+00:00
[CATEGORIES]
cs.LG
Learning Identity-Preserving Transformations on Data Manifolds
[AUTHORS]
Marissa Connor, Kion Fallah, Christopher Rozell
[ABSTRACT]
Many machine learning techniques incorporate identity-preserving
transformations into their models to generalize their performance to previously
unseen data. These transformations are typically selected from a set of
functions that are known to maintain the identity of an input when applied
(e.g., rotation, translation, flipping, and scaling). However, there are many
natural variations that cannot be labeled for supervision or defined through
examination of the data. As suggested by the manifold hypothesis, many of these
natural variations live on or near a low-dimensional, nonlinear manifold.
Several techniques represent manifold variations through a set of learned Lie
group operators that define directions of motion on the manifold. However,
these approaches are limited because they require transformation labels when
training their models and they lack a method for determining which regions of
the manifold are appropriate for applying each specific operator. We address
these limitations by introducing a learning strategy that does not require
transformation labels and developing a method that learns the local regions
where each operator is likely to be used while preserving the identity of
inputs. Experiments on MNIST and Fashion MNIST highlight our model’s ability to
learn identity-preserving transformations on multi-class datasets.
Additionally, we train on CelebA to showcase our model’s ability to learn
semantically meaningful transformations on complex datasets in an unsupervised
manner.
[LINK]
http://arxiv.org/abs/2106.12096v2
[DATE]
2023-03-29 03:12:54+00:00
[CATEGORIES]
cs.LG
ProductAE: Toward Deep Learning Driven Error-Correction Codes of Large Dimensions
[AUTHORS]
Mohammad Vahid Jamali, Hamid Saber, Homayoon Hatami, Jung Hyun Bae
[ABSTRACT]
While decades of theoretical research have led to the invention of several
classes of error-correction codes, the design of such codes is an extremely
challenging task, mostly driven by human ingenuity. Recent studies demonstrate
that such designs can be effectively automated and accelerated via tools from
machine learning (ML), thus enabling ML-driven classes of error-correction
codes with promising performance gains compared to classical designs. A
fundamental challenge, however, is that it is prohibitively complex, if not
impossible, to design and train fully ML-driven encoder and decoder pairs for
large code dimensions. In this paper, we propose Product Autoencoder
(ProductAE) – a computationally-efficient family of deep learning driven
(encoder, decoder) pairs – aimed at enabling the training of relatively large
codes (both encoder and decoder) with a manageable training complexity. We
build upon ideas from classical product codes and propose constructing large
neural codes using smaller code components. ProductAE boils down the complex
problem of training the encoder and decoder for a large code dimension $k$ and
blocklength $n$ to less-complex sub-problems of training encoders and decoders
for smaller dimensions and blocklengths. Our training results show successful
training of ProductAEs of dimensions as large as $k = 300$ bits with meaningful
performance gains compared to state-of-the-art classical and neural designs.
Moreover, we demonstrate excellent robustness and adaptivity of ProductAEs to
channel models different than the ones used for training.
[COMMENTS]
arXiv admin note: text overlap with arXiv:2110.04466
[LINK]
http://arxiv.org/abs/2303.16424v1
[DATE]
2023-03-29 03:10:09+00:00
[CATEGORIES]
cs.LG
Interpolating Discriminant Functions in High-Dimensional Gaussian Latent Mixtures
[AUTHORS]
Xin Bing, Marten Wegkamp
[ABSTRACT]
This paper considers binary classification of high-dimensional features under
a postulated model with a low-dimensional latent Gaussian mixture structure and
non-vanishing noise. A generalized least squares estimator is used to estimate
the direction of the optimal separating hyperplane. The estimated hyperplane is
shown to interpolate on the training data. While the direction vector can be
consistently estimated as could be expected from recent results in linear
regression, a naive plug-in estimate fails to consistently estimate the
intercept. A simple correction, that requires an independent hold-out sample,
renders the procedure minimax optimal in many scenarios. The interpolation
property of the latter procedure can be retained, but surprisingly depends on
the way the labels are encoded.
[LINK]
http://arxiv.org/abs/2210.14347v2
[DATE]
2023-03-29 03:04:44+00:00
[CATEGORIES]
cs.LG
Problems and shortcuts in deep learning for screening mammography
[AUTHORS]
Trevor Tsue, Brent Mombourquette, Ahmed Taha, Thomas Paul Matthews, Yen Nhi Truong Vu, Jason Su
[ABSTRACT]
This work reveals undiscovered challenges in the performance and
generalizability of deep learning models. We (1) identify spurious shortcuts
and evaluation issues that can inflate performance and (2) propose training and
analysis methods to address them.
We trained an AI model to classify cancer on a retrospective dataset of
120,112 US exams (3,467 cancers) acquired from 2008 to 2017 and 16,693 UK exams
(5,655 cancers) acquired from 2011 to 2015.
We evaluated on a screening mammography test set of 11,593 US exams (102
cancers; 7,594 women; age 57.1 \pm 11.0) and 1,880 UK exams (590 cancers; 1,745
women; age 63.3 \pm 7.2). A model trained on images of only view markers (no
breast) achieved a 0.691 AUC. The original model trained on both datasets
achieved a 0.945 AUC on the combined US+UK dataset but paradoxically only 0.838
and 0.892 on the US and UK datasets, respectively. Sampling cancers equally
from both datasets during training mitigated this shortcut. A similar AUC
paradox (0.903) occurred when evaluating diagnostic exams vs screening exams
(0.862 vs 0.861, respectively). Removing diagnostic exams during training
alleviated this bias. Finally, the model did not exhibit the AUC paradox over
scanner models but still exhibited a bias toward Selenia Dimension (SD) over
Hologic Selenia (HS) exams. Analysis showed that this AUC paradox occurred when
a dataset attribute had values with a higher cancer prevalence (dataset bias)
and the model consequently assigned a higher probability to these attribute
values (model bias). Stratification and balancing cancer prevalence can
mitigate shortcuts during evaluation.
Dataset and model bias can introduce shortcuts and the AUC paradox,
potentially pervasive issues within the healthcare AI space. Our methods can
verify and mitigate shortcuts while providing a clear understanding of
performance.
[LINK]
http://arxiv.org/abs/2303.16417v1
[DATE]
2023-03-29 02:50:59+00:00
[CATEGORIES]
cs.LG
[AUTHORS]
Shun Muroga, Yasuaki Miki, Kenji Hata [ABSTRACT]
We present a multimodal deep learning (MDL) framework for predicting physical
properties of a 10-dimensional acrylic polymer composite material by merging
physical attributes and chemical data. Our MDL model comprises four modules,
including three generative deep learning models for material structure
characterization and a fourth model for property prediction. Our approach
handles an 18-dimensional complexity, with 10 compositional inputs and 8
property outputs, successfully predicting 913,680 property data points across
114,210 composition conditions. This level of complexity is unprecedented in
computational materials science, particularly for materials with undefined
structures. We propose a framework to analyze the high-dimensional information
space for inverse material design, demonstrating flexibility and adaptability
to various materials and scales, provided sufficient data is available. This
study advances future research on different materials and the development of
more sophisticated models, drawing us closer to the ultimate goal of predicting
all properties of all materials. [COMMENTS]
38 pages, 17 figures, 1 table [LINK]
http://arxiv.org/abs/2303.16412v1 [DATE]
2023-03-29 02:42:17+00:00 [CATEGORIES]
cs.LG