CFP

PaDAWan 2024: Portuguese Data Augmentation Workshop

STIL

Belém, Brazil, November 17-21, 2024

Submission link	https://easychair.org/conferences/?conf=padawan2024
Abstract registration deadline	September 11, 2024
Submission deadline	September 11, 2024

Topics: data augmentation portuguese large language models

********************************************************

PaDAWan 2024: 1st Portuguese Data Augmentation Workshop (PaDAWan)

Belém, Pará, Brazil

collocated with STIL 2024

November 17th to 21th 2024

1st Call for Papers

endereço-do-site

********************************************************

The Portuguese Data Augmentation Workshop (PaDAWan) aims to gather the community working on Data Augmentation, particularly employing Large Language Models (LLMs), in Portuguese.

With the advancement of LLMs, many traditional Natural Language Processing (NLP) tasks are being revisited. One traditional key challenge is gathering high-quality data for training and evaluating specific tasks. This has often been the main bottleneck in developing machine learning models. Data augmentation has become a crucial technique for enhancing the performance of these models across various tasks, especially when reliable data are limited. Nowadays, particularly with the use of LLMs, it has become feasible to apply sophisticated text data augmentation techniques effectively.

The use of LLMs is still very restricted due to several factors, such as costs, privacy concerns, latency issues, and other challenges. Given the current scenario, using LLMs to generate synthetic data to train classical models for specific tasks is a viable approach. Moreover, while many works in the industry consider synthetic data, scientific discussions on methods and evaluations are not always aligned with market necessities.

This workshop aims to delve into the use of LLMs for data augmentation, exploring possible methods, evaluation techniques, and associated ethical considerations. The goal is to bring together both industry professionals and academics to deeply discuss the topic.

We invite researchers to submit papers that discuss challenges and advances in Portuguese data generation, including but not limited to the following topics:

Data creation and data labeling
Data reformation and anonymization
Data contamination and noise
Co-annotation
Augmented data evaluation and controlled data augmentation
Ethics in generated data and unbiased data generation
Practical applications or case studies of data augmentation techniques
Challenges in Portuguese Synthetic/Augmented Data

*Submissions*

We invite both unpublished work, to be published in a special section of STIL Proceedings, and lightning talks proposals highlighting already published work.

Submission deadline: September 10st.

For more information, please access: https://sites.google.com/view/padawan-2024/

For any doubts, please write to padawan.workshop@gmail.com

Livy Real, Evandro Fonseca, Paula Cardoso