PaDAWan 2024: Portuguese Data Augmentation Workshop STIL Belém, Brazil, November 17-21, 2024 |
Submission link | https://easychair.org/conferences/?conf=padawan2024 |
Abstract registration deadline | September 11, 2024 |
Submission deadline | September 11, 2024 |
********************************************************
PaDAWan 2024: 1st Portuguese Data Augmentation Workshop (PaDAWan)
Belém, Pará, Brazil
collocated with STIL 2024
November 17th to 21th 2024
1st Call for Papers
endereço-do-site
********************************************************
The Portuguese Data Augmentation Workshop (PaDAWan) aims to gather the community working on Data Augmentation, particularly employing Large Language Models (LLMs), in Portuguese.
With the advancement of LLMs, many traditional Natural Language Processing (NLP) tasks are being revisited. One traditional key challenge is gathering high-quality data for training and evaluating specific tasks. This has often been the main bottleneck in developing machine learning models. Data augmentation has become a crucial technique for enhancing the performance of these models across various tasks, especially when reliable data are limited. Nowadays, particularly with the use of LLMs, it has become feasible to apply sophisticated text data augmentation techniques effectively.
The use of LLMs is still very restricted due to several factors, such as costs, privacy concerns, latency issues, and other challenges. Given the current scenario, using LLMs to generate synthetic data to train classical models for specific tasks is a viable approach. Moreover, while many works in the industry consider synthetic data, scientific discussions on methods and evaluations are not always aligned with market necessities.
This workshop aims to delve into the use of LLMs for data augmentation, exploring possible methods, evaluation techniques, and associated ethical considerations. The goal is to bring together both industry professionals and academics to deeply discuss the topic.
We invite researchers to submit papers that discuss challenges and advances in Portuguese data generation, including but not limited to the following topics:
- Data creation and data labeling
- Data reformation and anonymization
- Data contamination and noise
- Co-annotation
- Augmented data evaluation and controlled data augmentation
- Ethics in generated data and unbiased data generation
- Practical applications or case studies of data augmentation techniques
- Challenges in Portuguese Synthetic/Augmented Data
*Submissions*
We invite both unpublished work, to be published in a special section of STIL Proceedings, and lightning talks proposals highlighting already published work.
Submission deadline: September 10st.
For more information, please access: https://sites.google.com/view/padawan-2024/
For any doubts, please write to padawan.workshop@gmail.com
Livy Real, Evandro Fonseca, Paula Cardoso