<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://beta.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Towards Sentiment Analysis for Romanian Twitter Content

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 28 Sep 2022Publisher:MDPI AGJournal:Algorithms, volume 15, page 357 (eissn: 1999-4893,

Authors: Dan Claudiu Neagu; Andrei Bogdan Rus; Mihai Grec; Mihai Augustin Boroianu; Nicolae Bogdan; Attila Gal;

doi: 10.3390/a15100357

Towards Sentiment Analysis for Romanian Twitter Content

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

With the increased popularity of social media platforms such as Twitter or Facebook, sentiment analysis (SA) over the microblogging content becomes of crucial importance. The literature reports good results for well-resourced languages such as English, Spanish or German, but open research space still exists for underrepresented languages such as Romanian, where there is a lack of public training datasets or pretrained word embeddings. The majority of research on Romanian SA tackles the issue in a binary classification manner (positive vs. negative), using a single public dataset which consists of product reviews. In this paper, we respond to the need for a media surveillance project to possess a custom multinomial SA classifier for usage in a restrictive and specific production setup. We describe in detail how such a classifier was built, with the help of an English dataset (containing around 15,000 tweets) translated to Romanian with a public translation service. We test the most popular classification methods that could be applied to SA, including standard machine learning, deep learning and BERT. As we could not find any results for multinomial sentiment classification (positive, negative and neutral) in Romanian, we set two benchmark accuracies of ≈78% using standard machine learning and ≈81% using BERT. Furthermore, we demonstrate that the automatic translation service does not downgrade the learning performance by comparing the accuracies achieved by the models trained on the original dataset with the models trained on the translated data.

Related Organizations

Babeș-Bolyai University
Romania
"UNIVERSITATEA BABES BOLYAI
Romania
Technical University of Cluj-Napoca
Romania

Keywords

Industrial engineering. Management engineering, Twitter, deep learning, QA75.5-76.95, T55.4-60.8, machine learning, sentiment analysis, Electronic computers. Computer science, underrepresented language, natural language processing; sentiment analysis; underrepresented language; machine learning; deep learning; Twitter, natural language processing

3 Research products, page 1 of 1

deap software on GitHub
IsRelatedTo
RoWordNet software on GitHub
IsRelatedTo
lime software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%