Challenge 2018

Background and Relevance for the Semantic Web community

The development of Web 2.0 has given users important tools and opportunities to create, participate and populate blogs, review sites, web forums, social networks and online discussions. Tracking emotions and opinions on certain subjects allows identifying users’ expectations, feelings, needs, reactions against particular events, political view towards certain ideas, etc. Therefore, mining, extracting and understanding opinion data from text that reside in online discussions is currently a hot topic for the research community and a key asset for industry.

The produced discussion spanned a wide range of domains and different areas such as commerce, tourism, education, health, etc. Moreover, this comes back and feeds the Web 2.0 itself thus bringing to an exponential expansion.

This explosion of activities and data brought to several opportunities that can be exploited in both research and industrial world. One of them concerns the mining and detection of users’ opinions which started back in 2003 (with the classical problem of polarity detection) and several variations have been proposed. Therefore, today there are still open challenges that have raised interest within the scientific community where new hybrid approaches are being proposed that, making use of new lexical resources, natural language processing techniques and semantic web best practices, bring substantial benefits.

Computer World [1] estimates that 70%-80% of all digital data consists of unstructured content, much of which is locked away across a variety of different data stores, locations and formats. Besides, accurately analyzing the text in an understandable manner is still far from being solved as this is extremely difficult. In fact, mining, detecting and assessing opinions and sentiments from natural language involves a deep (lexical, syntactic, semantic) understanding of most of the explicit and implicit, regular and irregular rules proper of a language.

Existing approaches are mainly focused on the identification of parts of the text where opinions and sentiments can be explicitly expressed such as polarity terms, expressions, statements that express emotions. They usually adopt purely syntactical approaches and are heavily dependent on the source language and the domain of the input text. It follows that they miss many language patterns where opinions can be expressed because this would involve a deep analysis of the semantics of a sentence. Today, several tools exist that can help understanding the semantics of a sentence. This offers an exciting research opportunity and challenge to the Semantic Web community as well. For example, sentic computing is a multi-disciplinary approach to natural language processing and understanding at the crossroads between affective computing, information extraction, and common-sense reasoning, which exploits both computer and human sciences to better interpret and process social information on the Web.

Therefore, the Semantic Sentiment Analysis Challenge looks for systems that can transform unstructured textual information to structured machine processable data in any domain by using recent advances in natural language processing, sentiment analysis and semantic web.

By relying on large semantic knowledge bases, Semantic Web best practices and techniques, and new lexical resources, semantic sentiment analysis steps away from blind use of keywords, simple statistical analysis based on syntactical rules, but rather relies on the implicit, semantics features associated with natural language concepts. Unlike purely syntactical techniques, semantic sentiment analysis approaches are able to detect sentiments that are implicitly expressed within the text, topics referred by those sentiments and are able to obtain higher performances than pure statistical methods.

[1] Computer World, 25 October 2004, Vol. 38, NO 43.

Submissions

Two steps submission

First step:

  • Abstract: no more than 200 words.
  • Paper (max 4 pages): containing the details of the system, including why the system is innovative, which features or functions the system provides, what design choices were made and what lessons were learned, how the semantics has been employed and which tasks the system addresses. Industrial tools with non disclosure restrictions are also allowed to participate, and in this case they are asked to:
    • explain even at a higher level their approach and engine macro-components, why it is innovative, and how the semantics is involved;
    • provide free access (even limited) for research purposes to their engine, especially to make repeatable the challenge results or other experiments possibly included in their paper

Second step (for accepted systems only):

  • Paper (max 15 pages): full description of the submitted system.
  • Web Access: applications should be either accessible via web or downloadable or anyway a RESTful API must be provided to run the challenge testset. If an application is not publicly accessible, password must be provided for reviewers. A short set of instructions on how to use the application or the RESTFul API must be provided as well.
  • The authors will have the possibility to present a poster and a demo advertising their work or networking during a dedicated session.

Please note that:

  • Papers must comply with the LNCS style
  • Papers are submitted in PDF format via the EasyChair submission pages (remember to check the topic Challenge).
  • Accepted papers will be published by Springer.
  • Extended versions of best systems will be invited to a journal special issue (to be determined yet).
  • All the participants are invited to submit a paper containing the research aspects of their systems to the ESWC 2018 Workshop on Emotions, Modality, Sentiment Analysis and the Semantic Web (http://www.maurodragoni.com/research/opinionmining/events/)

Important Dates

  • Friday March 9th, 2018, 23:59 (CET): First step submission
  • Monday April 9th, 2018, 23:59 (CET): Notification of acceptance
  • Monday April 23rd, 2018, 23:59 (CET): Camera ready papers for the conference (5 pages)
  • Monday May 21st, 2018, 23:59 (CET): Test data published
  • Thursday May 24th, 2018, 23:59 (CET): Submission of test results
  • June 3rd – 7th, 2018: The Challenge takes place at ESWC-18
  • Friday July 6th, 2018: Camera ready paper for the challenge post proceedings (15 pages document, tentative deadline)

Challenge Criteria

This challenge focuses on the introduction, presentation, development and discussion of novel approaches to semantic sentiment analysis. Participants will have to design a semantic opinion-mining engine that exploits Semantic Web knowledge bases, e.g., ontologies, DBpedia, etc., to perform multi-domain sentiment analysis. The main motivation for this challenge is to go beyond a mere word-level analysis of natural language text and provide novel semantic tools and techniques that allow a more efficient passage from (unstructured) natural language to (structured) machine-processable data in potentially any domain.

The submitted systems must provide an output according to Semantic Web standards (RDF, OWL, etc.). Systems must have a semantic flavour (e.g., by making use of Linked Data or known semantic networks within their core functionalities) and authors need to show how the introduction of semantics improves the performance of their methods. Existing natural language processing methods or statistical approaches can be used too as long as the semantics plays a role within the core approach and improves the precision (engines based merely on syntax/word-count will be excluded from the competition). The target language is English and multi-language capability is a plus.

 

Tasks

The Semantic Sentiment Analysis Challenge is defined in terms of different tasks. The first task is elementary whereas the others are more advanced.

Task #1: Polarity Detection

The basic idea of this task is the binary polarity detection, i.e. for each review of the evaluation dataset (test set), the goal is to detect its polarity value (positive OR negative). The participant semantic opinion-mining engines will be assessed according to precision, recall and F-measure computed on the confusion matrix of detected polarity values. In the final ranking to determine the winner, tools will be ordered according to the average F-measure calculated considering the F-measure obtained on each class. Participants can assume that there will be no neutral reviews. The output format for such a task is the following:

Task #1

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Sentences>
    <sentence id="apparel_0">
        <text>
        GOOD LOOKING KICKS IF YOUR KICKIN IT OLD SCHOOL LIKE ME. AND COMFORTABLE. AND 
        RELATIVELY CHEAP. I'LL ALWAYS KEEP A PAIR OF STAN SMITH'S AROUND FOR WEEKENDS
        </text>
        <polarity>
        positive
        </polarity>
    </sentence>
    <sentence id="apparel_1">
        <text>
        These sunglasses are all right. They were a little crooked, but still cool..
        </text>
        <polarity>
        positive
        </polarity>
    </sentence>
</Sentences>

Input is the same without the polarity tag. Dataset will be composed by one million of reviews collected from the Amazon web site and split in 20 different categories: Amazon Instant Video, Automotive, Baby, Beauty, Books, Clothing Accessories, Electronics, Health, Home Kitchen, Movies TV, Music, Office Products, Patio, Pet Supplies, Shoes, Software, Sports Outdoors, Tools Home Improvement, Toys Games, and Video Games. The classification of each review (positive or negative) has been done according to the guidelines used for the construction of the Blitzer dataset [2]. Participants will evaluate their system by applying a cross-fold validation over the dataset where each fold is clearly delimited. The script to compute Precision, Recall, and F-Measure and the confusion matrix will be provided to participants through the website of the challenge.
The participants are asked to train their systems by using a combination of word embeddings that have been already generated by the organizers and here provided. The aim is not only to validate the quality of their systems (precision/recall analysis) but also which combination of embeddings works better.

[2] Blitzer J., Dredze M., Pereira F.. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association of Computational Linguistics (ACL), 2007.

Task #2: Polarity Detection in presence of metaphorical language

The basic idea of this task is polarity detection (positive or negative or neutral) of tweets containing expressions such as irony, metaphors, sarcasm. The proposed semantic opinion-mining engines will be assessed according to precision, recall and F-measure computed on the confusion matrix of detected polarity values (positive OR negative) for each tweet of the evaluation dataset. The output format for such a task is the following:

Task #2

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Sentences>
    <sentence id="apparel_0">
        <text>
        I just love working for 6.5 hours without a break or anything. 
        Especially when I'm on my period and have awful cramps.
        </text>
        <polarity>
        negative
        </polarity>
    </sentence>
    <sentence id="apparel_1">
        <text>
        I literally love Stephen A smith haha he's hilarious
        </text>
        <polarity>
        positive
        </polarity>
    </sentence>
</Sentences>

Input is the same without the polarity tag. Dataset will be composed by three thousands of tweets collected from Twitter and already classified with [positive,negative,neutral] polarity values. The manual annotation of each tweet will be performed using Crowdflower [3]. The script to compute Precision, Recall, and F-Measure will be provided to participants through the website of the challenge.

[3] https://www.crowdflower.com/

Task #3: Aspect-Based Sentiment Analysis

The output of this Task will be a set of aspects of the reviewed product and a binary polarity value associated to each of such aspects. So, for example, while for the Task #1 an overall polarity (positive or negative) is expected for a review about a mobile phone, this Task requires a set of aspects (such as speaker',touchscreen’, `camera’, etc.) and a polarity value (positive OR negative) associated with each of such aspects. Engines will be assessed according to both aspect extraction and aspect polarity detection using precision, recall and F-measure similarly as performed during the first Concept-Level Sentiment Analysis Challenge held during ESWC2014 and re-proposed at SemEval 2015 Task12 [4]

[4] http://alt.qcri.org/semeval2015/task12/

The output format for such a task is the following:

Task #3

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Review rid="1">
        <sentences>
            <sentence id="348:0">
                <text>Most everything is fine with this machine: speed, capacity, build.</text>
                <Opinions>
                    <Opinion aspect="MACHINE" polarity="positive"/>
                </Opinions>
            </sentence>
            <sentence id="348:1">
                <text>The only thing I don't understand is that the resolution of the 
       	          screen isn't high enough for some pages, such as Yahoo!Mail.
                </text>
                <Opinions>
                    <Opinion aspect="SCREEN" polarity="negative"/>
                </Opinions>
            </sentence>
            <sentence id="277:2">
                <text>The screen takes some getting use to, because it is smaller
                 than the laptop.</text>
                <Opinions>
                    <Opinion aspect="SCREEN" polarity="negative"/>
                </Opinions>
            </sentence>
        </sentences>
    </Review>

Input is the same without the Opinions tag and its descendants nodes. As training set, we will use the dataset provided by the last two editions of SemEval; as test set we will extract around 100 sentences from the web where we will annotate aspects and their related polarity. Two experts will annotate the sentences and disagreements will be analyzed. Precision, Recall and F-Measure will be computed with respect to the extraction of concepts and the computation of their polarity. The script to compute Precision, Recall, and F-Measure will be provided to participants through the website of the challenge.

 

Judging and Prizes

During the competition, the new test sets will be released and participants will have to send their output within the next 4 days. Chairs will use the evaluation scripts on those results and the related annotated test sets in order to compute precision, recall in order to come up with a scoreboard of the systems for each task.

One award will be given for each task (the winner of each task will be the one with the highest score in precision-recall analysis) and one more award will be given for the most innovative approach (the system with the best use of common-sense knowledge and semantics and innovative nature of approach).

The awards will consist in Springer vouchers and cash prize (depending on sponsors availability).

Besides, each challenge paper will be included in a Springer book as already done in challenge editions of 2014, 2015, 2016, and 2017.

Organizers

  • Mauro Dragoni, Fondazione Bruno Kessler, Trento, Italy. dragoni@fbk.eu
    Mauro Dragoni is a Post Doctoral Researcher at Fondazione Bruno Kessler in Trento since 2011. He received his Ph.D. degree in Computer Science from the Universita` degli Studi di Milano in 2010 and his major research interests concern the Computational Intelligence and Knowledge Management fields applied to the Information Retrieval, Ontology Matching, and Sentiment Analysis topics. In particular, he focuses on applying state of the art research paradigms to the implementation of real-world knowledge management systems. He co-organized the ESWC 2015 edition of Challenge on Concept-Based Sentiment Analysis and participated at the 2014 edition by winning the awards of Most Innovative System and Best Performer on the Aspect-Based Sentiment Analysis task.
  • Erik Cambria, Nanyang Technological University, Singapore. cambria@ntu.edu.sg
    Erik Cambria received his BEng and MEng with honors in Electronic Engineering from the University of Genoa in 2005 and 2008, respectively. In 2012, he was awarded his PhD in Computing Science and Mathematics following the completion of an EPSRC project in collaboration with MIT Media Lab, which was selected as impact case study by the University of Stirling for the UK Research Excellence Framework (REF2014). After working at HP Labs India, Microsoft Research Asia, and NUS Temasek Labs, in 2014 Dr Cambria joined the School of Computer Science and Engineering at NTU as Assistant Professor. His current affiliations also include Rolls-Royce@NTU, A*STAR IHPC, MIT Synthetic Intelligence Lab, and the Brain Sciences Foundation. He is Associate Editor of Elsevier KBS and IPM, Springer AIRE and Cognitive Computation, IEEE CIM, and Editor of the IEEE IS Department on Affective Computing and Sentiment Analysis. Dr Cambria is also recipient of several awards, e.g., the Temasek Research Fellowship, and is involved in many international conferences as Workshop Organizer, e.g., ICDM and KDD, PC Member, e.g., AAAI and ACL, Program and Track Chair, e.g., ELM and FLAIRS, and Keynote Speaker, e.g., CICLing.

Program Committee

  • Aldo Gangemi, University of Paris13 and CNR (France and Italy)
  • Valentina Presutti, CNR (Italy)
  • Malvina Nissim, University of Bologna (Italy)
  • Hassan Saif, Open University (UK)
  • Rada Mihalcea, University of North Texas (USA)
  • Ping Chen, University of Houston-Downtown (USA)
  • Yongzheng Zhang, LinkedIn Inc. (USA)
  • Giuseppe Di Fabbrizio, Amazon Inc. (USA)
  • Soujanya Poria, Nanyang Technological University (Singapore)
  • Yunqing Xia, Tsinghua University (China)
  • Rui Xia, Nanjing University of Science and Technology (China)
  • Jane Hsu, National Taiwan University (Taiwan)
  • Rafal Rzepka, Hokkaido University (Japan)
  • Amir Hussain, University of Stirling (UK)
  • Alexander Gelbukh, National Polytechnic Institute (Mexico)
  • Bjoern Schuller, Technical University of Munich (Germany)
  • Amitava Das, Samsung Research India (India)
  • Dipankar Das, National Institute of Technology (India)
  • Stefano Squartini, Marche Polytechnic University (Italy)
  • Cristina Bosco, University of Torino (Italy)
  • Paolo Rosso, Technical University of Valencia (Spain)
  • Sergio Consoli, Philips Research (Netherlands)

MAILING LIST

To ask questions and information please join our Google Group (https://groups.google.com/forum/#!forum/semantic-sentiment-analysis). After you join the group, you can post messages to the topic “ESWC2018 Semantic Sentiment Analysis Challenge