Discourse relations are hard to define, since they lie in the borders between semantics and pragmatics. As the fields of machine learning and data mining flourish in a number of fields, and corpus processing and analysis are gradually accepted as standard part of one’s linguistic research, this paper ambitions to point out the importance of integrating techniques from both fields in order to enhance our understanding of what discourse relations are.
First, I will present Segmented Discourse Representation Theory (SDRT) [Asher 1993, Asher & Lascarides 2003], a well-founded theory of discourse semantics, on which C58, the first corpus annotated for discourse relations in Greek, is based on. SDRT is one of the most well-articulated discourse semantic theories and defines clear formal criteria for the identi cation of discourse relations. The following section describes the main characteristics of the corpus C58. After a short ‘excursion’ to the annotation principles followed for the two relations, Commentary and Elaboration, the next two sections are devoted to two classification models applied to a subset of C58, namely the case of two related utterances with two transitive verbs each, that will be called here the transitive-transitive case. Apart from being the most frequent discourse relation in C58, the transitive-transitive case has been chosen mainly because it offers the biggest number of possible predictors due to the verbal valency in both utterances, namely the thematic role of the subject and object for the first and the second utterance respectively. Other independent variables that enter the two modeling functions are the grammatical aspects of the utterances’ verbs, the distance between the related utterances and the subject type (null vs. overt) of the verbs.