Open Research Data in the Field of Phraseology


The aim of this paper is to investigate the concept of open research data in phraseology. Namely, the key factors of open science within European Union are: digital technology, belief in free circulation and criticism of ideas, as well as considering the role of data by researches. Digital technology nowadays enables a fast exchange and new ways of sharing and accessing the data. The existing exchange of research data in the field of phraseology is usually realized through publications (e.g. research articles, dictionaries). The complexity of the form, meaning and usage provides a challenge for describing phrasemes in lexicographic sources. So far, there are open research data in the field of linguistics, but not phraseology in particular.

The main research questions are: What are open research data in the field of phraseology? Which metadata elements are important for phraseology in the context of openness?


The analysis was done according to approaches in the field of phraseology (1), demands defined through FAIR principles, and to generic tasks taken by the user referring to: find, identification, select and reuse. The generic tasks derive from the FRBR concept.

The phraseme in digital environment is specified by a digital object, which is described by metadata elements. These elements are analyzed and identified on two levels. The first level refers to scientific content, and the second one is related to its digital representation

Phrasemes of the German and the Croatian language of fashion and football language are used as the corpus for this research.

Results and Discussion

Phrasemes are multiword combinations of various forms whose constituents create a new meaning (e.g. as cool as cucumber, someone’s right hand, under one roof, to be under the weather). Their main features, beside polylexicality, are stability and idiomaticity, i.e. their form is fixed and the meaning is figurative. Moreover, they can be used in stylistically various contexts and can create or attribute to expressivity of texts (2). Phrasemes can be used in various text types, e.g. in journalistic texts, literature, slogans, but are also often used in spoken language. They are considered a challenge in foreign language teaching and especially in translation and transcultural studies, due to the fact that they are usually described as culturally specific. Some phrasemes from different languages share their origin (e.g. the Bible, folk tales, fables), but most of them are language specific.

Considering phrasemes as data means that they are, just like other lexemes of a language, the result of writing down what had been heard, read or written. They can be noted in monolingual, bilingual, or multilingual dictionaries as well as in corpora, and are described according to their meaning, stylistic markedness, and in some cases the context of their use as well as its source are given. The existence of phrasemes as data, besides in dictionaries, can be confirmed in literary works, magazines, newspapers, different types of texts, and in everyday speech. Regarding their figurative meaning, some phrasemes can be linked to concepts such as space or time. They can also serve to entertain, decrease or increase the negative meaning, or to present something vividly. Phrasemes derive from the way of life in a certain period of human history, from cultural specificity, beliefs or customs.

Open research data are the results of scientific research, they can be freely digitally accessed, are published in a machine readable form and can be reused. According to Pampel and Dallmeier-Tiessen (3), open research data are available on the Internet and users can access, copy, analyze, re-process, and use them for any purpose. An important element of open research data are the following FAIR principles: they should be findable, accessible, interoperable, and reusable (4). Sharing research data includes various users, and requirements like searchability, availability, and usage (5). The importance of metadata for open research data is given through various country and research group directions, as well as through scientific research. Metadata are used to present all data related to the content (e.g. what the object includes), the context (e.g. who made the object), and to the structure (e.g. information about the object) (6). In order to access phraseological data or a group of data in the digital environment, they need to be described with the appropriate metadata.

The research identified and described the initial metadata elements that can help to exchange and search phraseological data in digital environments. The elements can be divided in two categories: research and digital representation. The research category consists of the following groups of elements: basic elements, contextual elements, methodological elements, and specific elements: The basic elements comprise the persistent identifier, the author/organization, the source, the phraseme, its meaning, its structure, the phraseological class, the grade of idiomaticity, the grade of motivation, modification, semantic fields, stylistic markedness, and equivalents in different languages. Contextual elements refer to the type of text and topic. Methodological elements refer to descriptive and contrastive method, as well as to the approach of the Systemic Functional Linguistics - Appraisal Theory, all used in the research of the phrasemes of fashion and football language. Specific elements, with regard to the investigated corpus, comprise position in the text, producer of the phraseme, the object described with the phraseme, the behaviour described with the phraseme, loanwords as components, emotions expressed with the phraseme. The second category, the digital representation, refers to datastream and elements related to the version, the organization, legal information and access rights.


This investigation shows that phrasemes can be analyzed as open research data. They have important characteristics and properties for exchange among researchers in the field of phraseology. Basic categories and groups of elements were identified. Further investigation will include the evaluation of results by other researchers and users.

  • Anita Pavić Pintarić
  • Neven Pintarić