Varieties in Contact/Varietà in Contatto/Varietäten im Kontakt

The acronym VinKo stands for Varieties in Contact, the research project from the Universities of Verona, Trento and Bolzano-Bozen (2018-2022), which laid the foundation for the AlpiLink project. VinKo featured a research infrastructure with the purpose of documenting and analyzing dialects and minority languages in northeastern Italy (regions of Trentino-South Tyrol and Veneto). The varieties in contact were the Tyrolean dialects and the German minority languages Mòcheno, Cimbrian, Saurano and Sappadino on the one hand and the Venetan or Veneto-Lombard and Ladin varieties on the other. The infrastructure included an Internet platform for the collection and representation of linguistic data, a repository for the permanent preservation of data, and various public outreach initiatives, e.g. a project with Venetan secondary schools called VinKiamo.

The goal of VinKo was to enable comparisons of German and Italian language structures; systematic geolinguistic analyses of variation are available so far on the morphology of articles and pronouns (Kruijt 2022), on subject expletives in weather verbs (Tomaselli & Bidese 2023), and on the expletive article in personal names (Rabanus 2023). For example, for research into the use of a definite article with a personal name (e.g. the John, the Mary, instead of John and Mary), the following sentence may be used: In diesem Saal ist Maria die schönste/In questa stanza Maria è la più bella (S0015).

The Tyrolean dialect of Völs am Schlern translates this sentence as In dem Saal isch die Maria die schianschte.


In the Venetan dialect of Grezzana in the province of Verona this is translated as: In questo locale la Maria l’è la più bela.


Rabanus (2023) can show with the VinKo data that in the province of Trento, personal names have expletive articles in about 74% of the cases, in the region of Veneto only in about 22% of the cases. Thus, interestingly, the border between the provinces of Trento and Verona is more relevant for the use of the expletive article than the border between the provinces of Trento and Bolzano (expletive articles in about 90% of the cases), which corresponds to the German-Italian language border.

Data collection via crowdsourcing

VinKo’s data collection was done by crowdsourcing via the multilingual (German-Italian-English) Internet platform, where participants could provide linguistic data by recording audio responses to the online linguistic questionnaire. The linguistic questionnaire had three specific tasks; the pronunciation of dialect words, translation of sentences from standard (German or Italian) to dialect or minority language, some presented in isolation and some embedded in picture stories, and free speech production by completing stimulus sentences related to the content of the picture stories (for details of data collection, see Kruijt, Cordin & Rabanus, 2023). By January 2023, 186,135 audio recordings of sentences or single words produced by 1,392 informants from 377 different locations had thus been collected. Collaboration between the local linguistic communities, the general public and the research team is a vital part of the crowdsourcing methodology. For the language communities, the project was intended to help increase their own linguistic-cultural self-awareness by documenting dialects and minority languages as intangible cultural heritage in the digital space. For the general public, an open access map provided a way to discover local linguistic diversity and gain an awareness and appreciating of the intangible cultural heritage of the Triveneto area. The map is currently still available via the AlpiLinK website, in the “Listen and Explore” section.

FAIR research data: repository

For the research community, the data has been archived in the VinKo Corpus (Rabanus et al. 2023; handle: which is an open access repository (license: CC BY-NC-SA 4.0 International), aiming to adhere as much as possible to the FAIR data principles. The data has been reorganized and archived in this external database, independently from the university, in order to ensure long-term preservation and accessibility of the material even after the end of the project. The repository is stored at the Eurac Research Clarin Centre (ERCC), based in Bolzano-Bozen (South Tyrol). As part of the European CLARIN infrastructure, the centre adheres to well-defined international standards for data curation.

This means that the examples cited above can be directly tied to the audio data using their respective ID’s S0015_tir_U0372 (Völs am Schlern) and S0015_vec_U0556 (Grezzana). For the logic of the designations and for finding the sound recordings in the repository, see Kruijt, Rabanus & Tagliani (2023, pp. 212-214). At the time of writing, the VinKo Corpus contains 13,617 audio files for Trentino dialects, 518 for Mòcheno, 743 for Cimbrian, 194 for Gardenese Ladin, 1,242 for Badiot Ladin, 499 for Fassan Ladin, 2,455 for Fodom Ladin, 1,680 for Anpezan Ladin, 305 for Saurano, 90,918 for Venetan dialects, and 13,294 for Tyrolean dialects.

In the context of VinKo also the data collected in the traditional fieldwork of the AThEME project – in which the first prototype of VinKo has been developed (cfr. Cordin et al. 2019) – have been made accessible in a repository. The AThEME Verona-Trento Corpus (Tomaselli et al. 2022; handle has exactly the structure of the VinKo Corpus, hence, it allows the retrieval, analysis and citation of the data in exactly the way as described for the VinKo Corpus. For an exemplification of the usage and a comparison of the validity of the data collected in traditional fieldwork and via crowdsourcing see Kruijt, Cordin & Rabanus (2023).


  • Cordin, Patrizia, Stefan Rabanus, Birgit Alber, Antonio Mattei, Jan Casalicchio, Alessandra Tomaselli, Ermenegildo Bidese & Andrea Padovan. VinKo. 2. In Thomas Krefeld & Roland Bauer (eds.) (2019): Lo spazio comunicativo dell’Italia e delle varietà italiane, Version 67. Korpus im Text.
  • Kruijt, Anne (2022): Crowdsourcing language contact: pronoun and article morphology in Trentino-South Tyrol and Veneto. PhD Dissertation, University of Verona.
  • Kruijt, Anne, Patrizia Cordin & Stefan Rabanus (2023): On the validity of crowdsourced data. In Elissa Pustka, Carmen Quijada Van den Berghe & Verena Weiland (eds.): Corpus Dialectology. Amsterdam/Philadelphia: Benjamins, 10-33.
  • Kruijt, Anne, Stefan Rabanus & Marta Tagliani (2023). The VinKo-Corpus: Oral data from Romance and Germanic local varieties of Northern Italy. In Marc Kupietz & Thomas Schmidt (eds.): Neue Entwicklungen in der Korpuslandschaft der Germanistik: Beiträge zur IDS-Methodenmesse 2022. (= Korpuslinguistik und interdisziplinäre Perspektiven auf Sprache (CLIP) 11). Tübingen: Narr, 203-212.
  • Rabanus, Stefan (2023): Nome di battesimo e articolo espletivo – crowdsourcing e cartografica linguistica nello studio della variazione linguistica in Trentino-Alto Adige e Veneto. In Robert Schöntag & Laura Linzmeier (eds.): Neue Ansätze und Perspektiven zur sprachlichen Raumkonzeption und Geolinguistik. Frankfurt: Lang.
  • Rabanus, Stefan, Anne Kruijt, Marta Tagliani, Alessandra Tomaselli, Andrea Padovan, Birgit Alber, Patrizia Cordin, Roberto Zamparelli & Barbara Maria Vogt (2023): VinKo (Varieties in Contact) Corpus v1.2. Bolzano-Bozen: ERCC.
  • Tomaselli, Alessandra & Ermenegildo Bidese (2023): Fortune and decay of lexical expletives in Germanic and Romance along the Adige River. Languages 8(1), 44.
  • Tomaselli, Alessandra, Anne Kruijt, Birgit Alber, Ermenegildo Bidese, Jan Casalicchio, Patrizia Cordin, Joachim Kokkelmans, Andrea Padovan, Stefan Rabanus & Francesco Zuin (2022), AThEME Verona-Trento Corpus. Bolzano-Bozen: ERCC.