Please click on one of the questions below
What is a Corpus?
Essentially, a corpus is a database of words. A corpus is not the same as a dictionary. When a user searches for a word in a corpus, they are able to see many examples of the words in their original context, as they were used by the original author or speaker. As a corpus is an electronic resource, users can also find out, for instance, how frequently a specific word is used, or create tailored quizzes to help with language learning. In a nutshell, a corpus is a valuable electronic tool which allows us to better understand our language.
What is CorCenCC?
CorCenCC is an inter-disciplinary and multi-institutional project that will create a large scale, open source corpus of contemporary Welsh language. CorCenCC will break new ground both as a language resource and as a model of corpus construction. It will be the first large-scale corpus of Welsh representative of language use across communication types (circa 4m spoken words, 4m written, 2m e-language), genres, language varieties (regional and social) and contexts, with contributors representative of over half a million Welsh speakers in the UK. It will also forge transformative methods for corpus creation, impact and sustainability.
The creation of CorCenCC will be community-driven, harnessing opportunities afforded by mobile technologies, specifically crowdsourcing and community collaboration. Impact will be generated through a user-informed design, so that basic corpus functionalities for the querying of language use can be integrated into a bespoke toolkit for teachers and learners (within this project) and interface specifications for other user groups (e.g. translators, publishers, policy-makers, language technology developers, academics and others) beyond the project.
CorCenCC is funded by the ESRC and AHRC (Grant Ref ES/M011348/1). The project will be led by Dawn Knight, at the Centre for Language and Communication Research, Cardiff University. The academic project team comprises:
- Dawn Knight, Cardiff University (School of English, Communication and Philosophy)
- Tess Fitzpatrick, Swansea University (Department of English Language and Applied Linguistics)
- Irena Spasic, Cardiff University (School of Computer Science and Informatics, Cardiff University)
- Jeremy Evas, Cardiff University (School of Welsh, Cardiff University)
- Steve Morris, Swansea University (Department of Welsh, Swansea University)
- Mark Stonelake, Swansea University (Academi Hywel Teifi, Swansea University)
- Paul Rayson, Lancaster University (School of Computing and Communications, LancasterUniversity)
- Enlli Thomas, Bangor University (School of Education, Bangor University)
Other contributors and collaborators include computer programmers, Welsh language experts and a range of external stakeholders including the Welsh Government, National Assembly for Wales, Welsh Joint Education Committee, Welsh for Adults, Gwasg y Lolfa, and University of Wales Dictionary of the Welsh Language.
See also the About section of this website
How do I use the corpus?
Once the corpus has been published, we will provide detailed instructions on how to use it and its associated tools.
I would like to use the mobile phone app but don’t own an iPhone or iPad – what can I do?
We are in the process of developing a version of the app for Android phones. When this becomes available, details will be advertised on Twitter, on Facebook, in our project newsletter and on our home page.
I have a problem with the mobile phone app – who do I contact?
If you have a problem with the mobile phone app, please e-mail email@example.com.
I have a problem with the website – who do I contact?
If you have a problem with the website, please e-mail firstname.lastname@example.org.