Please click on one of the questions below

 

General

 

What is a corpus?
What is CorCenCC?
How do I use the corpus?

Technical

I would like to use the mobile phone app but don’t own an iphone – what can I do?
I have a problem with the mobile phone app – who do I contact?
I have a problem with the website – who do I contact?

What is a Corpus?

Essentially, a corpus is a database of words. A corpus is not the same as a dictionary. When a user searches for a word in a corpus, they are able to see many examples of the words in their original context, as they were used by the original author or speaker. As a corpus is an electronic resource, users can also find out, for instance, how frequently a specific word is used, or create tailored quizzes to help with language learning. In a nutshell, a corpus is a valuable electronic tool which allows us to better understand our language.

Back to top

What is CorCenCC?

CorCenCC is an inter-disciplinary and multi-institutional project that will create a large scale, open source corpus of contemporary Welsh language. CorCenCC will break new ground both as a language resource and as a model of corpus construction. It will be the first large-scale corpus of Welsh representative of language use across communication types (circa 4m spoken words, 4m written, 2m e-language), genres, language varieties (regional and social) and contexts, with contributors representative of over half a million Welsh speakers in the UK. It will also forge transformative methods for corpus creation, impact and sustainability.

The creation of CorCenCC will be community-driven, harnessing opportunities afforded by mobile technologies, specifically crowdsourcing and community collaboration. Impact will be generated through a user-informed design, so that basic corpus functionalities for the querying of language use can be integrated into a bespoke toolkit for teachers and learners (within this project) and interface specifications for other user groups (e.g. translators, publishers, policy-makers, language technology developers, academics and others) beyond the project.

CorCenCC is funded by the ESRC and AHRC (Grant Ref ES/M011348/1). The project will be led by Dawn Knight, at the Centre for Language and Communication Research, Cardiff University. The academic project team comprises:

  • Dawn Knight, Cardiff University (School of English, Communication and Philosophy)
  • Irena Spasic, Cardiff University (School of Computer Science and Informatics)
  • Jonathan Morris, Cardiff University (School of Welsh)
  • Tess Fitzpatrick, Swansea University (Department of Applied Linguistics)
  • Steve Morris, Swansea University (Department of Welsh)
  • Alex Lovell, Swansea University (Department of Welsh)
  • Paul Rayson, Lancaster University (School of Computing and Communications)
  • Enlli Thomas, Bangor University (School of Education)

Other contributors and collaborators include computer programmers, Welsh language experts and a range of external stakeholders including the Welsh Government, National Assembly for Wales, Welsh Joint Education Committee, Welsh for Adults, Gwasg y Lolfa, and University of Wales Dictionary of the Welsh Language.

See also the About section of this website

Back to top

How do I use the corpus?

Once the corpus has been published, we will provide detailed instructions on how to use it and its associated tools.  Also see the Explore the Corpus section of this website.

Back to top

I would like to use the mobile phone app but don’t own an iPhone or iPad – what can I do?

Good news! We have now developed an Android version of the app – see our App page for more information. It is also possible to contribute data via our web app: app.corcencc.org.

Back to top

I have a problem with the mobile phone app – who do I contact?

If you have a problem with the mobile phone app, please e-mail tech@corcencc.org.

Back to top

I have a problem with the website – who do I contact?

If you have a problem with the website, please e-mail tech@corcencc.org.

Back to top