News (2017)


The CorCenCC team are pleased to announce that we have been awarded funding from the Welsh Government’s Grant Cymraeg 2050 scheme for work on a project entitled WordNet Cymraeg.

The aim of the project is to automatically construct a WordNet for Welsh, a lexical database in which words are grouped into sets of synonyms (synsets), which are then organised into a network of lexico-semantic relationships. WordNets are widely used in natural language processing (NLP) to support understanding of meaning expressed in written and spoken language. As such, WordNet is vital for language technology applications such as question answering, information retrieval and machine translation.

These technologies are vital for development of user-friendly interfaces of smartphone and smart home apps, which will drive the use of Welsh-medium digital technology for Cymraeg 2050. By linking the WordNet Cymraeg project to the CorCenCC project, we will re-use its sustainability and engagement plans to increase the visibility and ensure the long-term future of the WordNet Cymraeg. Public engagement activities include:

  • A social media campaign will be carried out to advertise the project and encourage users to test functionalities.
  • Regional road shows/workshops will be held at schools, libraries and community centres/ Mentrau Iaith to raise potential users’ awareness of the WordNet and to provide basic training of its utilities.

WordNet Cymru will be led by Professor Irena Spasic, working with Dr Dawn Knight and Dr Steven Neale.


17/11/2017 – CorCenCC runs the only Welsh medium event in the 2017 ‘Being Human’ Festival

On Friday 17 November, Jenny Needs and Steve Morris went to the Ty’r Gwrhyd ‘Canolfan Gymraeg’ in Pontardawe to hold the only Welsh medium event at this year’s ‘Being Human’ Festival. This is the UK’s only national festival of the humanities and the only hub the festival has in Wales is in Swansea. The festival is led by the School of Advanced Study, University of London in partnership with the British Academy and the Arts and Humanities Research Council.

The name of the CorCenCC Welsh medium session was “Rho dy Gymraeg i ni / We want your Welsh!” and it was a fantastic opportunity to collect hours of spoken data through experimenting with the ‘Gogglebox’ television programme model and asking participants to give a live reaction to short films.

It was also an opportunity to engage with the public in the Swansea Valley and show them the app (as Jenny is doing in the picture). There was a good – and lively – response to the films from the ‘sofa critics’ and this is definitely a way of collecting data which we will look to use again in the future.


06/11/2017 – CorCenCC Away Day

On the 13th of November, the WP1 team (the PI, Co-Investigators and RAs) met up in Swansea for an away day. This meeting was multifunctional; it gave the team some much needed time to catch-up face-to-face, to take stock and positively reflect on the progress we have made so far, and to prioritise and plan the remaining months of the project. The meeting was productive and a good way to introduce new members of the team. To break the ice, our communication skills were tested by completing interactive communication tasks using images – in which we completed in record time. But the main focus was on how to stream line handling Big Data, and identifying the challenges of data collecting and possible ways of limiting them. Overall, the outcomes were positive, and we have already started implementing changes to data collection methods, such as the use of a web scraper to automate the extraction of e-language texts.



01/08/2017 – Would you be interested in working as a CorCenCC transcriber?

As you know, we have been busy recording Welsh being spoken up and down the country. Work has begun on transcribing the recordings, but we are now looking for more transcribers – would you be
interested? The work is flexible (you can work whenever suits you, and do as many/few hours as you wish) so it is easy to fit in around other activities, and the recordings are interesting and varied – one day you might be transcribing a lecture or sermon, and the next day a lively conversation down the pub! If you’d be interested in joining our team of transcribers, please email for more information.


17/02/2017 – CorCenCC Crowdsourcing App launch

To coincide with the launch of the website, February also witnessed the launch of the first release of the CorCenCC crowdsourcing app. The app is currently available on iOS and an Android version will be released within the next two-four months (keep an eye out for that!).

News of the app release was featured on the websites of all partner institutions, on tech websites and in Y Cymro and the Denbighshire Free Press (amongst others). We are hoping that by spreading the word about the app and project, we can raise people’s awareness of the importance and value of the work, and get as many people as possible involved in contributing data and/or using the corpus when it is finally constructed.


28/02/2017 – Project launch

To celebrate a successful first 12 months of the project, the CorCenCC team hosted a launch event at the Pierhead Building in Cardiff Bay. Scaffolded by a weighty media campaign, which included radio interviews on the BBC’s Good Morning Wales programme (PI Dawn Knight) and BBC Radio Cymru’s Post Cyntaf (Ambassador Nia Parry) and print and online press coverage in various outlets (including the BBC and Mail Online, institutional websites and tech blogs, amongst others), the event aimed to act as a springboard for engaging with the public, policy makers, educators, publishers and the media; raising awareness about the project and encouraging individuals to support the work.


The launch, attended by Alun Davies AM, Minister for Lifelong Learning and Welsh Language, gave guests the chance to find out more about the project, which is a collaboration between Cardiff, Swansea, Lancaster and Bangor universities, and is breaking new ground in creating a large-scale, open access corpus of contemporary Welsh language. Backed by high-profile ambassadors poet Damian Walford Davies, musician and presenter Cerys Matthews, broadcaster Nia Parry and international rugby referee Nigel Owens CorCenCC is community-driven and uses mobile and digital technologies to enable public collaboration. A demonstration of our new data collection app which enables Welsh speakers from all walks of life to contribute to the project, was on show at the event. CorCenCC partners and ambassadors also shared their impressions of how the resource will impact on their research, and on the Welsh language community more widely.

Alun and co

Alun Davies, Steve Morris, Dawn Knight, Bethan Jenkins and Tess Fitzpatrick

Minister for Lifelong Learning and the Welsh Language, Alun Davies, said: “I am very pleased to attend the launch of this exciting project today. Not only will this work give us a real record of how Welsh is actually being used, but it will also feed into our aim of developing the role of the Welsh language in technology which will be key if we are to meet our target of a million Welsh speakers by 2050.”


The CorCenCC team

Around 85 people attended the launch and the evening also marked the first time that the majority of the extended CorCenCC team were assembled in the same place together! The launch was sponsored by funds from the British Council, the School of English, Communication and Philosophy at Cardiff University, and the Research Institute for Arts and Humanities at Swansea University – many thanks for your support!

 01/03/17 – Whole Project Team meeting

Hot on the heels of the launch event, we held the first Whole Project Team meeting at Cardiff University on St David’s Day. The meeting, which will take place annually, brings together the CorCenCC Project Team (CPT – which comprises the PI, all CIs, RAs and PhD students), Consultants and all members of the Project Advisory Group, and is a great opportunity for the team to get to know each other a little better (face-to-face) and to discuss ideas and future plans. The aim of the meeting was to provide specific work package (WP) updates, to consider and discuss potential routes to engagement for the project as a whole (concentrating on input mainly from the Project Advisory Group) and to think about how we can best push the boundaries in current corpus research with future developments on CorCenCC.


We would like to say a big thank you to all of you who travelled far and wide to attend this meeting – we all thought it was a very successful and engaging meeting and is likely to provide us with an added strength in ideas and motivation to fuel the next steps of development on the project. We are looking forward to having you all back in Cardiff for the meeting in 2018!


CorCenCC newsletter – previous editions

Subscribe to our project newsletter

Enter your e-mail address in the form below then click the ‘Subscribe’ button