Automatically translated from Basque, translation may contain errors. More information here. Elhuyarren itzultzaile automatikoaren logoa

Latxa: Hitz creates the largest and free linguistic model in Basque

  • Recently the great model of free Catalan language called Aina Flor was introduced, and in the news last week we said that the director of the Hitz Basque Centre, Eneko Agirre, announced that he was also coming in Euskera shortly. And just yesterday, the Hitz Center became public. Latxa. LLM is a great linguistic model, a superdatabase on which artificial intelligence initiatives are based. LLMs are the basis for OpenAI ChatGPT versions, for example. Now we have one of these, in Basque (well, lots of real models, made up of 3 corpus).
Artikulu hau CC BY-SA 3.0 lizentziari esker ekarri dugu.

30 January 2024 - 07:30

According to Hitz Zentroa "is the family of open models" Latxa, which includes the "largest linguistic model in Basque". It is built on the linguistic model Meta or Facebook Llama 2 and follows its license. Llama 2 has already seen excellent results in Basque, able to perform a correct oral machine translation in Basque via the product Seamless M4T. Latxa’s logo is precisely the one that links Llama and the Basque sheep, although there is also a connection in the name (as we thought).

Latxa collects models of between 7 and 70 billion parameters. Regarding the set of texts for the construction of models, Basque researchers have used EusCrawl, a set of texts in Basque of 1.72 million documents and 288 million words. EusCrawl was extracted from 33 quality websites, offering higher quality than other corpus training techniques from the Internet.

In fact, Latxa has not been done for the general public, that will come later. However, the three models are available on the Huwaukee Face platform and can be used by the expert engineer by checking the “model card”, where the instructions for technical information and initiating the use of the models are located.

The development of Latxa has been the result of a research, innovation and development initiative, which is part of the IKER-GAITIK project, supported by the Basque Government, in cooperation with the European EuroHpc programme.

Today's language models have amazing performance, like English ChatGPT or English Bard. However, in the case of minority languages and the Basque language no. With these models he took a step in the session of Hitz Zentroa to turn the situation around, and according to his data, Latxa responds better than other systems to formulations in Basque.

More information, here.

In Hugginface: Latxa.


You are interested in the channel: Adimen artifiziala
Israel leaves the killing of Palestinians in the hands of artificial intelligence
The headline is read and someone thinks it is an exaggeration, an excessive generalization of the journalist. 'Lavender': The AI machine directing Israel's bombing spree in Gaza. These are the six people who have been in Gaza since 7 October last year, following artificial... [+]

2024-04-17 | Reyes Ilintxeta
Elisabeth Pérez. Proponent of creation
"Artificial Intelligence is a tool of the future, but its essence is to steal creations from the past"
I knew the work of the creators Elisabeth at the bookstore congress held in March in Pamplona, passionately defending creative artificial intelligence. Soon after we stayed to interview before going to the Bologna Book Fair and Colombia. I recognize that I imagined the work of... [+]

Are civil rights in danger in Europe?

On 8 December 2023, the European Union (EU) approved the first comprehensive regulation of artificial intelligence, but according to an internal document acquired by the Political Weekly "an irresponsible and disproportionate use of biometric identification technology, such as... [+]


New forms of digital violence: Unaccepted synthetic pornography

In the digital age, we have more and more examples of how technology affects human intimacy and alarming phenomena are occurring. Last synthetic pornography not accepted. This term refers to manipulating images or videos through artificial intelligence. To create pornographic... [+]


Eguneraketa berriak daude