Juan Manuel Dato

Acquirability of a language - To determine how a text is easy to acquire for no-speakers

by
This JavaScript code generates a metric referential cohesion to evaluate how self-referenced the words in the text are to learn which categories they belong to without using a corpus or prior knowledge. The language may be completely unknown.

Add a comment

Replies

Best
Juan Manuel Dato
The code is in Spanish, but it is not a real problem considering it is javascript. It can be considered like a tokenization, or a technique to guess the entities in a complete unknown language. In this first version not all languages could be guessed, but this is a first approximation about how it works: the idea of a register, categories and accent of a language.