NIM is a search engine designed to look for the collection of psycholinguistic research materials. Its purpose is to simplify the task of researchers during the preparation and design of experiments.

The engine is available in English, Spanish, and Catalan.


Related links:

Logo Grupo de Investigación en Psicolingüística Logo Departament de Psicologia Logo Universitat Rovira i Virgili

Credits:
Marc Guasch E-mail
Antonio Masip
Enric Sunyer
Roger Boada

About NIM

NIM is a web application designed to facilitate the task of collecting materials for experiments involving lexical stimuli. It is composed of three lexical corpora:

 

Spanish:

The LEXESP database (Sebastián-Gallés, Martí, Carreiras, & Cuetos, 2000), a Spanish corpus of 5,629,279 words. The version used here contains 135,725 words with a totla length between 1 and 26 letters, and a relative frequency per million between 0.18 and 47,025 per million.

The source materials sampled to build the corpus was composed of:

• 40% of narrative texts (i.e., novel).
• 40% of press (including newspapers, sport magazines, and news magazines).
• 10% of popular science magazines.
• 10% of essays.

The time window of the source material ranged from 1978 to 1995.

 

Catalan:

The Corpus Textual Informatitzat de la Llengua Catalana (CTILC; Rafel, 1998), a Catalan corpus composed of 51,253,669 words. The version employed here contains 408,815 words with a length between 1 and 25 letters and a relative frequency between 0.02 and 48,581 per million.

The source materials sampled to build the corpus was composed of:

• 49% of informative writings (e.g., sciences, arts, religion, philosophy...).
• 44% of literary writings (i.e., mainly narrative texts, but also drama, essay and poetry).
• 7% of other non-literary writings (e.g., press or personal letters).

The time window of the source material ranged from 1833 to 1988, although the vast majority of the sampled texts was posterior to 1914.

 

English:

The British National Corpus (BNC; The British National Corpus, 2007), an English corpus of 98,119,624 words. Our version contains 257,504 words between 1 and 26 letters and a relative frequency between 0.02 and 61,702 per million.

The source materials sampled to build the corpus was composed of:

• 60% of books.
• 25% of periodicals.
• between 5 to 10% of miscellaneous published materials (e.g., brochures, advertising texts, etc.).
• 5 to 10% of unpublished written materials (e.g., personal letters, essays, etc.).
• less than 5% of written speeches, scripts...

Of all this, 75% of the materials were obtained from informative writings (sciences, arts, world affairs…) and the remaining 25% were literary and creative works.

The time window of the source material ranged from 1964 and 1993.

 

These corpora use different codes and frequency values, but the versions employed here have been standardised into a single format as to facilitate inter-language comparison.

 

We would like to thank the following people and institutions for the help and assistance they provided during the preparation of the NIM search engine:

• Núria Sebastián-Gallés generously made the LEXESP database accessible to us.

• The Institut d'Estudis Catalans kindly granted permission for the use of their frequencies dictionary.

• Adam Kilgarriff provided invaluable assistance with the BNC data.

 

For any questions regarding NIM, send us an e-mail to: marc.guasch@urv.cat.