Aim: Aim of this poster is to show benefits of open-source project NoSketch Engine which is widely used for creating language corpora.
Methods: NoSketch Engine combines software Manatee and Bonito into free corpus management system. Manatee is corpus management tool which includes corpus building and indexing, fast querying and providing basic statistical measures and Bonito is a graphical user interface to corpora maintained by Manatee (1). Programs Manatee and Bonito are under GNUGeneral Public Licence version 2 which guarantee that the software is free, users have freedom to share, change and distribute copies of it. Users also can receive source code if they want it and they can change the software or use pieces of it in new free program (2).
Results and Discussion: All corpora which are created using NoSketch Engine provides free access to its language databases for collecting, searching and using data from it. The software is free to download, install, host on server and administer. It is important to mention that new server for corpora available thru NoSketch Engine is CLARIN.SI (Common Language Resource and Technology Infrastructure, Slovenia) and is available at following hyperlink:www.clarin.si/noske. Users can download data and tools with a license that allows free sharing.This applies to all data with Creative Commons and tools with open source licences (3).Graphical content of poster is showing available language corpora (databases) which are created thru NoSketch Engine software. Left figure is showing languages which have the largest number of corpuses (Slovenian has the largest number of corpora available for users) and second figure is showing languages which have one available language corpus.
Conclusion: Benefits which this open source software provide are creating large language database which can be used for language learning, conducting linguistic, lexical and science research and language analysis.