Cocoa to open source with morpheme analyzer based on deep learning technology



[ad_1]

Cocoa released an open source morphological badyzer using artificial intelligence technology.

"Khaiii" (Kakao Hangul III badyzer, Kai) is a technology to badyze a morpheme with the help of data acquired during an in-depth learning. This is the first time in Korea that a morphological badyzer based on deep learning technology is provided in open source.

The morphological badysis technique is a technology that automatically separates morphemes into units of morphemes, which is the smallest unit of language units with two or more letters and having meaning when entering words or of sentences. If you enter "Go to School", it is a technique to identify and clbadify morpheme units and parts of the speech with "school (name) +" (inquiry) + (verb) + (mother) ". It is mainly used as a core technology for natural language processing application services and for various services such as information retrieval, machine translation, smart speakers and chatbot.

Kai badyzes syllable-based morphology using Convolutional Neural Network (CNN) technology, one of the deepest techniques. Based on data from the Sejong Corpus distributed by the National Institute of Korean Language, Cocoa corrected data errors and added data built to Cocoa to improve accuracy by learning data from 850,000 sentences and 10 million words. In addition, the C ++ language was applied during the process of in-depth badysis and the speed of badysis was fast without using the graphics processor, usually used for in-depth execution.

Cocoa is hoping to use kai in various Korean-language areas, such as Korean research technology, chatbot and natural language processing, to reduce the cost and duration of morphological badyzes.

Kai is available for free for all users of the feather hub.

"We are going to open a morphological badyzer to help people who study Korean," said Kim Eun-gyun, head of natural language processing in Cacao. "We will continue to study and share in-depth technological research in various fields."

[ad_2]
Source link