The paper examines the formalization of the process of extracting multicomponent terms from Russian-language scientific and technical texts. A predicative approach is used to extract terms, while a grammatical approach is currently more well-developed. We propose a predicate algebra and a system of axioms that make it possible to determine, using axiom statements, whether a sequence of words in a sentence constitutes a multicomponent term. Based on this, an algorithm for automatic term extraction has been developed. The algorithm checks the text for repetitions of a certain sequence of nouns and adjectives and participles, then uses the endings of the identified words and possible prepositions. The algorithm must determine whether each such sequence of words constitutes a term. Examples of the algorithm’s operation are analyzed, including those where terms are extracted, but, due to homonymy, the grammatical properties of individual words are ambiguously determined. Importantly, the algorithm does not use preliminary syntactic parsing of the sentence and is also specific to the Russian language and subject area. It is believed that this approach can be adapted to other contexts, such as a different subject area or a different language, by varying the axiom system. A combination with the syntactic analysis is also possible.
Mastikhina et al. (Wed,) studied this question.