Lead Data Scientist, Integrated Systems
Applying Topic Segmentation to Document-Level Information Retrieval
October 13, 12:30
In the present paper we discuss how text segmentation could be applied in the information retrieval domain. We assume that topic text segmentation allows one to better model text structure and therefore language itself, which influences the quality of text representation. We test the initial hypothesis by conducting experiments with several baseline models on the arXiv dataset comparing their quality on whole texts and on segmented texts. The experiments demonstrated that, indeed, the quality of retrieval is generally slightly improved.