[Paper Review]ColBERT

Updated: 2024.11.08 Updated: 2024.11.08

Categories: NR

Background

Inverted Index Search Engine

While it’s essential to search for candidate documents, the task of ranking the documents ultimately shown to the user is also crucial. Datasets like HotpotQA, MuSiQue, and 2WikiMultiHopQA already have candidate documents assigned, making the ranking process essential.A traditional ranking method commonly used is BM25. BM25 is similar to TF-IDF but incorporates document length into its ranking function.

It considers both how frequently a term appears in a given document (with a higher frequency increasing the score) and how many documents contain that term (with a higher count reducing the score). BM25 can be precomputed and stored in an inverted index search engine, making it highly suitable for search engines that manage large volumes of documents and prioritize performance.

An Inverted Index search engine is a method used in document search systems to efficiently retrieve data. Unlike a traditional index, which records the location of words within documents, an inverted index records each word and the list of documents where it appears. This enables the quick retrieval of documents containing specific terms when searched by users.

In an inverted index, the key is a word, and the value is the list of documents where the word appears or information on its location within the documents. For example, assume we have the following three documents:

Document 1: “The cat is cute.”
Document 2: “The dog is loyal.”
Document 3: “The cat and dog are friends.”

Meaningful

[Paper Review]ColBERT

Background

Inverted Index Search Engine

NR 카테고리 내 다른 글 보러가기

Comments

최근 글 10 개 :)