Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
COMP4650/6490
Document Analysis
Introduction to Information
Retrieval
1
IR Module Overview
Information retrieval (IR) part consists of
four lectures:
1. Introduction to IR + Boolean model
2. Ranked retrieval model
3. Evaluation of IR systems
4. Web search basics
2Introduction to Information Retrieval
Textbook
• Introduction to
information retrieval
• Chapters: 1, 2, 4, 6, 8,
19
Introduction to Information Retrieval 3
Table of Contents
• Lecture Overview
• Introduction to Boolean Retrieval
– Information Retrieval
– Term-Document Matrix
– Inverted Index
– Boolean Retrieval with Inverted Index
– Document Tokenization
4Introduction to Information Retrieval
What is Information Retrieval
5Introduction to Information Retrieval
What is information retrieval?
6Introduction to Information Retrieval
Information Retrieval
• “Information Retrieval (IR) is finding material (usually
documents) of an unstructured nature (usually text)
that satisfies an information need from within large
collections (usually stored on computers).”
Manning et al.
• You may think of web search first, but there are many
other cases
– E-mail search
– Searching your laptop
– Corporate knowledge bases
– Image search, video search
7Introduction to Information Retrieval
Why information retrieval
• Information overload
– “It refers to the difficulty a person can have
understanding an issue and making decisions
that can be caused by the presence of too
much information.” - wiki
8Introduction to Information Retrieval
Why information retrieval
• An essential tool to deal with information
overload
9
You are
here!
Introduction to Information Retrieval
How to perform information retrieval
• Information retrieval when we did not have
a computer
10Introduction to Information Retrieval
Starting Point
• Collection: A set of documents
– Assume it as a static collection for the
moment
• Goal: Retrieve documents with information
that is relevant to the user’s information
need and helps the user complete a task
– User’s information need is often
underspecified
11Introduction to Information Retrieval
Classic Search Model
12
1. User task
2. Info need
3. Query
4. Search
engine
5. Result
6. Query
Refinement
Collection
Introduction to Information Retrieval
Classic Search Model
13
1. User task
2. Info need
3. Query
4. Search
engine
5. Result
6. Query
Refinement
Collection
Listen to music using a
Bluetooth headset
Info about connecting a
Bluetooth headset
CONNECT BLUETOOTH
HEADSET
PAIRING BLUETOOTH
HEADPHONES
Refinement!
Repeat!
Introduction to Information Retrieval
Key Objectives
• Every good IR system needs to achieve
– Scalability
• More than 40 billion pages are indexed by Google
– Accuracy
• Top 10 pages from 40 billion pages?
14Introduction to Information Retrieval
IR vs. NLP
• Information
retrieval
– Computational
approaches
– Statistical (shallow)
understanding of
language
– Handle large scale
problems
• Natural language
processing
– Cognitive, symbolic
and computational
approaches
– Semantic (deep)
understanding of
language
– (often) smaller
scale problems
15Introduction to Information Retrieval
IR and NLP are getting closer