dm.cs.tu-dortmund.de/mlbits/text-mining-vector-space-model/
Vector Space Model – Lecture Notes
results, normalize words: birds → bird , gets → get . Discard . and , .
Document-Term Matrix
Dim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
a
and
bed
bird
but
cheese
early
feather [...] that
the
to
together
wealthy
wise
worm
Doc 1
1
1
1
1
1
1
Doc 2
1
1
1
1
1
1
2
1
Doc 3
1
1
1
1
1
2
Doc 4
1
2
1
2
1
1
1
1
2
1
1
Note: for a large document collection, we will have thousands of dimensions!
Similar [...] probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval . 3, 4 (2009), 333–389. DOI: 10.1561/1500000019
[SaBu88]
Salton, G. and Buckley, C. 1988. Term-weighting approaches …