Embedding means converting data to a feature representation where certain properties can be represented by notions of distance.
거리의 개념으로서 표현될 수 있는 특성을 지닌 feature representation 으로 데이터를 바꾸는 것.
거리개념은 metric learning에서 자주 보인다.
Example, a model trained on speech signals for speaker identification, may allow you to convert a speech snippet to a vector of numbers, such that another snippet from the same speaker will have a small distance (e.g. Euclidean distance) from the original vector. Alternately, a different embedding function, might allow you to convert the speech signal on the basis of the word that is said in the signal. So you will get small Euclidean distance between the encoded representations of two speech signals if the same word if spoken in those snippets. Yet again, you might simply want to learn an embedding, that represents the “mood” of the speech signal e.g. “happy” vs “sad” vs “angry” etc. A small distance between encoded representations of two speech signals will then imply similar mood and vice versa.
Or for instance, word2vec embeddings “project” a word in a space in which Euclidean distances between these words represent semantic similarity (again embedding ~ gives you a vector of numbers for a given word). So if you take the word2vec representation of “Germany”, subtract “Berlin” from it, and add the result to “France”, you get a vector which is very close in Euclidean space to the embedding for “Paris”!
Similarly, in applications where you need to classify into hundreds of thousands or millions of classes, e.g. face recognition, one common way is to use “metric learning” techniques (often Siamese CNNs with so-called contrastive or triplet loss), which at test time allows you to use Nearest Neighbor techniques on the vector representation of faces!
'AI > CV & SSL' 카테고리의 다른 글
[개념정리] Ground resolution (지상해상도) (0) | 2019.11.16 |
---|---|
[개념정리] Ranking Loss Functions & Metric Learning (0) | 2019.11.16 |
[개념정리] Bottleneck (병목현상)이란? (0) | 2019.11.10 |
[개념정리] Logistic Regression과 MLP의 차이 (0) | 2019.11.09 |
[개념정리] downstream task란? (0) | 2019.11.07 |