Skip to content
Snippets Groups Projects
Commit d50f862c authored by Ben,S.W. van der (Sinie)'s avatar Ben,S.W. van der (Sinie) :speech_balloon:
Browse files

Update COMPARISON.md. Still needs an update when the tool is finished. Can say...

Update COMPARISON.md. Still needs an update when the tool is finished. Can say more about the differences then. 
parent dd97d291
No related branches found
No related tags found
1 merge request!12docs: adding open source documentation and readme file for better outside communication
......@@ -12,23 +12,23 @@ The most important libraries/tools we have looked at to create PROVEE:
- Whatlies
- Parallax
Libraries and tools were examined for their scalability, user-friendliness, responsiveness and advantages and disadvantages.
Libraries and tools were examined for their scalability, user-friendliness, responsiveness and advantages and disadvantages. We were also interested in their back-end and data storage and transfer.
##### Vec2graph
Vec2graph is a library by [Katricheva et al.(2020)][vec2graph] for visualizing word embeddings as graphs. The 2D graph is created based on cosine similarity between points and between neighbours. Graphs can contain nodes that link to other graphs and are displayed using an .html file. Vec2graph shines in the ease of use, but the projections are not suited to display many data points at once.
##### Whatlies
[Whatlies][df1] is a project developed by Rasa. The goal of the project is to create an API that supports many languages, such as SpaCy, Gensim and FastText. Whatlies enables users to easily display data in interactive 2D graphs. The axes can be defined by dimension reduction methods, such as PCA or UMAP, but also special queries. For instance, 'man' can be the y-axis and 'woman' can be the x-axis. The special thing about Whatlies is the support for vector arithmetic on embeddings, which can be visualized directly in the interactive plots. It has a high scalability of the input, but the overall goal is to visualize smaller groups of words. The drawback is that the library depends on a lot of backends, which can be problematic when packages get updated.
[Whatlies][df1] is a project developed by Rasa. The goal of the project is to create an API that supports many languages, such as SpaCy, Gensim and FastText to display word or sentence embeddings. Whatlies enables users to easily display data in interactive 2D graphs. The axes can be defined by dimension reduction methods, such as PCA or UMAP, but also special queries. For instance, 'man' can be the y-axis and 'woman' can be the x-axis. The special thing about Whatlies is the support for vector arithmetic on embeddings, which can be visualized directly in the interactive plots. It has a high scalability of the input, but the overall goal is to visualize smaller groups of words. The drawback is that the library depends on a lot of backends, which can be problematic when packages get updated. Furthermore, the tool requires little programming knowledge, but has a clear documentation on their Github.io page.
##### Tensorflow Embedding Projector
The [Embedding Projector][tens] is part of the Tensorboard. It can graphically represent embeddings in a 2D or 3D space. These embeddings can be anything, as long as they can be converted to a tab separated file. The tool has a high scalability and is suited to display many data points at once. The user can interactively explore the embedding space, varying many parameters, easily switching from PCA to UMAP, 2D to 3D. A disadvantage of the tool is the limitation of dimensionality reduction methods as preprocessing method to display the data points.
##### Parallax
Last but not least, there is [Parallax][par]. Parallax is a tool to display embedding spaces, suited for many embeddings with high dimensions. The tool is suited to display many data points at once. Most interesting is the article accompanying the tool of [Molino et al. (2019)][parallax]. The axes can be obtained using PCA or t-SNE, but they propose a Cartesian approach to the specification of the axes: axes that are the result of algebraic formulas on these vectors. This results in axes that can be the average of two words or the most frequently occuring word in the data set. The major drawback of the tool is the slow loading and reaction time when using >10.000 points.
Last but not least, there is [Parallax][par]. Parallax is a tool to display word embedding spaces, suited for many embeddings with high dimensions. The tool is suited to display many data points at once. Most interesting is the article accompanying the tool of [Molino et al. (2019)][parallax]. The axes can be obtained using PCA or t-SNE, but they propose a Cartesian approach to the specification of the axes: axes that are the result of algebraic formulas on these vectors. This results in axes that can be the average of two words or the most frequently occuring word in the data set. The major drawback of the tool is the slow loading and reaction time when using >10.000 points. Another difficulty of the tool is the required programming exprience. Some knowledge of programming is required.
## PROVEE
The current tool of interest, PROVEE, tries to overcome all the major drawbacks of the studied tools, by designing a user-friendly, responsive tool, suitable for many embeddings with high dimensionality.
Our current project PROVEE tries to overcome all the major drawbacks of the studied tools, by designing a user-friendly, responsive tool, suitable for many embeddings with high dimensionality. The embeddings can be anything. Not only word embeddings, but image embeddings, DNA embeddings, sentence embeddigns, you name it. Programming experience is not required, which makes the tool easy to use.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment