README.md 7.44 KB
Newer Older
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
1
# PSEUDo: Pattern Search, Exploration and Understanding in multivariate time series Data
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
2

3
[![](http://img.youtube.com/vi/oJfXoDyZRPY/0.jpg)](http://www.youtube.com/watch?v=oJfXoDyZRPY "")
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
4

Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
5
## Introduction
Yuncong Yu's avatar
Yuncong Yu committed
6
7
8
9
**PSEUDo** is an application to explore large multivariate time series and query interesting patterns
while enabling a clear understanding of what's going on behind the scenes on the machine learning side. 
This is achieved by combining the vast knowledge of domain-experts with the immense processing power of computers, 
creating the interactive machine learning tool called **PSEUDo**.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
10

Yuncong Yu's avatar
Yuncong Yu committed
11
12
13
14
The application consists of three parts:
1. An Angular web application on the client side 
2. A Python Flask backend with REST API on the server side
3. A query-aware locality-sensitive hashing algorithm library in C++
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
15
16
17

## Setup
To run the application, make sure you've installed the following:
Yuncong Yu's avatar
Yuncong Yu committed
18
19
1. Python >=3.5 e.g. from https://www.python.org/downloads/
2. Conda e.g. from https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
20
3. Angular: https://angular.io/guide/setup-local
Yuncong Yu's avatar
Yuncong Yu committed
21
4. Java: https://www.java.com/en/download/help/download_options.html
Yuncong Yu's avatar
Yuncong Yu committed
22
23
5. (Only for Windows) MSVC build tool: https://visualstudio.microsoft.com/
6. (Only for Windows) `make` in git Bash: https://gist.github.com/evanwill/0207876c3243bbb6863e65ec5dc3f058 
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
24

Yuncong Yu's avatar
Yuncong Yu committed
25
### Step 1: Create an environment
26
27
28
29
All dependencies are listed in the *environment.yml* file. To create an environment, run the following command:
`conda env create -f environment.yml`
This will create a conda environment named *pseudo*. Now activate the environment as follows:
`conda activate pseudo`
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
30

Yuncong Yu's avatar
Yuncong Yu committed
31
### Step 2: Set up backend - creating the LSH package
Yuncong Yu's avatar
Yuncong Yu committed
32
33
34
The LSH algorithm is maintained locally for now, so you'll have to create it manually. The file that you need to setup this package is 
located in the backend folder (this is more efficient when debugging, as for every change you have to rebuild the package). 
So the package can be created by running the following code:
Yuncong Yu's avatar
Yuncong Yu committed
35
36
`cd backend/libs`
`python setup.py build_ext --inplace && python setup.py install`
37
38
39
`cd ..`
**NOTE 1**: So as a reminder, don't forget to run the 2nd line everytime you change something in the c++ code.

Yuncong Yu's avatar
Yuncong Yu committed
40
### Step 3: Set up frontend - install Node packages
Yuncong Yu's avatar
Yuncong Yu committed
41
The cloned Angular application cannot be launched directly. You have to install the node packages via
42
`cd frontend`
Yuncong Yu's avatar
Yuncong Yu committed
43
`npm install`
Yuncong Yu's avatar
Yuncong Yu committed
44

Yuncong Yu's avatar
Yuncong Yu committed
45
### Step 4: Launch PSEUDo
Yuncong Yu's avatar
Yuncong Yu committed
46
47
48
As mentioned before, PSEUDo has of a user interface and a server. 
A Makefile is provided to setup both easily. 
Just run the following commands from PSEUDo's root directory for the server and ui respectively:
49
`make server`
Yuncong Yu's avatar
Yuncong Yu committed
50
51
52
53
54
`make ui`
A browser window should automatically be opened when running the UI. If not, visit http://localhost:4200/
**NOTE 1**: Both ui and server needs their own terminal window.
**NOTE 2**: Make sure the pseudo environment is activated (as described in step 1) when running the server, 
otherwise you'll get ModuleNotFound errors.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
55

Yuncong Yu's avatar
Yuncong Yu committed
56
57
58
59
60
61
62
63
64
65
### Step 5: Load data
config.yml configures the global settings in PSEUDo, including the path of the config file of the used dataset.
Some examples of dataset config files can be found under experiments/configs/.

Currently, PSEUDo supports HDF files containing the data plus a JSON file describing the meta-information of the data (espl. the track names).
An example is the combination of data/eeg_eye_state/eeg_eye_state.hdf and data/eeg_eye_state/metadata.json. 
You may want to use [HDFView](https://www.hdfgroup.org/downloads/hdfview/) to explore HDF files and
use the python library [PyTables](https://www.pytables.org/usersguide/tutorials.html) or 
[h5py](https://docs.h5py.org/en/stable/index.html) to convert your data into HDF format.

Yuncong Yu's avatar
Yuncong Yu committed
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
### Tips and comments
This PSEUDo version is a prototype, which is prone to bugs and insufficient functions. 
It is released for evaluation and knowledge exchange. 
We are working on an upgraded version with augmented functionality and thorough tests. 
However, the open-source issue is still under discussion.

You may find the following tips or comments useful: 

- PSEUDo is designed mainly for high-dimensional time series. Its sublinear scalability refers to the asymptotic complexity with respect to the number of tracks;
- PSEUDo's active learning mainly works for high-dimensional time series. Because the mechanism works by attaching larger weights to important tracks. Though updating query with DBA works also in univariate case;
- The track weights and query will be updated in each feedback round, however, the hash functions are initialized rather than incrementally learnt;
- Only positive labels take effect and negative labels are ignored in this version; 
- This version does not support variable query length. Namely, the patterns searched for in the time series should have similar length as the query;
- You may want to use the same query length because the preprocessed data and the estimated LSH parameters are cached. Especially the latter takes some time.
- The loaded tracks may not scale well vertically. Tuning the range slider rescales it properly.  

Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
82
# Documentation
Yuncong Yu's avatar
Yuncong Yu committed
83
## Frontend Views
Yuncong Yu's avatar
Yuncong Yu committed
84
85
86
87
88
The UI is backed up by the Angular framework. 
It consists of views (components), a state service and an API service. 
Every time an API call finishes, the state changes. 
Using hooks (EventEmitters), views can subscribe state variable changes and refresh their view with new values. 
The views are listed below.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
89
### Overview
Yuncong Yu's avatar
Yuncong Yu committed
90
91
shows the entire dataset. 
Upon receiving labels and predictions, it shows the locations of labels and predictions in this overview.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
92
### Query
Yuncong Yu's avatar
Yuncong Yu committed
93
shows the current query.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
94
### Training
Yuncong Yu's avatar
Yuncong Yu committed
95
96
97
shows the sampled predictions. 
In this view the user can label the samples as correct or incorrect. 
When the user is satisfied and klick the "Train" button, the labels will be updated and new samples will be generated.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
98
### Progress
Yuncong Yu's avatar
Yuncong Yu committed
99
100
shows the progress of the learned classifier. 
It shows whether the classifier is getting better at understanding the desired pattern
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
101
### Labels
Yuncong Yu's avatar
Yuncong Yu committed
102
shows the currently labeled windows. The user should be able to change and delete labels in this view.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
103

Yuncong Yu's avatar
Yuncong Yu committed
104
## Backend API
Yuncong Yu's avatar
Yuncong Yu committed
105
106
### /load-data
reads time series data from a file and returns the values and indices.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
107
### /create-windows
Yuncong Yu's avatar
Yuncong Yu committed
108
reads time series data and chunks it into windows. For now the windows are saved to a file locally.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
109
### /get-lsh-parameters
Yuncong Yu's avatar
Yuncong Yu committed
110
111
112
calculates the necessary LSH parameters 
- envolope `r`, 
- mean distances of all samples pairs `a` and 
Yuncong Yu's avatar
Yuncong Yu committed
113
- standard deviation of all sample pairs `sd` 
Yuncong Yu's avatar
Yuncong Yu committed
114
115

based on the dataset.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
116
### /initialize
Yuncong Yu's avatar
Yuncong Yu committed
117
118
119
120
starts the initial iteration of the LSH algorithm. 
First some essential parameters are calculated. 
Then the LSH algorithm is called. 
The API returns the candidates, distances and hash functions.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
121
### /weights
Yuncong Yu's avatar
Yuncong Yu committed
122
calculates the new weights for the hash function distributions, based on the relevance feedback given by the user.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
123
### /update
Yuncong Yu's avatar
Yuncong Yu committed
124
125
runs the LSH algorithm with weights that will manipulate the hashing functions. 
The API returns the candidates, distances and hash functions.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
126
### /query
Yuncong Yu's avatar
Yuncong Yu committed
127
128
129
returns the query data based on the provided window indices. 
If only one index is given, the API call will return the window values according to the index. 
If multiple indices are given, the DBA-based average of the indexed windows is returned.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
130
### /window
Yuncong Yu's avatar
Yuncong Yu committed
131
simply returns the window values according to the index.
Kruyff,D.L.W. (Dylan)'s avatar
Kruyff,D.L.W. (Dylan) committed
132
### /table-info
Yuncong Yu's avatar
Yuncong Yu committed
133
134
135
returns extra info needed for the progression view. 
The input will be a subdivision of windows into buckets, 
and for each bucket the prototype (average) will be calculated + the distances between each prototype.