Skip to content

146 spaCy preprocessing

Jasper Hofman requested to merge feature/146-spaCy-preprocessing into main

Setup of the (basic) preprocessing pipeline. Current inconsistencies in the rendered tokens:

  • Words connected with a dash are not taken apart
  • Words with diacritical marks that should not be there may appear

Merge request reports

Loading