A memory effecient TF-IDF project in Python to vectorize datasets large than RAM

A memory effecient TF-IDF project in Python to vectorize datasets large than RAM










Re-designed at C++ level, this library can easily process datasets around 100GB and beyond on as small as a 4GB memory

It does have its constraints but the outputs are comparable to sklearn’s output

fasttfidf

submitted by /u/mrnerdy59
[link] [comments]






/u/mrnerdy59





Go to original source





Posted

in

,

by