|
|
Thank you but I still have have no clue of how to do that by using Weka
after taking a look at its API. Let me reformulate my problem :
I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from a file) and I do not have the
original files. I would like to calculate TF as well as TFIDF of each term
and sorted them by these value respectively. As suggested by Grant
Ingersoll, I could index those vectors of terms again using Lucene and then
use its API to measure TF and TFIDF. However I guess there should be a
simpler way or API just fit-in this case.
Thanks once again everyone.
Best regards,
Sengly
On 3/28/07, karl wettin <karl.wettin@xxxxxxxxx> wrote:
28 mar 2007 kl. 10.36 skrev Sengly Heng:
> Does anyone of you know any Java API that directly handle this
> problem?
> or I have to implement from scratch.
You can also try
weka.filters.unsupervised.attribute.StringToWordVector, it has many
neat features you might be interested in. And if applicable to what
you attempt to do, the feature selection algorithms of the same
project (Weka) does a great job reducing the data set.
http://www.cs.waikato.ac.nz/ml/weka/
It is GPL.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
|
|