Trouble to load Big data set

Moderator: NorbertKrupa

Post Reply
Nsingh45
Newbie
Newbie
Posts: 3
Joined: Wed May 11, 2016 12:35 pm

Trouble to load Big data set

Post by Nsingh45 » Thu May 12, 2016 3:26 pm

Hi all,I have a dataset with 10000 features (terms) with tf_idf values.
when I try to load the data (which is in C45 format) I get the following error:
data = orange.ExampleTable(file)
SystemError: C45ExampleGenerator: line 1 of file '../features/pubmed.data' too long
So I reduced the feature space to 5000 by use of term frequency treshold (>2).
But it doesn't work neather.
Does Orange have a problem with 5000 features and if so, where is its limit?
best regards,

sonamjain4715
Newbie
Newbie
Posts: 1
Joined: Fri May 13, 2016 6:09 am

Re: Trouble to load Big data set

Post by sonamjain4715 » Sat May 14, 2016 4:42 am

you can have much more than 10000 features. The problem is that the code for parsing the files in C4.5 is old and ugly, and limits the length of lines to 10000 characters. This shouldn't be difficult to fix, so we'll do it some day soon. Till then, you can just convert your data to tab-delimited format

Post Reply

Return to “R Language Integration”