Rapidminer dummy coding mismatch

Dimebag

I'm trying to use a neural network by training it on trainData and then testing on testData, as anyone would do. However, the data requires dummy coding of some nominal features to numerical. When I do that, it trains the neural network but fails when applying it to the test data (on which I apply the exact same transformations/blocks) because of a mismatch in the dummy coding*.

*The error message is in the lines of: v47=H does not exist in testData

I checked and it is true that testData does not have the value 'H' at all in v47, while trainData has it. Therefore, I'd like to ignore this 'H' in v47, or replace it.

Any way I could do this easily? Keeping in mind this might happen with other features as well and going through all the features, one by one, to fix this kind of issue, would be very time consuming.

Perhaps there's another way to tackle this?

Thanks!

Andrew Chisholm

This is similar to a previous post

This answer suggests combining the test and training data to cause all possible values of a nominal to be present then splitting to recover the test and training sets again. The possible additional nominal values will be retained in both splits.

This may not suit so another possibility is to use the Data to Weights operator on the training example set. The resulting weights can then be used with the Select by Weights operator to keep only the attributes of interest in the test example set.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related