Fitting a Support Vector Classifier in scikit-learn with image data produces error

Jon

I'm trying to train an SVC classifier for image data. Yet, when I run this code:

classifier = svm.SVC(gamma=0.001)
classifier.fit(train_set, train_set_labels)

I get this error:

ValueError: setting an array element with a sequence.

I produced the images into an array with Matplotlib: plt.imread(image).

The error seems like it's not in an array, yet when I check the types of the data and the labels they're both lists (I manually add to a list for the labels data):

print(type(train_set))
print(type(train_set_labels))

<class 'list'>
<class 'list'>

If I do a plt.imshow(items[0]) then the image shows correctly in the output.

I also called train_test_split from scikit-learn:

train_set, test_set = train_test_split(items, test_size=0.2, random_state=42)

Example input:

train_set[0]

array([[[212, 134,  34],
    [221, 140,  48],
    [240, 154,  71],
    ..., 
    [245, 182,  51],
    [235, 175,  43],
    [242, 182,  50]],

   [[230, 152,  51],
    [222, 139,  47],
    [236, 147,  65],
    ..., 
    [246, 184,  49],
    [238, 179,  43],
    [245, 186,  50]],

   [[229, 150,  47],
    [205, 122,  28],
    [220, 129,  46],
    ..., 
    [232, 171,  28],
    [237, 179,  35],
    [244, 188,  43]],

   ..., 
   [[115, 112, 103],
    [112, 109, 102],
    [ 80,  77,  72],
    ..., 
    [ 34,  25,  28],
    [ 55,  46,  49],
    [ 80,  71,  74]],

   [[ 59,  56,  47],
    [ 66,  63,  56],
    [ 48,  45,  40],
    ..., 
    [ 32,  23,  26],
    [ 56,  47,  50],
    [ 82,  73,  76]],

   [[ 29,  26,  17],
    [ 41,  38,  31],
    [ 32,  29,  24],
    ..., 
    [ 56,  47,  50],
    [ 59,  50,  53],
    [ 84,  75,  78]]], dtype=uint8)

Example label:

 train_set_labels[0]

 'Picasso'

I'm not sure what step I'm missing to get the data in the form that the classifier needs in order to train it. Can anyone see what may be needed?

Joshua Simon Tarcisio Fenech

The error message you are receiving:

 ValueError: setting an array element with a sequence,

normally results when you are trying to put a list somewhere that a single value is required. This would suggest to me that your train_set is made up of a list of multidimensional elements, although you do state that your inputs are lists. Would you be able to post an example of your inputs and labels?

UPDATE Yes, it's as I thought. The first element of your training data, train_set[0], corresponds to a long list (I can't tell how long), each element of which consists of a list of 3 elements. You are therefore calling the classifier on a list of lists of lists, when the classifier requires a list of lists (m rows corresponding to the number of training examples with each row made up of a list of n features). What else is in your train_set array? Is the full data set in train_set[0]? If so, you would need to create a new array with each element corresponding to each of the subelements of train_set[0], and then I believe your code should run, although I am not too familiar with that classifier. Alternatively you could try running the classifier with train_set[0].

UPDATE 2

I don't have experience with scikit-learn.svc so I wouldn't be able to tell you what the best way of preprocessing the data in order for it to be acceptable to the algorithm, but one method would be to do as I said previously and for each element of train_set, which is composed of lists of lists, would be to recurse through and place all the elements of sublist into the list above. For example

new_train_set = []
    for i in range(len(train_set)):
        for j in range(len(train_set[i]):
        new_train_set.append([train_set[i,j])

I would then train with new_train_set and the training labels.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Plot Confusion Matrix with scikit-learn without a Classifier

How to upgrade the classifier to the latest version of scikit-learn

Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

Python scikit-learn: exporting trained classifier

Save classifier to disk in scikit-learn

What is the theorical foundation for scikit-learn dummy classifier?

Grid Search parameter and cross-validated data set in KNN classifier in Scikit-learn

Finding mixed degree polynomials in Scikit learn support vector regression

Save classifier to postrgesql database, in scikit-learn

ROC curve for discrete classifier using scikit learn

scikit learn averaged perceptron classifier

Scikit Learn OneHotEncoder fit and transform Error: ValueError: X has different shape than during fitting

What is the classifier used in scikit-learn's VotingClassifier?

Strange error in fitting classifier

'Multiclass-multioutput is not supported' Error in Scikit learn for Knn classifier

How to build my training data in my case to train a SVM in classifier in scikit-learn?

scikit learn classifier with mixed type features returns 0% accuracy with test data

Error plotting scikit-learn dataset training and test data

Error in fit method of scikit learn chain classifier with a keras model for a multilabel problem

Evaluating convergence of SGD classifier in scikit learn

Use scikit-learn to predict data vector "x" given "y"?

Support Vector Machines in scikit python

scikit fitting data error

What is the set of negative data points for each classifier when using OneVsRest classification in scikit-learn?

Low R^2 Score for Support Vector Regression on SciKit-Learn Diabetes Dataset

Is Scikit Learn's Support Vector Classifier hard margin or soft margin

Can scikit-learn 'dummy classifier' be applied to multiclass scenario

scikit-learn: fitting KNeighborsClassifier without labels

adding more data to Support Vector Classifier training