python - How to retain column headers of data frame after Pre-processing in scikit-learn -

- January 15, 2014

i have pandas info frame has rows , columns. each column has header. long maintain doing info manipulation operations in pandas, variable headers retained. if seek info pre-processing feature of sci-kit-learn lib, end losing headers , frame gets converted matrix of numbers.

i understand why happens because scikit-learn gives numpy ndarray output. , numpy ndarray beingness matrix not have column names.

but here thing. if building model on dataset, after initial info pre-processing , trying model, might have more info manipulation tasks run other model improve fit. without beingness able access column header makes hard info manipulation might not know index of particular variable, it's easier remember variable name or doing df.columns.

how overcome that?

edit1: editing sample info snapshot.

    pclass  sex age sibsp   parch   fare    embarked 0   3   0   22  1   0   7.2500  1 1   1   1   38  1   0   71.2833 2 2   3   1   26  0   0   7.9250  1 3   1   1   35  1   0   53.1000 1 4   3   0   35  0   0   8.0500  1 5   3   0   nan 0   0   8.4583  3 6   1   0   54  0   0   51.8625 1 7   3   0   2   3   1   21.0750 1 8   3   1   27  0   2   11.1333 1 9   2   1   14  1   0   30.0708 2 10  3   1   4   1   1   16.7000 1 11  1   1   58  0   0   26.5500 1 12  3   0   20  0   0   8.0500  1 13  3   0   39  1   5   31.2750 1 14  3   1   14  0   0   7.8542  1 15  2   1   55  0   0   16.0000 1

the above pandas info frame. when on info frame strip column headers.

from sklearn import preprocessing x_imputed=preprocessing.imputer().fit_transform(x_train) x_imputed

new info of numpy array , hence column names stripped.

array([[  3.        ,   0.        ,  22.        , ...,   0.        ,           7.25      ,   1.        ],        [  1.        ,   1.        ,  38.        , ...,   0.        ,          71.2833    ,   2.        ],        [  3.        ,   1.        ,  26.        , ...,   0.        ,           7.925     ,   1.        ],        ...,         [  3.        ,   1.        ,  29.69911765, ...,   2.        ,          23.45      ,   1.        ],        [  1.        ,   0.        ,  26.        , ...,   0.        ,          30.        ,   2.        ],        [  3.        ,   0.        ,  32.        , ...,   0.        ,           7.75      ,   3.        ]])

so want retain column names when info manipulation on pandas info frame.

scikit-learn indeed strips column headers in cases, add together them on afterward. in example, x_imputed sklearn.preprocessing output , x_train original dataframe, can set column headers on with:

x_imputed_df = pd.dataframe(x_imputed, columns = x_train.columns)

python numpy pandas scikit-learn

Search This Blog

Five

python - How to retain column headers of data frame after Pre-processing in scikit-learn -

Comments

Post a Comment

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -