python - How to retain column headers of data frame after Pre-processing in scikit-learn -
python - How to retain column headers of data frame after Pre-processing in scikit-learn -
i have pandas info frame has rows , columns. each column has header. long maintain doing info manipulation operations in pandas, variable headers retained. if seek info pre-processing feature of sci-kit-learn lib, end losing headers , frame gets converted matrix of numbers.
i understand why happens because scikit-learn gives numpy ndarray output. , numpy ndarray beingness matrix not have column names.
but here thing. if building model on dataset, after initial info pre-processing , trying model, might have more info manipulation tasks run other model improve fit. without beingness able access column header makes hard info manipulation might not know index of particular variable, it's easier remember variable name or doing df.columns.
how overcome that?
edit1: editing sample info snapshot.
pclass sex age sibsp parch fare embarked 0 3 0 22 1 0 7.2500 1 1 1 1 38 1 0 71.2833 2 2 3 1 26 0 0 7.9250 1 3 1 1 35 1 0 53.1000 1 4 3 0 35 0 0 8.0500 1 5 3 0 nan 0 0 8.4583 3 6 1 0 54 0 0 51.8625 1 7 3 0 2 3 1 21.0750 1 8 3 1 27 0 2 11.1333 1 9 2 1 14 1 0 30.0708 2 10 3 1 4 1 1 16.7000 1 11 1 1 58 0 0 26.5500 1 12 3 0 20 0 0 8.0500 1 13 3 0 39 1 5 31.2750 1 14 3 1 14 0 0 7.8542 1 15 2 1 55 0 0 16.0000 1
the above pandas info frame. when on info frame strip column headers.
from sklearn import preprocessing x_imputed=preprocessing.imputer().fit_transform(x_train) x_imputed
new info of numpy array , hence column names stripped.
array([[ 3. , 0. , 22. , ..., 0. , 7.25 , 1. ], [ 1. , 1. , 38. , ..., 0. , 71.2833 , 2. ], [ 3. , 1. , 26. , ..., 0. , 7.925 , 1. ], ..., [ 3. , 1. , 29.69911765, ..., 2. , 23.45 , 1. ], [ 1. , 0. , 26. , ..., 0. , 30. , 2. ], [ 3. , 0. , 32. , ..., 0. , 7.75 , 3. ]])
so want retain column names when info manipulation on pandas info frame.
scikit-learn indeed strips column headers in cases, add together them on afterward. in example, x_imputed
sklearn.preprocessing
output , x_train
original dataframe, can set column headers on with:
x_imputed_df = pd.dataframe(x_imputed, columns = x_train.columns)
python numpy pandas scikit-learn
Comments
Post a Comment