python - Pandas Difference in DataFrames -
python - Pandas Difference in DataFrames -
i trying figure out how display differences between 2 pandas dataframes. there can't seem figure out how display additional info rows contain difference.
here have far:
compare dataframe dataframe b:
dataframe a:
date id_1 id_2 value 1-jan 1 1 5 2-jan 1 2 6 3-jan 1 3 4 4-jan 1 4 2 5-jan 1 5 8
dataframe b:
date id_1 id_2 value 1-jan 1 1 5 2-jan 1 2 6 3-jan 1 3 4 4-jan 1 4 2 5-jan 1 5 55
current output:
date column 5-jan value 8 55
desired output:
date id_1 id_2 5-jan 1 5 8 55
current code:
#stack column(s) dataframes not equal ne_stacked = (df1 != df2).stack() #create new dataframe ne_stacked changed = ne_stacked[ne_stacked] #change column names changed.index.names = ['date', 'column'] #create array dataframes not equal diff_loc = np.where(df1 != df2) #create 'from' column changed_from = df1.values[diff_loc] #create 'to' column changed_to = df2.values[diff_loc] #create summary dataframe final = pd.dataframe({'from': changed_from, 'to': changed_to}, index=changed.index) print final
use merge
:
in [29]: print df_a date id_1 id_2 value 0 1-jan 1 1 5 1 2-jan 1 2 6 2 3-jan 1 3 4 3 4-jan 1 4 2 4 5-jan 1 5 8 in [30]: print df_b date id_1 id_2 value 0 1-jan 1 1 5 1 2-jan 1 2 6 2 3-jan 1 3 4 3 4-jan 1 4 2 4 5-jan 1 5 55 in [31]: df_c = pd.merge(df_a, df_b, how='outer', on=['date', 'id_1', 'id_2']) df_c.columns = ['date', 'id_1', 'id_2', 'from', 'to'] df_c = df_c[df_c.from!=df_c.to] print df_c date id_1 id_2 4 5-jan 1 5 8 55
python pandas
Comments
Post a Comment