python - Pandas Difference in DataFrames -



python - Pandas Difference in DataFrames -

i trying figure out how display differences between 2 pandas dataframes. there can't seem figure out how display additional info rows contain difference.

here have far:

compare dataframe dataframe b:

dataframe a:

date id_1 id_2 value 1-jan 1 1 5 2-jan 1 2 6 3-jan 1 3 4 4-jan 1 4 2 5-jan 1 5 8

dataframe b:

date id_1 id_2 value 1-jan 1 1 5 2-jan 1 2 6 3-jan 1 3 4 4-jan 1 4 2 5-jan 1 5 55

current output:

date column 5-jan value 8 55

desired output:

date id_1 id_2 5-jan 1 5 8 55

current code:

#stack column(s) dataframes not equal ne_stacked = (df1 != df2).stack() #create new dataframe ne_stacked changed = ne_stacked[ne_stacked] #change column names changed.index.names = ['date', 'column'] #create array dataframes not equal diff_loc = np.where(df1 != df2) #create 'from' column changed_from = df1.values[diff_loc] #create 'to' column changed_to = df2.values[diff_loc] #create summary dataframe final = pd.dataframe({'from': changed_from, 'to': changed_to}, index=changed.index) print final

use merge:

in [29]: print df_a date id_1 id_2 value 0 1-jan 1 1 5 1 2-jan 1 2 6 2 3-jan 1 3 4 3 4-jan 1 4 2 4 5-jan 1 5 8 in [30]: print df_b date id_1 id_2 value 0 1-jan 1 1 5 1 2-jan 1 2 6 2 3-jan 1 3 4 3 4-jan 1 4 2 4 5-jan 1 5 55 in [31]: df_c = pd.merge(df_a, df_b, how='outer', on=['date', 'id_1', 'id_2']) df_c.columns = ['date', 'id_1', 'id_2', 'from', 'to'] df_c = df_c[df_c.from!=df_c.to] print df_c date id_1 id_2 4 5-jan 1 5 8 55

python pandas

Comments

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -