sql - Row Aggregation after Cross Join in BigQuery -



sql - Row Aggregation after Cross Join in BigQuery -

say have next table in bigquery:

a = user1 | 0 0 | user2 | 0 3 | user3 | 4 0 |

after cross join, have

dist = |user1 user2 0 0 , 0 3 | #comma showing user val seperation |user1 user3 0 0 , 4 0 | |user2 user3 0 3 , 4 0 |

how can perform row aggregation in bigquery compute pairwise aggregation across rows. typical utilize case, compute euclidean distance between 2 users. want compute next metric between 2 users:

sum(min(user1_row[i], user2_row[i]) / abs(user1_row[i] - user2_row[i]))

summed on each pair of users.

for illustration in python simply:

for in np.arange(row_length/2)]): dist.append([user1, user2, np.sum(min(r1[i], r2[i]) / abs(r1[i] - r2[i]))])

to start ugly way: flatten out math query. is, turn for in ... sum(min(...)/abs(...)) sql operating on each of fields. note min , sum aggregate functions won't want use. instead utilize + sum , if(a < b, a, b) min. abs(a, b) looks if(a < b, b-a, a-b). if computing euclidian distance, do

select left.user, right.user, sqrt((left.x-right.x)*(left.x-right.x) + (left.y-right.y)*(left.y-right.y) + (left.z-right.z)*(left.z-right.z)) dist ( select * dataset.table1 left cross bring together dataset.table1 right)

the nicer way user-defined functions, , create vectors repeated values. can write distance() function performs computation on 2 arrays left , right side of cross join. if you're not in udf beta programme , join, please contact google cloud support.

finally, if alter schema {user:string, field1:float, field2:float, field3:float,...} {user:string, fields:[field:float]}

you flatten field position , cross bring together on that. in:

select user, field, index, (flatten(( select user, fields.field field, position(fields.field) index, [dataset1.table1] ), fields))

if save view, phone call "dataset1.flat_view"

then can join:

select left.user user1, right.user user2, left.field l, right.field r, dataset1.flat_view left bring together dataset1.flat_view right on left.index = right.index left.user != right.user

this give 1 row each each pair of users , each field matching field. can save view "dataset1.joined_view".

finally, can aggregations:

since want this:

sum(min(user1_row[i], user2_row[i]) / abs(user1_row[i] - user2_row[i]))

it like:

select user1, user2, sum((if (l < r, l, r)) / (if (l > r, l-r, r-l)) [dataset1.joined_view] grouping each user1, user2

sql google-bigquery aggregation data-analysis cross-join

Comments

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -