sql - Row Aggregation after Cross Join in BigQuery -
sql - Row Aggregation after Cross Join in BigQuery -
say have next table in bigquery:
a = user1 | 0 0 | user2 | 0 3 | user3 | 4 0 |
after cross join, have
dist = |user1 user2 0 0 , 0 3 | #comma showing user val seperation |user1 user3 0 0 , 4 0 | |user2 user3 0 3 , 4 0 |
how can perform row aggregation in bigquery compute pairwise aggregation across rows. typical utilize case, compute euclidean distance between 2 users. want compute next metric between 2 users:
sum(min(user1_row[i], user2_row[i]) / abs(user1_row[i] - user2_row[i]))
summed on each pair of users.
for illustration in python simply:
for in np.arange(row_length/2)]): dist.append([user1, user2, np.sum(min(r1[i], r2[i]) / abs(r1[i] - r2[i]))])
to start ugly way: flatten out math query. is, turn for in ... sum(min(...)/abs(...))
sql operating on each of fields. note min
, sum
aggregate functions won't want use. instead utilize +
sum , if(a < b, a, b)
min
. abs(a, b)
looks if(a < b, b-a, a-b)
. if computing euclidian distance, do
select left.user, right.user, sqrt((left.x-right.x)*(left.x-right.x) + (left.y-right.y)*(left.y-right.y) + (left.z-right.z)*(left.z-right.z)) dist ( select * dataset.table1 left cross bring together dataset.table1 right)
the nicer way user-defined functions, , create vectors repeated values. can write distance()
function performs computation on 2 arrays left , right side of cross join. if you're not in udf beta programme , join, please contact google cloud support.
finally, if alter schema {user:string, field1:float, field2:float, field3:float,...}
{user:string, fields:[field:float]}
you flatten field position , cross bring together on that. in:
select user, field, index, (flatten(( select user, fields.field field, position(fields.field) index, [dataset1.table1] ), fields))
if save view, phone call "dataset1.flat_view"
then can join:
select left.user user1, right.user user2, left.field l, right.field r, dataset1.flat_view left bring together dataset1.flat_view right on left.index = right.index left.user != right.user
this give 1 row each each pair of users , each field matching field. can save view "dataset1.joined_view".
finally, can aggregations:
since want this:
sum(min(user1_row[i], user2_row[i]) / abs(user1_row[i] - user2_row[i]))
it like:
select user1, user2, sum((if (l < r, l, r)) / (if (l > r, l-r, r-l)) [dataset1.joined_view] grouping each user1, user2
sql google-bigquery aggregation data-analysis cross-join
Comments
Post a Comment