Day 27
Day 27
Pyspark vs
Spark SQL
Ganesh. R
#Problem Statement Product recommendation. Just the basic type (“customers who bought this
also bought…”). That, in its simplest form, is an outcome of basket analysis. In this solution, i
will learn how to find products which are most frequently bought together using simple SQL.
Based on the history ecommerce website can recommend products to new user.
###Pyspark
###Spark SQL
%sql
with t1 as (
Select a.order_id,a.customer_id,p1.name as name1,p2.name as name2,
(p1.id+p2.id) as pair_sum,monotonically_increasing_id() as idf
from orders a
inner join orders b on a.order_id = b.order_id and
a.product_id<>b.product_id
left join products p1 on a.product_id = p1.id
left join products p2 on b.product_id = p2.id
)
, t2 as (
Select order_id,customer_id,name1,name2,pair_sum, row_number()
over(partition by order_id,pair_sum order by idf asc ) as rnk
from t1
), t3 as (
Select *,
concat(name1, ' ',name2) as pair
from t2 where rnk=1
)
Select
pair,count(distinct order_id) as frequency
from t3
group by pair
order by 2 desc
IF YOU FOUND
THIS POST
USEFUL, PLEASE
SAVE IT.
Ganesh. R
+91-9030485102. Hyderabad, Telangana. rganesh0203@gmail.com
https://medium.com/@rganesh0203 https://rganesh203.github.io/Portfolio/
https://github.com/rganesh203. https://www.linkedin.com/in/r-ganesh-a86418155/
https://www.instagram.com/rg_data_talks/ https://topmate.io/ganesh_r0203