Why is WHERE being used to JOIN the same table

79 views Asked by At

I have been taking a class with Coursera. There is a query that is incorporating a subquery in the SELECT clause. I understand about 90% of the query; however, I cannot figure out why we need to use WHERE in the subquery. The WHERE is acting like the JOIN ON. I don't know why a JOIN ON would be needed.

SELECT
    starttime,
    stoptime,
    start_station_id,
    tripduration,
    (SELECT ROUND(AVG(tripduration), 2) 
     FROM `bigquery-public-data.new_york_citibike.citibike_trips`
     WHERE start_station_id = outer_trips.start_station_id) AS avg_duration_for_stations,
     ROUND (tripduration - (SELECT AVG(tripduration) 
                            FROM `bigquery-public-data.new_york_citibike.citibike_trips`
                            WHERE start_station_id = outer_trips.start_station_id), 2)
FROM
    `bigquery-public-data.new_york_citibike.citibike_trips` AS outer_trips

I tried to eliminate the WHERE because it is used as a filter. I could not understand why it was functioning as a JOIN ON. I was expecting a subquery that would create two additional columns without the lengthy steps of adding WHERE.

Why is WHERE being used to JOIN the same table?

1

There are 1 answers

1
elemes On

the WHERE clause in the subquery actually filters the rows from the

bigquery-public-data.new_york_citibike.citibike_trips

where the start_station_id matches the start_station_id of the current row being processed.

In the outer query to ensure that the average trip duration calculated in the subquery is specific to the start or each individual trip in the outer query.

the reason for the need to use WHERE in the subquery is because it filters the rows from the citibike_trips table so as to calculate the average trip duration , i think its used to target trips starting at the same stations the current row in the outer query. if you remove the where filtering it would return the average trip duration for all trips not just theones starting at the same station , which changes the intended behaviour of the query.