Calculations in Malloy are fields which operate on the results of a grouping, aggregating, or projecting query, while operating within the same query stage. Logically these calculation operations occur "after" the other operations, so their exact semantics can be challenging to understand.
Window Functions
A calculation is a kind of field which appears in a calculate:
operation in a query. These calculation fields are defined in terms of analytic functions, which tend to be translated into SQL window functions. So when you see the terms "calculation" or "analytic function" in Malloy documentation, it's safe to think "window function."
For more details, see a full list of available analytic functions.
run: flights -> { where: carrier = 'WN' group_by: dep_year is dep_time.year aggregate: flight_count order_by: dep_year asc calculate: year_change is flight_count - lag(flight_count) }
dep_year | flight_count | year_change |
---|---|---|
2000 | 13,133 | ∅ |
2001 | 14,421 | 1,288 |
2002 | 14,708 | 287 |
2003 | 14,300 | -408 |
2004 | 14,640 | 340 |
[ { "dep_year": "2000-01-01T00:00:00.000Z", "flight_count": 13133, "year_change": null }, { "dep_year": "2001-01-01T00:00:00.000Z", "flight_count": 14421, "year_change": 1288 }, { "dep_year": "2002-01-01T00:00:00.000Z", "flight_count": 14708, "year_change": 287 }, { "dep_year": "2003-01-01T00:00:00.000Z", "flight_count": 14300, "year_change": -408 }, { "dep_year": "2004-01-01T00:00:00.000Z", "flight_count": 14640, "year_change": 340 } ]
SELECT DATE_TRUNC('year', flights."dep_time") as "dep_year", COUNT( 1) as "flight_count", (COUNT( 1))-LAG((COUNT( 1))) OVER( ORDER BY DATE_TRUNC('year', flights."dep_time") asc NULLS LAST ) as "year_change" FROM '../data/flights.parquet' as flights WHERE flights."carrier"='WN' GROUP BY 1 ORDER BY 1 asc NULLS LAST
Field References in Calculations
Because calculations operate logically on the output of the grouping/projecting/aggregating operations, field references behave somewhat differently inside a calculate:
block. In particular, field references refer by default to output names from those operations.
run: flights -> { group_by: lower_carrier is lower(carrier) calculate: prev_carrier is lag(lower_carrier) }
lower_carrier | prev_carrier |
---|---|
aa | ∅ |
as | aa |
b6 | as |
co | b6 |
dl | co |
[ { "lower_carrier": "aa", "prev_carrier": null }, { "lower_carrier": "as", "prev_carrier": "aa" }, { "lower_carrier": "b6", "prev_carrier": "as" }, { "lower_carrier": "co", "prev_carrier": "b6" }, { "lower_carrier": "dl", "prev_carrier": "co" } ]
SELECT LOWER(flights."carrier") as "lower_carrier", LAG((LOWER(flights."carrier"))) OVER( ORDER BY LOWER(flights."carrier") asc NULLS LAST ) as "prev_carrier" FROM '../data/flights.parquet' as flights GROUP BY 1 ORDER BY 1 asc NULLS LAST
In a group_by
or aggregate
, you cannot reference lower_carrier
because it is not a field defined inside of flights
, but in calculate
, you can.
For a detailed explanation of the exact semantics, see the evaluation space documentation.
Ordering
Some window functions in SQL, such as lag, require an ORDER BY
clause to determine behavior. In Malloy, such analytic functions use the ordering of the query itself. For example:
run: flights -> { where: carrier = 'WN' group_by: dep_year is dep_time.year aggregate: flight_count order_by: dep_year asc calculate: year_change is flight_count - lag(flight_count) }
dep_year | flight_count | year_change |
---|---|---|
2000 | 13,133 | ∅ |
2001 | 14,421 | 1,288 |
2002 | 14,708 | 287 |
2003 | 14,300 | -408 |
2004 | 14,640 | 340 |
[ { "dep_year": "2000-01-01T00:00:00.000Z", "flight_count": 13133, "year_change": null }, { "dep_year": "2001-01-01T00:00:00.000Z", "flight_count": 14421, "year_change": 1288 }, { "dep_year": "2002-01-01T00:00:00.000Z", "flight_count": 14708, "year_change": 287 }, { "dep_year": "2003-01-01T00:00:00.000Z", "flight_count": 14300, "year_change": -408 }, { "dep_year": "2004-01-01T00:00:00.000Z", "flight_count": 14640, "year_change": 340 } ]
SELECT DATE_TRUNC('year', flights."dep_time") as "dep_year", COUNT( 1) as "flight_count", (COUNT( 1))-LAG((COUNT( 1))) OVER( ORDER BY DATE_TRUNC('year', flights."dep_time") asc NULLS LAST ) as "year_change" FROM '../data/flights.parquet' as flights WHERE flights."carrier"='WN' GROUP BY 1 ORDER BY 1 asc NULLS LAST
lag(flight_count)
for each row is calculated with respect to dep_year asc
. The value of lag(flight_count)
for the year 2001 is the value of flight_count
for the year 2000.
Note that Malloy queries use a default ordering if none is explicitly specified:
If a time dimension is specified in the query, default ordering is by "descending time", showing newest rows first
If no time is specified, but there is a numeric measure, default ordering is by "measure values descending", showing largest values first