Malloy Documentation
search

Calculations in Malloy are fields which operate on the results of a grouping, aggregating, or projecting query, while operating within the same query stage. Logically these calculation operations occur "after" the other operations, so their exact semantics can be challenging to understand.

Window Functions

A calculation is a kind of field which appears in a calculate: operation in a query. These calculation fields are defined in terms of analytic functions, which tend to be translated into SQL window functions. So when you see the terms "calculation" or "analytic function" in Malloy documentation, it's safe to think "window function."

For more details, see a full list of available analytic functions.

document
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  order_by: dep_year asc
  calculate: year_change is flight_count - lag(flight_count)
}
QUERY RESULTS
dep_​yearflight_​countyear_​change
200013,133
200114,4211,288
200214,708287
200314,300-408
200414,640340
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "flight_count": 13133,
    "year_change": null
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  }
]
SELECT 
   DATE_TRUNC('year', flights."dep_time") as "dep_year",
   COUNT( 1) as "flight_count",
   (COUNT( 1))-LAG((COUNT( 1))) OVER(  ORDER BY  DATE_TRUNC('year', flights."dep_time") asc NULLS LAST ) as "year_change"
FROM '../data/flights.parquet' as flights
WHERE flights."carrier"='WN'
GROUP BY 1
ORDER BY 1 asc NULLS LAST

Field References in Calculations

Because calculations operate logically on the output of the grouping/projecting/aggregating operations, field references behave somewhat differently inside a calculate: block. In particular, field references refer by default to output names from those operations.

document
run: flights -> {
  group_by: lower_carrier is lower(carrier)
  calculate: prev_carrier is lag(lower_carrier)
}
QUERY RESULTS
lower_​carrierprev_​carrier
aa
asaa
b6as
cob6
dlco
[
  {
    "lower_carrier": "aa",
    "prev_carrier": null
  },
  {
    "lower_carrier": "as",
    "prev_carrier": "aa"
  },
  {
    "lower_carrier": "b6",
    "prev_carrier": "as"
  },
  {
    "lower_carrier": "co",
    "prev_carrier": "b6"
  },
  {
    "lower_carrier": "dl",
    "prev_carrier": "co"
  }
]
SELECT 
   LOWER(flights."carrier") as "lower_carrier",
   LAG((LOWER(flights."carrier"))) OVER(  ORDER BY  LOWER(flights."carrier") asc NULLS LAST ) as "prev_carrier"
FROM '../data/flights.parquet' as flights
GROUP BY 1
ORDER BY 1 asc NULLS LAST

In a group_by or aggregate, you cannot reference lower_carrier because it is not a field defined inside of flights, but in calculate, you can.

For a detailed explanation of the exact semantics, see the evaluation space documentation.

Ordering

Some window functions in SQL, such as lag, require an ORDER BY clause to determine behavior. In Malloy, such analytic functions use the ordering of the query itself. For example:

document
run: flights -> {
  where: carrier = 'WN'
  group_by: dep_year is dep_time.year
  aggregate: flight_count
  order_by: dep_year asc
  calculate: year_change is flight_count - lag(flight_count)
}
QUERY RESULTS
dep_​yearflight_​countyear_​change
200013,133
200114,4211,288
200214,708287
200314,300-408
200414,640340
[
  {
    "dep_year": "2000-01-01T00:00:00.000Z",
    "flight_count": 13133,
    "year_change": null
  },
  {
    "dep_year": "2001-01-01T00:00:00.000Z",
    "flight_count": 14421,
    "year_change": 1288
  },
  {
    "dep_year": "2002-01-01T00:00:00.000Z",
    "flight_count": 14708,
    "year_change": 287
  },
  {
    "dep_year": "2003-01-01T00:00:00.000Z",
    "flight_count": 14300,
    "year_change": -408
  },
  {
    "dep_year": "2004-01-01T00:00:00.000Z",
    "flight_count": 14640,
    "year_change": 340
  }
]
SELECT 
   DATE_TRUNC('year', flights."dep_time") as "dep_year",
   COUNT( 1) as "flight_count",
   (COUNT( 1))-LAG((COUNT( 1))) OVER(  ORDER BY  DATE_TRUNC('year', flights."dep_time") asc NULLS LAST ) as "year_change"
FROM '../data/flights.parquet' as flights
WHERE flights."carrier"='WN'
GROUP BY 1
ORDER BY 1 asc NULLS LAST

lag(flight_count) for each row is calculated with respect to dep_year asc. The value of lag(flight_count) for the year 2001 is the value of flight_count for the year 2000.

Note that Malloy queries use a default ordering if none is explicitly specified:

  1. If a time dimension is specified in the query, default ordering is by "descending time", showing newest rows first

  2. If no time is specified, but there is a numeric measure, default ordering is by "measure values descending", showing largest values first