The basic syntax for a query in Malloy consists of a source and a view consisting of one or more stages separated by ->
. The shape of the data defined in the original source is transformed by each stage.
run: flights -> { group_by: carrier aggregate: flight_count }
carrier | flight_count |
---|---|
WN | 88,751 |
US | 37,683 |
AA | 34,577 |
NW | 33,580 |
UA | 32,757 |
[ { "carrier": "WN", "flight_count": 88751 }, { "carrier": "US", "flight_count": 37683 }, { "carrier": "AA", "flight_count": 34577 }, { "carrier": "NW", "flight_count": 33580 }, { "carrier": "UA", "flight_count": 32757 } ]
SELECT flights."carrier" as "carrier", COUNT( 1) as "flight_count" FROM '../data/flights.parquet' as flights GROUP BY 1 ORDER BY 2 desc NULLS LAST
Sources
Every query begins with a source, which can be thought of as a table with an associated collection of calculations and join relationships. The most common kind of source is a table source, e.g. duckdb.table('flights.csv')
; see the Sources section for more information on creating and extending sources.
Generally, ad-hoc queries will use one of these kinds of sources directly:
run: duckdb.table('../data/flights.parquet') -> { aggregate: total_flight_count is count() }
total_flight_count |
---|
344,827 |
[ { "total_flight_count": 344827 } ]
SELECT COUNT( 1) as "total_flight_count" FROM '../data/flights.parquet' as base ORDER BY 1 desc NULLS LAST
However, as analysis becomes more involved, it is often useful to define reusable computations in a named source, then use that named source as the basis for queries.
source: flights2 is duckdb.table('../data/flights.parquet') extend { measure: flight_count is count() total_distance is sum(distance) } run: flights2 -> { group_by: carrier aggregate: flight_count, total_distance }
carrier | flight_count | total_distance |
---|---|---|
WN | 88,751 | 54,619,152 |
US | 37,683 | 23,721,642 |
AA | 34,577 | 37,684,885 |
NW | 33,580 | 33,376,503 |
UA | 32,757 | 38,882,934 |
[ { "carrier": "WN", "flight_count": 88751, "total_distance": 54619152 }, { "carrier": "US", "flight_count": 37683, "total_distance": 23721642 }, { "carrier": "AA", "flight_count": 34577, "total_distance": 37684885 }, { "carrier": "NW", "flight_count": 33580, "total_distance": 33376503 }, { "carrier": "UA", "flight_count": 32757, "total_distance": 38882934 } ]
SELECT flights2."carrier" as "carrier", COUNT( 1) as "flight_count", COALESCE(SUM(flights2."distance"),0) as "total_distance" FROM '../data/flights.parquet' as flights2 GROUP BY 1 ORDER BY 2 desc NULLS LAST
It is also sometimes useful to define a query and then later use that query as the source for another query:
query: flights_by_carrier is duckdb.table('../data/flights.parquet') -> { group_by: carrier aggregate: flight_count is count() } run: flights_by_carrier -> { select: carrier limit: 2 }
carrier |
---|
WN |
US |
[ { "carrier": "WN" }, { "carrier": "US" } ]
WITH __stage0 AS ( SELECT base."carrier" as "carrier", COUNT( 1) as "flight_count" FROM '../data/flights.parquet' as base GROUP BY 1 ORDER BY 2 desc NULLS LAST ) SELECT base."carrier" as "carrier" FROM __stage0 as base LIMIT 2
Views
A view defines a transformation (or series of transformations) to perform on a source. In a query, everything after the first ->
is a view.
// source -> view run: flights -> { select: * }
Views can be defined in a source extension to enable reusability and nesting. You can always copy and paste the view of a query into the definition of the source of that query (or an extension of that source).
source: flights3 is flights extend { view: by_carrier2 is { group_by: carrier aggregate: flight_count } } run: flights3 -> by_carrier2
carrier | flight_count |
---|---|
WN | 88,751 |
US | 37,683 |
AA | 34,577 |
NW | 33,580 |
UA | 32,757 |
[ { "carrier": "WN", "flight_count": 88751 }, { "carrier": "US", "flight_count": 37683 }, { "carrier": "AA", "flight_count": 34577 }, { "carrier": "NW", "flight_count": 33580 }, { "carrier": "UA", "flight_count": 32757 } ]
SELECT flights3."carrier" as "carrier", COUNT( 1) as "flight_count" FROM '../data/flights.parquet' as flights3 GROUP BY 1 ORDER BY 2 desc NULLS LAST
Queries that reference a view by name can add additional stages after those defined in the view.
run: flights3 -> by_carrier2 -> { select: carrier limit: 1 }
carrier |
---|
WN |
[ { "carrier": "WN" } ]
WITH __stage0 AS ( SELECT flights3."carrier" as "carrier", COUNT( 1) as "flight_count" FROM '../data/flights.parquet' as flights3 GROUP BY 1 ORDER BY 2 desc NULLS LAST ) SELECT base."carrier" as "carrier" FROM __stage0 as base LIMIT 1
For more information about defining, reusing, nesting, and refining views, see the Views section.