Malloy Quickstart
This guide introduces the basics of querying data and building a semantic model with the Malloy language. By the end of this tutorial, you will understand how to use Malloy to run queries, build re-usable data models, and do analysis on your data that is nearly impossible in SQL.
The easiest way to follow along is by going to the interactive notebook version of this tutorial. The link will launch a browser-based VSCode environment and ask you to install the Malloy extension. Once installed, navigate back to the quickstart notebook file, and dive in.
If you'd like to run Malloy locally on your laptop instead, follow the setup instructions to install the VSCode extension and connect to a database.
A simple SELECT
The following query is equivalent to SELECT id, code, city FROM airports LIMIT 10
in SQL:
query: table('malloy-data.faa.airports') -> { project: id code city limit: 10 }
id | code | city |
---|---|---|
19,783 | 1Q9 | MILI ISLAND |
19,777 | Q51 | KILI ISLAND |
19,787 | 3N1 | TAORA IS MALOELAP ATOLL |
19,789 | 03N | UTIRIK ISLAND |
19,774 | ANG | ANGAUR ISLAND |
[ { "id": 19783, "code": "1Q9", "city": "MILI ISLAND" }, { "id": 19777, "code": "Q51", "city": "KILI ISLAND" }, { "id": 19787, "code": "3N1", "city": "TAORA IS MALOELAP ATOLL" }, { "id": 19789, "code": "03N", "city": "UTIRIK ISLAND" }, { "id": 19774, "code": "ANG", "city": "ANGAUR ISLAND" } ]
SELECT base.id as id, base.code as code, base.city as city FROM `malloy-data.faa.airports` as base LIMIT 10
Let's break down each part of this query.
query:
is the opening statement that indicates we're starting to write a querytable('malloy-data.faa.airports')
defines the source for the query. Thetable()
function creates a source from a table or view in the database.A source is similar to a table or view in SQL, but Malloy sources can include additional information like joins and measures. We'll cover this in depth later on.
The
->
operator begins the query. All queries take the formsource -> { ... }
, with the query logic specified inside of the curly braces.project:
is equivalent toSELECT
in SQL. In this clause, we select theid
,code
, andcity
columns from the table. Theproject
operator takes its name from the projection operation in Relational Algebra.limit: 10
limits the resultset of the query to the first 10 items
Query Operators
In SQL, the SELECT
command does two very different things. A SELECT
with a GROUP BY
aggregates data according to the GROUP BY
clause and produces aggregate calculation against every calculation not in the GROUP BY
. In Malloy, the query operator for this is group_by
. Calculation about data in the group are made using aggregate
.
The second type of SELECT
in SQL does not perform any aggregation; All rows in the input table, unless filtered in some way, show up in the output table. In Malloy, this command is called project
.
Aggregate
In the query below, the data will be grouped by state
and county
, and will produce an aggregate calculation for airport_count
and average_elevation
.
query: table('malloy-data.faa.airports') -> { group_by: state county aggregate: airport_count is count() average_elevation is avg(elevation) }
state | county | airport_count | average_elevation |
---|---|---|---|
CA | LOS ANGELES | 176 | 689.165 |
TX | HARRIS | 135 | 106.467 |
AZ | MARICOPA | 117 | 1,395.667 |
CA | SAN BERNARDINO | 71 | 2,376.056 |
TX | TARRANT | 63 | 646.794 |
[ { "state": "CA", "county": "LOS ANGELES", "airport_count": 176, "average_elevation": 689.164772727273 }, { "state": "TX", "county": "HARRIS", "airport_count": 135, "average_elevation": 106.4666666666667 }, { "state": "AZ", "county": "MARICOPA", "airport_count": 117, "average_elevation": 1395.6666666666663 }, { "state": "CA", "county": "SAN BERNARDINO", "airport_count": 71, "average_elevation": 2376.056338028169 }, { "state": "TX", "county": "TARRANT", "airport_count": 63, "average_elevation": 646.7936507936511 } ]
SELECT base.state as state, base.county as county, COUNT( 1) as airport_count, AVG(base.elevation) as average_elevation FROM `malloy-data.faa.airports` as base GROUP BY 1,2 ORDER BY 3 desc
Project
In Malloy, "project" is a verb, not a noun. As in "to project something", rather than "this is a project". project
produces a list of fields. For every row in the input table, there is a row in the output table. This is similar to a simple SELECT
statement in SQL with no aggregations.
query: table('malloy-data.faa.airports') -> { project: code, full_name, city, county where: county = 'SANTA CRUZ' limit: 10 }
code | full_name | city | county |
---|---|---|---|
2AZ8 | TUBAC ULTRALIGHT FLIGHTPARK | TUBAC | SANTA CRUZ |
OLS | NOGALES INTL | NOGALES | SANTA CRUZ |
NSI | SAN NICOLAS ISLAND NOLF | SAN NICOLAS ISLAND | SANTA CRUZ |
CL77 | BONNY DOON VILLAGE | SANTA CRUZ | SANTA CRUZ |
CA37 | DOMINICAN SANTA CRUZ HOSPITAL | SANTA CRUZ | SANTA CRUZ |
[ { "code": "2AZ8", "full_name": "TUBAC ULTRALIGHT FLIGHTPARK", "city": "TUBAC", "county": "SANTA CRUZ" }, { "code": "OLS", "full_name": "NOGALES INTL", "city": "NOGALES", "county": "SANTA CRUZ" }, { "code": "NSI", "full_name": "SAN NICOLAS ISLAND NOLF", "city": "SAN NICOLAS ISLAND", "county": "SANTA CRUZ" }, { "code": "CL77", "full_name": "BONNY DOON VILLAGE", "city": "SANTA CRUZ", "county": "SANTA CRUZ" }, { "code": "CA37", "full_name": "DOMINICAN SANTA CRUZ HOSPITAL", "city": "SANTA CRUZ", "county": "SANTA CRUZ" } ]
SELECT base.code as code, base.full_name as full_name, base.city as city, base.county as county FROM `malloy-data.faa.airports` as base WHERE base.county='SANTA CRUZ' LIMIT 10
Operator statements can be placed in any order within a query. where
can come before or after project
, and limit
can be placed anywhere as well. The above query could also be written:
query: table('malloy-data.faa.airports') -> { limit: 10 where: county = 'SANTA CRUZ' project: code, full_name, city, county }
code | full_name | city | county |
---|---|---|---|
2AZ8 | TUBAC ULTRALIGHT FLIGHTPARK | TUBAC | SANTA CRUZ |
OLS | NOGALES INTL | NOGALES | SANTA CRUZ |
NSI | SAN NICOLAS ISLAND NOLF | SAN NICOLAS ISLAND | SANTA CRUZ |
CL77 | BONNY DOON VILLAGE | SANTA CRUZ | SANTA CRUZ |
CA37 | DOMINICAN SANTA CRUZ HOSPITAL | SANTA CRUZ | SANTA CRUZ |
[ { "code": "2AZ8", "full_name": "TUBAC ULTRALIGHT FLIGHTPARK", "city": "TUBAC", "county": "SANTA CRUZ" }, { "code": "OLS", "full_name": "NOGALES INTL", "city": "NOGALES", "county": "SANTA CRUZ" }, { "code": "NSI", "full_name": "SAN NICOLAS ISLAND NOLF", "city": "SAN NICOLAS ISLAND", "county": "SANTA CRUZ" }, { "code": "CL77", "full_name": "BONNY DOON VILLAGE", "city": "SANTA CRUZ", "county": "SANTA CRUZ" }, { "code": "CA37", "full_name": "DOMINICAN SANTA CRUZ HOSPITAL", "city": "SANTA CRUZ", "county": "SANTA CRUZ" } ]
SELECT base.code as code, base.full_name as full_name, base.city as city, base.county as county FROM `malloy-data.faa.airports` as base WHERE base.county='SANTA CRUZ' LIMIT 10
Everything has a Name
In Malloy, all output fields have names. This means that any time a query includes a field with a calculated value, like a scalar or aggregate function, it must be named. (unlike SQL, which allows un-named expressions)
query: table('malloy-data.faa.airports') -> { aggregate: max_elevation is max(elevation) }
max_elevation |
---|
12,442 |
[ { "max_elevation": 12442 } ]
SELECT max(base.elevation) as max_elevation FROM `malloy-data.faa.airports` as base ORDER BY 1 desc
Notice that Malloy uses the form "name is
value" instead of SQL's "value as
name".
Having the output column name written first makes it easier for someone reading
the code to visualize the resulting query structure.
Named objects, like columns from a table, and fields defined in a source, can be included in field lists without an is
query: table('malloy-data.faa.airports') -> { project: full_name elevation }
full_name | elevation |
---|---|
MILI | 4 |
KILI | 5 |
MALOELAP | 4 |
UTIRIK | 4 |
NAMORIK | 15 |
[ { "full_name": "MILI", "elevation": 4 }, { "full_name": "KILI", "elevation": 5 }, { "full_name": "MALOELAP", "elevation": 4 }, { "full_name": "UTIRIK", "elevation": 4 }, { "full_name": "NAMORIK", "elevation": 15 } ]
SELECT base.full_name as full_name, base.elevation as elevation FROM `malloy-data.faa.airports` as base
Expressions
Many SQL expressions will work unchanged in Malloy, and many functions available in Standard SQL are usable in Malloy as well. This makes expressions fairly straightforward to understand, given a knowledge of SQL.
query: table('malloy-data.faa.airports') -> { group_by: county_and_state is concat(county, ', ', state) aggregate: airport_count is count() max_elevation is max(elevation) min_elevation is min(elevation) avg_elevation is avg(elevation) }
county_and_state | airport_count | max_elevation | min_elevation | avg_elevation |
---|---|---|---|---|
LOS ANGELES, CA | 176 | 3,420 | 0 | 689.165 |
HARRIS, TX | 135 | 774 | 9 | 106.467 |
MARICOPA, AZ | 117 | 3,995 | 737 | 1,395.667 |
SAN BERNARDINO, CA | 71 | 6,748 | 631 | 2,376.056 |
TARRANT, TX | 63 | 895 | 472 | 646.794 |
[ { "county_and_state": "LOS ANGELES, CA", "airport_count": 176, "max_elevation": 3420, "min_elevation": 0, "avg_elevation": 689.164772727273 }, { "county_and_state": "HARRIS, TX", "airport_count": 135, "max_elevation": 774, "min_elevation": 9, "avg_elevation": 106.4666666666667 }, { "county_and_state": "MARICOPA, AZ", "airport_count": 117, "max_elevation": 3995, "min_elevation": 737, "avg_elevation": 1395.6666666666663 }, { "county_and_state": "SAN BERNARDINO, CA", "airport_count": 71, "max_elevation": 6748, "min_elevation": 631, "avg_elevation": 2376.056338028169 }, { "county_and_state": "TARRANT, TX", "airport_count": 63, "max_elevation": 895, "min_elevation": 472, "avg_elevation": 646.7936507936511 } ]
SELECT concat(base.county,', ',base.state) as county_and_state, COUNT( 1) as airport_count, max(base.elevation) as max_elevation, min(base.elevation) as min_elevation, AVG(base.elevation) as avg_elevation FROM `malloy-data.faa.airports` as base GROUP BY 1 ORDER BY 2 desc
The basic types of Malloy expressions are string
, number
, boolean
, date
, and timestamp
.
Sources: the Basic Structure for Modeling and Reuse
One of the main benefits of Malloy is the ability to save common calculations into a data model. The data model is made of sources, which can be thought of as tables or views, but with additional information, such as joins, dimensions and measures.
In the example below, we create a source object named airports
and add a dimension
calculation for county_and_state
and measure
calculation for airport_count
. Dimensions can be used in group_by
, project
and where
. Measures can be used in aggregate
and having
.
source: airports is table('malloy-data.faa.airports') { dimension: county_and_state is concat(county, ', ', state) measure: airport_count is count() measure: average_elevation is avg(elevation) }
query: airports -> { group_by: county_and_state aggregate: airport_count }
county_and_state | airport_count |
---|---|
LOS ANGELES, CA | 176 |
HARRIS, TX | 135 |
MARICOPA, AZ | 117 |
SAN BERNARDINO, CA | 71 |
TARRANT, TX | 63 |
[ { "county_and_state": "LOS ANGELES, CA", "airport_count": 176 }, { "county_and_state": "HARRIS, TX", "airport_count": 135 }, { "county_and_state": "MARICOPA, AZ", "airport_count": 117 }, { "county_and_state": "SAN BERNARDINO, CA", "airport_count": 71 }, { "county_and_state": "TARRANT, TX", "airport_count": 63 } ]
SELECT concat(airports.county,', ',airports.state) as county_and_state, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as airports GROUP BY 1 ORDER BY 2 desc
Sources that are defined in one file can be imported into another using import "path/to/some/file.malloy"
. For example, if the airports
source above were defined in a file called flights.malloy
, you could create a new file that imports it and immediately start using the airports
source:
import "flights.malloy" query: airports -> { group_by: county_and_state aggregate: average_elevation }
Sources can also contain named queries. These named queries are useful for building nested queries (covered later) or for saving a query so it can re-used again and again without having to rewrite it.
source: airports_with_named_query is table('malloy-data.faa.airports') { dimension: county_and_state is concat(county, ', ', state) measure: airport_count is count() measure: average_elevation is avg(elevation) // This is a "named query": query: top_county_and_state is { group_by: county_and_state aggregate: airport_count limit:10 } } // The named query can now be referenced by name, and run without having to rewrite the logic: query: airports_with_named_query -> top_county_and_state
county_and_state | airport_count |
---|---|
LOS ANGELES, CA | 176 |
HARRIS, TX | 135 |
MARICOPA, AZ | 117 |
SAN BERNARDINO, CA | 71 |
TARRANT, TX | 63 |
[ { "county_and_state": "LOS ANGELES, CA", "airport_count": 176 }, { "county_and_state": "HARRIS, TX", "airport_count": 135 }, { "county_and_state": "MARICOPA, AZ", "airport_count": 117 }, { "county_and_state": "SAN BERNARDINO, CA", "airport_count": 71 }, { "county_and_state": "TARRANT, TX", "airport_count": 63 } ]
SELECT concat(airports_with_named_query.county,', ',airports_with_named_query.state) as county_and_state, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as airports_with_named_query GROUP BY 1 ORDER BY 2 desc LIMIT 10
Joins
Joins are declared as part of a source. When joining a source to another, it brings with it all child joins.
source: aircraft_models is table('malloy-data.faa.aircraft_models') { primary_key: aircraft_model_code } source: aircraft is table('malloy-data.faa.aircraft') { primary_key: tail_num join_one: aircraft_models on aircraft_model_code = aircraft_models.aircraft_model_code } source: flights is table('malloy-data.faa.flights') { join_one: aircraft on tail_num = aircraft.tail_num } query: flights -> { where: dep_time ? @2003-01 group_by: aircraft.aircraft_models.manufacturer aggregate: flight_count is count() aircraft_count is aircraft.count() average_seats_per_model is aircraft.aircraft_models.seats.avg() }
manufacturer | flight_count | aircraft_count | average_seats_per_model |
---|---|---|---|
BOEING | 216,221 | 1,899 | 206.103 |
null | 82,357 | 0 | ∅ |
EMBRAER | 67,698 | 412 | 42.7 |
AIRBUS INDUSTRIE | 43,235 | 397 | 198.909 |
MCDONNELL DOUGLAS | 36,279 | 342 | 151.625 |
[ { "manufacturer": "BOEING", "flight_count": 216221, "aircraft_count": 1899, "average_seats_per_model": 206.10344827586206 }, { "manufacturer": null, "flight_count": 82357, "aircraft_count": 0, "average_seats_per_model": null }, { "manufacturer": "EMBRAER", "flight_count": 67698, "aircraft_count": 412, "average_seats_per_model": 42.7 }, { "manufacturer": "AIRBUS INDUSTRIE", "flight_count": 43235, "aircraft_count": 397, "average_seats_per_model": 198.9090909090909 }, { "manufacturer": "MCDONNELL DOUGLAS", "flight_count": 36279, "aircraft_count": 342, "average_seats_per_model": 151.625 } ]
SELECT aircraft_models_0.manufacturer as manufacturer, COUNT( 1) as flight_count, COUNT(DISTINCT aircraft_0.tail_num) as aircraft_count, (CAST(( ( SUM(DISTINCT (CAST(ROUND(COALESCE(aircraft_models_0.seats,0)*(1*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) )/(1*1.0)) as FLOAT64))/NULLIF(COUNT(DISTINCT aircraft_models_0.aircraft_model_code),0) as average_seats_per_model FROM `malloy-data.faa.flights` as flights LEFT JOIN `malloy-data.faa.aircraft` AS aircraft_0 ON flights.tail_num=aircraft_0.tail_num LEFT JOIN `malloy-data.faa.aircraft_models` AS aircraft_models_0 ON aircraft_0.aircraft_model_code=aircraft_models_0.aircraft_model_code WHERE (flights.dep_time>=TIMESTAMP('2003-01-01', 'UTC'))and(flights.dep_time<TIMESTAMP('2003-02-01', 'UTC')) GROUP BY 1 ORDER BY 2 desc
In this example, the aircraft
source is joined to flights
, and aircraft_models
is joined via aircraft
. These examples explicitly name both keys -- this same syntax can be used to write more complex joins.
Now, any query that uses the flights
source has access to fields in both aircraft
and aircraft_models
without having to explicitly specify the join condition. The joins are specified once in the source, and usable by any query on flights
.
An ad hoc join can also be specified in a query block. In the query below, we join in the airports
table using the destination
column as a join key, then compute the top 5 destination airports by flight count.
source: airports is table('malloy-data.faa.airports') source: flights is table('malloy-data.faa.flights') query: flights -> { join_one: airports on destination = airports.code group_by: airports.full_name aggregate: flight_count is count() top: 5 }
full_name | flight_count |
---|---|
CHICAGO O'HARE INTL | 2,036,343 |
THE WILLIAM B HARTSFIELD ATLANTA INTL | 1,963,535 |
DALLAS/FORT WORTH INTERNATIONAL | 1,812,062 |
LOS ANGELES INTL | 1,300,241 |
PHOENIX SKY HARBOR INTL | 1,059,054 |
[ { "full_name": "CHICAGO O'HARE INTL", "flight_count": 2036343 }, { "full_name": "THE WILLIAM B HARTSFIELD ATLANTA INTL", "flight_count": 1963535 }, { "full_name": "DALLAS/FORT WORTH INTERNATIONAL", "flight_count": 1812062 }, { "full_name": "LOS ANGELES INTL", "flight_count": 1300241 }, { "full_name": "PHOENIX SKY HARBOR INTL", "flight_count": 1059054 } ]
SELECT airports_0.full_name as full_name, COUNT( 1) as flight_count FROM `malloy-data.faa.flights` as flights LEFT JOIN `malloy-data.faa.airports` AS airports_0 ON flights.destination=airports_0.code GROUP BY 1 ORDER BY 2 desc LIMIT 5
Filtering
When working with data, filtering is something you do in almost every query. Malloy provides consistent syntax for filtering everywhere within a query. The most basic type of filter is applied using a where:
clause, very similar to a WHERE
clause in SQL.
The following query grabs the top 5 counties in California with the highest airport count:
query: table('malloy-data.faa.airports') -> { where: state = 'CA' top: 5 group_by: county aggregate: airport_count is count() }
county | airport_count |
---|---|
LOS ANGELES | 176 |
SAN BERNARDINO | 71 |
ORANGE | 53 |
KERN | 49 |
SAN DIEGO | 49 |
[ { "county": "LOS ANGELES", "airport_count": 176 }, { "county": "SAN BERNARDINO", "airport_count": 71 }, { "county": "ORANGE", "airport_count": 53 }, { "county": "KERN", "airport_count": 49 }, { "county": "SAN DIEGO", "airport_count": 49 } ]
SELECT base.county as county, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as base WHERE base.state='CA' GROUP BY 1 ORDER BY 2 desc LIMIT 5
Filters can also be applied to sources:
source: airports_in_california is table('malloy-data.faa.airports') { where: state = 'CA' } query: airports_in_california -> { top: 5 group_by: county aggregate: airport_count is count() }
county | airport_count |
---|---|
LOS ANGELES | 176 |
SAN BERNARDINO | 71 |
ORANGE | 53 |
KERN | 49 |
SAN DIEGO | 49 |
[ { "county": "LOS ANGELES", "airport_count": 176 }, { "county": "SAN BERNARDINO", "airport_count": 71 }, { "county": "ORANGE", "airport_count": 53 }, { "county": "KERN", "airport_count": 49 }, { "county": "SAN DIEGO", "airport_count": 49 } ]
SELECT airports_in_california.county as county, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as airports_in_california WHERE airports_in_california.state='CA' GROUP BY 1 ORDER BY 2 desc LIMIT 5
Any query run on the airports_in_california
source will run against the airports
table, and always include the filter in state = 'CA'
.
Filtering Measures
A filter on an aggregate calculation (a measure) narrows down the data used in that specific calculation. In the example below, the calculations for airports
and heliports
are filtered separately.
query: table('malloy-data.faa.airports') -> { group_by: state aggregate: airports is count() { where: fac_type = 'AIRPORT' } heliports is count() { where: fac_type = 'HELIPORT' } total is count() }
state | airports | heliports | total |
---|---|---|---|
TX | 1,389 | 435 | 1,845 |
IL | 625 | 245 | 890 |
CA | 569 | 396 | 984 |
OH | 537 | 201 | 749 |
FL | 511 | 280 | 856 |
[ { "state": "TX", "airports": 1389, "heliports": 435, "total": 1845 }, { "state": "IL", "airports": 625, "heliports": 245, "total": 890 }, { "state": "CA", "airports": 569, "heliports": 396, "total": 984 }, { "state": "OH", "airports": 537, "heliports": 201, "total": 749 }, { "state": "FL", "airports": 511, "heliports": 280, "total": 856 } ]
SELECT base.state as state, COUNT( CASE WHEN base.fac_type='AIRPORT' THEN 1 END) as airports, COUNT( CASE WHEN base.fac_type='HELIPORT' THEN 1 END) as heliports, COUNT( 1) as total FROM `malloy-data.faa.airports` as base GROUP BY 1 ORDER BY 2 desc
In SQL, this same calculation is often done using CASE
statements inside of the aggregates, which is verbose and difficult to read. A query like the above would look like:
SELECT state , SUM(CASE WHEN fac_type = 'AIRPORT' THEN 1 ELSE 0 END) AS airports , SUM(CASE WHEN fac_type = 'HELIPORT' THEN 1 ELSE 0 END) AS heliports , COUNT(*) AS total FROM `malloy-data.faa.airports` GROUP BY state
Nested Queries
The next several examples will use this simple source definition:
source: airports is table('malloy-data.faa.airports') { measure: airport_count is count() };
Nested Queries
In Malloy, queries can be nested to produce subtables on each output row.
query: airports -> { group_by: state aggregate: airport_count nest: by_facility is { group_by: fac_type aggregate: airport_count top: 3 } }
state | airport_count | by_facility | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
TX | 1,845 |
| ||||||||
CA | 984 |
| ||||||||
IL | 890 |
| ||||||||
FL | 856 |
| ||||||||
PA | 804 |
|
[ { "state": "TX", "airport_count": 1845, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 1389 }, { "fac_type": "HELIPORT", "airport_count": 435 }, { "fac_type": "ULTRALIGHT", "airport_count": 8 } ] }, { "state": "CA", "airport_count": 984, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 569 }, { "fac_type": "HELIPORT", "airport_count": 396 }, { "fac_type": "SEAPLANE BASE", "airport_count": 12 } ] }, { "state": "IL", "airport_count": 890, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 625 }, { "fac_type": "HELIPORT", "airport_count": 245 }, { "fac_type": "SEAPLANE BASE", "airport_count": 8 } ] }, { "state": "FL", "airport_count": 856, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 511 }, { "fac_type": "HELIPORT", "airport_count": 280 }, { "fac_type": "SEAPLANE BASE", "airport_count": 43 } ] }, { "state": "PA", "airport_count": 804, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 468 }, { "fac_type": "HELIPORT", "airport_count": 307 }, { "fac_type": "ULTRALIGHT", "airport_count": 13 } ] } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set IN (0,1) THEN airports.state END as state__0, CASE WHEN group_set=0 THEN COUNT( 1) END as airport_count__0, CASE WHEN group_set=1 THEN airports.fac_type END as fac_type__1, CASE WHEN group_set=1 THEN COUNT( 1) END as airport_count__1 FROM `malloy-data.faa.airports` as airports CROSS JOIN (SELECT row_number() OVER() -1 group_set FROM UNNEST(GENERATE_ARRAY(0,1,1))) GROUP BY 1,2,4 ) SELECT state__0 as state, ANY_VALUE(CASE WHEN group_set=0 THEN airport_count__0 END) as airport_count, ARRAY_AGG(CASE WHEN group_set=1 THEN STRUCT( fac_type__1 as fac_type, airport_count__1 as airport_count ) END IGNORE NULLS ORDER BY airport_count__1 desc LIMIT 3) as by_facility FROM __stage0 GROUP BY 1 ORDER BY 2 desc
Here we can see that the by_facility
column of the output table contains a nested subtable on each row. by_facility
contains the counts for the top 3 facility types for each state, i.e., the number of airports, heliports, and stolports in Texas, the number of airports, heliports, and seaplane bases in California, etc.
When a query is nested inside another query, each output row of the outer query will have a nested table for the inner query which only includes data limited to that row.
Queries can be nested infinitely, allowing for rich, complex output structures. A query may always include another nested query, regardless of depth.
query: airports -> { group_by: state aggregate: airport_count nest: top_5_counties is { top: 5 group_by: county aggregate: airport_count nest: by_facility is { group_by: fac_type aggregate: airport_count } } }
state | airport_count | top_5_counties | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TX | 1,845 |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CA | 984 |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
IL | 890 |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FL | 856 |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PA | 804 |
|
[ { "state": "TX", "airport_count": 1845, "top_5_counties": [ { "county": "HARRIS", "airport_count": 135, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 110 }, { "fac_type": "AIRPORT", "airport_count": 25 } ] }, { "county": "TARRANT", "airport_count": 63, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 35 }, { "fac_type": "AIRPORT", "airport_count": 26 }, { "fac_type": "ULTRALIGHT", "airport_count": 1 }, { "fac_type": "STOLPORT", "airport_count": 1 } ] }, { "county": "DENTON", "airport_count": 53, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 47 }, { "fac_type": "HELIPORT", "airport_count": 6 } ] }, { "county": "DALLAS", "airport_count": 42, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 32 }, { "fac_type": "AIRPORT", "airport_count": 9 }, { "fac_type": "STOLPORT", "airport_count": 1 } ] }, { "county": "BEXAR", "airport_count": 40, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 24 }, { "fac_type": "HELIPORT", "airport_count": 16 } ] } ] }, { "state": "CA", "airport_count": 984, "top_5_counties": [ { "county": "LOS ANGELES", "airport_count": 176, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 151 }, { "fac_type": "AIRPORT", "airport_count": 23 }, { "fac_type": "SEAPLANE BASE", "airport_count": 2 } ] }, { "county": "SAN BERNARDINO", "airport_count": 71, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 47 }, { "fac_type": "HELIPORT", "airport_count": 24 } ] }, { "county": "ORANGE", "airport_count": 53, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 47 }, { "fac_type": "AIRPORT", "airport_count": 6 } ] }, { "county": "KERN", "airport_count": 49, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 41 }, { "fac_type": "HELIPORT", "airport_count": 7 }, { "fac_type": "ULTRALIGHT", "airport_count": 1 } ] }, { "county": "SAN DIEGO", "airport_count": 49, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 30 }, { "fac_type": "HELIPORT", "airport_count": 17 }, { "fac_type": "GLIDERPORT", "airport_count": 2 } ] } ] }, { "state": "IL", "airport_count": 890, "top_5_counties": [ { "county": "COOK", "airport_count": 51, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 44 }, { "fac_type": "AIRPORT", "airport_count": 7 } ] }, { "county": "LA SALLE", "airport_count": 39, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 35 }, { "fac_type": "HELIPORT", "airport_count": 4 } ] }, { "county": "MC HENRY", "airport_count": 29, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 20 }, { "fac_type": "HELIPORT", "airport_count": 7 }, { "fac_type": "SEAPLANE BASE", "airport_count": 2 } ] }, { "county": "DE KALB", "airport_count": 27, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 24 }, { "fac_type": "HELIPORT", "airport_count": 3 } ] }, { "county": "WINNEBAGO", "airport_count": 24, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 15 }, { "fac_type": "HELIPORT", "airport_count": 9 } ] } ] }, { "state": "FL", "airport_count": 856, "top_5_counties": [ { "county": "PALM BEACH", "airport_count": 45, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 30 }, { "fac_type": "AIRPORT", "airport_count": 14 }, { "fac_type": "GLIDERPORT", "airport_count": 1 } ] }, { "county": "DADE", "airport_count": 44, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 27 }, { "fac_type": "AIRPORT", "airport_count": 12 }, { "fac_type": "GLIDERPORT", "airport_count": 2 }, { "fac_type": "SEAPLANE BASE", "airport_count": 2 }, { "fac_type": "STOLPORT", "airport_count": 1 } ] }, { "county": "POLK", "airport_count": 43, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 18 }, { "fac_type": "HELIPORT", "airport_count": 16 }, { "fac_type": "SEAPLANE BASE", "airport_count": 9 } ] }, { "county": "MARION", "airport_count": 37, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 27 }, { "fac_type": "HELIPORT", "airport_count": 7 }, { "fac_type": "SEAPLANE BASE", "airport_count": 2 }, { "fac_type": "STOLPORT", "airport_count": 1 } ] }, { "county": "ORANGE", "airport_count": 36, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 24 }, { "fac_type": "AIRPORT", "airport_count": 8 }, { "fac_type": "SEAPLANE BASE", "airport_count": 4 } ] } ] }, { "state": "PA", "airport_count": 804, "top_5_counties": [ { "county": "BUCKS", "airport_count": 55, "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 32 }, { "fac_type": "HELIPORT", "airport_count": 19 }, { "fac_type": "ULTRALIGHT", "airport_count": 2 }, { "fac_type": "STOLPORT", "airport_count": 1 }, { "fac_type": "GLIDERPORT", "airport_count": 1 } ] }, { "county": "MONTGOMERY", "airport_count": 44, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 29 }, { "fac_type": "AIRPORT", "airport_count": 14 }, { "fac_type": "SEAPLANE BASE", "airport_count": 1 } ] }, { "county": "ALLEGHENY", "airport_count": 31, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 22 }, { "fac_type": "AIRPORT", "airport_count": 8 }, { "fac_type": "SEAPLANE BASE", "airport_count": 1 } ] }, { "county": "CHESTER", "airport_count": 27, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 16 }, { "fac_type": "AIRPORT", "airport_count": 11 } ] }, { "county": "PHILADELPHIA", "airport_count": 26, "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 22 }, { "fac_type": "AIRPORT", "airport_count": 4 } ] } ] } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set IN (0,1,2) THEN airports.state END as state__0, CASE WHEN group_set=0 THEN COUNT( 1) END as airport_count__0, CASE WHEN group_set IN (1,2) THEN airports.county END as county__1, CASE WHEN group_set=1 THEN COUNT( 1) END as airport_count__1, CASE WHEN group_set=2 THEN airports.fac_type END as fac_type__2, CASE WHEN group_set=2 THEN COUNT( 1) END as airport_count__2 FROM `malloy-data.faa.airports` as airports CROSS JOIN (SELECT row_number() OVER() -1 group_set FROM UNNEST(GENERATE_ARRAY(0,2,1))) GROUP BY 1,2,4,6 ) , __stage1 AS ( SELECT CASE WHEN group_set=2 THEN 1 ELSE group_set END as group_set, state__0 as state__0, ANY_VALUE(CASE WHEN group_set=0 THEN airport_count__0 END) as airport_count__0, CASE WHEN group_set IN (1,2) THEN county__1 END as county__1, ANY_VALUE(CASE WHEN group_set=1 THEN airport_count__1 END) as airport_count__1, ARRAY_AGG(CASE WHEN group_set=2 THEN STRUCT( fac_type__2 as fac_type, airport_count__2 as airport_count ) END IGNORE NULLS ORDER BY airport_count__2 desc ) as by_facility__1 FROM __stage0 GROUP BY 1,2,4 ) SELECT state__0 as state, ANY_VALUE(CASE WHEN group_set=0 THEN airport_count__0 END) as airport_count, ARRAY_AGG(CASE WHEN group_set=1 THEN STRUCT( county__1 as county, airport_count__1 as airport_count, by_facility__1 as by_facility ) END IGNORE NULLS ORDER BY airport_count__1 desc LIMIT 5) as top_5_counties FROM __stage1 GROUP BY 1 ORDER BY 2 desc
Filtering Nested Queries
Filters can be isolated to any level of nesting. In the following example, we limit the major_facilities
query to only airports where major
is 'Y'
. This particular filter applies only to major_facilities
, and not to other parts of the outer query.
query: airports -> { where: state = 'CA' group_by: county aggregate: airport_count nest: major_facilities is { where: major = 'Y' group_by: name is concat(code, ' (', full_name, ')') } nest: by_facility is { group_by: fac_type aggregate: airport_count } }
county | airport_count | major_facilities | by_facility | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LOS ANGELES | 176 |
|
| ||||||||||||
SAN BERNARDINO | 71 |
|
| ||||||||||||
ORANGE | 53 |
|
| ||||||||||||
SAN DIEGO | 49 |
|
| ||||||||||||
KERN | 49 |
|
|
[ { "county": "LOS ANGELES", "airport_count": 176, "major_facilities": [ { "name": "BUR (BURBANK-GLENDALE-PASADENA)" }, { "name": "LAX (LOS ANGELES INTL)" }, { "name": "LGB (LONG BEACH /DAUGHERTY FIELD/)" } ], "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 151 }, { "fac_type": "AIRPORT", "airport_count": 23 }, { "fac_type": "SEAPLANE BASE", "airport_count": 2 } ] }, { "county": "SAN BERNARDINO", "airport_count": 71, "major_facilities": [ { "name": "ONT (ONTARIO INTL)" } ], "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 47 }, { "fac_type": "HELIPORT", "airport_count": 24 } ] }, { "county": "ORANGE", "airport_count": 53, "major_facilities": [ { "name": "SNA (JOHN WAYNE AIRPORT-ORANGE COUNTY)" } ], "by_facility": [ { "fac_type": "HELIPORT", "airport_count": 47 }, { "fac_type": "AIRPORT", "airport_count": 6 } ] }, { "county": "SAN DIEGO", "airport_count": 49, "major_facilities": [ { "name": "SAN (SAN DIEGO INTL-LINDBERGH FLD)" } ], "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 30 }, { "fac_type": "HELIPORT", "airport_count": 17 }, { "fac_type": "GLIDERPORT", "airport_count": 2 } ] }, { "county": "KERN", "airport_count": 49, "major_facilities": [ { "name": "BFL (MEADOWS FIELD)" }, { "name": "IYK (INYOKERN)" } ], "by_facility": [ { "fac_type": "AIRPORT", "airport_count": 41 }, { "fac_type": "HELIPORT", "airport_count": 7 }, { "fac_type": "ULTRALIGHT", "airport_count": 1 } ] } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set IN (0,1,2) THEN airports.county END as county__0, CASE WHEN group_set=0 THEN COUNT( 1) END as airport_count__0, CASE WHEN group_set=1 THEN concat(airports.code,' (',airports.full_name,')') END as name__1, CASE WHEN group_set=2 THEN airports.fac_type END as fac_type__2, CASE WHEN group_set=2 THEN COUNT( 1) END as airport_count__2 FROM `malloy-data.faa.airports` as airports CROSS JOIN (SELECT row_number() OVER() -1 group_set FROM UNNEST(GENERATE_ARRAY(0,2,1))) WHERE (airports.state='CA') AND ((group_set NOT IN (1) OR (group_set IN (1) AND airports.major='Y'))) GROUP BY 1,2,4,5 ) SELECT county__0 as county, ANY_VALUE(CASE WHEN group_set=0 THEN airport_count__0 END) as airport_count, ARRAY_AGG(CASE WHEN group_set=1 THEN STRUCT( name__1 as name ) END IGNORE NULLS ORDER BY name__1 asc ) as major_facilities, ARRAY_AGG(CASE WHEN group_set=2 THEN STRUCT( fac_type__2 as fac_type, airport_count__2 as airport_count ) END IGNORE NULLS ORDER BY airport_count__2 desc ) as by_facility FROM __stage0 GROUP BY 1 ORDER BY 2 desc
Dates and Timestamps
Working with time in data is often needlessly complex; Malloy has built in constructs to simplify many time-related operations. This section gives a brief introduction to some of these tools, but for more details see the Time Ranges section.
Time Literals
Literals of type date
and timestamp
are notated with an @
, e.g. @2003-03-29
or @1994-07-14 10:23:59
. Similarly, years (@2021
), quarters (@2020-Q1
), months (@2019-03
), weeks (@WK2021-08-01
), and minutes (@2017-01-01 10:53
) can be expressed.
Time literals can be used as values, but are more often useful in filters. For example, the following query shows the number of flights in 2003.
query: table('malloy-data.faa.flights') { where: dep_time ? @2003 } -> { aggregate: flight_count is count() }
flight_count |
---|
6,493,690 |
[ { "flight_count": 6493690 } ]
SELECT COUNT( 1) as flight_count FROM `malloy-data.faa.flights` as base WHERE (base.dep_time>=TIMESTAMP('2003-01-01', 'UTC'))and(base.dep_time<TIMESTAMP('2004-01-01', 'UTC')) ORDER BY 1 desc
There is a special time literal now
, referring to the current timestamp, which allows for relative time filters.
query: table('malloy-data.faa.flights') { where: dep_time > now - 6 hours } -> { aggregate: flights_last_6_hours is count() }
Truncation
Time values can be truncated to a given timeframe, which can be second
, minute
, hour
, day
, week
, month
, quarter
, or year
.
query: table('malloy-data.faa.flights') -> { group_by: flight_year is dep_time.year flight_month is dep_time.month aggregate: flight_count is count() }
flight_year | flight_month | flight_count |
---|---|---|
2005 | 2005-03 | 617,328 |
2005 | 2005-05 | 614,678 |
2005 | 2005-08 | 630,566 |
2005 | 2005-07 | 627,574 |
2005 | 2005-02 | 545,159 |
[ { "flight_year": { "value": "2005-01-01T00:00:00.000Z" }, "flight_month": { "value": "2005-03-01T00:00:00.000Z" }, "flight_count": 617328 }, { "flight_year": { "value": "2005-01-01T00:00:00.000Z" }, "flight_month": { "value": "2005-05-01T00:00:00.000Z" }, "flight_count": 614678 }, { "flight_year": { "value": "2005-01-01T00:00:00.000Z" }, "flight_month": { "value": "2005-08-01T00:00:00.000Z" }, "flight_count": 630566 }, { "flight_year": { "value": "2005-01-01T00:00:00.000Z" }, "flight_month": { "value": "2005-07-01T00:00:00.000Z" }, "flight_count": 627574 }, { "flight_year": { "value": "2005-01-01T00:00:00.000Z" }, "flight_month": { "value": "2005-02-01T00:00:00.000Z" }, "flight_count": 545159 } ]
SELECT TIMESTAMP_TRUNC(base.dep_time,year) as flight_year, TIMESTAMP_TRUNC(base.dep_time,month) as flight_month, COUNT( 1) as flight_count FROM `malloy-data.faa.flights` as base GROUP BY 1,2 ORDER BY 1 desc
Extraction
Numeric values can be extracted from time values, e.g. day_of_year(some_date)
or minute(some_time)
. See the full list of extraction functions here.
query: table('malloy-data.faa.flights') -> { order_by: 1 group_by: day_of_week is day(dep_time) aggregate: flight_count is count() }
day_of_week | flight_count |
---|---|
1 | 1,220,298 |
2 | 1,232,955 |
3 | 1,229,187 |
4 | 1,221,663 |
5 | 1,229,545 |
6 | 1,238,003 |
7 | 1,242,262 |
[ { "day_of_week": 1, "flight_count": 1220298 }, { "day_of_week": 2, "flight_count": 1232955 }, { "day_of_week": 3, "flight_count": 1229187 }, { "day_of_week": 4, "flight_count": 1221663 }, { "day_of_week": 5, "flight_count": 1229545 }, { "day_of_week": 6, "flight_count": 1238003 }, { "day_of_week": 7, "flight_count": 1242262 } ]
SELECT EXTRACT(day FROM base.dep_time) as day_of_week, COUNT( 1) as flight_count FROM `malloy-data.faa.flights` as base GROUP BY 1 ORDER BY 1 ASC
Time Ranges
Two kinds of time ranges are given special syntax: the range between two times and the range starting at some time for some duration. These are represented like @2003 to @2005
and @2004-Q1 for 6 quarters
respectively. These ranges can be used in filters just like time literals.
query: table('malloy-data.faa.flights') { where: dep_time ? @2003 to @2005 } -> { aggregate: flight_count is count() }
flight_count |
---|
13,621,381 |
[ { "flight_count": 13621381 } ]
SELECT COUNT( 1) as flight_count FROM `malloy-data.faa.flights` as base WHERE (base.dep_time>=TIMESTAMP('2003-01-01', 'UTC'))and(base.dep_time<TIMESTAMP('2005-01-01', 'UTC')) ORDER BY 1 desc
Time literals and truncations can also behave like time ranges. Each kind of time literal has an implied duration that takes effect when it is used in a comparison, e.g. @2003
represents the whole of the year 2003, and @2004-Q1
lasts the whole 3 months of the quarter. Similarly, when a time value is truncated, it takes on the
timeframe from the truncation, e.g. now.month
means the entirety of the current month.
When a time range is used in a comparison, =
checks for "is in the range", >
"is after", and <
"is before." So some_time > @2003
filters dates starting on January 1, 2004, while some_time = @2003
filters to dates in the year 2003.
query: table('malloy-data.faa.flights') { where: dep_time > @2003 } -> { top: 3; order_by: departure_date asc group_by: departure_date is dep_time.day aggregate: flight_count is count() }
departure_date | flight_count |
---|---|
2004-01-01 | 17,206 |
2004-01-02 | 19,053 |
2004-01-03 | 17,252 |
[ { "departure_date": { "value": "2004-01-01T00:00:00.000Z" }, "flight_count": 17206 }, { "departure_date": { "value": "2004-01-02T00:00:00.000Z" }, "flight_count": 19053 }, { "departure_date": { "value": "2004-01-03T00:00:00.000Z" }, "flight_count": 17252 } ]
SELECT TIMESTAMP_TRUNC(base.dep_time,day) as departure_date, COUNT( 1) as flight_count FROM `malloy-data.faa.flights` as base WHERE base.dep_time>=TIMESTAMP('2004-01-01', 'UTC') GROUP BY 1 ORDER BY 1 asc LIMIT 3
Pipelines and Multi-stage Queries
The output from one stage of a query can be passed into another stage using ->
. For example, we'll start with this query which outputs, for California and New York, the total number of airports, as well as the number of airports in each county.
query: airports -> { where: state = 'CA' | 'NY' group_by: state aggregate: airport_count nest: by_county is { group_by: county aggregate: airport_count } }
state | airport_count | by_county | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CA | 984 |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NY | 576 |
|
[ { "state": "CA", "airport_count": 984, "by_county": [ { "county": "LOS ANGELES", "airport_count": 176 }, { "county": "SAN BERNARDINO", "airport_count": 71 }, { "county": "ORANGE", "airport_count": 53 }, { "county": "KERN", "airport_count": 49 }, { "county": "SAN DIEGO", "airport_count": 49 }, { "county": "RIVERSIDE", "airport_count": 46 }, { "county": "FRESNO", "airport_count": 40 }, { "county": "SACRAMENTO", "airport_count": 24 }, { "county": "SANTA BARBARA", "airport_count": 22 }, { "county": "SAN JOAQUIN", "airport_count": 20 }, { "county": "VENTURA", "airport_count": 19 }, { "county": "SUTTER", "airport_count": 16 }, { "county": "TULARE", "airport_count": 16 }, { "county": "EL DORADO", "airport_count": 15 }, { "county": "KINGS", "airport_count": 15 }, { "county": "MERCED", "airport_count": 15 }, { "county": "ALAMEDA", "airport_count": 14 }, { "county": "SAN LUIS OBISPO", "airport_count": 14 }, { "county": "MONTEREY", "airport_count": 13 }, { "county": "SISKIYOU", "airport_count": 13 }, { "county": "MADERA", "airport_count": 12 }, { "county": "IMPERIAL", "airport_count": 12 }, { "county": "SONOMA", "airport_count": 12 }, { "county": "SHASTA", "airport_count": 11 }, { "county": "MODOC", "airport_count": 11 }, { "county": "BUTTE", "airport_count": 11 }, { "county": "STANISLAUS", "airport_count": 10 }, { "county": "NEVADA", "airport_count": 10 }, { "county": "SANTA CLARA", "airport_count": 10 }, { "county": "INYO", "airport_count": 9 }, { "county": "HUMBOLDT", "airport_count": 9 }, { "county": "TEHAMA", "airport_count": 9 }, { "county": "COLUSA", "airport_count": 9 }, { "county": "YOLO", "airport_count": 9 }, { "county": "CONTRA COSTA", "airport_count": 9 }, { "county": "SANTA CRUZ", "airport_count": 8 }, { "county": "PLUMAS", "airport_count": 8 }, { "county": "NAPA", "airport_count": 8 }, { "county": "SOLANO", "airport_count": 8 }, { "county": "PLACER", "airport_count": 8 }, { "county": "LAKE", "airport_count": 8 }, { "county": "MENDOCINO", "airport_count": 8 }, { "county": "TUOLUMNE", "airport_count": 7 }, { "county": "TRINITY", "airport_count": 7 }, { "county": "AMADOR", "airport_count": 7 }, { "county": "LASSEN", "airport_count": 6 }, { "county": "MARIN", "airport_count": 6 }, { "county": "MONO", "airport_count": 6 }, { "county": "CALAVERAS", "airport_count": 5 }, { "county": "YUBA", "airport_count": 5 }, { "county": "SAN MATEO", "airport_count": 4 }, { "county": "GLENN", "airport_count": 4 }, { "county": "SAN BENITO", "airport_count": 4 }, { "county": "DEL NORTE", "airport_count": 3 }, { "county": "MARIPOSA", "airport_count": 3 }, { "county": "SAN FRANCISCO", "airport_count": 3 }, { "county": "SIERRA", "airport_count": 2 }, { "county": "ALPINE", "airport_count": 2 }, { "county": "ESMERALDA", "airport_count": 1 } ] }, { "state": "NY", "airport_count": 576, "by_county": [ { "county": "SUFFOLK", "airport_count": 34 }, { "county": "ERIE", "airport_count": 26 }, { "county": "DUTCHESS", "airport_count": 20 }, { "county": "NIAGARA", "airport_count": 20 }, { "county": "ONEIDA", "airport_count": 18 }, { "county": "ONONDAGA", "airport_count": 18 }, { "county": "ULSTER", "airport_count": 17 }, { "county": "ORANGE", "airport_count": 16 }, { "county": "ONTARIO", "airport_count": 15 }, { "county": "NASSAU", "airport_count": 15 }, { "county": "CHAUTAUQUA", "airport_count": 14 }, { "county": "MONROE", "airport_count": 13 }, { "county": "DELAWARE", "airport_count": 13 }, { "county": "ALBANY", "airport_count": 13 }, { "county": "SARATOGA", "airport_count": 12 }, { "county": "MONTGOMERY", "airport_count": 12 }, { "county": "OSWEGO", "airport_count": 12 }, { "county": "CATTARAUGUS", "airport_count": 11 }, { "county": "WESTCHESTER", "airport_count": 11 }, { "county": "QUEENS", "airport_count": 11 }, { "county": "SULLIVAN", "airport_count": 10 }, { "county": "WAYNE", "airport_count": 10 }, { "county": "ESSEX", "airport_count": 10 }, { "county": "STEUBEN", "airport_count": 9 }, { "county": "SCHOHARIE", "airport_count": 9 }, { "county": "ORLEANS", "airport_count": 9 }, { "county": "WYOMING", "airport_count": 9 }, { "county": "JEFFERSON", "airport_count": 9 }, { "county": "COLUMBIA", "airport_count": 8 }, { "county": "LIVINGSTON", "airport_count": 8 }, { "county": "CAYUGA", "airport_count": 8 }, { "county": "FULTON", "airport_count": 8 }, { "county": "CHEMUNG", "airport_count": 8 }, { "county": "ST LAWRENCE", "airport_count": 8 }, { "county": "CLINTON", "airport_count": 8 }, { "county": "GREENE", "airport_count": 8 }, { "county": "TOMPKINS", "airport_count": 7 }, { "county": "SCHENECTADY", "airport_count": 7 }, { "county": "NEW YORK", "airport_count": 7 }, { "county": "SCHUYLER", "airport_count": 7 }, { "county": "RENSSELAER", "airport_count": 7 }, { "county": "HERKIMER", "airport_count": 7 }, { "county": "HAMILTON", "airport_count": 7 }, { "county": "MADISON", "airport_count": 7 }, { "county": "BROOME", "airport_count": 7 }, { "county": "GENESEE", "airport_count": 6 }, { "county": "OTSEGO", "airport_count": 6 }, { "county": "CHENANGO", "airport_count": 5 }, { "county": "ALLEGANY", "airport_count": 4 }, { "county": "YATES", "airport_count": 4 }, { "county": "WARREN", "airport_count": 4 }, { "county": "TIOGA", "airport_count": 4 }, { "county": "SENECA", "airport_count": 4 }, { "county": "WASHINGTON", "airport_count": 4 }, { "county": "ROCKLAND", "airport_count": 4 }, { "county": "CORTLAND", "airport_count": 2 }, { "county": "FRANKLIN", "airport_count": 2 }, { "county": "KINGS", "airport_count": 1 }, { "county": "BRONX", "airport_count": 1 }, { "county": "LEWIS", "airport_count": 1 }, { "county": "PUTNAM", "airport_count": 1 } ] } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set IN (0,1) THEN airports.state END as state__0, CASE WHEN group_set=0 THEN COUNT( 1) END as airport_count__0, CASE WHEN group_set=1 THEN airports.county END as county__1, CASE WHEN group_set=1 THEN COUNT( 1) END as airport_count__1 FROM `malloy-data.faa.airports` as airports CROSS JOIN (SELECT row_number() OVER() -1 group_set FROM UNNEST(GENERATE_ARRAY(0,1,1))) WHERE (airports.state='CA')or(airports.state='NY') GROUP BY 1,2,4 ) SELECT state__0 as state, ANY_VALUE(CASE WHEN group_set=0 THEN airport_count__0 END) as airport_count, ARRAY_AGG(CASE WHEN group_set=1 THEN STRUCT( county__1 as county, airport_count__1 as airport_count ) END IGNORE NULLS ORDER BY airport_count__1 desc ) as by_county FROM __stage0 GROUP BY 1 ORDER BY 2 desc
Next, we'll use the output of that query as the input to another, where we determine which counties have the highest percentage of airports compared to the whole state, taking advantage of the nested structure of the data to to so.
query: airports -> { where: state = 'CA' | 'NY' group_by: state aggregate: airport_count nest: by_county is { group_by: county aggregate: airport_count } } -> { top: 10; order_by: 4 desc project: by_county.county airports_in_county is by_county.airport_count airports_in_state is airport_count percent_in_county is by_county.airport_count / airport_count }
county | airports_in_county | airports_in_state | percent_in_county |
---|---|---|---|
LOS ANGELES | 176 | 984 | 17.89% |
SAN BERNARDINO | 71 | 984 | 7.22% |
SUFFOLK | 34 | 576 | 5.9% |
ORANGE | 53 | 984 | 5.39% |
KERN | 49 | 984 | 4.98% |
[ { "county": "LOS ANGELES", "airports_in_county": 176, "airports_in_state": 984, "percent_in_county": 0.17886178861788618 }, { "county": "SAN BERNARDINO", "airports_in_county": 71, "airports_in_state": 984, "percent_in_county": 0.07215447154471545 }, { "county": "SUFFOLK", "airports_in_county": 34, "airports_in_state": 576, "percent_in_county": 0.059027777777777776 }, { "county": "ORANGE", "airports_in_county": 53, "airports_in_state": 984, "percent_in_county": 0.05386178861788618 }, { "county": "KERN", "airports_in_county": 49, "airports_in_state": 984, "percent_in_county": 0.049796747967479675 } ]
WITH __stage0 AS ( SELECT group_set, CASE WHEN group_set IN (0,1) THEN airports.state END as state__0, CASE WHEN group_set=0 THEN COUNT( 1) END as airport_count__0, CASE WHEN group_set=1 THEN airports.county END as county__1, CASE WHEN group_set=1 THEN COUNT( 1) END as airport_count__1 FROM `malloy-data.faa.airports` as airports CROSS JOIN (SELECT row_number() OVER() -1 group_set FROM UNNEST(GENERATE_ARRAY(0,1,1))) WHERE (airports.state='CA')or(airports.state='NY') GROUP BY 1,2,4 ) , __stage1 AS ( SELECT state__0 as state, ANY_VALUE(CASE WHEN group_set=0 THEN airport_count__0 END) as airport_count, ARRAY_AGG(CASE WHEN group_set=1 THEN STRUCT( county__1 as county, airport_count__1 as airport_count ) END IGNORE NULLS ORDER BY airport_count__1 desc ) as by_county FROM __stage0 GROUP BY 1 ORDER BY 2 desc ) SELECT by_county_0.county as county, by_county_0.airport_count as airports_in_county, base.airport_count as airports_in_state, by_county_0.airport_count/base.airport_count as percent_in_county FROM __stage1 as base LEFT JOIN UNNEST(base.by_county) as by_county_0 ORDER BY 4 desc LIMIT 10
NOTE:: to pipeline a named query, the syntax to reference that named query is -> query_name
. An example of this can be found in the Query Doc.
Aggregate Locality
When computing sum
, avg
, and count
on fields in joined sources with one-to-many relationships, Malloy will automatically handle the duplication of rows that occurs in the join, and compute accurate aggregations on the fanned-out table. See the Aggregate Locality section for more information.
query: aircraft -> { aggregate: // The average number of seats on models of registered aircraft models_avg_seats is aircraft_models.seats.avg() // The average number of seats on registered aircraft aircraft_avg_seats is avg(aircraft_models.seats) }
models_avg_seats | aircraft_avg_seats |
---|---|
5.145 | 8.373 |
[ { "models_avg_seats": 5.145145145145145, "aircraft_avg_seats": 8.372722397134547 } ]
SELECT (CAST(( ( SUM(DISTINCT (CAST(ROUND(COALESCE(aircraft_models_0.seats,0)*(1*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(aircraft_models_0.aircraft_model_code AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) )/(1*1.0)) as FLOAT64))/NULLIF(COUNT(DISTINCT aircraft_models_0.aircraft_model_code),0) as models_avg_seats, AVG(aircraft_models_0.seats) as aircraft_avg_seats FROM `malloy-data.faa.aircraft` as aircraft LEFT JOIN `malloy-data.faa.aircraft_models` AS aircraft_models_0 ON aircraft_models_0.aircraft_model_code=aircraft.aircraft_model_code ORDER BY 1 desc
Comments
Malloy code can include both line and block comments. Line comments, which begin with --
or //
,
may appear anywhere within a line, and cause all subsequent characters on that line to be ignored.
Block comments, which are enclosed between /*
and */
, cause all enclosed characters to be ignored
and may span multiple lines.
-- The total number of flight entries query: flights -> { aggregate: flight_count // Defined simply as `count()` } /* * A comparison of the total number of flights * for each of the tracked carriers. */ query: flights -> { group_by: carrier aggregate: flight_count /* , total_distance */ }
Ordering and Limiting
In Malloy, ordering and limiting work pretty much the same way they do in SQL, though Malloy introduces some reasonable defaults.
The top:
and limit:
statements are synonyms and limits the number of rows returned. Results below are sorted by the first measure descending--in this case, airport_count
.
query: table('malloy-data.faa.airports') -> { top: 2 group_by: state aggregate: airport_count is count() }
state | airport_count |
---|---|
TX | 1,845 |
CA | 984 |
[ { "state": "TX", "airport_count": 1845 }, { "state": "CA", "airport_count": 984 } ]
SELECT base.state as state, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as base GROUP BY 1 ORDER BY 2 desc LIMIT 2
Default ordering can be overridden with order_by:
, as in the following query, which shows the states in alphabetical order. order_by:
can take a field index number or the name of a field.
query: table('malloy-data.faa.airports') -> { order_by: state group_by: state aggregate: airport_count is count() }
state | airport_count |
---|---|
null | 23 |
AK | 608 |
AL | 260 |
AR | 299 |
AS | 4 |
[ { "state": null, "airport_count": 23 }, { "state": "AK", "airport_count": 608 }, { "state": "AL", "airport_count": 260 }, { "state": "AR", "airport_count": 299 }, { "state": "AS", "airport_count": 4 } ]
SELECT base.state as state, COUNT( 1) as airport_count FROM `malloy-data.faa.airports` as base GROUP BY 1 ORDER BY 1 ASC
Next Steps
This was a whirlwind tour of the syntax and features of Malloy. To continue on your Malloy journey:
Explore sample analyses and data models built in Malloy in our Patterns Github repo.
Learn how to connect Malloy to your own database.
Take a look at our guide for translating SQL to Malloy.
Join the Malloy community Slack channel!