Malloy Documentation
search

This document will assumes a working knowledge of SQL and will rapidly take you through some of Malloy's key language features.

Using this Guide

For every Malloy Query you can see the formatted result, or raw result as JSON, or the SQL used to produce the result.

Click tab to to see the HTML, JSON or SQL result: 👈👈

SQL SELECT vs Malloy's run:

The statement to run a query in Malloy is run:. There are two types of queries in Malloy, reductions which have group_by: or aggregate: statements, and projections which have select: statements and do not group or aggregate results.

Projection: SELECT with no GROUP BY

In SQL

SELECT code, full_name, state, faa_region, fac_type, elevation
FROM `malloy-data.faa.airports`
ORDER BY code

Equivalent in Malloy

document
run: duckdb.table('../data/airports.parquet') -> {
  select: code, full_name, state, faa_region, fac_type, elevation
  order_by: code
}
QUERY RESULTS
codefull_​namestatefaa_​regionfac_​typeelevation
00ATOTAL RFPAAEAHELIPORT11
00CANIMAS AIR PARKCOANMAIRPORT6,684
00CAGOLDSTONE /GTS/CAAWPAIRPORT3,038
00EAT&T - APACHE JUNCTIONAZAWPHELIPORT2,527
00FTCJC-NORTHEAST CAMPUSTXASWHELIPORT600
[
  {
    "code": "00A",
    "full_name": "TOTAL RF",
    "state": "PA",
    "faa_region": "AEA",
    "fac_type": "HELIPORT",
    "elevation": 11
  },
  {
    "code": "00C",
    "full_name": "ANIMAS AIR PARK",
    "state": "CO",
    "faa_region": "ANM",
    "fac_type": "AIRPORT",
    "elevation": 6684
  },
  {
    "code": "00CA",
    "full_name": "GOLDSTONE /GTS/",
    "state": "CA",
    "faa_region": "AWP",
    "fac_type": "AIRPORT",
    "elevation": 3038
  },
  {
    "code": "00E",
    "full_name": "AT&T - APACHE JUNCTION",
    "state": "AZ",
    "faa_region": "AWP",
    "fac_type": "HELIPORT",
    "elevation": 2527
  },
  {
    "code": "00F",
    "full_name": "TCJC-NORTHEAST CAMPUS",
    "state": "TX",
    "faa_region": "ASW",
    "fac_type": "HELIPORT",
    "elevation": 600
  }
]
SELECT 
   base."code" as "code",
   base."full_name" as "full_name",
   base."state" as "state",
   base."faa_region" as "faa_region",
   base."fac_type" as "fac_type",
   base."elevation" as "elevation"
FROM '../data/airports.parquet' as base
ORDER BY 1 ASC NULLS LAST

Reduction: SELECT with GROUP BY and/or aggregation

In SQL

SELECT
  base.fac_type as fac_type,
  COUNT( 1) as airport_count
FROM `malloy-data.faa.airports` as base
WHERE base.state='CA'
GROUP BY 1
ORDER BY 2 desc

Equivalent in Malloy

document
run: duckdb.table('../data/airports.parquet') -> {
  group_by: fac_type
  aggregate: airport_count is count()
  where: state = 'CA'
  order_by: airport_count desc
}
QUERY RESULTS
fac_​typeairport_​count
AIRPORT569
HELIPORT396
SEAPLANE BASE12
GLIDERPORT3
STOLPORT2
[
  {
    "fac_type": "AIRPORT",
    "airport_count": 569
  },
  {
    "fac_type": "HELIPORT",
    "airport_count": 396
  },
  {
    "fac_type": "SEAPLANE BASE",
    "airport_count": 12
  },
  {
    "fac_type": "GLIDERPORT",
    "airport_count": 3
  },
  {
    "fac_type": "STOLPORT",
    "airport_count": 2
  }
]
SELECT 
   base."fac_type" as "fac_type",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as base
WHERE base."state"='CA'
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Source: A data source for queries

Malloy separates a query's view from the source of the data. A source can be thought of as a table and a collection of computations and relationships which are relevant to that table. (Source Documentation).

Fields can be defined as part of a source.

  • A measure: is a declared aggregate calculation (think function that operates across the table) which can be used in aggregate: elements in a query stage

  • A dimension: is a declared scalar calculation which that can be used in group_by: or select: elements of a query stage

document
source: airports is duckdb.table('../data/airports.parquet') extend {
  dimension: elevation_in_meters is elevation * 0.3048
  dimension: state_and_county is concat(state,' - ', county)
  measure: airport_count is count()
  measure: avg_elevation_in_meters is elevation_in_meters.avg()
}

Querying Against a Source

Queries can be run against source: objects and can utilize the modeled fields from that source, as well as introduce new ones. (Query Documentation)

using the above declared airports source

document
run: airports -> {
  limit: 10
  where: fac_type = 'HELIPORT'
  group_by: state
  aggregate:
    airport_count           // <-- declared in source
    avg_elevation_in_meters // <-- declared in source
}
QUERY RESULTS
stateairport_​countavg_​elevation_​in_​meters
TX435138.687
CA396276.402
PA307216.255
FL28018.101
NJ24748.15
[
  {
    "state": "TX",
    "airport_count": 435,
    "avg_elevation_in_meters": 138.6868027586207
  },
  {
    "state": "CA",
    "airport_count": 396,
    "avg_elevation_in_meters": 276.4020303030303
  },
  {
    "state": "PA",
    "airport_count": 307,
    "avg_elevation_in_meters": 216.2551035830619
  },
  {
    "state": "FL",
    "airport_count": 280,
    "avg_elevation_in_meters": 18.100765714285714
  },
  {
    "state": "NJ",
    "airport_count": 247,
    "avg_elevation_in_meters": 48.14976194331984
  }
]
SELECT 
   airports."state" as "state",
   COUNT( 1) as "airport_count",
   AVG((airports."elevation"*0.3048)) as "avg_elevation_in_meters"
FROM '../data/airports.parquet' as airports
WHERE airports."fac_type"='HELIPORT'
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 10

Dimensional calculations are no different from columns

using the above declared airports source

document
run: airports -> {
  group_by: state_and_county // <-- declared in source
  aggregate: airport_count
  order_by: 1 desc
}
QUERY RESULTS
state_​and_​countyairport_​count
WY - WESTON3
WY - WASHAKIE4
WY - UINTA3
WY - TETON7
WY - SWEETWATER4
[
  {
    "state_and_county": "WY - WESTON",
    "airport_count": 3
  },
  {
    "state_and_county": "WY - WASHAKIE",
    "airport_count": 4
  },
  {
    "state_and_county": "WY - UINTA",
    "airport_count": 3
  },
  {
    "state_and_county": "WY - TETON",
    "airport_count": 7
  },
  {
    "state_and_county": "WY - SWEETWATER",
    "airport_count": 4
  }
]
SELECT 
   CONCAT(airports."state",' - ',airports."county") as "state_and_county",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports
GROUP BY 1
ORDER BY 1 desc NULLS LAST

Defining Views in a Source

A source can also contain a set of useful views relating to that source, which can be run in queries.

using the above declared airports source

document
source: airports2 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()

  view: by_state is {        // <-- can be called by name
    group_by: state
    aggregate: airport_count
  }
}

Executing Views

The simplest form of a query in Malloy is the name of a source, the query operator ->, and the name of one of its contained views.

using the above declared airports source

document
run: airports2 -> by_state
QUERY RESULTS
stateairport_​count
TX1,845
CA984
IL890
FL856
PA804
[
  {
    "state": "TX",
    "airport_count": 1845
  },
  {
    "state": "CA",
    "airport_count": 984
  },
  {
    "state": "IL",
    "airport_count": 890
  },
  {
    "state": "FL",
    "airport_count": 856
  },
  {
    "state": "PA",
    "airport_count": 804
  }
]
SELECT 
   airports2."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports2
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Filtering a Source

You can filter a source by adding a filter expression using the where: keyword in an extension. In this example, we filter an existing airports source and then use this extended version to execute the by_state view in a query. For more information on filtering, see the Filters section.

document
run: airports2 extend {
  where: fac_type = 'SEAPLANE BASE'   // <- run the query with an added filter
}
-> by_state
QUERY RESULTS
stateairport_​count
AK104
MN72
FL43
ME38
NY23
[
  {
    "state": "AK",
    "airport_count": 104
  },
  {
    "state": "MN",
    "airport_count": 72
  },
  {
    "state": "FL",
    "airport_count": 43
  },
  {
    "state": "ME",
    "airport_count": 38
  },
  {
    "state": "NY",
    "airport_count": 23
  }
]
SELECT 
   airports2."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports2
WHERE airports2."fac_type"='SEAPLANE BASE'
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Filtering Measures

The input to an aggregate computation can be filtered.

using the above declared airports source

document
run: airports -> {
  group_by: state
  aggregate: airport_count
  aggregate: heliport_count is airport_count { where: fac_type = 'HELIPORT' } // <-- add a filter
}
QUERY RESULTS
stateairport_​countheliport_​count
TX1,845435
CA984396
IL890245
FL856280
PA804307
[
  {
    "state": "TX",
    "airport_count": 1845,
    "heliport_count": 435
  },
  {
    "state": "CA",
    "airport_count": 984,
    "heliport_count": 396
  },
  {
    "state": "IL",
    "airport_count": 890,
    "heliport_count": 245
  },
  {
    "state": "FL",
    "airport_count": 856,
    "heliport_count": 280
  },
  {
    "state": "PA",
    "airport_count": 804,
    "heliport_count": 307
  }
]
SELECT 
   airports."state" as "state",
   COUNT( 1) as "airport_count",
   (COUNT( CASE WHEN airports."fac_type"='HELIPORT' THEN 1 END)) as "heliport_count"
FROM '../data/airports.parquet' as airports
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Composing with Views

For the next section assume the following source declaration.

document
source: airports3 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
  measure: avg_elevation is elevation.avg()

  view: top_5_states is {
    group_by: state
    aggregate: airport_count
    limit: 5
  }

  view: by_facility_type is {
    group_by: fac_type
    aggregate: airport_count
  }
}

The nest: property embeds one view in another

Malloy allows you to create nested subtables easily in a query. In the case below, the top level view groups by state and the nested view groups by facility type. This mechanism is really useful for understanding data and creating complex data structures. (Nesting Documentation)

using the above declared airports source

document
run: airports3 -> {
  group_by: state
  aggregate: airport_count
  limit: 5
  nest: by_facility_type is  {
    group_by: fac_type
    aggregate: airport_count
  }
}
QUERY RESULTS
stateairport_​countby_​facility_​type
TX1,845
fac_​typeairport_​count
AIRPORT1,389
HELIPORT435
ULTRALIGHT8
STOLPORT8
GLIDERPORT5
CA984
fac_​typeairport_​count
AIRPORT569
HELIPORT396
SEAPLANE BASE12
GLIDERPORT3
STOLPORT2
ULTRALIGHT2
IL890
fac_​typeairport_​count
AIRPORT625
HELIPORT245
SEAPLANE BASE8
ULTRALIGHT6
BALLOONPORT2
STOLPORT2
GLIDERPORT2
FL856
fac_​typeairport_​count
AIRPORT511
HELIPORT280
SEAPLANE BASE43
STOLPORT13
ULTRALIGHT5
GLIDERPORT4
PA804
fac_​typeairport_​count
AIRPORT468
HELIPORT307
ULTRALIGHT13
SEAPLANE BASE10
GLIDERPORT3
STOLPORT3
[
  {
    "state": "TX",
    "airport_count": 1845,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1389
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 435
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 5
      }
    ]
  },
  {
    "state": "CA",
    "airport_count": 984,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 569
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 396
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 12
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "IL",
    "airport_count": 890,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 625
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 245
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 8
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 6
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 2
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "FL",
    "airport_count": 856,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 511
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 280
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 43
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 5
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "state": "PA",
    "airport_count": 804,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 468
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 307
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 13
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 10
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 3
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."fac_type"
      END as "fac_type__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "state__0" as "state",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "fac_type": "fac_type__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_facility_type"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 5

Queries can contain multiple nested views.

using the above declared airports source

document
run: airports3 -> {
  group_by: faa_region
  aggregate: airport_count
  nest: top_5_states
  nest: by_facility_type
}
QUERY RESULTS
faa_​regionairport_​counttop_​5_​statesby_​facility_​type
AGL4,437
stateairport_​count
IL890
OH749
IN643
WI543
MN507
fac_​typeairport_​count
AIRPORT3,443
HELIPORT826
SEAPLANE BASE119
ULTRALIGHT30
STOLPORT11
GLIDERPORT4
BALLOONPORT4
ASW3,268
stateairport_​count
TX1,845
LA500
OK443
AR299
NM181
fac_​typeairport_​count
AIRPORT2,341
HELIPORT861
ULTRALIGHT32
SEAPLANE BASE19
STOLPORT9
GLIDERPORT6
ASO2,924
stateairport_​count
FL856
GA440
NC400
TN285
AL260
fac_​typeairport_​count
AIRPORT2,038
HELIPORT770
SEAPLANE BASE57
STOLPORT33
ULTRALIGHT17
GLIDERPORT8
BALLOONPORT1
AEA2,586
stateairport_​count
PA804
NY576
VA421
NJ378
MD229
fac_​typeairport_​count
AIRPORT1,525
HELIPORT964
SEAPLANE BASE61
ULTRALIGHT18
STOLPORT8
GLIDERPORT7
BALLOONPORT3
ANM2,102
stateairport_​count
WA484
OR441
CO425
MT259
ID238
fac_​typeairport_​count
AIRPORT1,524
HELIPORT527
SEAPLANE BASE25
STOLPORT13
ULTRALIGHT8
GLIDERPORT4
BALLOONPORT1
[
  {
    "faa_region": "AGL",
    "airport_count": 4437,
    "top_5_states": [
      {
        "state": "IL",
        "airport_count": 890
      },
      {
        "state": "OH",
        "airport_count": 749
      },
      {
        "state": "IN",
        "airport_count": 643
      },
      {
        "state": "WI",
        "airport_count": 543
      },
      {
        "state": "MN",
        "airport_count": 507
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 3443
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 826
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 119
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 30
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 11
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "faa_region": "ASW",
    "airport_count": 3268,
    "top_5_states": [
      {
        "state": "TX",
        "airport_count": 1845
      },
      {
        "state": "LA",
        "airport_count": 500
      },
      {
        "state": "OK",
        "airport_count": 443
      },
      {
        "state": "AR",
        "airport_count": 299
      },
      {
        "state": "NM",
        "airport_count": 181
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2341
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 861
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 32
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 19
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 9
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 6
      }
    ]
  },
  {
    "faa_region": "ASO",
    "airport_count": 2924,
    "top_5_states": [
      {
        "state": "FL",
        "airport_count": 856
      },
      {
        "state": "GA",
        "airport_count": 440
      },
      {
        "state": "NC",
        "airport_count": 400
      },
      {
        "state": "TN",
        "airport_count": 285
      },
      {
        "state": "AL",
        "airport_count": 260
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2038
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 770
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 57
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 33
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 17
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 8
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  },
  {
    "faa_region": "AEA",
    "airport_count": 2586,
    "top_5_states": [
      {
        "state": "PA",
        "airport_count": 804
      },
      {
        "state": "NY",
        "airport_count": 576
      },
      {
        "state": "VA",
        "airport_count": 421
      },
      {
        "state": "NJ",
        "airport_count": 378
      },
      {
        "state": "MD",
        "airport_count": 229
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1525
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 964
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 61
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 18
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 7
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 3
      }
    ]
  },
  {
    "faa_region": "ANM",
    "airport_count": 2102,
    "top_5_states": [
      {
        "state": "WA",
        "airport_count": 484
      },
      {
        "state": "OR",
        "airport_count": 441
      },
      {
        "state": "CO",
        "airport_count": 425
      },
      {
        "state": "MT",
        "airport_count": 259
      },
      {
        "state": "ID",
        "airport_count": 238
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1524
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 527
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 25
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."faa_region" as "faa_region__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."state"
      END as "state__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1",
    CASE WHEN group_set=2 THEN
      airports3."fac_type"
      END as "fac_type__2",
    CASE WHEN group_set=2 THEN
      COUNT( 1)
      END as "airport_count__2"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,2,1)) as group_set  ) as group_set
  GROUP BY 1,2,4,6
)
SELECT
  "faa_region__0" as "faa_region",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "state": "state__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:5],[]) as "top_5_states",
  COALESCE(LIST({
    "fac_type": "fac_type__2", 
    "airport_count": "airport_count__2"}  ORDER BY  "airport_count__2" desc NULLS LAST) FILTER (WHERE group_set=2),[]) as "by_facility_type"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Views can be nested to any level of depth.

using the above declared airports source

document
run: airports3 -> {
  group_by: faa_region
  aggregate: airport_count
  nest: by_state_and_county is  {
    group_by: state
    aggregate: airport_count
    nest: by_county is  {
      group_by: county
      aggregate: airport_count
      limit: 4
    }
  }
  nest: by_facility_type
}
QUERY RESULTS
faa_​regionairport_​countby_​state_​and_​countyby_​facility_​type
AGL4,437
stateairport_​countby_​county
IL890
countyairport_​count
COOK51
LA SALLE39
MC HENRY29
DE KALB27
OH749
countyairport_​count
CUYAHOGA27
FRANKLIN27
STARK23
MONTGOMERY22
IN643
countyairport_​count
MARION27
ALLEN24
HAMILTON20
MARSHALL18
WI543
countyairport_​count
DANE30
WALWORTH22
WINNEBAGO17
KENOSHA16
MN507
countyairport_​count
ST LOUIS28
HENNEPIN23
DAKOTA17
CROW WING17
MI489
countyairport_​count
OAKLAND25
KENT24
WAYNE17
KALAMAZOO15
ND436
countyairport_​count
CASS39
RICHLAND20
MC LEAN20
WARD19
SD180
countyairport_​count
MEADE7
PENNINGTON7
MINNEHAHA7
BRULE6
fac_​typeairport_​count
AIRPORT3,443
HELIPORT826
SEAPLANE BASE119
ULTRALIGHT30
STOLPORT11
GLIDERPORT4
BALLOONPORT4
ASW3,268
stateairport_​countby_​county
TX1,845
countyairport_​count
HARRIS135
TARRANT63
DENTON53
DALLAS42
LA500
countyairport_​count
PLAQUEMINES31
VERMILION29
CALCASIEU23
LAFOURCHE21
OK443
countyairport_​count
OKLAHOMA31
TULSA25
ROGERS16
DELAWARE13
AR299
countyairport_​count
PULASKI20
BENTON19
LONOKE13
MISSISSIPPI10
NM181
countyairport_​count
CATRON13
LINCOLN10
DONA ANA9
SANTA FE9
fac_​typeairport_​count
AIRPORT2,341
HELIPORT861
ULTRALIGHT32
SEAPLANE BASE19
STOLPORT9
GLIDERPORT6
ASO2,924
stateairport_​countby_​county
FL856
countyairport_​count
PALM BEACH45
DADE44
POLK43
MARION37
GA440
countyairport_​count
FULTON22
PIKE17
CARROLL14
FAYETTE12
NC400
countyairport_​count
WAKE15
MECKLENBURG14
ROWAN14
UNION14
TN285
countyairport_​count
SHELBY24
DAVIDSON17
KNOX14
RUTHERFORD10
AL260
countyairport_​count
BALDWIN25
JEFFERSON19
MOBILE17
MADISON14
MS243
countyairport_​count
WASHINGTON15
HARRISON12
HINDS11
BOLIVAR9
KY202
countyairport_​count
JEFFERSON13
FAYETTE7
BOONE6
MC LEAN6
SC189
countyairport_​count
GREENVILLE13
BEAUFORT9
LEXINGTON9
CHARLESTON9
PR40
countyairport_​count
--PUERTO RICO40
VI9
countyairport_​count
-VIRGIN ISLANDS-9
fac_​typeairport_​count
AIRPORT2,038
HELIPORT770
SEAPLANE BASE57
STOLPORT33
ULTRALIGHT17
GLIDERPORT8
BALLOONPORT1
AEA2,586
stateairport_​countby_​county
PA804
countyairport_​count
BUCKS55
MONTGOMERY44
ALLEGHENY31
CHESTER27
NY576
countyairport_​count
SUFFOLK34
ERIE26
DUTCHESS20
NIAGARA20
VA421
countyairport_​count
FAUQUIER23
SHENANDOAH13
ACCOMACK12
FAIRFAX12
NJ378
countyairport_​count
MONMOUTH31
MIDDLESEX29
BURLINGTON29
HUNTERDON27
MD229
countyairport_​count
BALTIMORE24
PRINCE GEORGES14
ANNE ARUNDEL14
CARROLL14
WV116
countyairport_​count
KANAWHA8
MASON7
PRESTON6
RALEIGH6
DE42
countyairport_​count
KENT17
SUSSEX14
NEW CASTLE11
DC20
countyairport_​count
WASHINGTON18
ARLINGTON1
LOUDOUN1
fac_​typeairport_​count
AIRPORT1,525
HELIPORT964
SEAPLANE BASE61
ULTRALIGHT18
STOLPORT8
GLIDERPORT7
BALLOONPORT3
ANM2,102
stateairport_​countby_​county
WA484
countyairport_​count
KING61
PIERCE27
SPOKANE26
SNOHOMISH25
OR441
countyairport_​count
CLACKAMAS34
LINN26
WASHINGTON24
LANE24
CO425
countyairport_​count
WELD40
EL PASO26
ADAMS23
JEFFERSON21
MT259
countyairport_​count
FLATHEAD20
LEWIS AND CLARK16
MISSOULA14
GALLATIN11
ID238
countyairport_​count
VALLEY27
KOOTENAI21
IDAHO18
BONNER18
UT140
countyairport_​count
SALT LAKE21
SAN JUAN13
UTAH10
GRAND9
WY115
countyairport_​count
LARAMIE12
PARK9
CAMPBELL9
CARBON9
fac_​typeairport_​count
AIRPORT1,524
HELIPORT527
SEAPLANE BASE25
STOLPORT13
ULTRALIGHT8
GLIDERPORT4
BALLOONPORT1
[
  {
    "faa_region": "AGL",
    "airport_count": 4437,
    "by_state_and_county": [
      {
        "state": "IL",
        "airport_count": 890,
        "by_county": [
          {
            "county": "COOK",
            "airport_count": 51
          },
          {
            "county": "LA SALLE",
            "airport_count": 39
          },
          {
            "county": "MC HENRY",
            "airport_count": 29
          },
          {
            "county": "DE KALB",
            "airport_count": 27
          }
        ]
      },
      {
        "state": "OH",
        "airport_count": 749,
        "by_county": [
          {
            "county": "CUYAHOGA",
            "airport_count": 27
          },
          {
            "county": "FRANKLIN",
            "airport_count": 27
          },
          {
            "county": "STARK",
            "airport_count": 23
          },
          {
            "county": "MONTGOMERY",
            "airport_count": 22
          }
        ]
      },
      {
        "state": "IN",
        "airport_count": 643,
        "by_county": [
          {
            "county": "MARION",
            "airport_count": 27
          },
          {
            "county": "ALLEN",
            "airport_count": 24
          },
          {
            "county": "HAMILTON",
            "airport_count": 20
          },
          {
            "county": "MARSHALL",
            "airport_count": 18
          }
        ]
      },
      {
        "state": "WI",
        "airport_count": 543,
        "by_county": [
          {
            "county": "DANE",
            "airport_count": 30
          },
          {
            "county": "WALWORTH",
            "airport_count": 22
          },
          {
            "county": "WINNEBAGO",
            "airport_count": 17
          },
          {
            "county": "KENOSHA",
            "airport_count": 16
          }
        ]
      },
      {
        "state": "MN",
        "airport_count": 507,
        "by_county": [
          {
            "county": "ST LOUIS",
            "airport_count": 28
          },
          {
            "county": "HENNEPIN",
            "airport_count": 23
          },
          {
            "county": "DAKOTA",
            "airport_count": 17
          },
          {
            "county": "CROW WING",
            "airport_count": 17
          }
        ]
      },
      {
        "state": "MI",
        "airport_count": 489,
        "by_county": [
          {
            "county": "OAKLAND",
            "airport_count": 25
          },
          {
            "county": "KENT",
            "airport_count": 24
          },
          {
            "county": "WAYNE",
            "airport_count": 17
          },
          {
            "county": "KALAMAZOO",
            "airport_count": 15
          }
        ]
      },
      {
        "state": "ND",
        "airport_count": 436,
        "by_county": [
          {
            "county": "CASS",
            "airport_count": 39
          },
          {
            "county": "RICHLAND",
            "airport_count": 20
          },
          {
            "county": "MC LEAN",
            "airport_count": 20
          },
          {
            "county": "WARD",
            "airport_count": 19
          }
        ]
      },
      {
        "state": "SD",
        "airport_count": 180,
        "by_county": [
          {
            "county": "MEADE",
            "airport_count": 7
          },
          {
            "county": "PENNINGTON",
            "airport_count": 7
          },
          {
            "county": "MINNEHAHA",
            "airport_count": 7
          },
          {
            "county": "BRULE",
            "airport_count": 6
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 3443
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 826
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 119
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 30
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 11
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "faa_region": "ASW",
    "airport_count": 3268,
    "by_state_and_county": [
      {
        "state": "TX",
        "airport_count": 1845,
        "by_county": [
          {
            "county": "HARRIS",
            "airport_count": 135
          },
          {
            "county": "TARRANT",
            "airport_count": 63
          },
          {
            "county": "DENTON",
            "airport_count": 53
          },
          {
            "county": "DALLAS",
            "airport_count": 42
          }
        ]
      },
      {
        "state": "LA",
        "airport_count": 500,
        "by_county": [
          {
            "county": "PLAQUEMINES",
            "airport_count": 31
          },
          {
            "county": "VERMILION",
            "airport_count": 29
          },
          {
            "county": "CALCASIEU",
            "airport_count": 23
          },
          {
            "county": "LAFOURCHE",
            "airport_count": 21
          }
        ]
      },
      {
        "state": "OK",
        "airport_count": 443,
        "by_county": [
          {
            "county": "OKLAHOMA",
            "airport_count": 31
          },
          {
            "county": "TULSA",
            "airport_count": 25
          },
          {
            "county": "ROGERS",
            "airport_count": 16
          },
          {
            "county": "DELAWARE",
            "airport_count": 13
          }
        ]
      },
      {
        "state": "AR",
        "airport_count": 299,
        "by_county": [
          {
            "county": "PULASKI",
            "airport_count": 20
          },
          {
            "county": "BENTON",
            "airport_count": 19
          },
          {
            "county": "LONOKE",
            "airport_count": 13
          },
          {
            "county": "MISSISSIPPI",
            "airport_count": 10
          }
        ]
      },
      {
        "state": "NM",
        "airport_count": 181,
        "by_county": [
          {
            "county": "CATRON",
            "airport_count": 13
          },
          {
            "county": "LINCOLN",
            "airport_count": 10
          },
          {
            "county": "DONA ANA",
            "airport_count": 9
          },
          {
            "county": "SANTA FE",
            "airport_count": 9
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2341
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 861
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 32
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 19
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 9
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 6
      }
    ]
  },
  {
    "faa_region": "ASO",
    "airport_count": 2924,
    "by_state_and_county": [
      {
        "state": "FL",
        "airport_count": 856,
        "by_county": [
          {
            "county": "PALM BEACH",
            "airport_count": 45
          },
          {
            "county": "DADE",
            "airport_count": 44
          },
          {
            "county": "POLK",
            "airport_count": 43
          },
          {
            "county": "MARION",
            "airport_count": 37
          }
        ]
      },
      {
        "state": "GA",
        "airport_count": 440,
        "by_county": [
          {
            "county": "FULTON",
            "airport_count": 22
          },
          {
            "county": "PIKE",
            "airport_count": 17
          },
          {
            "county": "CARROLL",
            "airport_count": 14
          },
          {
            "county": "FAYETTE",
            "airport_count": 12
          }
        ]
      },
      {
        "state": "NC",
        "airport_count": 400,
        "by_county": [
          {
            "county": "WAKE",
            "airport_count": 15
          },
          {
            "county": "MECKLENBURG",
            "airport_count": 14
          },
          {
            "county": "ROWAN",
            "airport_count": 14
          },
          {
            "county": "UNION",
            "airport_count": 14
          }
        ]
      },
      {
        "state": "TN",
        "airport_count": 285,
        "by_county": [
          {
            "county": "SHELBY",
            "airport_count": 24
          },
          {
            "county": "DAVIDSON",
            "airport_count": 17
          },
          {
            "county": "KNOX",
            "airport_count": 14
          },
          {
            "county": "RUTHERFORD",
            "airport_count": 10
          }
        ]
      },
      {
        "state": "AL",
        "airport_count": 260,
        "by_county": [
          {
            "county": "BALDWIN",
            "airport_count": 25
          },
          {
            "county": "JEFFERSON",
            "airport_count": 19
          },
          {
            "county": "MOBILE",
            "airport_count": 17
          },
          {
            "county": "MADISON",
            "airport_count": 14
          }
        ]
      },
      {
        "state": "MS",
        "airport_count": 243,
        "by_county": [
          {
            "county": "WASHINGTON",
            "airport_count": 15
          },
          {
            "county": "HARRISON",
            "airport_count": 12
          },
          {
            "county": "HINDS",
            "airport_count": 11
          },
          {
            "county": "BOLIVAR",
            "airport_count": 9
          }
        ]
      },
      {
        "state": "KY",
        "airport_count": 202,
        "by_county": [
          {
            "county": "JEFFERSON",
            "airport_count": 13
          },
          {
            "county": "FAYETTE",
            "airport_count": 7
          },
          {
            "county": "BOONE",
            "airport_count": 6
          },
          {
            "county": "MC LEAN",
            "airport_count": 6
          }
        ]
      },
      {
        "state": "SC",
        "airport_count": 189,
        "by_county": [
          {
            "county": "GREENVILLE",
            "airport_count": 13
          },
          {
            "county": "BEAUFORT",
            "airport_count": 9
          },
          {
            "county": "LEXINGTON",
            "airport_count": 9
          },
          {
            "county": "CHARLESTON",
            "airport_count": 9
          }
        ]
      },
      {
        "state": "PR",
        "airport_count": 40,
        "by_county": [
          {
            "county": "--PUERTO RICO",
            "airport_count": 40
          }
        ]
      },
      {
        "state": "VI",
        "airport_count": 9,
        "by_county": [
          {
            "county": "-VIRGIN ISLANDS-",
            "airport_count": 9
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2038
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 770
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 57
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 33
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 17
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 8
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  },
  {
    "faa_region": "AEA",
    "airport_count": 2586,
    "by_state_and_county": [
      {
        "state": "PA",
        "airport_count": 804,
        "by_county": [
          {
            "county": "BUCKS",
            "airport_count": 55
          },
          {
            "county": "MONTGOMERY",
            "airport_count": 44
          },
          {
            "county": "ALLEGHENY",
            "airport_count": 31
          },
          {
            "county": "CHESTER",
            "airport_count": 27
          }
        ]
      },
      {
        "state": "NY",
        "airport_count": 576,
        "by_county": [
          {
            "county": "SUFFOLK",
            "airport_count": 34
          },
          {
            "county": "ERIE",
            "airport_count": 26
          },
          {
            "county": "DUTCHESS",
            "airport_count": 20
          },
          {
            "county": "NIAGARA",
            "airport_count": 20
          }
        ]
      },
      {
        "state": "VA",
        "airport_count": 421,
        "by_county": [
          {
            "county": "FAUQUIER",
            "airport_count": 23
          },
          {
            "county": "SHENANDOAH",
            "airport_count": 13
          },
          {
            "county": "ACCOMACK",
            "airport_count": 12
          },
          {
            "county": "FAIRFAX",
            "airport_count": 12
          }
        ]
      },
      {
        "state": "NJ",
        "airport_count": 378,
        "by_county": [
          {
            "county": "MONMOUTH",
            "airport_count": 31
          },
          {
            "county": "MIDDLESEX",
            "airport_count": 29
          },
          {
            "county": "BURLINGTON",
            "airport_count": 29
          },
          {
            "county": "HUNTERDON",
            "airport_count": 27
          }
        ]
      },
      {
        "state": "MD",
        "airport_count": 229,
        "by_county": [
          {
            "county": "BALTIMORE",
            "airport_count": 24
          },
          {
            "county": "PRINCE GEORGES",
            "airport_count": 14
          },
          {
            "county": "ANNE ARUNDEL",
            "airport_count": 14
          },
          {
            "county": "CARROLL",
            "airport_count": 14
          }
        ]
      },
      {
        "state": "WV",
        "airport_count": 116,
        "by_county": [
          {
            "county": "KANAWHA",
            "airport_count": 8
          },
          {
            "county": "MASON",
            "airport_count": 7
          },
          {
            "county": "PRESTON",
            "airport_count": 6
          },
          {
            "county": "RALEIGH",
            "airport_count": 6
          }
        ]
      },
      {
        "state": "DE",
        "airport_count": 42,
        "by_county": [
          {
            "county": "KENT",
            "airport_count": 17
          },
          {
            "county": "SUSSEX",
            "airport_count": 14
          },
          {
            "county": "NEW CASTLE",
            "airport_count": 11
          }
        ]
      },
      {
        "state": "DC",
        "airport_count": 20,
        "by_county": [
          {
            "county": "WASHINGTON",
            "airport_count": 18
          },
          {
            "county": "ARLINGTON",
            "airport_count": 1
          },
          {
            "county": "LOUDOUN",
            "airport_count": 1
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1525
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 964
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 61
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 18
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 7
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 3
      }
    ]
  },
  {
    "faa_region": "ANM",
    "airport_count": 2102,
    "by_state_and_county": [
      {
        "state": "WA",
        "airport_count": 484,
        "by_county": [
          {
            "county": "KING",
            "airport_count": 61
          },
          {
            "county": "PIERCE",
            "airport_count": 27
          },
          {
            "county": "SPOKANE",
            "airport_count": 26
          },
          {
            "county": "SNOHOMISH",
            "airport_count": 25
          }
        ]
      },
      {
        "state": "OR",
        "airport_count": 441,
        "by_county": [
          {
            "county": "CLACKAMAS",
            "airport_count": 34
          },
          {
            "county": "LINN",
            "airport_count": 26
          },
          {
            "county": "WASHINGTON",
            "airport_count": 24
          },
          {
            "county": "LANE",
            "airport_count": 24
          }
        ]
      },
      {
        "state": "CO",
        "airport_count": 425,
        "by_county": [
          {
            "county": "WELD",
            "airport_count": 40
          },
          {
            "county": "EL PASO",
            "airport_count": 26
          },
          {
            "county": "ADAMS",
            "airport_count": 23
          },
          {
            "county": "JEFFERSON",
            "airport_count": 21
          }
        ]
      },
      {
        "state": "MT",
        "airport_count": 259,
        "by_county": [
          {
            "county": "FLATHEAD",
            "airport_count": 20
          },
          {
            "county": "LEWIS AND CLARK",
            "airport_count": 16
          },
          {
            "county": "MISSOULA",
            "airport_count": 14
          },
          {
            "county": "GALLATIN",
            "airport_count": 11
          }
        ]
      },
      {
        "state": "ID",
        "airport_count": 238,
        "by_county": [
          {
            "county": "VALLEY",
            "airport_count": 27
          },
          {
            "county": "KOOTENAI",
            "airport_count": 21
          },
          {
            "county": "IDAHO",
            "airport_count": 18
          },
          {
            "county": "BONNER",
            "airport_count": 18
          }
        ]
      },
      {
        "state": "UT",
        "airport_count": 140,
        "by_county": [
          {
            "county": "SALT LAKE",
            "airport_count": 21
          },
          {
            "county": "SAN JUAN",
            "airport_count": 13
          },
          {
            "county": "UTAH",
            "airport_count": 10
          },
          {
            "county": "GRAND",
            "airport_count": 9
          }
        ]
      },
      {
        "state": "WY",
        "airport_count": 115,
        "by_county": [
          {
            "county": "LARAMIE",
            "airport_count": 12
          },
          {
            "county": "PARK",
            "airport_count": 9
          },
          {
            "county": "CAMPBELL",
            "airport_count": 9
          },
          {
            "county": "CARBON",
            "airport_count": 9
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1524
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 527
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 25
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."faa_region" as "faa_region__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set IN (1,2) THEN
      airports3."state"
      END as "state__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1",
    CASE WHEN group_set=2 THEN
      airports3."county"
      END as "county__2",
    CASE WHEN group_set=2 THEN
      COUNT( 1)
      END as "airport_count__2",
    CASE WHEN group_set=3 THEN
      airports3."fac_type"
      END as "fac_type__3",
    CASE WHEN group_set=3 THEN
      COUNT( 1)
      END as "airport_count__3"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,3,1)) as group_set  ) as group_set
  GROUP BY 1,2,4,6,8
)
, __stage1 AS (
  SELECT 
    CASE WHEN group_set=2 THEN 1  ELSE group_set END as group_set,
    "faa_region__0" as "faa_region__0",
    FIRST("airport_count__0") FILTER (WHERE "airport_count__0" IS NOT NULL) as "airport_count__0",
    CASE WHEN group_set IN (1,2) THEN
      "state__1"
      END as "state__1",
    FIRST("airport_count__1") FILTER (WHERE "airport_count__1" IS NOT NULL) as "airport_count__1",
    COALESCE(LIST({
      "county": "county__2", 
      "airport_count": "airport_count__2"}  ORDER BY  "airport_count__2" desc NULLS LAST) FILTER (WHERE group_set=2)[1:4],[]) as "by_county__1",
    CASE WHEN group_set=3 THEN
      "fac_type__3"
      END as "fac_type__3",
    FIRST("airport_count__3") FILTER (WHERE "airport_count__3" IS NOT NULL) as "airport_count__3"
  FROM __stage0
  GROUP BY 1,2,4,7
)
SELECT
  "faa_region__0" as "faa_region",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "state": "state__1", 
    "airport_count": "airport_count__1", 
    "by_county": "by_county__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_state_and_county",
  COALESCE(LIST({
    "fac_type": "fac_type__3", 
    "airport_count": "airport_count__3"}  ORDER BY  "airport_count__3" desc NULLS LAST) FILTER (WHERE group_set=3),[]) as "by_facility_type"
FROM __stage1
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Refining a View

The gesture + { refinements } allows you to base a new view on an existing view while adding new refinements to the query terms.

For example we can add a limit and an order by to by_state

document
run: airports2 -> by_state + {
  order_by: state desc    // <-- add order by to query
  limit: 2
}
QUERY RESULTS
stateairport_​count
WY115
WV116
[
  {
    "state": "WY",
    "airport_count": 115
  },
  {
    "state": "WV",
    "airport_count": 116
  }
]
SELECT 
   airports2."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports2
GROUP BY 1
ORDER BY 1 desc NULLS LAST
LIMIT 2

is the same as

document
run: airports -> {
  group_by: state
  aggregate: airport_count
  order_by: state desc
  limit: 2
}
QUERY RESULTS
stateairport_​count
WY115
WV116
[
  {
    "state": "WY",
    "airport_count": 115
  },
  {
    "state": "WV",
    "airport_count": 116
  }
]
SELECT 
   airports."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports
GROUP BY 1
ORDER BY 1 desc NULLS LAST
LIMIT 2

You can add a measure or dimension

document
run: airports3 -> by_facility_type + {
  aggregate: avg_elevation
}
QUERY RESULTS
fac_​typeairport_​countavg_​elevation
AIRPORT13,9251,237.044
HELIPORT5,135950.513
SEAPLANE BASE473488.822
ULTRALIGHT125806.144
STOLPORT861,375.047
[
  {
    "fac_type": "AIRPORT",
    "airport_count": 13925,
    "avg_elevation": 1237.0441651705567
  },
  {
    "fac_type": "HELIPORT",
    "airport_count": 5135,
    "avg_elevation": 950.5125608568646
  },
  {
    "fac_type": "SEAPLANE BASE",
    "airport_count": 473,
    "avg_elevation": 488.82241014799155
  },
  {
    "fac_type": "ULTRALIGHT",
    "airport_count": 125,
    "avg_elevation": 806.144
  },
  {
    "fac_type": "STOLPORT",
    "airport_count": 86,
    "avg_elevation": 1375.046511627907
  }
]
SELECT 
   airports3."fac_type" as "fac_type",
   COUNT( 1) as "airport_count",
   AVG(airports3."elevation") as "avg_elevation"
FROM '../data/airports.parquet' as airports3
GROUP BY 1
ORDER BY 2 desc NULLS LAST

You can nest another view

document
run: airports3 -> top_5_states + {
  nest: by_facility_type
}
QUERY RESULTS
stateairport_​countby_​facility_​type
TX1,845
fac_​typeairport_​count
AIRPORT1,389
HELIPORT435
ULTRALIGHT8
STOLPORT8
GLIDERPORT5
CA984
fac_​typeairport_​count
AIRPORT569
HELIPORT396
SEAPLANE BASE12
GLIDERPORT3
ULTRALIGHT2
STOLPORT2
IL890
fac_​typeairport_​count
AIRPORT625
HELIPORT245
SEAPLANE BASE8
ULTRALIGHT6
BALLOONPORT2
GLIDERPORT2
STOLPORT2
FL856
fac_​typeairport_​count
AIRPORT511
HELIPORT280
SEAPLANE BASE43
STOLPORT13
ULTRALIGHT5
GLIDERPORT4
PA804
fac_​typeairport_​count
AIRPORT468
HELIPORT307
ULTRALIGHT13
SEAPLANE BASE10
STOLPORT3
GLIDERPORT3
[
  {
    "state": "TX",
    "airport_count": 1845,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1389
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 435
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 5
      }
    ]
  },
  {
    "state": "CA",
    "airport_count": 984,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 569
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 396
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 12
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 2
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "IL",
    "airport_count": 890,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 625
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 245
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 8
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 6
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 2
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 2
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "FL",
    "airport_count": 856,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 511
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 280
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 43
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 5
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "state": "PA",
    "airport_count": 804,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 468
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 307
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 13
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 10
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 3
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."fac_type"
      END as "fac_type__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "state__0" as "state",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "fac_type": "fac_type__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_facility_type"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 5

Changing the inner and outer query in the example above reveals very different information.

document
run: airports3 -> by_facility_type + {
  nest: top_5_states
}
QUERY RESULTS
fac_​typeairport_​counttop_​5_​states
AIRPORT13,925
stateairport_​count
TX1,389
IL625
CA569
OH537
FL511
HELIPORT5,135
stateairport_​count
TX435
CA396
PA307
FL280
NJ247
SEAPLANE BASE473
stateairport_​count
AK104
MN72
FL43
ME38
NY23
ULTRALIGHT125
stateairport_​count
LA18
IN17
PA13
TX8
AZ7
STOLPORT86
stateairport_​count
FL13
TN9
TX8
CO6
GA4
[
  {
    "fac_type": "AIRPORT",
    "airport_count": 13925,
    "top_5_states": [
      {
        "state": "TX",
        "airport_count": 1389
      },
      {
        "state": "IL",
        "airport_count": 625
      },
      {
        "state": "CA",
        "airport_count": 569
      },
      {
        "state": "OH",
        "airport_count": 537
      },
      {
        "state": "FL",
        "airport_count": 511
      }
    ]
  },
  {
    "fac_type": "HELIPORT",
    "airport_count": 5135,
    "top_5_states": [
      {
        "state": "TX",
        "airport_count": 435
      },
      {
        "state": "CA",
        "airport_count": 396
      },
      {
        "state": "PA",
        "airport_count": 307
      },
      {
        "state": "FL",
        "airport_count": 280
      },
      {
        "state": "NJ",
        "airport_count": 247
      }
    ]
  },
  {
    "fac_type": "SEAPLANE BASE",
    "airport_count": 473,
    "top_5_states": [
      {
        "state": "AK",
        "airport_count": 104
      },
      {
        "state": "MN",
        "airport_count": 72
      },
      {
        "state": "FL",
        "airport_count": 43
      },
      {
        "state": "ME",
        "airport_count": 38
      },
      {
        "state": "NY",
        "airport_count": 23
      }
    ]
  },
  {
    "fac_type": "ULTRALIGHT",
    "airport_count": 125,
    "top_5_states": [
      {
        "state": "LA",
        "airport_count": 18
      },
      {
        "state": "IN",
        "airport_count": 17
      },
      {
        "state": "PA",
        "airport_count": 13
      },
      {
        "state": "TX",
        "airport_count": 8
      },
      {
        "state": "AZ",
        "airport_count": 7
      }
    ]
  },
  {
    "fac_type": "STOLPORT",
    "airport_count": 86,
    "top_5_states": [
      {
        "state": "FL",
        "airport_count": 13
      },
      {
        "state": "TN",
        "airport_count": 9
      },
      {
        "state": "TX",
        "airport_count": 8
      },
      {
        "state": "CO",
        "airport_count": 6
      },
      {
        "state": "GA",
        "airport_count": 4
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."fac_type" as "fac_type__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."state"
      END as "state__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "fac_type__0" as "fac_type",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "state": "state__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:5],[]) as "top_5_states"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Joining

First let's model some simple tables... (Join Documentation)

Carrier table

simple source declaration used in example below

document
source: carriers is duckdb.table('../data/carriers.parquet') extend {
  measure: carrier_count is count()
}

run: carriers -> {
  select: *
}
QUERY RESULTS
codenamenickname
EVAtlantic Southeast AirlinesAtlantic Southeast
NWNorthwest AirlinesNorthwest
AAAmerican AirlinesAmerican
FLAirtran Airways CorporationAirtran
B6Jetblue AirwaysJetblue
[
  {
    "code": "EV",
    "name": "Atlantic Southeast Airlines",
    "nickname": "Atlantic Southeast"
  },
  {
    "code": "NW",
    "name": "Northwest Airlines",
    "nickname": "Northwest"
  },
  {
    "code": "AA",
    "name": "American Airlines",
    "nickname": "American"
  },
  {
    "code": "FL",
    "name": "Airtran Airways Corporation",
    "nickname": "Airtran"
  },
  {
    "code": "B6",
    "name": "Jetblue Airways",
    "nickname": "Jetblue"
  }
]
SELECT 
   carriers."code" as "code",
   carriers."name" as "name",
   carriers."nickname" as "nickname"
FROM '../data/carriers.parquet' as carriers

Flights table

simple source declaration used in example below

document
source: flights is duckdb.table('../data/flights.parquet') extend {
  measure: flight_count is count()
}

run: flights -> {
  select: id2, tail_num, dep_time, carrier, origin, destination, distance, dep_delay
  limit: 10
}
QUERY RESULTS
id2tail_​numdep_​timecarrierorigindestinationdistancedep_​delay
30,272,525N806MD2004-11-18 22:32:00USPHLABE55-3
29,742,442N806MD2004-10-12 20:46:00USPHLABE556
30,270,885N816MA2004-11-24 10:20:00USPHLABE550
28,344,746N806MD2004-08-31 20:30:00USPHLABE550
27,898,410N806MD2004-07-27 10:21:00USPHLABE55-4
[
  {
    "id2": 30272525,
    "tail_num": "N806MD",
    "dep_time": "2004-11-18T22:32:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": -3
  },
  {
    "id2": 29742442,
    "tail_num": "N806MD",
    "dep_time": "2004-10-12T20:46:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": 6
  },
  {
    "id2": 30270885,
    "tail_num": "N816MA",
    "dep_time": "2004-11-24T10:20:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": 0
  },
  {
    "id2": 28344746,
    "tail_num": "N806MD",
    "dep_time": "2004-08-31T20:30:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": 0
  },
  {
    "id2": 27898410,
    "tail_num": "N806MD",
    "dep_time": "2004-07-27T10:21:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": -4
  }
]
SELECT 
   flights."id2" as "id2",
   flights."tail_num" as "tail_num",
   flights."dep_time" as "dep_time",
   flights."carrier" as "carrier",
   flights."origin" as "origin",
   flights."destination" as "destination",
   flights."distance" as "distance",
   flights."dep_delay" as "dep_delay"
FROM '../data/flights.parquet' as flights
LIMIT 10

Declare a Join

Join carriers to flights. Each flight has one carrier so we use join_one:. (Join Documentation)

document
source: carriers2 is duckdb.table('../data/carriers.parquet') extend {
  measure: carrier_count is count()
}

source: flights2 is duckdb.table('../data/flights.parquet') extend {
  join_one: carriers2 on carrier = carriers2.code

  measure:
    flight_count is count()
    total_distance is distance.sum()
    avg_distance is distance.avg()
}

Query the joined tables

using the above declared flights source

document
run: flights2 -> {
  group_by: carriers2.nickname
  aggregate:
    flight_count
    total_distance
    avg_distance
}
QUERY RESULTS
nicknameflight_​counttotal_​distanceavg_​distance
Southwest88,75154,619,152615.42
USAir37,68323,721,642629.505
American34,57737,684,8851,089.883
Northwest33,58033,376,503993.94
United32,75738,882,9341,187.011
[
  {
    "nickname": "Southwest",
    "flight_count": 88751,
    "total_distance": 54619152,
    "avg_distance": 615.4201304774031
  },
  {
    "nickname": "USAir",
    "flight_count": 37683,
    "total_distance": 23721642,
    "avg_distance": 629.5051349414855
  },
  {
    "nickname": "American",
    "flight_count": 34577,
    "total_distance": 37684885,
    "avg_distance": 1089.8830147207682
  },
  {
    "nickname": "Northwest",
    "flight_count": 33580,
    "total_distance": 33376503,
    "avg_distance": 993.9399344848124
  },
  {
    "nickname": "United",
    "flight_count": 32757,
    "total_distance": 38882934,
    "avg_distance": 1187.0114479347926
  }
]
SELECT 
   carriers2_0."nickname" as "nickname",
   COUNT( 1) as "flight_count",
   COALESCE(SUM(flights2."distance"),0) as "total_distance",
   AVG(flights2."distance") as "avg_distance"
FROM '../data/flights.parquet' as flights2
 LEFT JOIN '../data/carriers.parquet' AS carriers2_0
  ON flights2."carrier"=carriers2_0."code"
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Aggregates can be computed from anywhere in the Join Tree

(Aggregate Documentation)

using the above declared flights source

document
run: flights2 -> {
  limit: 10
  group_by: origin
  aggregate: carriers2.carrier_count   // <-- calculation in joined table
  nest: top_3_carriers is  {
    limit: 3
    group_by: carriers2.nickname
    aggregate:
      flight_count
      total_distance
      avg_distance
  }
}
QUERY RESULTS
origincarrier_​counttop_​3_​carriers
DFW13
nicknameflight_​counttotal_​distanceavg_​distance
American8,7428,419,987963.165
American Eagle6,1461,464,011238.205
Delta668563,628843.754
IAH12
nicknameflight_​counttotal_​distanceavg_​distance
Continental Express2,8611,634,764571.396
Continental1,9701,699,834862.86
USAir600695,1411,158.568
SEA12
nicknameflight_​counttotal_​distanceavg_​distance
Alaska3,0302,633,111869.014
Southwest1,4091,064,397755.427
Northwest7511,228,9181,636.375
ATL12
nicknameflight_​counttotal_​distanceavg_​distance
Delta8,4195,793,004688.087
Atlantic Southeast7,3921,473,933199.396
USAir521257,058493.393
LAX12
nicknameflight_​counttotal_​distanceavg_​distance
Southwest4,2822,637,054615.846
United2,3193,441,4491,484.023
American1,9513,451,1941,768.936
[
  {
    "origin": "DFW",
    "carrier_count": 13,
    "top_3_carriers": [
      {
        "nickname": "American",
        "flight_count": 8742,
        "total_distance": 8419987,
        "avg_distance": 963.1648364218714
      },
      {
        "nickname": "American Eagle",
        "flight_count": 6146,
        "total_distance": 1464011,
        "avg_distance": 238.20549951187763
      },
      {
        "nickname": "Delta",
        "flight_count": 668,
        "total_distance": 563628,
        "avg_distance": 843.754491017964
      }
    ]
  },
  {
    "origin": "IAH",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Continental Express",
        "flight_count": 2861,
        "total_distance": 1634764,
        "avg_distance": 571.3960153792381
      },
      {
        "nickname": "Continental",
        "flight_count": 1970,
        "total_distance": 1699834,
        "avg_distance": 862.8598984771573
      },
      {
        "nickname": "USAir",
        "flight_count": 600,
        "total_distance": 695141,
        "avg_distance": 1158.5683333333334
      }
    ]
  },
  {
    "origin": "SEA",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Alaska",
        "flight_count": 3030,
        "total_distance": 2633111,
        "avg_distance": 869.0135313531354
      },
      {
        "nickname": "Southwest",
        "flight_count": 1409,
        "total_distance": 1064397,
        "avg_distance": 755.4272533711852
      },
      {
        "nickname": "Northwest",
        "flight_count": 751,
        "total_distance": 1228918,
        "avg_distance": 1636.375499334221
      }
    ]
  },
  {
    "origin": "ATL",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Delta",
        "flight_count": 8419,
        "total_distance": 5793004,
        "avg_distance": 688.0869461931346
      },
      {
        "nickname": "Atlantic Southeast",
        "flight_count": 7392,
        "total_distance": 1473933,
        "avg_distance": 199.39569805194805
      },
      {
        "nickname": "USAir",
        "flight_count": 521,
        "total_distance": 257058,
        "avg_distance": 493.3934740882917
      }
    ]
  },
  {
    "origin": "LAX",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Southwest",
        "flight_count": 4282,
        "total_distance": 2637054,
        "avg_distance": 615.8463334890238
      },
      {
        "nickname": "United",
        "flight_count": 2319,
        "total_distance": 3441449,
        "avg_distance": 1484.022854678741
      },
      {
        "nickname": "American",
        "flight_count": 1951,
        "total_distance": 3451194,
        "avg_distance": 1768.9359302921578
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    flights2."origin" as "origin__0",
    CASE WHEN group_set=0 THEN
      COUNT(DISTINCT carriers2_0."__distinct_key")
      END as "carrier_count__0",
    CASE WHEN group_set=1 THEN
      carriers2_0."nickname"
      END as "nickname__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "flight_count__1",
    CASE WHEN group_set=1 THEN
      COALESCE(SUM(flights2."distance"),0)
      END as "total_distance__1",
    CASE WHEN group_set=1 THEN
      AVG(flights2."distance")
      END as "avg_distance__1"
  FROM '../data/flights.parquet' as flights2
   LEFT JOIN (SELECT GEN_RANDOM_UUID() as __distinct_key, x.*  FROM '../data/carriers.parquet' as x) AS carriers2_0
    ON flights2."carrier"=carriers2_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "origin__0" as "origin",
  MAX(CASE WHEN group_set=0 THEN "carrier_count__0" END) as "carrier_count",
  COALESCE(LIST({
    "nickname": "nickname__1", 
    "flight_count": "flight_count__1", 
    "total_distance": "total_distance__1", 
    "avg_distance": "avg_distance__1"}  ORDER BY  "flight_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_carriers"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 10

More Complex Joins

The most common join pattern is a foreign key join. Malloy uses the with: to declare these and generates more efficient SQL when these joins are used.

In the example below, we use a with: join for carriers and then model the more complex relationship with the flights originating from each airport using on:.

Many flights have the same airport as their origin so we use join_many:.

document
source: carriers4 is duckdb.table('../data/carriers.parquet') extend {
  primary_key: code
  measure: carrier_count is count()
}

source: flights4 is duckdb.table('../data/flights.parquet') extend {
  join_one: carriers4 with carrier  // <-- each flight has 1 carrier

  measure:
    flight_count is count()
    total_distance is distance.sum()
    avg_distance is distance.avg()
}

source: airports4 is duckdb.table('../data/airports.parquet') extend {
  join_many: flights4 on code = flights4.origin  // <-- each airport has many flights

  measure: airport_count is count()
  dimension: elevation_in_meters is elevation * 0.3048
  measure: avg_elevation_in_meters is elevation_in_meters.avg()

  view: by_state is {
    group_by: state
    aggregate: airport_count
  }
}

Calculations work properly regardless of where you are in the graph

This query is very difficult to express in SQL. Malloy's understanding of source relationships allows it to compute aggregate computations at any node of the join path, unlike SQL which can only do aggregate computation at the. outermost level. (Aggregate Documentation)

using the above declared airports source

document
run: airports4 ->  {
  group_by: state
  aggregate:
    flights4.carriers4.carrier_count  // <-- 3 levels
    flights4.flight_count
    flights4.total_distance
    airport_count
    avg_elevation_in_meters         // <-- symmetric calculation
}
QUERY RESULTS
statecarrier_​countflight_​counttotal_​distanceairport_​countavg_​elevation_​in_​meters
TX1440,08524,171,1821,845273.799
CA1440,67037,690,414984331.533
IN132,3241,495,747643231.508
LA134,2462,644,46050019.65
OH127,3273,562,086749279.498
[
  {
    "state": "TX",
    "carrier_count": 14,
    "flight_count": 40085,
    "total_distance": 24171182,
    "airport_count": 1845,
    "avg_elevation_in_meters": 273.7994445528455
  },
  {
    "state": "CA",
    "carrier_count": 14,
    "flight_count": 40670,
    "total_distance": 37690414,
    "airport_count": 984,
    "avg_elevation_in_meters": 331.5331902439024
  },
  {
    "state": "IN",
    "carrier_count": 13,
    "flight_count": 2324,
    "total_distance": 1495747,
    "airport_count": 643,
    "avg_elevation_in_meters": 231.50816174183515
  },
  {
    "state": "LA",
    "carrier_count": 13,
    "flight_count": 4246,
    "total_distance": 2644460,
    "airport_count": 500,
    "avg_elevation_in_meters": 19.650456
  },
  {
    "state": "OH",
    "carrier_count": 12,
    "flight_count": 7327,
    "total_distance": 3562086,
    "airport_count": 749,
    "avg_elevation_in_meters": 279.4975305740988
  }
]
SELECT 
   airports4."state" as "state",
   COUNT(DISTINCT carriers4_0."code") as "carrier_count",
   COUNT(DISTINCT flights4_0."__distinct_key") as "flight_count",
   COALESCE(SUM(flights4_0."distance"),0) as "total_distance",
   COUNT(DISTINCT airports4."__distinct_key") as "airport_count",
   (
        SELECT AVG(a.val) as value
        FROM (
          SELECT UNNEST(list(distinct {key:airports4."__distinct_key", val: (airports4."elevation"*0.3048)})) a
        )
      ) as "avg_elevation_in_meters"
FROM (SELECT GEN_RANDOM_UUID() as __distinct_key, x.*  FROM '../data/airports.parquet' as x) as airports4
 LEFT JOIN (SELECT GEN_RANDOM_UUID() as __distinct_key, x.*  FROM '../data/flights.parquet' as x) AS flights4_0
  ON airports4."code"=flights4_0."origin"
 LEFT JOIN '../data/carriers.parquet' AS carriers4_0
  ON carriers4_0."code"=flights4_0."carrier"
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Pipelines

The output of a query can be used as the source for the next query.

Assume the following query as a starting point.

document
source: airports5 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
}

run: airports5 -> {
  where: fac_type = 'HELIPORT'
  group_by: state
  aggregate: airport_count
  nest: top_3_county is {
    limit: 3
    group_by: county
    aggregate: airport_count
  }
}
QUERY RESULTS
stateairport_​counttop_​3_​county
TX435
countyairport_​count
HARRIS110
TARRANT35
DALLAS32
CA396
countyairport_​count
LOS ANGELES151
ORANGE47
SAN BERNARDINO24
PA307
countyairport_​count
MONTGOMERY29
PHILADELPHIA22
ALLEGHENY22
FL280
countyairport_​count
PALM BEACH30
DADE27
ORANGE24
NJ247
countyairport_​count
MIDDLESEX26
SOMERSET23
MONMOUTH23
[
  {
    "state": "TX",
    "airport_count": 435,
    "top_3_county": [
      {
        "county": "HARRIS",
        "airport_count": 110
      },
      {
        "county": "TARRANT",
        "airport_count": 35
      },
      {
        "county": "DALLAS",
        "airport_count": 32
      }
    ]
  },
  {
    "state": "CA",
    "airport_count": 396,
    "top_3_county": [
      {
        "county": "LOS ANGELES",
        "airport_count": 151
      },
      {
        "county": "ORANGE",
        "airport_count": 47
      },
      {
        "county": "SAN BERNARDINO",
        "airport_count": 24
      }
    ]
  },
  {
    "state": "PA",
    "airport_count": 307,
    "top_3_county": [
      {
        "county": "MONTGOMERY",
        "airport_count": 29
      },
      {
        "county": "PHILADELPHIA",
        "airport_count": 22
      },
      {
        "county": "ALLEGHENY",
        "airport_count": 22
      }
    ]
  },
  {
    "state": "FL",
    "airport_count": 280,
    "top_3_county": [
      {
        "county": "PALM BEACH",
        "airport_count": 30
      },
      {
        "county": "DADE",
        "airport_count": 27
      },
      {
        "county": "ORANGE",
        "airport_count": 24
      }
    ]
  },
  {
    "state": "NJ",
    "airport_count": 247,
    "top_3_county": [
      {
        "county": "MIDDLESEX",
        "airport_count": 26
      },
      {
        "county": "SOMERSET",
        "airport_count": 23
      },
      {
        "county": "MONMOUTH",
        "airport_count": 23
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports5."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports5."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports5
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE airports5."fac_type"='HELIPORT'
  GROUP BY 1,2,4
)
SELECT
  "state__0" as "state",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "county": "county__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_county"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Un-nesting in a pipeline flattens the table

Queries can be chained together (pipelined), the output of one becoming the input of the next one, by simply adding another -> operator and a new query definition.

document
source: airports6 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
}

run: airports6 -> {
  where: fac_type = 'HELIPORT'
  group_by: state
  aggregate: airport_count
  nest: top_3_county is {
    limit: 3
    group_by: county
    aggregate: airport_count
  }
} -> {
  select:
    state
    top_3_county.county
    airports_in_state is airport_count
    airports_in_county is top_3_county.airport_count
    percent_of_state is top_3_county.airport_count / airport_count
}
QUERY RESULTS
statecountyairports_​in_​stateairports_​in_​countypercent_​of_​state
AKJUNEAU3030.1
COLARIMER16590.055
CTFAIRFIELD89180.202
INALLEN11550.043
ILLAKE245120.049
[
  {
    "state": "AK",
    "county": "JUNEAU",
    "airports_in_state": 30,
    "airports_in_county": 3,
    "percent_of_state": 0.1
  },
  {
    "state": "CO",
    "county": "LARIMER",
    "airports_in_state": 165,
    "airports_in_county": 9,
    "percent_of_state": 0.05454545454545454
  },
  {
    "state": "CT",
    "county": "FAIRFIELD",
    "airports_in_state": 89,
    "airports_in_county": 18,
    "percent_of_state": 0.20224719101123595
  },
  {
    "state": "IN",
    "county": "ALLEN",
    "airports_in_state": 115,
    "airports_in_county": 5,
    "percent_of_state": 0.043478260869565216
  },
  {
    "state": "IL",
    "county": "LAKE",
    "airports_in_state": 245,
    "airports_in_county": 12,
    "percent_of_state": 0.04897959183673469
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports6."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports6."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports6
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE airports6."fac_type"='HELIPORT'
  GROUP BY 1,2,4
)
, __stage1 AS (
  SELECT
    "state__0" as "state",
    MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
    COALESCE(LIST({
      "county": "county__1", 
      "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_county"
  FROM __stage0
  GROUP BY 1
)
SELECT 
   base."state" as "state",
   top_3_county_0."county" as "county",
   base."airport_count" as "airports_in_state",
   top_3_county_0."airport_count" as "airports_in_county",
   top_3_county_0."airport_count"*1.0/base."airport_count" as "percent_of_state"
FROM __stage1 as base
LEFT JOIN LATERAL (SELECT UNNEST(base."top_3_county"), 1 as ignoreme) as top_3_county_0_outer(top_3_county_0,ignoreme) ON top_3_county_0_outer.ignoreme=1

Pipelines can be named as views in sources

Pipelines can do pretty complex things. They can be built into source objects.

document
source: airports7 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
  view: county_rollup is {
    where: fac_type = 'HELIPORT'
    group_by: state
    aggregate: airport_count
    nest: top_3_county is {
      limit: 3
      group_by: county
      aggregate: airport_count
    }
  } -> {
    select:
      state
      top_3_county.county
      airports_in_state is airport_count
      airports_in_county is top_3_county.airport_count
      percent_of_state is top_3_county.airport_count/airport_count
  }
}

run: airports7 -> county_rollup
QUERY RESULTS
statecountyairports_​in_​stateairports_​in_​countypercent_​of_​state
AKJUNEAU3030.1
CTFAIRFIELD89180.202
COLARIMER16590.055
ILLAKE245120.049
INALLEN11550.043
[
  {
    "state": "AK",
    "county": "JUNEAU",
    "airports_in_state": 30,
    "airports_in_county": 3,
    "percent_of_state": 0.1
  },
  {
    "state": "CT",
    "county": "FAIRFIELD",
    "airports_in_state": 89,
    "airports_in_county": 18,
    "percent_of_state": 0.20224719101123595
  },
  {
    "state": "CO",
    "county": "LARIMER",
    "airports_in_state": 165,
    "airports_in_county": 9,
    "percent_of_state": 0.05454545454545454
  },
  {
    "state": "IL",
    "county": "LAKE",
    "airports_in_state": 245,
    "airports_in_county": 12,
    "percent_of_state": 0.04897959183673469
  },
  {
    "state": "IN",
    "county": "ALLEN",
    "airports_in_state": 115,
    "airports_in_county": 5,
    "percent_of_state": 0.043478260869565216
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports7."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports7."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports7
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE airports7."fac_type"='HELIPORT'
  GROUP BY 1,2,4
)
, __stage1 AS (
  SELECT
    "state__0" as "state",
    MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
    COALESCE(LIST({
      "county": "county__1", 
      "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_county"
  FROM __stage0
  GROUP BY 1
)
SELECT 
   base."state" as "state",
   top_3_county_0."county" as "county",
   base."airport_count" as "airports_in_state",
   top_3_county_0."airport_count" as "airports_in_county",
   top_3_county_0."airport_count"*1.0/base."airport_count" as "percent_of_state"
FROM __stage1 as base
LEFT JOIN LATERAL (SELECT UNNEST(base."top_3_county"), 1 as ignoreme) as top_3_county_0_outer(top_3_county_0,ignoreme) ON top_3_county_0_outer.ignoreme=1

Extending Sources

Similar to query refinement, a source can be extended with the extend { ... } gesture to create a new version of the source with additional properties.

source: new name is old name extend {
  where: some data limit
  measure: new field declarations
}

Sources based on Queries

Sometimes it is helpful to pre-process data before modeling it out with a source. To do this, you can first define a model-level query using a query statement, then extend the query to add reusable definitions.

Model-Level Queries

Here, we define a model-level query q_airport_facts based on the flights.parquet table.

document
query: q_airport_facts is duckdb.table('../data/flights.parquet') -> {
  group_by:
    flight_year is dep_time.year
    origin
    carrier
  aggregate:
    num_flights is count()
    distance is distance.sum()
}

run: q_airport_facts
query: q_airport_facts2 is duckdb.table('../data/flights.parquet') -> {
  group_by:
    flight_year is dep_time.year
    origin
    carrier
  aggregate:
    num_flights is count()
    distance is distance.sum()
}
QUERY RESULTS
flight_​yearorigincarriernum_​flightsdistance
2005TUSWN10041,253
2005DENUA1,1201,138,828
2005STLWN356223,181
2005CVGDL148155,699
2005DTWNW1,080978,006
[
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "TUS",
    "carrier": "WN",
    "num_flights": 100,
    "distance": 41253
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "DEN",
    "carrier": "UA",
    "num_flights": 1120,
    "distance": 1138828
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "STL",
    "carrier": "WN",
    "num_flights": 356,
    "distance": 223181
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "CVG",
    "carrier": "DL",
    "num_flights": 148,
    "distance": 155699
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "DTW",
    "carrier": "NW",
    "num_flights": 1080,
    "distance": 978006
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "flight_year",
   base."origin" as "origin",
   base."carrier" as "carrier",
   COUNT( 1) as "num_flights",
   COALESCE(SUM(base."distance"),0) as "distance"
FROM '../data/flights.parquet' as base
GROUP BY 1,2,3
ORDER BY 1 desc NULLS LAST

Source based on a query

Next, we can define a source based on q_airport_facts to add reusable fields and source-level queries.

document
source: airport_facts is q_airport_facts extend {
  measure: flight_count is num_flights.sum()
  measure: total_distance is distance.sum()

  view: flights_by_year is {
    group_by: flight_year
    aggregate:
      flight_count
      carrier_count is count(carrier)
      origin_count is count(origin)
  }

  view: flights_by_origin is {
    group_by: origin
    aggregate:
      flight_count
      carrier_count is count(carrier)
  }
}

Querying the Summary source

document
run: airport_facts -> flights_by_origin
QUERY RESULTS
originflight_​countcarrier_​count
ATL17,87512
DFW17,78213
ORD14,21410
PHX12,47612
LAS11,09610
[
  {
    "origin": "ATL",
    "flight_count": 17875,
    "carrier_count": 12
  },
  {
    "origin": "DFW",
    "flight_count": 17782,
    "carrier_count": 13
  },
  {
    "origin": "ORD",
    "flight_count": 14214,
    "carrier_count": 10
  },
  {
    "origin": "PHX",
    "flight_count": 12476,
    "carrier_count": 12
  },
  {
    "origin": "LAS",
    "flight_count": 11096,
    "carrier_count": 10
  }
]
WITH __stage0 AS (
  SELECT 
     DATE_TRUNC('year', base."dep_time") as "flight_year",
     base."origin" as "origin",
     base."carrier" as "carrier",
     COUNT( 1) as "num_flights",
     COALESCE(SUM(base."distance"),0) as "distance"
  FROM '../data/flights.parquet' as base
  GROUP BY 1,2,3
)
SELECT 
   airport_facts."origin" as "origin",
   COALESCE(SUM(airport_facts."num_flights"),0) as "flight_count",
   count(distinct airport_facts."carrier") as "carrier_count"
FROM __stage0 as airport_facts
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Other Interesting Language Features:

SQL Sources

See the SQL Sources section.

Embedding Malloy queries in SQL (SQL Block Documentation)

Case statement improved with pick (Expression Documentation)

Group by on Joined Subtrees

Date/Timestamp filters and Timezones (Time Documentation)

Nested data and Symmetric aggregates (Aggregates Documentation)

Import (Import Documentation)

Data styles and rendering (Rendering Documentation)