Malloy Documentation
search

This document will assumes a working knowledge of SQL and will rapidly take you through some of Malloy's key language features.

Using this Guide

For every Malloy Query you can see the formatted result, or raw result as JSON, or the SQL used to produce the result.

Click tab to to see the HTML, JSON or SQL result: ūüĎąūüĎą

SQL SELECT vs Malloy's run:

The statement to run a query in Malloy is run:. There are two types of queries in Malloy, reductions which have group_by: or aggregate: statements, and projections which have select: statements and do not group or aggregate results.

Projection: SELECT with no GROUP BY

In SQL

SELECT code, full_name, state, faa_region, fac_type, elevation
FROM `malloy-data.faa.airports`
ORDER BY code

Equivalent in Malloy

document
run: duckdb.table('../data/airports.parquet') -> {
  select: code, full_name, state, faa_region, fac_type, elevation
  order_by: code
}
QUERY RESULTS
codefull_‚Äčnamestatefaa_‚Äčregionfac_‚Äčtypeelevation
00ATOTAL RFPAAEAHELIPORT11
00CANIMAS AIR PARKCOANMAIRPORT6,684
00CAGOLDSTONE /GTS/CAAWPAIRPORT3,038
00EAT&T - APACHE JUNCTIONAZAWPHELIPORT2,527
00FTCJC-NORTHEAST CAMPUSTXASWHELIPORT600
[
  {
    "code": "00A",
    "full_name": "TOTAL RF",
    "state": "PA",
    "faa_region": "AEA",
    "fac_type": "HELIPORT",
    "elevation": 11
  },
  {
    "code": "00C",
    "full_name": "ANIMAS AIR PARK",
    "state": "CO",
    "faa_region": "ANM",
    "fac_type": "AIRPORT",
    "elevation": 6684
  },
  {
    "code": "00CA",
    "full_name": "GOLDSTONE /GTS/",
    "state": "CA",
    "faa_region": "AWP",
    "fac_type": "AIRPORT",
    "elevation": 3038
  },
  {
    "code": "00E",
    "full_name": "AT&T - APACHE JUNCTION",
    "state": "AZ",
    "faa_region": "AWP",
    "fac_type": "HELIPORT",
    "elevation": 2527
  },
  {
    "code": "00F",
    "full_name": "TCJC-NORTHEAST CAMPUS",
    "state": "TX",
    "faa_region": "ASW",
    "fac_type": "HELIPORT",
    "elevation": 600
  }
]
SELECT 
   base."code" as "code",
   base."full_name" as "full_name",
   base."state" as "state",
   base."faa_region" as "faa_region",
   base."fac_type" as "fac_type",
   base."elevation" as "elevation"
FROM '../data/airports.parquet' as base
ORDER BY 1 ASC NULLS LAST

Reduction: SELECT with GROUP BY and/or aggregation

In SQL

SELECT
  base.fac_type as fac_type,
  COUNT( 1) as airport_count
FROM `malloy-data.faa.airports` as base
WHERE base.state='CA'
GROUP BY 1
ORDER BY 2 desc

Equivalent in Malloy

document
run: duckdb.table('../data/airports.parquet') -> {
  group_by: fac_type
  aggregate: airport_count is count()
  where: state = 'CA'
  order_by: airport_count desc
}
QUERY RESULTS
fac_‚Äčtypeairport_‚Äčcount
AIRPORT569
HELIPORT396
SEAPLANE BASE12
GLIDERPORT3
STOLPORT2
[
  {
    "fac_type": "AIRPORT",
    "airport_count": 569
  },
  {
    "fac_type": "HELIPORT",
    "airport_count": 396
  },
  {
    "fac_type": "SEAPLANE BASE",
    "airport_count": 12
  },
  {
    "fac_type": "GLIDERPORT",
    "airport_count": 3
  },
  {
    "fac_type": "STOLPORT",
    "airport_count": 2
  }
]
SELECT 
   base."fac_type" as "fac_type",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as base
WHERE base."state"='CA'
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Source: A data source for queries

Malloy separates a query's view from the source of the data. A source can be thought of as a table and a collection of computations and relationships which are relevant to that table. (Source Documentation).

Fields can be defined as part of a source.

  • A measure: is a declared aggregate calculation (think function that operates across the table) which can be used in aggregate: elements in a query stage

  • A dimension: is a declared scalar calculation which that can be used in group_by: or select: elements of a query stage

document
source: airports is duckdb.table('../data/airports.parquet') extend {
  dimension: elevation_in_meters is elevation * 0.3048
  dimension: state_and_county is concat(state,' - ', county)
  measure: airport_count is count()
  measure: avg_elevation_in_meters is elevation_in_meters.avg()
}

Querying Against a Source

Queries can be run against source: objects and can utilize the modeled fields from that source, as well as introduce new ones. (Query Documentation)

using the above declared airports source

document
run: airports -> {
  limit: 10
  where: fac_type = 'HELIPORT'
  group_by: state
  aggregate:
    airport_count           // <-- declared in source
    avg_elevation_in_meters // <-- declared in source
}
QUERY RESULTS
stateairport_‚Äčcountavg_‚Äčelevation_‚Äčin_‚Äčmeters
TX435138.687
CA396276.402
PA307216.255
FL28018.101
NJ24748.15
[
  {
    "state": "TX",
    "airport_count": 435,
    "avg_elevation_in_meters": 138.6868027586207
  },
  {
    "state": "CA",
    "airport_count": 396,
    "avg_elevation_in_meters": 276.4020303030303
  },
  {
    "state": "PA",
    "airport_count": 307,
    "avg_elevation_in_meters": 216.2551035830619
  },
  {
    "state": "FL",
    "airport_count": 280,
    "avg_elevation_in_meters": 18.100765714285714
  },
  {
    "state": "NJ",
    "airport_count": 247,
    "avg_elevation_in_meters": 48.14976194331984
  }
]
SELECT 
   airports."state" as "state",
   COUNT( 1) as "airport_count",
   AVG((airports."elevation"*0.3048)) as "avg_elevation_in_meters"
FROM '../data/airports.parquet' as airports
WHERE airports."fac_type"='HELIPORT'
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 10

Dimensional calculations are no different from columns

using the above declared airports source

document
run: airports -> {
  group_by: state_and_county // <-- declared in source
  aggregate: airport_count
  order_by: 1 desc
}
QUERY RESULTS
state_‚Äčand_‚Äčcountyairport_‚Äčcount
WY - WESTON3
WY - WASHAKIE4
WY - UINTA3
WY - TETON7
WY - SWEETWATER4
[
  {
    "state_and_county": "WY - WESTON",
    "airport_count": 3
  },
  {
    "state_and_county": "WY - WASHAKIE",
    "airport_count": 4
  },
  {
    "state_and_county": "WY - UINTA",
    "airport_count": 3
  },
  {
    "state_and_county": "WY - TETON",
    "airport_count": 7
  },
  {
    "state_and_county": "WY - SWEETWATER",
    "airport_count": 4
  }
]
SELECT 
   CONCAT(airports."state",' - ',airports."county") as "state_and_county",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports
GROUP BY 1
ORDER BY 1 desc NULLS LAST

Defining Views in a Source

A source can also contain a set of useful views relating to that source, which can be run in queries.

using the above declared airports source

document
source: airports2 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()

  view: by_state is {        // <-- can be called by name
    group_by: state
    aggregate: airport_count
  }
}

Executing Views

The simplest form of a query in Malloy is the name of a source, the query operator ->, and the name of one of its contained views.

using the above declared airports source

document
run: airports2 -> by_state
QUERY RESULTS
stateairport_‚Äčcount
TX1,845
CA984
IL890
FL856
PA804
[
  {
    "state": "TX",
    "airport_count": 1845
  },
  {
    "state": "CA",
    "airport_count": 984
  },
  {
    "state": "IL",
    "airport_count": 890
  },
  {
    "state": "FL",
    "airport_count": 856
  },
  {
    "state": "PA",
    "airport_count": 804
  }
]
SELECT 
   airports2."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports2
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Filtering a Source

You can filter a source by adding a filter expression using the where: keyword in an extension. In this example, we filter an existing airports source and then use this extended version to execute the by_state view in a query. For more information on filtering, see the Filters section.

document
run: airports2 extend {
  where: fac_type = 'SEAPLANE BASE'   // <- run the query with an added filter
}
-> by_state
QUERY RESULTS
stateairport_‚Äčcount
AK104
MN72
FL43
ME38
NY23
[
  {
    "state": "AK",
    "airport_count": 104
  },
  {
    "state": "MN",
    "airport_count": 72
  },
  {
    "state": "FL",
    "airport_count": 43
  },
  {
    "state": "ME",
    "airport_count": 38
  },
  {
    "state": "NY",
    "airport_count": 23
  }
]
SELECT 
   airports2."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports2
WHERE (airports2."fac_type"='SEAPLANE BASE')
AND (airports2."fac_type"='SEAPLANE BASE')
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Filtering Measures

The input to an aggregate computation can be filtered.

using the above declared airports source

document
run: airports -> {
  group_by: state
  aggregate: airport_count
  aggregate: heliport_count is airport_count { where: fac_type = 'HELIPORT' } // <-- add a filter
}
QUERY RESULTS
stateairport_‚Äčcountheliport_‚Äčcount
TX1,845435
CA984396
IL890245
FL856280
PA804307
[
  {
    "state": "TX",
    "airport_count": 1845,
    "heliport_count": 435
  },
  {
    "state": "CA",
    "airport_count": 984,
    "heliport_count": 396
  },
  {
    "state": "IL",
    "airport_count": 890,
    "heliport_count": 245
  },
  {
    "state": "FL",
    "airport_count": 856,
    "heliport_count": 280
  },
  {
    "state": "PA",
    "airport_count": 804,
    "heliport_count": 307
  }
]
SELECT 
   airports."state" as "state",
   COUNT( 1) as "airport_count",
   (COUNT( CASE WHEN airports."fac_type"='HELIPORT' THEN 1 END)) as "heliport_count"
FROM '../data/airports.parquet' as airports
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Composing with Views

For the next section assume the following source declaration.

document
source: airports3 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
  measure: avg_elevation is elevation.avg()

  view: top_5_states is {
    group_by: state
    aggregate: airport_count
    limit: 5
  }

  view: by_facility_type is {
    group_by: fac_type
    aggregate: airport_count
  }
}

The nest: property embeds one view in another

Malloy allows you to create nested subtables easily in a query. In the case below, the top level view groups by state and the nested view groups by facility type. This mechanism is really useful for understanding data and creating complex data structures. (Nesting Documentation)

using the above declared airports source

document
run: airports3 -> {
  group_by: state
  aggregate: airport_count
  limit: 5
  nest: by_facility_type is  {
    group_by: fac_type
    aggregate: airport_count
  }
}
QUERY RESULTS
stateairport_‚Äčcountby_‚Äčfacility_‚Äčtype
TX1,845
fac_‚Äčtypeairport_‚Äčcount
AIRPORT1,389
HELIPORT435
STOLPORT8
ULTRALIGHT8
GLIDERPORT5
CA984
fac_‚Äčtypeairport_‚Äčcount
AIRPORT569
HELIPORT396
SEAPLANE BASE12
GLIDERPORT3
STOLPORT2
ULTRALIGHT2
IL890
fac_‚Äčtypeairport_‚Äčcount
AIRPORT625
HELIPORT245
SEAPLANE BASE8
ULTRALIGHT6
BALLOONPORT2
GLIDERPORT2
STOLPORT2
FL856
fac_‚Äčtypeairport_‚Äčcount
AIRPORT511
HELIPORT280
SEAPLANE BASE43
STOLPORT13
ULTRALIGHT5
GLIDERPORT4
PA804
fac_‚Äčtypeairport_‚Äčcount
AIRPORT468
HELIPORT307
ULTRALIGHT13
SEAPLANE BASE10
STOLPORT3
GLIDERPORT3
[
  {
    "state": "TX",
    "airport_count": 1845,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1389
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 435
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 5
      }
    ]
  },
  {
    "state": "CA",
    "airport_count": 984,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 569
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 396
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 12
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "IL",
    "airport_count": 890,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 625
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 245
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 8
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 6
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 2
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 2
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "FL",
    "airport_count": 856,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 511
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 280
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 43
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 5
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "state": "PA",
    "airport_count": 804,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 468
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 307
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 13
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 10
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 3
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."fac_type"
      END as "fac_type__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "state__0" as "state",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "fac_type": "fac_type__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_facility_type"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 5

Queries can contain multiple nested views.

using the above declared airports source

document
run: airports3 -> {
  group_by: faa_region
  aggregate: airport_count
  nest: top_5_states
  nest: by_facility_type
}
QUERY RESULTS
faa_‚Äčregionairport_‚Äčcounttop_‚Äč5_‚Äčstatesby_‚Äčfacility_‚Äčtype
AGL4,437
stateairport_‚Äčcount
IL890
OH749
IN643
WI543
MN507
fac_‚Äčtypeairport_‚Äčcount
AIRPORT3,443
HELIPORT826
SEAPLANE BASE119
ULTRALIGHT30
STOLPORT11
BALLOONPORT4
GLIDERPORT4
ASW3,268
stateairport_‚Äčcount
TX1,845
LA500
OK443
AR299
NM181
fac_‚Äčtypeairport_‚Äčcount
AIRPORT2,341
HELIPORT861
ULTRALIGHT32
SEAPLANE BASE19
STOLPORT9
GLIDERPORT6
ASO2,924
stateairport_‚Äčcount
FL856
GA440
NC400
TN285
AL260
fac_‚Äčtypeairport_‚Äčcount
AIRPORT2,038
HELIPORT770
SEAPLANE BASE57
STOLPORT33
ULTRALIGHT17
GLIDERPORT8
BALLOONPORT1
AEA2,586
stateairport_‚Äčcount
PA804
NY576
VA421
NJ378
MD229
fac_‚Äčtypeairport_‚Äčcount
AIRPORT1,525
HELIPORT964
SEAPLANE BASE61
ULTRALIGHT18
STOLPORT8
GLIDERPORT7
BALLOONPORT3
ANM2,102
stateairport_‚Äčcount
WA484
OR441
CO425
MT259
ID238
fac_‚Äčtypeairport_‚Äčcount
AIRPORT1,524
HELIPORT527
SEAPLANE BASE25
STOLPORT13
ULTRALIGHT8
GLIDERPORT4
BALLOONPORT1
[
  {
    "faa_region": "AGL",
    "airport_count": 4437,
    "top_5_states": [
      {
        "state": "IL",
        "airport_count": 890
      },
      {
        "state": "OH",
        "airport_count": 749
      },
      {
        "state": "IN",
        "airport_count": 643
      },
      {
        "state": "WI",
        "airport_count": 543
      },
      {
        "state": "MN",
        "airport_count": 507
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 3443
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 826
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 119
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 30
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 11
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 4
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "faa_region": "ASW",
    "airport_count": 3268,
    "top_5_states": [
      {
        "state": "TX",
        "airport_count": 1845
      },
      {
        "state": "LA",
        "airport_count": 500
      },
      {
        "state": "OK",
        "airport_count": 443
      },
      {
        "state": "AR",
        "airport_count": 299
      },
      {
        "state": "NM",
        "airport_count": 181
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2341
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 861
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 32
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 19
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 9
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 6
      }
    ]
  },
  {
    "faa_region": "ASO",
    "airport_count": 2924,
    "top_5_states": [
      {
        "state": "FL",
        "airport_count": 856
      },
      {
        "state": "GA",
        "airport_count": 440
      },
      {
        "state": "NC",
        "airport_count": 400
      },
      {
        "state": "TN",
        "airport_count": 285
      },
      {
        "state": "AL",
        "airport_count": 260
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2038
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 770
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 57
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 33
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 17
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 8
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  },
  {
    "faa_region": "AEA",
    "airport_count": 2586,
    "top_5_states": [
      {
        "state": "PA",
        "airport_count": 804
      },
      {
        "state": "NY",
        "airport_count": 576
      },
      {
        "state": "VA",
        "airport_count": 421
      },
      {
        "state": "NJ",
        "airport_count": 378
      },
      {
        "state": "MD",
        "airport_count": 229
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1525
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 964
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 61
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 18
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 7
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 3
      }
    ]
  },
  {
    "faa_region": "ANM",
    "airport_count": 2102,
    "top_5_states": [
      {
        "state": "WA",
        "airport_count": 484
      },
      {
        "state": "OR",
        "airport_count": 441
      },
      {
        "state": "CO",
        "airport_count": 425
      },
      {
        "state": "MT",
        "airport_count": 259
      },
      {
        "state": "ID",
        "airport_count": 238
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1524
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 527
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 25
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."faa_region" as "faa_region__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."state"
      END as "state__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1",
    CASE WHEN group_set=2 THEN
      airports3."fac_type"
      END as "fac_type__2",
    CASE WHEN group_set=2 THEN
      COUNT( 1)
      END as "airport_count__2"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,2,1)) as group_set  ) as group_set
  GROUP BY 1,2,4,6
)
SELECT
  "faa_region__0" as "faa_region",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "state": "state__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:5],[]) as "top_5_states",
  COALESCE(LIST({
    "fac_type": "fac_type__2", 
    "airport_count": "airport_count__2"}  ORDER BY  "airport_count__2" desc NULLS LAST) FILTER (WHERE group_set=2),[]) as "by_facility_type"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Views can be nested to any level of depth.

using the above declared airports source

document
run: airports3 -> {
  group_by: faa_region
  aggregate: airport_count
  nest: by_state_and_county is  {
    group_by: state
    aggregate: airport_count
    nest: by_county is  {
      group_by: county
      aggregate: airport_count
      limit: 4
    }
  }
  nest: by_facility_type
}
QUERY RESULTS
faa_‚Äčregionairport_‚Äčcountby_‚Äčstate_‚Äčand_‚Äčcountyby_‚Äčfacility_‚Äčtype
AGL4,437
stateairport_‚Äčcountby_‚Äčcounty
IL890
countyairport_‚Äčcount
COOK51
LA SALLE39
MC HENRY29
DE KALB27
OH749
countyairport_‚Äčcount
CUYAHOGA27
FRANKLIN27
STARK23
MONTGOMERY22
IN643
countyairport_‚Äčcount
MARION27
ALLEN24
HAMILTON20
MARSHALL18
WI543
countyairport_‚Äčcount
DANE30
WALWORTH22
WINNEBAGO17
ST CROIX16
MN507
countyairport_‚Äčcount
ST LOUIS28
HENNEPIN23
DAKOTA17
CROW WING17
MI489
countyairport_‚Äčcount
OAKLAND25
KENT24
WAYNE17
ST CLAIR15
ND436
countyairport_‚Äčcount
CASS39
RICHLAND20
MC LEAN20
WARD19
SD180
countyairport_‚Äčcount
PENNINGTON7
MINNEHAHA7
MEADE7
BRULE6
fac_‚Äčtypeairport_‚Äčcount
AIRPORT3,443
HELIPORT826
SEAPLANE BASE119
ULTRALIGHT30
STOLPORT11
BALLOONPORT4
GLIDERPORT4
ASW3,268
stateairport_‚Äčcountby_‚Äčcounty
TX1,845
countyairport_‚Äčcount
HARRIS135
TARRANT63
DENTON53
DALLAS42
LA500
countyairport_‚Äčcount
PLAQUEMINES31
VERMILION29
CALCASIEU23
LAFOURCHE21
OK443
countyairport_‚Äčcount
OKLAHOMA31
TULSA25
ROGERS16
DELAWARE13
AR299
countyairport_‚Äčcount
PULASKI20
BENTON19
LONOKE13
MISSISSIPPI10
NM181
countyairport_‚Äčcount
CATRON13
LINCOLN10
OTERO9
SANTA FE9
fac_‚Äčtypeairport_‚Äčcount
AIRPORT2,341
HELIPORT861
ULTRALIGHT32
SEAPLANE BASE19
STOLPORT9
GLIDERPORT6
ASO2,924
stateairport_‚Äčcountby_‚Äčcounty
FL856
countyairport_‚Äčcount
PALM BEACH45
DADE44
POLK43
MARION37
GA440
countyairport_‚Äčcount
FULTON22
PIKE17
CARROLL14
FAYETTE12
NC400
countyairport_‚Äčcount
WAKE15
ROWAN14
UNION14
MECKLENBURG14
TN285
countyairport_‚Äčcount
SHELBY24
DAVIDSON17
KNOX14
RUTHERFORD10
AL260
countyairport_‚Äčcount
BALDWIN25
JEFFERSON19
MOBILE17
MADISON14
MS243
countyairport_‚Äčcount
WASHINGTON15
HARRISON12
HINDS11
BOLIVAR9
KY202
countyairport_‚Äčcount
JEFFERSON13
FAYETTE7
BOONE6
MC LEAN6
SC189
countyairport_‚Äčcount
GREENVILLE13
LEXINGTON9
BEAUFORT9
CHARLESTON9
PR40
countyairport_‚Äčcount
--PUERTO RICO40
VI9
countyairport_‚Äčcount
-VIRGIN ISLANDS-9
fac_‚Äčtypeairport_‚Äčcount
AIRPORT2,038
HELIPORT770
SEAPLANE BASE57
STOLPORT33
ULTRALIGHT17
GLIDERPORT8
BALLOONPORT1
AEA2,586
stateairport_‚Äčcountby_‚Äčcounty
PA804
countyairport_‚Äčcount
BUCKS55
MONTGOMERY44
ALLEGHENY31
CHESTER27
NY576
countyairport_‚Äčcount
SUFFOLK34
ERIE26
DUTCHESS20
NIAGARA20
VA421
countyairport_‚Äčcount
FAUQUIER23
SHENANDOAH13
FAIRFAX12
ACCOMACK12
NJ378
countyairport_‚Äčcount
MONMOUTH31
MIDDLESEX29
BURLINGTON29
HUNTERDON27
MD229
countyairport_‚Äčcount
BALTIMORE24
CARROLL14
ANNE ARUNDEL14
PRINCE GEORGES14
WV116
countyairport_‚Äčcount
KANAWHA8
MASON7
RALEIGH6
PRESTON6
DE42
countyairport_‚Äčcount
KENT17
SUSSEX14
NEW CASTLE11
DC20
countyairport_‚Äčcount
WASHINGTON18
ARLINGTON1
LOUDOUN1
fac_‚Äčtypeairport_‚Äčcount
AIRPORT1,525
HELIPORT964
SEAPLANE BASE61
ULTRALIGHT18
STOLPORT8
GLIDERPORT7
BALLOONPORT3
ANM2,102
stateairport_‚Äčcountby_‚Äčcounty
WA484
countyairport_‚Äčcount
KING61
PIERCE27
SPOKANE26
SNOHOMISH25
OR441
countyairport_‚Äčcount
CLACKAMAS34
LINN26
WASHINGTON24
LANE24
CO425
countyairport_‚Äčcount
WELD40
EL PASO26
ADAMS23
JEFFERSON21
MT259
countyairport_‚Äčcount
FLATHEAD20
LEWIS AND CLARK16
MISSOULA14
GALLATIN11
ID238
countyairport_‚Äčcount
VALLEY27
KOOTENAI21
IDAHO18
BONNER18
UT140
countyairport_‚Äčcount
SALT LAKE21
SAN JUAN13
UTAH10
IRON9
WY115
countyairport_‚Äčcount
LARAMIE12
CAMPBELL9
PARK9
CARBON9
fac_‚Äčtypeairport_‚Äčcount
AIRPORT1,524
HELIPORT527
SEAPLANE BASE25
STOLPORT13
ULTRALIGHT8
GLIDERPORT4
BALLOONPORT1
[
  {
    "faa_region": "AGL",
    "airport_count": 4437,
    "by_state_and_county": [
      {
        "state": "IL",
        "airport_count": 890,
        "by_county": [
          {
            "county": "COOK",
            "airport_count": 51
          },
          {
            "county": "LA SALLE",
            "airport_count": 39
          },
          {
            "county": "MC HENRY",
            "airport_count": 29
          },
          {
            "county": "DE KALB",
            "airport_count": 27
          }
        ]
      },
      {
        "state": "OH",
        "airport_count": 749,
        "by_county": [
          {
            "county": "CUYAHOGA",
            "airport_count": 27
          },
          {
            "county": "FRANKLIN",
            "airport_count": 27
          },
          {
            "county": "STARK",
            "airport_count": 23
          },
          {
            "county": "MONTGOMERY",
            "airport_count": 22
          }
        ]
      },
      {
        "state": "IN",
        "airport_count": 643,
        "by_county": [
          {
            "county": "MARION",
            "airport_count": 27
          },
          {
            "county": "ALLEN",
            "airport_count": 24
          },
          {
            "county": "HAMILTON",
            "airport_count": 20
          },
          {
            "county": "MARSHALL",
            "airport_count": 18
          }
        ]
      },
      {
        "state": "WI",
        "airport_count": 543,
        "by_county": [
          {
            "county": "DANE",
            "airport_count": 30
          },
          {
            "county": "WALWORTH",
            "airport_count": 22
          },
          {
            "county": "WINNEBAGO",
            "airport_count": 17
          },
          {
            "county": "ST CROIX",
            "airport_count": 16
          }
        ]
      },
      {
        "state": "MN",
        "airport_count": 507,
        "by_county": [
          {
            "county": "ST LOUIS",
            "airport_count": 28
          },
          {
            "county": "HENNEPIN",
            "airport_count": 23
          },
          {
            "county": "DAKOTA",
            "airport_count": 17
          },
          {
            "county": "CROW WING",
            "airport_count": 17
          }
        ]
      },
      {
        "state": "MI",
        "airport_count": 489,
        "by_county": [
          {
            "county": "OAKLAND",
            "airport_count": 25
          },
          {
            "county": "KENT",
            "airport_count": 24
          },
          {
            "county": "WAYNE",
            "airport_count": 17
          },
          {
            "county": "ST CLAIR",
            "airport_count": 15
          }
        ]
      },
      {
        "state": "ND",
        "airport_count": 436,
        "by_county": [
          {
            "county": "CASS",
            "airport_count": 39
          },
          {
            "county": "RICHLAND",
            "airport_count": 20
          },
          {
            "county": "MC LEAN",
            "airport_count": 20
          },
          {
            "county": "WARD",
            "airport_count": 19
          }
        ]
      },
      {
        "state": "SD",
        "airport_count": 180,
        "by_county": [
          {
            "county": "PENNINGTON",
            "airport_count": 7
          },
          {
            "county": "MINNEHAHA",
            "airport_count": 7
          },
          {
            "county": "MEADE",
            "airport_count": 7
          },
          {
            "county": "BRULE",
            "airport_count": 6
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 3443
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 826
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 119
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 30
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 11
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 4
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "faa_region": "ASW",
    "airport_count": 3268,
    "by_state_and_county": [
      {
        "state": "TX",
        "airport_count": 1845,
        "by_county": [
          {
            "county": "HARRIS",
            "airport_count": 135
          },
          {
            "county": "TARRANT",
            "airport_count": 63
          },
          {
            "county": "DENTON",
            "airport_count": 53
          },
          {
            "county": "DALLAS",
            "airport_count": 42
          }
        ]
      },
      {
        "state": "LA",
        "airport_count": 500,
        "by_county": [
          {
            "county": "PLAQUEMINES",
            "airport_count": 31
          },
          {
            "county": "VERMILION",
            "airport_count": 29
          },
          {
            "county": "CALCASIEU",
            "airport_count": 23
          },
          {
            "county": "LAFOURCHE",
            "airport_count": 21
          }
        ]
      },
      {
        "state": "OK",
        "airport_count": 443,
        "by_county": [
          {
            "county": "OKLAHOMA",
            "airport_count": 31
          },
          {
            "county": "TULSA",
            "airport_count": 25
          },
          {
            "county": "ROGERS",
            "airport_count": 16
          },
          {
            "county": "DELAWARE",
            "airport_count": 13
          }
        ]
      },
      {
        "state": "AR",
        "airport_count": 299,
        "by_county": [
          {
            "county": "PULASKI",
            "airport_count": 20
          },
          {
            "county": "BENTON",
            "airport_count": 19
          },
          {
            "county": "LONOKE",
            "airport_count": 13
          },
          {
            "county": "MISSISSIPPI",
            "airport_count": 10
          }
        ]
      },
      {
        "state": "NM",
        "airport_count": 181,
        "by_county": [
          {
            "county": "CATRON",
            "airport_count": 13
          },
          {
            "county": "LINCOLN",
            "airport_count": 10
          },
          {
            "county": "OTERO",
            "airport_count": 9
          },
          {
            "county": "SANTA FE",
            "airport_count": 9
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2341
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 861
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 32
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 19
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 9
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 6
      }
    ]
  },
  {
    "faa_region": "ASO",
    "airport_count": 2924,
    "by_state_and_county": [
      {
        "state": "FL",
        "airport_count": 856,
        "by_county": [
          {
            "county": "PALM BEACH",
            "airport_count": 45
          },
          {
            "county": "DADE",
            "airport_count": 44
          },
          {
            "county": "POLK",
            "airport_count": 43
          },
          {
            "county": "MARION",
            "airport_count": 37
          }
        ]
      },
      {
        "state": "GA",
        "airport_count": 440,
        "by_county": [
          {
            "county": "FULTON",
            "airport_count": 22
          },
          {
            "county": "PIKE",
            "airport_count": 17
          },
          {
            "county": "CARROLL",
            "airport_count": 14
          },
          {
            "county": "FAYETTE",
            "airport_count": 12
          }
        ]
      },
      {
        "state": "NC",
        "airport_count": 400,
        "by_county": [
          {
            "county": "WAKE",
            "airport_count": 15
          },
          {
            "county": "ROWAN",
            "airport_count": 14
          },
          {
            "county": "UNION",
            "airport_count": 14
          },
          {
            "county": "MECKLENBURG",
            "airport_count": 14
          }
        ]
      },
      {
        "state": "TN",
        "airport_count": 285,
        "by_county": [
          {
            "county": "SHELBY",
            "airport_count": 24
          },
          {
            "county": "DAVIDSON",
            "airport_count": 17
          },
          {
            "county": "KNOX",
            "airport_count": 14
          },
          {
            "county": "RUTHERFORD",
            "airport_count": 10
          }
        ]
      },
      {
        "state": "AL",
        "airport_count": 260,
        "by_county": [
          {
            "county": "BALDWIN",
            "airport_count": 25
          },
          {
            "county": "JEFFERSON",
            "airport_count": 19
          },
          {
            "county": "MOBILE",
            "airport_count": 17
          },
          {
            "county": "MADISON",
            "airport_count": 14
          }
        ]
      },
      {
        "state": "MS",
        "airport_count": 243,
        "by_county": [
          {
            "county": "WASHINGTON",
            "airport_count": 15
          },
          {
            "county": "HARRISON",
            "airport_count": 12
          },
          {
            "county": "HINDS",
            "airport_count": 11
          },
          {
            "county": "BOLIVAR",
            "airport_count": 9
          }
        ]
      },
      {
        "state": "KY",
        "airport_count": 202,
        "by_county": [
          {
            "county": "JEFFERSON",
            "airport_count": 13
          },
          {
            "county": "FAYETTE",
            "airport_count": 7
          },
          {
            "county": "BOONE",
            "airport_count": 6
          },
          {
            "county": "MC LEAN",
            "airport_count": 6
          }
        ]
      },
      {
        "state": "SC",
        "airport_count": 189,
        "by_county": [
          {
            "county": "GREENVILLE",
            "airport_count": 13
          },
          {
            "county": "LEXINGTON",
            "airport_count": 9
          },
          {
            "county": "BEAUFORT",
            "airport_count": 9
          },
          {
            "county": "CHARLESTON",
            "airport_count": 9
          }
        ]
      },
      {
        "state": "PR",
        "airport_count": 40,
        "by_county": [
          {
            "county": "--PUERTO RICO",
            "airport_count": 40
          }
        ]
      },
      {
        "state": "VI",
        "airport_count": 9,
        "by_county": [
          {
            "county": "-VIRGIN ISLANDS-",
            "airport_count": 9
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 2038
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 770
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 57
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 33
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 17
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 8
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  },
  {
    "faa_region": "AEA",
    "airport_count": 2586,
    "by_state_and_county": [
      {
        "state": "PA",
        "airport_count": 804,
        "by_county": [
          {
            "county": "BUCKS",
            "airport_count": 55
          },
          {
            "county": "MONTGOMERY",
            "airport_count": 44
          },
          {
            "county": "ALLEGHENY",
            "airport_count": 31
          },
          {
            "county": "CHESTER",
            "airport_count": 27
          }
        ]
      },
      {
        "state": "NY",
        "airport_count": 576,
        "by_county": [
          {
            "county": "SUFFOLK",
            "airport_count": 34
          },
          {
            "county": "ERIE",
            "airport_count": 26
          },
          {
            "county": "DUTCHESS",
            "airport_count": 20
          },
          {
            "county": "NIAGARA",
            "airport_count": 20
          }
        ]
      },
      {
        "state": "VA",
        "airport_count": 421,
        "by_county": [
          {
            "county": "FAUQUIER",
            "airport_count": 23
          },
          {
            "county": "SHENANDOAH",
            "airport_count": 13
          },
          {
            "county": "FAIRFAX",
            "airport_count": 12
          },
          {
            "county": "ACCOMACK",
            "airport_count": 12
          }
        ]
      },
      {
        "state": "NJ",
        "airport_count": 378,
        "by_county": [
          {
            "county": "MONMOUTH",
            "airport_count": 31
          },
          {
            "county": "MIDDLESEX",
            "airport_count": 29
          },
          {
            "county": "BURLINGTON",
            "airport_count": 29
          },
          {
            "county": "HUNTERDON",
            "airport_count": 27
          }
        ]
      },
      {
        "state": "MD",
        "airport_count": 229,
        "by_county": [
          {
            "county": "BALTIMORE",
            "airport_count": 24
          },
          {
            "county": "CARROLL",
            "airport_count": 14
          },
          {
            "county": "ANNE ARUNDEL",
            "airport_count": 14
          },
          {
            "county": "PRINCE GEORGES",
            "airport_count": 14
          }
        ]
      },
      {
        "state": "WV",
        "airport_count": 116,
        "by_county": [
          {
            "county": "KANAWHA",
            "airport_count": 8
          },
          {
            "county": "MASON",
            "airport_count": 7
          },
          {
            "county": "RALEIGH",
            "airport_count": 6
          },
          {
            "county": "PRESTON",
            "airport_count": 6
          }
        ]
      },
      {
        "state": "DE",
        "airport_count": 42,
        "by_county": [
          {
            "county": "KENT",
            "airport_count": 17
          },
          {
            "county": "SUSSEX",
            "airport_count": 14
          },
          {
            "county": "NEW CASTLE",
            "airport_count": 11
          }
        ]
      },
      {
        "state": "DC",
        "airport_count": 20,
        "by_county": [
          {
            "county": "WASHINGTON",
            "airport_count": 18
          },
          {
            "county": "ARLINGTON",
            "airport_count": 1
          },
          {
            "county": "LOUDOUN",
            "airport_count": 1
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1525
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 964
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 61
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 18
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 7
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 3
      }
    ]
  },
  {
    "faa_region": "ANM",
    "airport_count": 2102,
    "by_state_and_county": [
      {
        "state": "WA",
        "airport_count": 484,
        "by_county": [
          {
            "county": "KING",
            "airport_count": 61
          },
          {
            "county": "PIERCE",
            "airport_count": 27
          },
          {
            "county": "SPOKANE",
            "airport_count": 26
          },
          {
            "county": "SNOHOMISH",
            "airport_count": 25
          }
        ]
      },
      {
        "state": "OR",
        "airport_count": 441,
        "by_county": [
          {
            "county": "CLACKAMAS",
            "airport_count": 34
          },
          {
            "county": "LINN",
            "airport_count": 26
          },
          {
            "county": "WASHINGTON",
            "airport_count": 24
          },
          {
            "county": "LANE",
            "airport_count": 24
          }
        ]
      },
      {
        "state": "CO",
        "airport_count": 425,
        "by_county": [
          {
            "county": "WELD",
            "airport_count": 40
          },
          {
            "county": "EL PASO",
            "airport_count": 26
          },
          {
            "county": "ADAMS",
            "airport_count": 23
          },
          {
            "county": "JEFFERSON",
            "airport_count": 21
          }
        ]
      },
      {
        "state": "MT",
        "airport_count": 259,
        "by_county": [
          {
            "county": "FLATHEAD",
            "airport_count": 20
          },
          {
            "county": "LEWIS AND CLARK",
            "airport_count": 16
          },
          {
            "county": "MISSOULA",
            "airport_count": 14
          },
          {
            "county": "GALLATIN",
            "airport_count": 11
          }
        ]
      },
      {
        "state": "ID",
        "airport_count": 238,
        "by_county": [
          {
            "county": "VALLEY",
            "airport_count": 27
          },
          {
            "county": "KOOTENAI",
            "airport_count": 21
          },
          {
            "county": "IDAHO",
            "airport_count": 18
          },
          {
            "county": "BONNER",
            "airport_count": 18
          }
        ]
      },
      {
        "state": "UT",
        "airport_count": 140,
        "by_county": [
          {
            "county": "SALT LAKE",
            "airport_count": 21
          },
          {
            "county": "SAN JUAN",
            "airport_count": 13
          },
          {
            "county": "UTAH",
            "airport_count": 10
          },
          {
            "county": "IRON",
            "airport_count": 9
          }
        ]
      },
      {
        "state": "WY",
        "airport_count": 115,
        "by_county": [
          {
            "county": "LARAMIE",
            "airport_count": 12
          },
          {
            "county": "CAMPBELL",
            "airport_count": 9
          },
          {
            "county": "PARK",
            "airport_count": 9
          },
          {
            "county": "CARBON",
            "airport_count": 9
          }
        ]
      }
    ],
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1524
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 527
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 25
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 1
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."faa_region" as "faa_region__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set IN (1,2) THEN
      airports3."state"
      END as "state__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1",
    CASE WHEN group_set=2 THEN
      airports3."county"
      END as "county__2",
    CASE WHEN group_set=2 THEN
      COUNT( 1)
      END as "airport_count__2",
    CASE WHEN group_set=3 THEN
      airports3."fac_type"
      END as "fac_type__3",
    CASE WHEN group_set=3 THEN
      COUNT( 1)
      END as "airport_count__3"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,3,1)) as group_set  ) as group_set
  GROUP BY 1,2,4,6,8
)
, __stage1 AS (
  SELECT 
    CASE WHEN group_set=2 THEN 1  ELSE group_set END as group_set,
    "faa_region__0" as "faa_region__0",
    FIRST("airport_count__0") FILTER (WHERE "airport_count__0" IS NOT NULL) as "airport_count__0",
    CASE WHEN group_set IN (1,2) THEN
      "state__1"
      END as "state__1",
    FIRST("airport_count__1") FILTER (WHERE "airport_count__1" IS NOT NULL) as "airport_count__1",
    COALESCE(LIST({
      "county": "county__2", 
      "airport_count": "airport_count__2"}  ORDER BY  "airport_count__2" desc NULLS LAST) FILTER (WHERE group_set=2)[1:4],[]) as "by_county__1",
    CASE WHEN group_set=3 THEN
      "fac_type__3"
      END as "fac_type__3",
    FIRST("airport_count__3") FILTER (WHERE "airport_count__3" IS NOT NULL) as "airport_count__3"
  FROM __stage0
  GROUP BY 1,2,4,7
)
SELECT
  "faa_region__0" as "faa_region",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "state": "state__1", 
    "airport_count": "airport_count__1", 
    "by_county": "by_county__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_state_and_county",
  COALESCE(LIST({
    "fac_type": "fac_type__3", 
    "airport_count": "airport_count__3"}  ORDER BY  "airport_count__3" desc NULLS LAST) FILTER (WHERE group_set=3),[]) as "by_facility_type"
FROM __stage1
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Refining a View

The gesture + { refinements } allows you to base a new view on an existing view while adding new refinements to the query terms.

For example we can add a limit and an order by to by_state

document
run: airports2 -> by_state + {
  order_by: state desc    // <-- add order by to query
  limit: 2
}
QUERY RESULTS
stateairport_‚Äčcount
WY115
WV116
[
  {
    "state": "WY",
    "airport_count": 115
  },
  {
    "state": "WV",
    "airport_count": 116
  }
]
SELECT 
   airports2."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports2
GROUP BY 1
ORDER BY 1 desc NULLS LAST
LIMIT 2

is the same as

document
run: airports -> {
  group_by: state
  aggregate: airport_count
  order_by: state desc
  limit: 2
}
QUERY RESULTS
stateairport_‚Äčcount
WY115
WV116
[
  {
    "state": "WY",
    "airport_count": 115
  },
  {
    "state": "WV",
    "airport_count": 116
  }
]
SELECT 
   airports."state" as "state",
   COUNT( 1) as "airport_count"
FROM '../data/airports.parquet' as airports
GROUP BY 1
ORDER BY 1 desc NULLS LAST
LIMIT 2

You can add a measure or dimension

document
run: airports3 -> by_facility_type + {
  aggregate: avg_elevation
}
QUERY RESULTS
fac_‚Äčtypeairport_‚Äčcountavg_‚Äčelevation
AIRPORT13,9251,237.044
HELIPORT5,135950.513
SEAPLANE BASE473488.822
ULTRALIGHT125806.144
STOLPORT861,375.047
[
  {
    "fac_type": "AIRPORT",
    "airport_count": 13925,
    "avg_elevation": 1237.0441651705567
  },
  {
    "fac_type": "HELIPORT",
    "airport_count": 5135,
    "avg_elevation": 950.5125608568646
  },
  {
    "fac_type": "SEAPLANE BASE",
    "airport_count": 473,
    "avg_elevation": 488.82241014799155
  },
  {
    "fac_type": "ULTRALIGHT",
    "airport_count": 125,
    "avg_elevation": 806.144
  },
  {
    "fac_type": "STOLPORT",
    "airport_count": 86,
    "avg_elevation": 1375.046511627907
  }
]
SELECT 
   airports3."fac_type" as "fac_type",
   COUNT( 1) as "airport_count",
   AVG(airports3."elevation") as "avg_elevation"
FROM '../data/airports.parquet' as airports3
GROUP BY 1
ORDER BY 2 desc NULLS LAST

You can nest another view

document
run: airports3 -> top_5_states + {
  nest: by_facility_type
}
QUERY RESULTS
stateairport_‚Äčcountby_‚Äčfacility_‚Äčtype
TX1,845
fac_‚Äčtypeairport_‚Äčcount
AIRPORT1,389
HELIPORT435
STOLPORT8
ULTRALIGHT8
GLIDERPORT5
CA984
fac_‚Äčtypeairport_‚Äčcount
AIRPORT569
HELIPORT396
SEAPLANE BASE12
GLIDERPORT3
STOLPORT2
ULTRALIGHT2
IL890
fac_‚Äčtypeairport_‚Äčcount
AIRPORT625
HELIPORT245
SEAPLANE BASE8
ULTRALIGHT6
BALLOONPORT2
GLIDERPORT2
STOLPORT2
FL856
fac_‚Äčtypeairport_‚Äčcount
AIRPORT511
HELIPORT280
SEAPLANE BASE43
STOLPORT13
ULTRALIGHT5
GLIDERPORT4
PA804
fac_‚Äčtypeairport_‚Äčcount
AIRPORT468
HELIPORT307
ULTRALIGHT13
SEAPLANE BASE10
STOLPORT3
GLIDERPORT3
[
  {
    "state": "TX",
    "airport_count": 1845,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 1389
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 435
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 8
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 8
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 5
      }
    ]
  },
  {
    "state": "CA",
    "airport_count": 984,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 569
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 396
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 12
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "IL",
    "airport_count": 890,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 625
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 245
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 8
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 6
      },
      {
        "fac_type": "BALLOONPORT",
        "airport_count": 2
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 2
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 2
      }
    ]
  },
  {
    "state": "FL",
    "airport_count": 856,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 511
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 280
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 43
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 13
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 5
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 4
      }
    ]
  },
  {
    "state": "PA",
    "airport_count": 804,
    "by_facility_type": [
      {
        "fac_type": "AIRPORT",
        "airport_count": 468
      },
      {
        "fac_type": "HELIPORT",
        "airport_count": 307
      },
      {
        "fac_type": "ULTRALIGHT",
        "airport_count": 13
      },
      {
        "fac_type": "SEAPLANE BASE",
        "airport_count": 10
      },
      {
        "fac_type": "STOLPORT",
        "airport_count": 3
      },
      {
        "fac_type": "GLIDERPORT",
        "airport_count": 3
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."fac_type"
      END as "fac_type__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "state__0" as "state",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "fac_type": "fac_type__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1),[]) as "by_facility_type"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 5

Changing the inner and outer query in the example above reveals very different information.

document
run: airports3 -> by_facility_type + {
  nest: top_5_states
}
QUERY RESULTS
fac_‚Äčtypeairport_‚Äčcounttop_‚Äč5_‚Äčstates
AIRPORT13,925
stateairport_‚Äčcount
TX1,389
IL625
CA569
OH537
FL511
HELIPORT5,135
stateairport_‚Äčcount
TX435
CA396
PA307
FL280
NJ247
SEAPLANE BASE473
stateairport_‚Äčcount
AK104
MN72
FL43
ME38
NY23
ULTRALIGHT125
stateairport_‚Äčcount
LA18
IN17
PA13
TX8
AZ7
STOLPORT86
stateairport_‚Äčcount
FL13
TN9
TX8
CO6
GA4
[
  {
    "fac_type": "AIRPORT",
    "airport_count": 13925,
    "top_5_states": [
      {
        "state": "TX",
        "airport_count": 1389
      },
      {
        "state": "IL",
        "airport_count": 625
      },
      {
        "state": "CA",
        "airport_count": 569
      },
      {
        "state": "OH",
        "airport_count": 537
      },
      {
        "state": "FL",
        "airport_count": 511
      }
    ]
  },
  {
    "fac_type": "HELIPORT",
    "airport_count": 5135,
    "top_5_states": [
      {
        "state": "TX",
        "airport_count": 435
      },
      {
        "state": "CA",
        "airport_count": 396
      },
      {
        "state": "PA",
        "airport_count": 307
      },
      {
        "state": "FL",
        "airport_count": 280
      },
      {
        "state": "NJ",
        "airport_count": 247
      }
    ]
  },
  {
    "fac_type": "SEAPLANE BASE",
    "airport_count": 473,
    "top_5_states": [
      {
        "state": "AK",
        "airport_count": 104
      },
      {
        "state": "MN",
        "airport_count": 72
      },
      {
        "state": "FL",
        "airport_count": 43
      },
      {
        "state": "ME",
        "airport_count": 38
      },
      {
        "state": "NY",
        "airport_count": 23
      }
    ]
  },
  {
    "fac_type": "ULTRALIGHT",
    "airport_count": 125,
    "top_5_states": [
      {
        "state": "LA",
        "airport_count": 18
      },
      {
        "state": "IN",
        "airport_count": 17
      },
      {
        "state": "PA",
        "airport_count": 13
      },
      {
        "state": "TX",
        "airport_count": 8
      },
      {
        "state": "AZ",
        "airport_count": 7
      }
    ]
  },
  {
    "fac_type": "STOLPORT",
    "airport_count": 86,
    "top_5_states": [
      {
        "state": "FL",
        "airport_count": 13
      },
      {
        "state": "TN",
        "airport_count": 9
      },
      {
        "state": "TX",
        "airport_count": 8
      },
      {
        "state": "CO",
        "airport_count": 6
      },
      {
        "state": "GA",
        "airport_count": 4
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports3."fac_type" as "fac_type__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports3."state"
      END as "state__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports3
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "fac_type__0" as "fac_type",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "state": "state__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:5],[]) as "top_5_states"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Joining

First let's model some simple tables... (Join Documentation)

Carrier table

simple source declaration used in example below

document
source: carriers is duckdb.table('../data/carriers.parquet') extend {
  measure: carrier_count is count()
}

run: carriers -> {
  select: *
}
QUERY RESULTS
codenamenickname
EVAtlantic Southeast AirlinesAtlantic Southeast
NWNorthwest AirlinesNorthwest
AAAmerican AirlinesAmerican
FLAirtran Airways CorporationAirtran
B6Jetblue AirwaysJetblue
[
  {
    "code": "EV",
    "name": "Atlantic Southeast Airlines",
    "nickname": "Atlantic Southeast"
  },
  {
    "code": "NW",
    "name": "Northwest Airlines",
    "nickname": "Northwest"
  },
  {
    "code": "AA",
    "name": "American Airlines",
    "nickname": "American"
  },
  {
    "code": "FL",
    "name": "Airtran Airways Corporation",
    "nickname": "Airtran"
  },
  {
    "code": "B6",
    "name": "Jetblue Airways",
    "nickname": "Jetblue"
  }
]
SELECT 
   carriers."code" as "code",
   carriers."name" as "name",
   carriers."nickname" as "nickname"
FROM '../data/carriers.parquet' as carriers

Flights table

simple source declaration used in example below

document
source: flights is duckdb.table('../data/flights.parquet') extend {
  measure: flight_count is count()
}

run: flights -> {
  select: id2, tail_num, dep_time, carrier, origin, destination, distance, dep_delay
  limit: 10
}
QUERY RESULTS
id2tail_‚Äčnumdep_‚Äčtimecarrierorigindestinationdistancedep_‚Äčdelay
30,272,525N806MD2004-11-18 22:32:00USPHLABE55-3
29,742,442N806MD2004-10-12 20:46:00USPHLABE556
30,270,885N816MA2004-11-24 10:20:00USPHLABE550
28,344,746N806MD2004-08-31 20:30:00USPHLABE550
27,898,410N806MD2004-07-27 10:21:00USPHLABE55-4
[
  {
    "id2": 30272525,
    "tail_num": "N806MD",
    "dep_time": "2004-11-18T22:32:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": -3
  },
  {
    "id2": 29742442,
    "tail_num": "N806MD",
    "dep_time": "2004-10-12T20:46:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": 6
  },
  {
    "id2": 30270885,
    "tail_num": "N816MA",
    "dep_time": "2004-11-24T10:20:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": 0
  },
  {
    "id2": 28344746,
    "tail_num": "N806MD",
    "dep_time": "2004-08-31T20:30:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": 0
  },
  {
    "id2": 27898410,
    "tail_num": "N806MD",
    "dep_time": "2004-07-27T10:21:00.000Z",
    "carrier": "US",
    "origin": "PHL",
    "destination": "ABE",
    "distance": 55,
    "dep_delay": -4
  }
]
SELECT 
   flights."id2" as "id2",
   flights."tail_num" as "tail_num",
   flights."dep_time" as "dep_time",
   flights."carrier" as "carrier",
   flights."origin" as "origin",
   flights."destination" as "destination",
   flights."distance" as "distance",
   flights."dep_delay" as "dep_delay"
FROM '../data/flights.parquet' as flights
LIMIT 10

Declare a Join

Join carriers to flights. Each flight has one carrier so we use join_one:. (Join Documentation)

document
source: carriers2 is duckdb.table('../data/carriers.parquet') extend {
  measure: carrier_count is count()
}

source: flights2 is duckdb.table('../data/flights.parquet') extend {
  join_one: carriers2 on carrier = carriers2.code

  measure:
    flight_count is count()
    total_distance is distance.sum()
    avg_distance is distance.avg()
}

Query the joined tables

using the above declared flights source

document
run: flights2 -> {
  group_by: carriers2.nickname
  aggregate:
    flight_count
    total_distance
    avg_distance
}
QUERY RESULTS
nicknameflight_‚Äčcounttotal_‚Äčdistanceavg_‚Äčdistance
Southwest88,75154,619,152615.42
USAir37,68323,721,642629.505
American34,57737,684,8851,089.883
Northwest33,58033,376,503993.94
United32,75738,882,9341,187.011
[
  {
    "nickname": "Southwest",
    "flight_count": 88751,
    "total_distance": 54619152,
    "avg_distance": 615.4201304774031
  },
  {
    "nickname": "USAir",
    "flight_count": 37683,
    "total_distance": 23721642,
    "avg_distance": 629.5051349414855
  },
  {
    "nickname": "American",
    "flight_count": 34577,
    "total_distance": 37684885,
    "avg_distance": 1089.8830147207682
  },
  {
    "nickname": "Northwest",
    "flight_count": 33580,
    "total_distance": 33376503,
    "avg_distance": 993.9399344848124
  },
  {
    "nickname": "United",
    "flight_count": 32757,
    "total_distance": 38882934,
    "avg_distance": 1187.0114479347926
  }
]
SELECT 
   carriers2_0."nickname" as "nickname",
   COUNT( 1) as "flight_count",
   COALESCE(SUM(flights2."distance"),0) as "total_distance",
   AVG(flights2."distance") as "avg_distance"
FROM '../data/flights.parquet' as flights2
LEFT JOIN '../data/carriers.parquet' AS carriers2_0
  ON flights2."carrier"=carriers2_0."code"
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Aggregates can be computed from anywhere in the Join Tree

(Aggregate Documentation)

using the above declared flights source

document
run: flights2 -> {
  limit: 10
  group_by: origin
  aggregate: carriers2.carrier_count   // <-- calculation in joined table
  nest: top_3_carriers is  {
    limit: 3
    group_by: carriers2.nickname
    aggregate:
      flight_count
      total_distance
      avg_distance
  }
}
QUERY RESULTS
origincarrier_‚Äčcounttop_‚Äč3_‚Äčcarriers
DFW13
nicknameflight_‚Äčcounttotal_‚Äčdistanceavg_‚Äčdistance
American8,7428,419,987963.165
American Eagle6,1461,464,011238.205
Delta668563,628843.754
ATL12
nicknameflight_‚Äčcounttotal_‚Äčdistanceavg_‚Äčdistance
Delta8,4195,793,004688.087
Atlantic Southeast7,3921,473,933199.396
USAir521257,058493.393
IAH12
nicknameflight_‚Äčcounttotal_‚Äčdistanceavg_‚Äčdistance
Continental Express2,8611,634,764571.396
Continental1,9701,699,834862.86
USAir600695,1411,158.568
PHX12
nicknameflight_‚Äčcounttotal_‚Äčdistanceavg_‚Äčdistance
Southwest6,4564,842,222750.034
America West3,6962,620,479709.004
United662614,817928.727
SEA12
nicknameflight_‚Äčcounttotal_‚Äčdistanceavg_‚Äčdistance
Alaska3,0302,633,111869.014
Southwest1,4091,064,397755.427
Northwest7511,228,9181,636.375
[
  {
    "origin": "DFW",
    "carrier_count": 13,
    "top_3_carriers": [
      {
        "nickname": "American",
        "flight_count": 8742,
        "total_distance": 8419987,
        "avg_distance": 963.1648364218714
      },
      {
        "nickname": "American Eagle",
        "flight_count": 6146,
        "total_distance": 1464011,
        "avg_distance": 238.20549951187763
      },
      {
        "nickname": "Delta",
        "flight_count": 668,
        "total_distance": 563628,
        "avg_distance": 843.754491017964
      }
    ]
  },
  {
    "origin": "ATL",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Delta",
        "flight_count": 8419,
        "total_distance": 5793004,
        "avg_distance": 688.0869461931346
      },
      {
        "nickname": "Atlantic Southeast",
        "flight_count": 7392,
        "total_distance": 1473933,
        "avg_distance": 199.39569805194805
      },
      {
        "nickname": "USAir",
        "flight_count": 521,
        "total_distance": 257058,
        "avg_distance": 493.3934740882917
      }
    ]
  },
  {
    "origin": "IAH",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Continental Express",
        "flight_count": 2861,
        "total_distance": 1634764,
        "avg_distance": 571.3960153792381
      },
      {
        "nickname": "Continental",
        "flight_count": 1970,
        "total_distance": 1699834,
        "avg_distance": 862.8598984771573
      },
      {
        "nickname": "USAir",
        "flight_count": 600,
        "total_distance": 695141,
        "avg_distance": 1158.5683333333334
      }
    ]
  },
  {
    "origin": "PHX",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Southwest",
        "flight_count": 6456,
        "total_distance": 4842222,
        "avg_distance": 750.0343866171004
      },
      {
        "nickname": "America West",
        "flight_count": 3696,
        "total_distance": 2620479,
        "avg_distance": 709.0040584415584
      },
      {
        "nickname": "United",
        "flight_count": 662,
        "total_distance": 614817,
        "avg_distance": 928.726586102719
      }
    ]
  },
  {
    "origin": "SEA",
    "carrier_count": 12,
    "top_3_carriers": [
      {
        "nickname": "Alaska",
        "flight_count": 3030,
        "total_distance": 2633111,
        "avg_distance": 869.0135313531354
      },
      {
        "nickname": "Southwest",
        "flight_count": 1409,
        "total_distance": 1064397,
        "avg_distance": 755.4272533711852
      },
      {
        "nickname": "Northwest",
        "flight_count": 751,
        "total_distance": 1228918,
        "avg_distance": 1636.375499334221
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    flights2."origin" as "origin__0",
    CASE WHEN group_set=0 THEN
      COUNT(DISTINCT carriers2_0."__distinct_key")
      END as "carrier_count__0",
    CASE WHEN group_set=1 THEN
      carriers2_0."nickname"
      END as "nickname__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "flight_count__1",
    CASE WHEN group_set=1 THEN
      COALESCE(SUM(flights2."distance"),0)
      END as "total_distance__1",
    CASE WHEN group_set=1 THEN
      AVG(flights2."distance")
      END as "avg_distance__1"
  FROM '../data/flights.parquet' as flights2
  LEFT JOIN (SELECT GEN_RANDOM_UUID() as __distinct_key, x.*  FROM '../data/carriers.parquet' as x) AS carriers2_0
    ON flights2."carrier"=carriers2_0."code"
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,4
)
SELECT
  "origin__0" as "origin",
  MAX(CASE WHEN group_set=0 THEN "carrier_count__0" END) as "carrier_count",
  COALESCE(LIST({
    "nickname": "nickname__1", 
    "flight_count": "flight_count__1", 
    "total_distance": "total_distance__1", 
    "avg_distance": "avg_distance__1"}  ORDER BY  "flight_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_carriers"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST
LIMIT 10

More Complex Joins

The most common join pattern is a foreign key join. Malloy uses the with: to declare these and generates more efficient SQL when these joins are used.

In the example below, we use a with: join for carriers and then model the more complex relationship with the flights originating from each airport using on:.

Many flights have the same airport as their origin so we use join_many:.

document
source: carriers4 is duckdb.table('../data/carriers.parquet') extend {
  primary_key: code
  measure: carrier_count is count()
}

source: flights4 is duckdb.table('../data/flights.parquet') extend {
  join_one: carriers4 with carrier  // <-- each flight has 1 carrier

  measure:
    flight_count is count()
    total_distance is distance.sum()
    avg_distance is distance.avg()
}

source: airports4 is duckdb.table('../data/airports.parquet') extend {
  join_many: flights4 on code = flights4.origin  // <-- each airport has many flights

  measure: airport_count is count()
  dimension: elevation_in_meters is elevation * 0.3048
  measure: avg_elevation_in_meters is elevation_in_meters.avg()

  view: by_state is {
    group_by: state
    aggregate: airport_count
  }
}

Calculations work properly regardless of where you are in the graph

This query is very difficult to express in SQL. Malloy's understanding of source relationships allows it to compute aggregate computations at any node of the join path, unlike SQL which can only do aggregate computation at the. outermost level. (Aggregate Documentation)

using the above declared airports source

document
run: airports4 ->  {
  group_by: state
  aggregate:
    flights4.carriers4.carrier_count  // <-- 3 levels
    flights4.flight_count
    flights4.total_distance
    airport_count
    avg_elevation_in_meters         // <-- symmetric calculation
}
QUERY RESULTS
statecarrier_‚Äčcountflight_‚Äčcounttotal_‚Äčdistanceairport_‚Äčcountavg_‚Äčelevation_‚Äčin_‚Äčmeters
TX1441,90324,171,1821,845273.799
CA1441,63937,690,414984331.533
LA134,7392,644,46050019.65
IN132,9631,495,747643231.508
OH128,0713,562,086749279.498
[
  {
    "state": "TX",
    "carrier_count": 14,
    "flight_count": 41903,
    "total_distance": 24171182,
    "airport_count": 1845,
    "avg_elevation_in_meters": 273.7994445528455
  },
  {
    "state": "CA",
    "carrier_count": 14,
    "flight_count": 41639,
    "total_distance": 37690414,
    "airport_count": 984,
    "avg_elevation_in_meters": 331.5331902439024
  },
  {
    "state": "LA",
    "carrier_count": 13,
    "flight_count": 4739,
    "total_distance": 2644460,
    "airport_count": 500,
    "avg_elevation_in_meters": 19.650456
  },
  {
    "state": "IN",
    "carrier_count": 13,
    "flight_count": 2963,
    "total_distance": 1495747,
    "airport_count": 643,
    "avg_elevation_in_meters": 231.50816174183515
  },
  {
    "state": "OH",
    "carrier_count": 12,
    "flight_count": 8071,
    "total_distance": 3562086,
    "airport_count": 749,
    "avg_elevation_in_meters": 279.4975305740988
  }
]
SELECT 
   airports4."state" as "state",
   COUNT(DISTINCT carriers4_0."code") as "carrier_count",
   COUNT( 1) as "flight_count",
   COALESCE(SUM(flights4_0."distance"),0) as "total_distance",
   COUNT(DISTINCT airports4."__distinct_key") as "airport_count",
   (
        SELECT AVG(a.val) as value
        FROM (
          SELECT UNNEST(list(distinct {key:airports4."__distinct_key", val: (airports4."elevation"*0.3048)})) a
        )
      ) as "avg_elevation_in_meters"
FROM (SELECT GEN_RANDOM_UUID() as __distinct_key, x.*  FROM '../data/airports.parquet' as x) as airports4
LEFT JOIN '../data/flights.parquet' AS flights4_0
  ON airports4."code"=flights4_0."origin"
LEFT JOIN '../data/carriers.parquet' AS carriers4_0
  ON carriers4_0."code"=flights4_0."carrier"
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Pipelines

The output of a query can be used as the source for the next query.

Assume the following query as a starting point.

document
source: airports5 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
}

run: airports5 -> {
  where: fac_type = 'HELIPORT'
  group_by: state
  aggregate: airport_count
  nest: top_3_county is {
    limit: 3
    group_by: county
    aggregate: airport_count
  }
}
QUERY RESULTS
stateairport_‚Äčcounttop_‚Äč3_‚Äčcounty
TX435
countyairport_‚Äčcount
HARRIS110
TARRANT35
DALLAS32
CA396
countyairport_‚Äčcount
LOS ANGELES151
ORANGE47
SAN BERNARDINO24
PA307
countyairport_‚Äčcount
MONTGOMERY29
PHILADELPHIA22
ALLEGHENY22
FL280
countyairport_‚Äčcount
PALM BEACH30
DADE27
ORANGE24
NJ247
countyairport_‚Äčcount
MIDDLESEX26
MONMOUTH23
SOMERSET23
[
  {
    "state": "TX",
    "airport_count": 435,
    "top_3_county": [
      {
        "county": "HARRIS",
        "airport_count": 110
      },
      {
        "county": "TARRANT",
        "airport_count": 35
      },
      {
        "county": "DALLAS",
        "airport_count": 32
      }
    ]
  },
  {
    "state": "CA",
    "airport_count": 396,
    "top_3_county": [
      {
        "county": "LOS ANGELES",
        "airport_count": 151
      },
      {
        "county": "ORANGE",
        "airport_count": 47
      },
      {
        "county": "SAN BERNARDINO",
        "airport_count": 24
      }
    ]
  },
  {
    "state": "PA",
    "airport_count": 307,
    "top_3_county": [
      {
        "county": "MONTGOMERY",
        "airport_count": 29
      },
      {
        "county": "PHILADELPHIA",
        "airport_count": 22
      },
      {
        "county": "ALLEGHENY",
        "airport_count": 22
      }
    ]
  },
  {
    "state": "FL",
    "airport_count": 280,
    "top_3_county": [
      {
        "county": "PALM BEACH",
        "airport_count": 30
      },
      {
        "county": "DADE",
        "airport_count": 27
      },
      {
        "county": "ORANGE",
        "airport_count": 24
      }
    ]
  },
  {
    "state": "NJ",
    "airport_count": 247,
    "top_3_county": [
      {
        "county": "MIDDLESEX",
        "airport_count": 26
      },
      {
        "county": "MONMOUTH",
        "airport_count": 23
      },
      {
        "county": "SOMERSET",
        "airport_count": 23
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports5."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports5."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports5
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE airports5."fac_type"='HELIPORT'
  GROUP BY 1,2,4
)
SELECT
  "state__0" as "state",
  MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
  COALESCE(LIST({
    "county": "county__1", 
    "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_county"
FROM __stage0
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Un-nesting in a pipeline flattens the table

Queries can be chained together (pipelined), the output of one becoming the input of the next one, by simply adding another -> operator and a new query definition.

document
source: airports6 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
}

run: airports6 -> {
  where: fac_type = 'HELIPORT'
  group_by: state
  aggregate: airport_count
  nest: top_3_county is {
    limit: 3
    group_by: county
    aggregate: airport_count
  }
} -> {
  select:
    state
    top_3_county.county
    airports_in_state is airport_count
    airports_in_county is top_3_county.airport_count
    percent_of_state is top_3_county.airport_count / airport_count
}
QUERY RESULTS
statecountyairports_‚Äčin_‚Äčstateairports_‚Äčin_‚Äčcountypercent_‚Äčof_‚Äčstate
TXDALLAS435320.074
CASAN BERNARDINO396240.061
PAALLEGHENY307220.072
FLORANGE280240.086
NJSOMERSET247230.093
[
  {
    "state": "TX",
    "county": "DALLAS",
    "airports_in_state": 435,
    "airports_in_county": 32,
    "percent_of_state": 0.0735632183908046
  },
  {
    "state": "CA",
    "county": "SAN BERNARDINO",
    "airports_in_state": 396,
    "airports_in_county": 24,
    "percent_of_state": 0.06060606060606061
  },
  {
    "state": "PA",
    "county": "ALLEGHENY",
    "airports_in_state": 307,
    "airports_in_county": 22,
    "percent_of_state": 0.07166123778501629
  },
  {
    "state": "FL",
    "county": "ORANGE",
    "airports_in_state": 280,
    "airports_in_county": 24,
    "percent_of_state": 0.08571428571428572
  },
  {
    "state": "NJ",
    "county": "SOMERSET",
    "airports_in_state": 247,
    "airports_in_county": 23,
    "percent_of_state": 0.0931174089068826
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports6."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports6."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports6
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE airports6."fac_type"='HELIPORT'
  GROUP BY 1,2,4
)
, __stage1 AS (
  SELECT
    "state__0" as "state",
    MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
    COALESCE(LIST({
      "county": "county__1", 
      "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_county"
  FROM __stage0
  GROUP BY 1
  ORDER BY 2 desc NULLS LAST
)
SELECT 
   base."state" as "state",
   base.top_3_county[top_3_county_0.__row_id]."county" as "county",
   base."airport_count" as "airports_in_state",
   base.top_3_county[top_3_county_0.__row_id]."airport_count" as "airports_in_county",
   base.top_3_county[top_3_county_0.__row_id]."airport_count"*1.0/base."airport_count" as "percent_of_state"
FROM __stage1 as base
LEFT JOIN (select UNNEST(generate_series(1,
        array_length(base."top_3_county"),
        1)) as __row_id) as top_3_county_0 ON  top_3_county_0.__row_id <= array_length(base."top_3_county")

Pipelines can be named as views in sources

Pipelines can do pretty complex things. They can be built into source objects.

document
source: airports7 is duckdb.table('../data/airports.parquet') extend {
  measure: airport_count is count()
  view: county_rollup is {
    where: fac_type = 'HELIPORT'
    group_by: state
    aggregate: airport_count
    nest: top_3_county is {
      limit: 3
      group_by: county
      aggregate: airport_count
    }
  } -> {
    select:
      state
      top_3_county.county
      airports_in_state is airport_count
      airports_in_county is top_3_county.airport_count
      percent_of_state is top_3_county.airport_count/airport_count
  }
}

run: airports7 -> county_rollup
QUERY RESULTS
statecountyairports_‚Äčin_‚Äčstateairports_‚Äčin_‚Äčcountypercent_‚Äčof_‚Äčstate
TXDALLAS435320.074
CASAN BERNARDINO396240.061
PAALLEGHENY307220.072
FLORANGE280240.086
NJSOMERSET247230.093
[
  {
    "state": "TX",
    "county": "DALLAS",
    "airports_in_state": 435,
    "airports_in_county": 32,
    "percent_of_state": 0.0735632183908046
  },
  {
    "state": "CA",
    "county": "SAN BERNARDINO",
    "airports_in_state": 396,
    "airports_in_county": 24,
    "percent_of_state": 0.06060606060606061
  },
  {
    "state": "PA",
    "county": "ALLEGHENY",
    "airports_in_state": 307,
    "airports_in_county": 22,
    "percent_of_state": 0.07166123778501629
  },
  {
    "state": "FL",
    "county": "ORANGE",
    "airports_in_state": 280,
    "airports_in_county": 24,
    "percent_of_state": 0.08571428571428572
  },
  {
    "state": "NJ",
    "county": "SOMERSET",
    "airports_in_state": 247,
    "airports_in_county": 23,
    "percent_of_state": 0.0931174089068826
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    airports7."state" as "state__0",
    CASE WHEN group_set=0 THEN
      COUNT( 1)
      END as "airport_count__0",
    CASE WHEN group_set=1 THEN
      airports7."county"
      END as "county__1",
    CASE WHEN group_set=1 THEN
      COUNT( 1)
      END as "airport_count__1"
  FROM '../data/airports.parquet' as airports7
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  WHERE airports7."fac_type"='HELIPORT'
  GROUP BY 1,2,4
)
, __stage1 AS (
  SELECT
    "state__0" as "state",
    MAX(CASE WHEN group_set=0 THEN "airport_count__0" END) as "airport_count",
    COALESCE(LIST({
      "county": "county__1", 
      "airport_count": "airport_count__1"}  ORDER BY  "airport_count__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:3],[]) as "top_3_county"
  FROM __stage0
  GROUP BY 1
  ORDER BY 2 desc NULLS LAST
)
SELECT 
   base."state" as "state",
   base.top_3_county[top_3_county_0.__row_id]."county" as "county",
   base."airport_count" as "airports_in_state",
   base.top_3_county[top_3_county_0.__row_id]."airport_count" as "airports_in_county",
   base.top_3_county[top_3_county_0.__row_id]."airport_count"*1.0/base."airport_count" as "percent_of_state"
FROM __stage1 as base
LEFT JOIN (select UNNEST(generate_series(1,
        array_length(base."top_3_county"),
        1)) as __row_id) as top_3_county_0 ON  top_3_county_0.__row_id <= array_length(base."top_3_county")

Extending Sources

Similar to query refinement, a source can be extended with the extend { ... } gesture to create a new version of the source with additional properties.

source: new name is old name extend {
  where: some data limit
  measure: new field declarations
}

Sources based on Queries

Sometimes it is helpful to pre-process data before modeling it out with a source. To do this, you can first define a model-level query using a query statement, then extend the query to add reusable definitions.

Model-Level Queries

Here, we define a model-level query q_airport_facts based on the flights.parquet table.

document
query: q_airport_facts is duckdb.table('../data/flights.parquet') -> {
  group_by:
    flight_year is dep_time.year
    origin
    carrier
  aggregate:
    num_flights is count()
    distance is distance.sum()
}

run: q_airport_facts
query: q_airport_facts2 is duckdb.table('../data/flights.parquet') -> {
  group_by:
    flight_year is dep_time.year
    origin
    carrier
  aggregate:
    num_flights is count()
    distance is distance.sum()
}
QUERY RESULTS
flight_‚Äčyearorigincarriernum_‚Äčflightsdistance
2005MIAUA4054,336
2005ATLUA3847,209
2005CLEUA64,551
2005SJCUA3353,521
2005DCAUA4433,840
[
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "MIA",
    "carrier": "UA",
    "num_flights": 40,
    "distance": 54336
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "ATL",
    "carrier": "UA",
    "num_flights": 38,
    "distance": 47209
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "CLE",
    "carrier": "UA",
    "num_flights": 6,
    "distance": 4551
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "SJC",
    "carrier": "UA",
    "num_flights": 33,
    "distance": 53521
  },
  {
    "flight_year": "2005-01-01T00:00:00.000Z",
    "origin": "DCA",
    "carrier": "UA",
    "num_flights": 44,
    "distance": 33840
  }
]
SELECT 
   DATE_TRUNC('year', base."dep_time") as "flight_year",
   base."origin" as "origin",
   base."carrier" as "carrier",
   COUNT( 1) as "num_flights",
   COALESCE(SUM(base."distance"),0) as "distance"
FROM '../data/flights.parquet' as base
GROUP BY 1,2,3
ORDER BY 1 desc NULLS LAST

Source based on a query

Next, we can define a source based on q_airport_facts to add reusable fields and source-level queries.

document
source: airport_facts is q_airport_facts extend {
  measure: flight_count is num_flights.sum()
  measure: total_distance is distance.sum()

  view: flights_by_year is {
    group_by: flight_year
    aggregate:
      flight_count
      carrier_count is count(carrier)
      origin_count is count(origin)
  }

  view: flights_by_origin is {
    group_by: origin
    aggregate:
      flight_count
      carrier_count is count(carrier)
  }
}

Querying the Summary source

document
run: airport_facts -> flights_by_origin
QUERY RESULTS
originflight_‚Äčcountcarrier_‚Äčcount
ATL17,87512
DFW17,78213
ORD14,21410
PHX12,47612
LAS11,09610
[
  {
    "origin": "ATL",
    "flight_count": 17875,
    "carrier_count": 12
  },
  {
    "origin": "DFW",
    "flight_count": 17782,
    "carrier_count": 13
  },
  {
    "origin": "ORD",
    "flight_count": 14214,
    "carrier_count": 10
  },
  {
    "origin": "PHX",
    "flight_count": 12476,
    "carrier_count": 12
  },
  {
    "origin": "LAS",
    "flight_count": 11096,
    "carrier_count": 10
  }
]
WITH __stage0 AS (
  SELECT 
     DATE_TRUNC('year', base."dep_time") as "flight_year",
     base."origin" as "origin",
     base."carrier" as "carrier",
     COUNT( 1) as "num_flights",
     COALESCE(SUM(base."distance"),0) as "distance"
  FROM '../data/flights.parquet' as base
  GROUP BY 1,2,3
  ORDER BY 1 desc NULLS LAST
)
SELECT 
   airport_facts."origin" as "origin",
   COALESCE(SUM(airport_facts."num_flights"),0) as "flight_count",
   count(distinct airport_facts."carrier") as "carrier_count"
FROM __stage0 as airport_facts
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Other Interesting Language Features:

SQL Sources

See the SQL Sources section.

Embedding Malloy queries in SQL (SQL Block Documentation)

Case statement improved with pick (Expression Documentation)

Group by on Joined Subtrees

Date/Timestamp filters and Timezones (Time Documentation)

Nested data and Symmetric aggregates (Aggregates Documentation)

Import (Import Documentation)

Data styles and rendering (Rendering Documentation)