Malloy Documentation
search

Malloy has a special query operator that builds _Dimensional Search Indexes for sources. A Dimensional Search Index is a table with 4 columns. Dimensional indexes are useful for a variety of things including filtering suggestions and LLMs.

When filtering data, you might know a term, but not necessarily which column in the one of the join data contains it. Indexing the data on field names and high cardinality fields let's you qucikly find the term and the associated value.

Indexing could be used by LLMs to find the interesting column/term mapping in the data set.

  • fieldName - The path to the column in the source

  • fieldValue - The dimensional value for the field (or range if fieldType is not a string)

  • fieldType - Type type of the column.

  • weight - a weighting to use as to the importance the distinct dimensional value. Defaults to cardinality of the field.

Simple Example

We're going to take the airports table and index it. The results are an un ordered list of distinct fieldName/fieldValue pairs appear in the table. The weight, in this case is the number of rows that partciular occurs on.

document
run: duckdb.table('../data/airports.parquet') -> {
  index: *
}
QUERY RESULTS
fieldNamefieldValuefieldTypeweight
codeZ53string1
code0AK8string1
codeIGTstring1
code71AKstring1
codeKGXstring1
code0AKstring1
code1AK2string1
codeZ09string1
codeILIstring1
code3Z5string1
code5KOstring1
codeMYUstring1
codeWRGstring1
codeAL05string1
codeAL33string1
code0AL2string1
code3AL2string1
codeAL52string1
code8A1string1
code60Astring1
code2AR9string1
codeAR51string1
code3AR7string1
code49Estring1
code33Estring1
codeAZ90string1
codeAZCstring1
code0AZ6string1
codeCFTstring1
codeSDLstring1
codeO62string1
code6L2string1
codeCL52string1
code0CA7string1
codeCL26string1
codeCA58string1
code3L7string1
codeCL61string1
codeL03string1
codeGNUstring1
codeCLPstring1
code5AR7string1
code0Y3string1
code2AR3string1
codeAR01string1
code7E1string1
code95CAstring1
codeCL80string1
code3CL4string1
codeCA94string1
codeQ69string1
codeAK06string1
site_number51516.21*Astring1
site_number50714.44*Astring1
site_number50711.*Astring1
site_number50199.6*Cstring1
site_number50181.7*Astring1
site_number50036.03*Astring1
site_number50692.*Astring1
site_number50340.8*Astring1
site_number50920.*Astring1
site_number50738.*Astring1
site_number00302.01*Hstring1
site_number00310.1*Astring1
site_number00536.*Cstring1
site_number00374.1*Astring1
site_number00267.*Astring1
site_number00329.*Astring1
site_number01101.1*Astring1
site_number01156.02*Astring1
site_number01016.2*Astring1
site_number00937.01*Hstring1
site_number01060.01*Astring1
site_number01156.01*Hstring1
site_number01104.4*Astring1
site_number00742.12*Hstring1
site_number00812.06*Hstring1
site_number00802.4*Ustring1
site_number01796.*Astring1
site_number01696.01*Astring1
site_number01280.8*Hstring1
site_number01818.4*Hstring1
site_number01767.12*Astring1
site_number01340.281*Astring1
site_number01625.1*Hstring1
site_number50672.01*Astring1
site_number50033.5*Astring1
site_number50221.3*Cstring1
site_number50216.31*Astring1
site_number50130.1*Hstring1
site_number00812.08*Hstring1
site_number00751.18*Hstring1
site_number00825.21*Astring1
site_number02422.01*Astring1
site_number02352.34*Hstring1
site_number50725.*Astring1
site_number00642.9*Astring1
site_number50215.1*Astring1
site_number01975.11*Hstring1
site_number01126.12*Astring1
[
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "Z53",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "0AK8",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "IGT",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "71AK",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "KGX",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "0AK",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "1AK2",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "Z09",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "ILI",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "3Z5",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "5KO",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "MYU",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "WRG",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AL05",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AL33",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "0AL2",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "3AL2",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AL52",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "8A1",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "60A",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "2AR9",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AR51",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "3AR7",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "49E",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "33E",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AZ90",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AZC",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "0AZ6",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CFT",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "SDL",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "O62",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "6L2",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CL52",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "0CA7",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CL26",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CA58",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "3L7",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CL61",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "L03",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "GNU",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CLP",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "5AR7",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "0Y3",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "2AR3",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AR01",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "7E1",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "95CA",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CL80",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "3CL4",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "CA94",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "Q69",
    "weight": 1
  },
  {
    "fieldName": "code",
    "fieldType": "string",
    "fieldValue": "AK06",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "51516.21*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50714.44*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50711.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50199.6*C",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50181.7*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50036.03*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50692.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50340.8*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50920.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50738.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00302.01*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00310.1*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00536.*C",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00374.1*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00267.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00329.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01101.1*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01156.02*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01016.2*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00937.01*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01060.01*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01156.01*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01104.4*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00742.12*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00812.06*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00802.4*U",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01796.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01696.01*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01280.8*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01818.4*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01767.12*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01340.281*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01625.1*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50672.01*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50033.5*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50221.3*C",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50216.31*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50130.1*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00812.08*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00751.18*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00825.21*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "02422.01*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "02352.34*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50725.*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "00642.9*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "50215.1*A",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01975.11*H",
    "weight": 1
  },
  {
    "fieldName": "site_number",
    "fieldType": "string",
    "fieldValue": "01126.12*A",
    "weight": 1
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE group_set
      WHEN 0 THEN 'id'
      WHEN 1 THEN 'code'
      WHEN 2 THEN 'site_number'
      WHEN 3 THEN 'fac_type'
      WHEN 4 THEN 'fac_use'
      WHEN 5 THEN 'faa_region'
      WHEN 6 THEN 'faa_dist'
      WHEN 7 THEN 'city'
      WHEN 8 THEN 'county'
      WHEN 9 THEN 'state'
      WHEN 10 THEN 'full_name'
      WHEN 11 THEN 'own_type'
      WHEN 12 THEN 'longitude'
      WHEN 13 THEN 'latitude'
      WHEN 14 THEN 'elevation'
      WHEN 15 THEN 'aero_cht'
      WHEN 16 THEN 'cbd_dist'
      WHEN 17 THEN 'cbd_dir'
      WHEN 18 THEN 'act_date'
      WHEN 19 THEN 'cert'
      WHEN 20 THEN 'fed_agree'
      WHEN 21 THEN 'cust_intl'
      WHEN 22 THEN 'c_ldg_rts'
      WHEN 23 THEN 'joint_use'
      WHEN 24 THEN 'mil_rts'
      WHEN 25 THEN 'cntl_twr'
      WHEN 26 THEN 'major'
    END as "fieldName",  CASE group_set
      WHEN 0 THEN 'number'
      WHEN 1 THEN 'string'
      WHEN 2 THEN 'string'
      WHEN 3 THEN 'string'
      WHEN 4 THEN 'string'
      WHEN 5 THEN 'string'
      WHEN 6 THEN 'string'
      WHEN 7 THEN 'string'
      WHEN 8 THEN 'string'
      WHEN 9 THEN 'string'
      WHEN 10 THEN 'string'
      WHEN 11 THEN 'string'
      WHEN 12 THEN 'number'
      WHEN 13 THEN 'number'
      WHEN 14 THEN 'number'
      WHEN 15 THEN 'string'
      WHEN 16 THEN 'number'
      WHEN 17 THEN 'string'
      WHEN 18 THEN 'string'
      WHEN 19 THEN 'string'
      WHEN 20 THEN 'string'
      WHEN 21 THEN 'string'
      WHEN 22 THEN 'string'
      WHEN 23 THEN 'string'
      WHEN 24 THEN 'string'
      WHEN 25 THEN 'string'
      WHEN 26 THEN 'string'
    END as "fieldType",  CASE group_set WHEN 99999 THEN NULL
      WHEN 1 THEN base."code"
      WHEN 2 THEN base."site_number"
      WHEN 3 THEN base."fac_type"
      WHEN 4 THEN base."fac_use"
      WHEN 5 THEN base."faa_region"
      WHEN 6 THEN base."faa_dist"
      WHEN 7 THEN base."city"
      WHEN 8 THEN base."county"
      WHEN 9 THEN base."state"
      WHEN 10 THEN base."full_name"
      WHEN 11 THEN base."own_type"
      WHEN 15 THEN base."aero_cht"
      WHEN 17 THEN base."cbd_dir"
      WHEN 18 THEN base."act_date"
      WHEN 19 THEN base."cert"
      WHEN 20 THEN base."fed_agree"
      WHEN 21 THEN base."cust_intl"
      WHEN 22 THEN base."c_ldg_rts"
      WHEN 23 THEN base."joint_use"
      WHEN 24 THEN base."mil_rts"
      WHEN 25 THEN base."cntl_twr"
      WHEN 26 THEN base."major"
    END as "fieldValue",
   COUNT(*) as weight,
    CASE group_set
      WHEN 99999 THEN ''    WHEN 0 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR)
      WHEN 12 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR)
      WHEN 13 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR)
      WHEN 14 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR)
      WHEN 16 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR)
    END as "fieldRange"
  FROM '../data/airports.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
SELECT
  "fieldName",
  "fieldType",
  COALESCE("fieldValue", "fieldRange") as "fieldValue",
  weight
FROM __stage0

Add Ordering

Adding a second query stage to filter on string columns and ordering by weight descending shows us the most common fieldName/fieldValue pairs in the dataset.

All Malloy queries run as a single SQL query. The index: operator is no different. Click the SQL tab to see how this works.

document
run: duckdb.table('../data/airports.parquet') -> {
  index: *
} -> {
  where: fieldType = 'string'
  select: *
  order_by: weight desc
}
QUERY RESULTS
fieldNamefieldValuefieldTypeweight
majorNstring19,523
certstring19,142
cntl_twrNstring19,124
fed_agreestring16,252
cust_intlstring15,145
c_ldg_rtsstring15,145
joint_usestring14,804
mil_rtsstring14,716
fac_usePRstring14,428
own_typePRstring14,306
fac_typeAIRPORTstring13,925
act_datestring12,040
faa_distNONEstring7,085
fac_usePUstring5,365
own_typePUstring5,174
fac_typeHELIPORTstring5,135
joint_useNstring4,779
cust_intlNstring4,575
faa_regionAGLstring4,437
c_ldg_rtsNstring4,365
cbd_dirNstring3,694
faa_regionASWstring3,268
mil_rtsYstring2,958
faa_regionASOstring2,924
faa_regionAEAstring2,586
cbd_dirNWstring2,438
cbd_dirSWstring2,427
cbd_dirNEstring2,393
cbd_dirSEstring2,281
cbd_dirSstring2,155
cbd_dirEstring2,127
mil_rtsNstring2,119
faa_regionANMstring2,102
cbd_dirWstring2,065
stateTXstring1,845
fed_agreeNGYstring1,682
aero_chtNEW YORKstring1,581
faa_regionACEstring1,579
faa_distCHIstring1,528
faa_regionAWPstring1,503
aero_chtCHICAGOstring1,480
aero_chtDETROITstring1,262
faa_distDETstring1,240
faa_distSEAstring1,162
faa_distMSPstring1,046
faa_distATLstring1,029
stateCAstring984
faa_distNYCstring957
faa_distORLstring905
stateILstring890
aero_chtDALLAS-FT WORTHstring888
aero_chtWASHINGTONstring887
stateFLstring856
aero_chtST LOUISstring847
faa_distHARstring846
statePAstring804
aero_chtATLANTAstring780
aero_chtHOUSTONstring765
faa_regionANEstring763
stateOHstring749
aero_chtSEATTLEstring749
aero_chtKANSAS CITYstring713
aero_chtTWIN CITIESstring699
faa_distDENstring679
cntl_twrYstring669
faa_distDCAstring667
stateINstring643
aero_chtCINCINNATIstring624
faa_distBISstring616
faa_regionAALstring608
stateAKstring608
aero_chtMEMPHISstring605
faa_distSFOstring585
stateNYstring576
aero_chtOMAHAstring551
stateWIstring543
stateMOstring537
fed_agreeNstring515
stateMNstring507
aero_chtSAN ANTONIOstring503
faa_distJANstring502
stateLAstring500
aero_chtLOS ANGELESstring498
stateMIstring489
faa_distMEMstring487
stateWAstring484
fac_typeSEAPLANE BASEstring473
aero_chtJACKSONVILLEstring467
aero_chtNEW ORLEANSstring444
stateOKstring443
stateORstring441
stateGAstring440
stateNDstring436
aero_chtSAN FRANCISCOstring427
stateCOstring425
stateVAstring421
aero_chtGREEN BAYstring418
stateKSstring415
stateNCstring400
aero_chtCHARLOTTEstring380
[
  {
    "fieldName": "major",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 19523
  },
  {
    "fieldName": "cert",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 19142
  },
  {
    "fieldName": "cntl_twr",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 19124
  },
  {
    "fieldName": "fed_agree",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 16252
  },
  {
    "fieldName": "cust_intl",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 15145
  },
  {
    "fieldName": "c_ldg_rts",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 15145
  },
  {
    "fieldName": "joint_use",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 14804
  },
  {
    "fieldName": "mil_rts",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 14716
  },
  {
    "fieldName": "fac_use",
    "fieldValue": "PR",
    "fieldType": "string",
    "weight": 14428
  },
  {
    "fieldName": "own_type",
    "fieldValue": "PR",
    "fieldType": "string",
    "weight": 14306
  },
  {
    "fieldName": "fac_type",
    "fieldValue": "AIRPORT",
    "fieldType": "string",
    "weight": 13925
  },
  {
    "fieldName": "act_date",
    "fieldValue": null,
    "fieldType": "string",
    "weight": 12040
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "NONE",
    "fieldType": "string",
    "weight": 7085
  },
  {
    "fieldName": "fac_use",
    "fieldValue": "PU",
    "fieldType": "string",
    "weight": 5365
  },
  {
    "fieldName": "own_type",
    "fieldValue": "PU",
    "fieldType": "string",
    "weight": 5174
  },
  {
    "fieldName": "fac_type",
    "fieldValue": "HELIPORT",
    "fieldType": "string",
    "weight": 5135
  },
  {
    "fieldName": "joint_use",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 4779
  },
  {
    "fieldName": "cust_intl",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 4575
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "AGL",
    "fieldType": "string",
    "weight": 4437
  },
  {
    "fieldName": "c_ldg_rts",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 4365
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 3694
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "ASW",
    "fieldType": "string",
    "weight": 3268
  },
  {
    "fieldName": "mil_rts",
    "fieldValue": "Y",
    "fieldType": "string",
    "weight": 2958
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "ASO",
    "fieldType": "string",
    "weight": 2924
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "AEA",
    "fieldType": "string",
    "weight": 2586
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "NW",
    "fieldType": "string",
    "weight": 2438
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "SW",
    "fieldType": "string",
    "weight": 2427
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "NE",
    "fieldType": "string",
    "weight": 2393
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "SE",
    "fieldType": "string",
    "weight": 2281
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "S",
    "fieldType": "string",
    "weight": 2155
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "E",
    "fieldType": "string",
    "weight": 2127
  },
  {
    "fieldName": "mil_rts",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 2119
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "ANM",
    "fieldType": "string",
    "weight": 2102
  },
  {
    "fieldName": "cbd_dir",
    "fieldValue": "W",
    "fieldType": "string",
    "weight": 2065
  },
  {
    "fieldName": "state",
    "fieldValue": "TX",
    "fieldType": "string",
    "weight": 1845
  },
  {
    "fieldName": "fed_agree",
    "fieldValue": "NGY",
    "fieldType": "string",
    "weight": 1682
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "NEW YORK",
    "fieldType": "string",
    "weight": 1581
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "ACE",
    "fieldType": "string",
    "weight": 1579
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "CHI",
    "fieldType": "string",
    "weight": 1528
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "AWP",
    "fieldType": "string",
    "weight": 1503
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "CHICAGO",
    "fieldType": "string",
    "weight": 1480
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "DETROIT",
    "fieldType": "string",
    "weight": 1262
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "DET",
    "fieldType": "string",
    "weight": 1240
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "SEA",
    "fieldType": "string",
    "weight": 1162
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "MSP",
    "fieldType": "string",
    "weight": 1046
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "ATL",
    "fieldType": "string",
    "weight": 1029
  },
  {
    "fieldName": "state",
    "fieldValue": "CA",
    "fieldType": "string",
    "weight": 984
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "NYC",
    "fieldType": "string",
    "weight": 957
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "ORL",
    "fieldType": "string",
    "weight": 905
  },
  {
    "fieldName": "state",
    "fieldValue": "IL",
    "fieldType": "string",
    "weight": 890
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "DALLAS-FT WORTH",
    "fieldType": "string",
    "weight": 888
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "WASHINGTON",
    "fieldType": "string",
    "weight": 887
  },
  {
    "fieldName": "state",
    "fieldValue": "FL",
    "fieldType": "string",
    "weight": 856
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "ST LOUIS",
    "fieldType": "string",
    "weight": 847
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "HAR",
    "fieldType": "string",
    "weight": 846
  },
  {
    "fieldName": "state",
    "fieldValue": "PA",
    "fieldType": "string",
    "weight": 804
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "ATLANTA",
    "fieldType": "string",
    "weight": 780
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "HOUSTON",
    "fieldType": "string",
    "weight": 765
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "ANE",
    "fieldType": "string",
    "weight": 763
  },
  {
    "fieldName": "state",
    "fieldValue": "OH",
    "fieldType": "string",
    "weight": 749
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "SEATTLE",
    "fieldType": "string",
    "weight": 749
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "KANSAS CITY",
    "fieldType": "string",
    "weight": 713
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "TWIN CITIES",
    "fieldType": "string",
    "weight": 699
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "DEN",
    "fieldType": "string",
    "weight": 679
  },
  {
    "fieldName": "cntl_twr",
    "fieldValue": "Y",
    "fieldType": "string",
    "weight": 669
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "DCA",
    "fieldType": "string",
    "weight": 667
  },
  {
    "fieldName": "state",
    "fieldValue": "IN",
    "fieldType": "string",
    "weight": 643
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "CINCINNATI",
    "fieldType": "string",
    "weight": 624
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "BIS",
    "fieldType": "string",
    "weight": 616
  },
  {
    "fieldName": "faa_region",
    "fieldValue": "AAL",
    "fieldType": "string",
    "weight": 608
  },
  {
    "fieldName": "state",
    "fieldValue": "AK",
    "fieldType": "string",
    "weight": 608
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "MEMPHIS",
    "fieldType": "string",
    "weight": 605
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "SFO",
    "fieldType": "string",
    "weight": 585
  },
  {
    "fieldName": "state",
    "fieldValue": "NY",
    "fieldType": "string",
    "weight": 576
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "OMAHA",
    "fieldType": "string",
    "weight": 551
  },
  {
    "fieldName": "state",
    "fieldValue": "WI",
    "fieldType": "string",
    "weight": 543
  },
  {
    "fieldName": "state",
    "fieldValue": "MO",
    "fieldType": "string",
    "weight": 537
  },
  {
    "fieldName": "fed_agree",
    "fieldValue": "N",
    "fieldType": "string",
    "weight": 515
  },
  {
    "fieldName": "state",
    "fieldValue": "MN",
    "fieldType": "string",
    "weight": 507
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "SAN ANTONIO",
    "fieldType": "string",
    "weight": 503
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "JAN",
    "fieldType": "string",
    "weight": 502
  },
  {
    "fieldName": "state",
    "fieldValue": "LA",
    "fieldType": "string",
    "weight": 500
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "LOS ANGELES",
    "fieldType": "string",
    "weight": 498
  },
  {
    "fieldName": "state",
    "fieldValue": "MI",
    "fieldType": "string",
    "weight": 489
  },
  {
    "fieldName": "faa_dist",
    "fieldValue": "MEM",
    "fieldType": "string",
    "weight": 487
  },
  {
    "fieldName": "state",
    "fieldValue": "WA",
    "fieldType": "string",
    "weight": 484
  },
  {
    "fieldName": "fac_type",
    "fieldValue": "SEAPLANE BASE",
    "fieldType": "string",
    "weight": 473
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "JACKSONVILLE",
    "fieldType": "string",
    "weight": 467
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "NEW ORLEANS",
    "fieldType": "string",
    "weight": 444
  },
  {
    "fieldName": "state",
    "fieldValue": "OK",
    "fieldType": "string",
    "weight": 443
  },
  {
    "fieldName": "state",
    "fieldValue": "OR",
    "fieldType": "string",
    "weight": 441
  },
  {
    "fieldName": "state",
    "fieldValue": "GA",
    "fieldType": "string",
    "weight": 440
  },
  {
    "fieldName": "state",
    "fieldValue": "ND",
    "fieldType": "string",
    "weight": 436
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "SAN FRANCISCO",
    "fieldType": "string",
    "weight": 427
  },
  {
    "fieldName": "state",
    "fieldValue": "CO",
    "fieldType": "string",
    "weight": 425
  },
  {
    "fieldName": "state",
    "fieldValue": "VA",
    "fieldType": "string",
    "weight": 421
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "GREEN BAY",
    "fieldType": "string",
    "weight": 418
  },
  {
    "fieldName": "state",
    "fieldValue": "KS",
    "fieldType": "string",
    "weight": 415
  },
  {
    "fieldName": "state",
    "fieldValue": "NC",
    "fieldType": "string",
    "weight": 400
  },
  {
    "fieldName": "aero_cht",
    "fieldValue": "CHARLOTTE",
    "fieldType": "string",
    "weight": 380
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE group_set
      WHEN 0 THEN 'id'
      WHEN 1 THEN 'code'
      WHEN 2 THEN 'site_number'
      WHEN 3 THEN 'fac_type'
      WHEN 4 THEN 'fac_use'
      WHEN 5 THEN 'faa_region'
      WHEN 6 THEN 'faa_dist'
      WHEN 7 THEN 'city'
      WHEN 8 THEN 'county'
      WHEN 9 THEN 'state'
      WHEN 10 THEN 'full_name'
      WHEN 11 THEN 'own_type'
      WHEN 12 THEN 'longitude'
      WHEN 13 THEN 'latitude'
      WHEN 14 THEN 'elevation'
      WHEN 15 THEN 'aero_cht'
      WHEN 16 THEN 'cbd_dist'
      WHEN 17 THEN 'cbd_dir'
      WHEN 18 THEN 'act_date'
      WHEN 19 THEN 'cert'
      WHEN 20 THEN 'fed_agree'
      WHEN 21 THEN 'cust_intl'
      WHEN 22 THEN 'c_ldg_rts'
      WHEN 23 THEN 'joint_use'
      WHEN 24 THEN 'mil_rts'
      WHEN 25 THEN 'cntl_twr'
      WHEN 26 THEN 'major'
    END as "fieldName",  CASE group_set
      WHEN 0 THEN 'number'
      WHEN 1 THEN 'string'
      WHEN 2 THEN 'string'
      WHEN 3 THEN 'string'
      WHEN 4 THEN 'string'
      WHEN 5 THEN 'string'
      WHEN 6 THEN 'string'
      WHEN 7 THEN 'string'
      WHEN 8 THEN 'string'
      WHEN 9 THEN 'string'
      WHEN 10 THEN 'string'
      WHEN 11 THEN 'string'
      WHEN 12 THEN 'number'
      WHEN 13 THEN 'number'
      WHEN 14 THEN 'number'
      WHEN 15 THEN 'string'
      WHEN 16 THEN 'number'
      WHEN 17 THEN 'string'
      WHEN 18 THEN 'string'
      WHEN 19 THEN 'string'
      WHEN 20 THEN 'string'
      WHEN 21 THEN 'string'
      WHEN 22 THEN 'string'
      WHEN 23 THEN 'string'
      WHEN 24 THEN 'string'
      WHEN 25 THEN 'string'
      WHEN 26 THEN 'string'
    END as "fieldType",  CASE group_set WHEN 99999 THEN NULL
      WHEN 1 THEN base."code"
      WHEN 2 THEN base."site_number"
      WHEN 3 THEN base."fac_type"
      WHEN 4 THEN base."fac_use"
      WHEN 5 THEN base."faa_region"
      WHEN 6 THEN base."faa_dist"
      WHEN 7 THEN base."city"
      WHEN 8 THEN base."county"
      WHEN 9 THEN base."state"
      WHEN 10 THEN base."full_name"
      WHEN 11 THEN base."own_type"
      WHEN 15 THEN base."aero_cht"
      WHEN 17 THEN base."cbd_dir"
      WHEN 18 THEN base."act_date"
      WHEN 19 THEN base."cert"
      WHEN 20 THEN base."fed_agree"
      WHEN 21 THEN base."cust_intl"
      WHEN 22 THEN base."c_ldg_rts"
      WHEN 23 THEN base."joint_use"
      WHEN 24 THEN base."mil_rts"
      WHEN 25 THEN base."cntl_twr"
      WHEN 26 THEN base."major"
    END as "fieldValue",
   COUNT(*) as weight,
    CASE group_set
      WHEN 99999 THEN ''    WHEN 0 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR)
      WHEN 12 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR)
      WHEN 13 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR)
      WHEN 14 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR)
      WHEN 16 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR)
    END as "fieldRange"
  FROM '../data/airports.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
, __stage1 AS (
  SELECT
    "fieldName",
    "fieldType",
    COALESCE("fieldValue", "fieldRange") as "fieldValue",
    weight
  FROM __stage0
)
SELECT 
   base."fieldName" as "fieldName",
   base."fieldValue" as "fieldValue",
   base."fieldType" as "fieldType",
   base."weight" as "weight"
FROM __stage1 as base
WHERE base."fieldType"='string'
ORDER BY 4 desc NULLS LAST

Index For Filtering User Interfaces

Indexes can be used find the best way to filter a dataset. For example supposed we'd like to find 'SANTA CRUZ' in the dataset. Upon approaching the dataset, but we don't which column might contain it. In a UI you might imagine that you type 'SANTA' and let have suggestons for values that might be appropriate. In the results we can see that top value, 'SANTA ROSA', appears as county on 26 rows in the table. We can also see that 'SANTA CRUZ' is both a city and a county..

document
run: duckdb.table('../data/airports.parquet') -> {
  index: *
} -> {
  where: fieldValue ~ r'SANTA'
  select: *
  order_by: weight desc
  limit: 15
}
QUERY RESULTS
fieldNamefieldValuefieldTypeweight
countySANTA ROSAstring26
countySANTA BARBARAstring22
countySANTA CRUZstring10
countySANTA CLARAstring10
countySANTA FEstring9
citySANTA FEstring6
citySANTA ANAstring6
citySANTA BARBARAstring5
citySANTA ROSAstring4
citySANTA MARIAstring3
citySANTA YNEZstring3
cityRANCHO SANTA MARGARITAstring2
citySANTA YSABELstring2
citySANTA CRUZstring2
citySANTA ELENAstring2
[
  {
    "fieldName": "county",
    "fieldValue": "SANTA ROSA",
    "fieldType": "string",
    "weight": 26
  },
  {
    "fieldName": "county",
    "fieldValue": "SANTA BARBARA",
    "fieldType": "string",
    "weight": 22
  },
  {
    "fieldName": "county",
    "fieldValue": "SANTA CRUZ",
    "fieldType": "string",
    "weight": 10
  },
  {
    "fieldName": "county",
    "fieldValue": "SANTA CLARA",
    "fieldType": "string",
    "weight": 10
  },
  {
    "fieldName": "county",
    "fieldValue": "SANTA FE",
    "fieldType": "string",
    "weight": 9
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA FE",
    "fieldType": "string",
    "weight": 6
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA ANA",
    "fieldType": "string",
    "weight": 6
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA BARBARA",
    "fieldType": "string",
    "weight": 5
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA ROSA",
    "fieldType": "string",
    "weight": 4
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA MARIA",
    "fieldType": "string",
    "weight": 3
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA YNEZ",
    "fieldType": "string",
    "weight": 3
  },
  {
    "fieldName": "city",
    "fieldValue": "RANCHO SANTA MARGARITA",
    "fieldType": "string",
    "weight": 2
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA YSABEL",
    "fieldType": "string",
    "weight": 2
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA CRUZ",
    "fieldType": "string",
    "weight": 2
  },
  {
    "fieldName": "city",
    "fieldValue": "SANTA ELENA",
    "fieldType": "string",
    "weight": 2
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE group_set
      WHEN 0 THEN 'id'
      WHEN 1 THEN 'code'
      WHEN 2 THEN 'site_number'
      WHEN 3 THEN 'fac_type'
      WHEN 4 THEN 'fac_use'
      WHEN 5 THEN 'faa_region'
      WHEN 6 THEN 'faa_dist'
      WHEN 7 THEN 'city'
      WHEN 8 THEN 'county'
      WHEN 9 THEN 'state'
      WHEN 10 THEN 'full_name'
      WHEN 11 THEN 'own_type'
      WHEN 12 THEN 'longitude'
      WHEN 13 THEN 'latitude'
      WHEN 14 THEN 'elevation'
      WHEN 15 THEN 'aero_cht'
      WHEN 16 THEN 'cbd_dist'
      WHEN 17 THEN 'cbd_dir'
      WHEN 18 THEN 'act_date'
      WHEN 19 THEN 'cert'
      WHEN 20 THEN 'fed_agree'
      WHEN 21 THEN 'cust_intl'
      WHEN 22 THEN 'c_ldg_rts'
      WHEN 23 THEN 'joint_use'
      WHEN 24 THEN 'mil_rts'
      WHEN 25 THEN 'cntl_twr'
      WHEN 26 THEN 'major'
    END as "fieldName",  CASE group_set
      WHEN 0 THEN 'number'
      WHEN 1 THEN 'string'
      WHEN 2 THEN 'string'
      WHEN 3 THEN 'string'
      WHEN 4 THEN 'string'
      WHEN 5 THEN 'string'
      WHEN 6 THEN 'string'
      WHEN 7 THEN 'string'
      WHEN 8 THEN 'string'
      WHEN 9 THEN 'string'
      WHEN 10 THEN 'string'
      WHEN 11 THEN 'string'
      WHEN 12 THEN 'number'
      WHEN 13 THEN 'number'
      WHEN 14 THEN 'number'
      WHEN 15 THEN 'string'
      WHEN 16 THEN 'number'
      WHEN 17 THEN 'string'
      WHEN 18 THEN 'string'
      WHEN 19 THEN 'string'
      WHEN 20 THEN 'string'
      WHEN 21 THEN 'string'
      WHEN 22 THEN 'string'
      WHEN 23 THEN 'string'
      WHEN 24 THEN 'string'
      WHEN 25 THEN 'string'
      WHEN 26 THEN 'string'
    END as "fieldType",  CASE group_set WHEN 99999 THEN NULL
      WHEN 1 THEN base."code"
      WHEN 2 THEN base."site_number"
      WHEN 3 THEN base."fac_type"
      WHEN 4 THEN base."fac_use"
      WHEN 5 THEN base."faa_region"
      WHEN 6 THEN base."faa_dist"
      WHEN 7 THEN base."city"
      WHEN 8 THEN base."county"
      WHEN 9 THEN base."state"
      WHEN 10 THEN base."full_name"
      WHEN 11 THEN base."own_type"
      WHEN 15 THEN base."aero_cht"
      WHEN 17 THEN base."cbd_dir"
      WHEN 18 THEN base."act_date"
      WHEN 19 THEN base."cert"
      WHEN 20 THEN base."fed_agree"
      WHEN 21 THEN base."cust_intl"
      WHEN 22 THEN base."c_ldg_rts"
      WHEN 23 THEN base."joint_use"
      WHEN 24 THEN base."mil_rts"
      WHEN 25 THEN base."cntl_twr"
      WHEN 26 THEN base."major"
    END as "fieldValue",
   COUNT(*) as weight,
    CASE group_set
      WHEN 99999 THEN ''    WHEN 0 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR)
      WHEN 12 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR)
      WHEN 13 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR)
      WHEN 14 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR)
      WHEN 16 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR)
    END as "fieldRange"
  FROM '../data/airports.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
, __stage1 AS (
  SELECT
    "fieldName",
    "fieldType",
    COALESCE("fieldValue", "fieldRange") as "fieldValue",
    weight
  FROM __stage0
)
SELECT 
   base."fieldName" as "fieldName",
   base."fieldValue" as "fieldValue",
   base."fieldType" as "fieldType",
   base."weight" as "weight"
FROM __stage1 as base
WHERE REGEXP_MATCHES(base."fieldValue", 'SANTA')
ORDER BY 4 desc NULLS LAST
LIMIT 15

We can then write a simple query to show the rows. It turns out that 'SANTA CRUZ' is a county in both California and Arizona.

document
run: duckdb.table('../data/airports.parquet') -> {
  where: county ~ 'SANTA CRUZ'
  select: *
}
QUERY RESULTS
idcodesite_​numberfac_​typefac_​usefaa_​regionfaa_​distcitycountystatefull_​nameown_​typelongitudelatitudeelevationaero_​chtcbd_​distcbd_​diract_​datecertfed_​agreecust_​intlc_​ldg_​rtsjoint_​usemil_​rtscntl_​twrmajor
1,4442AZ800811.*UULTRALIGHTPRAWPNONETUBACSANTA CRUZAZTUBAC ULTRALIGHT FLIGHTPARKPR-111.0331.613,200PHOENIX1E10/1987NN
1,312OLS00739.*AAIRPORTPUAWPNONENOGALESSANTA CRUZAZNOGALES INTLPU-110.8431.413,955PHOENIX7NENGYYNNYNN
2,287NSI02220.*AAIRPORTPRAWPNONESAN NICOLAS ISLANDSANTA CRUZCASAN NICOLAS ISLAND NOLFMN-119.4533.23504LOS ANGELES0YYN
2,307CL7702241.1*AAIRPORTPRAWPSFOSANTA CRUZSANTA CRUZCABONNY DOON VILLAGEPR-122.1237.072,020SAN FRANCISCO8NW07/1986NN
2,308CA3702241.12*HHELIPORTPRAWPSFOSANTA CRUZSANTA CRUZCADOMINICAN SANTA CRUZ HOSPITALPR-121.9836.99115SAN FRANCISCO3NENN
1,6936Q601486.9*AAIRPORTPRAWPSFODAVENPORTSANTA CRUZCALAS TRANCASPR-122.2737.08125SAN FRANCISCO6NWNNNN
2,441CA6502429.1*HHELIPORTPRAWPSFOWATSONVILLESANTA CRUZCAALTA VISTAPR-121.8636.97480SAN FRANCISCO5NWNNNN
2,442CL9902429.11*HHELIPORTPRAWPSFOWATSONVILLESANTA CRUZCAWATSONVILLE COMMUNITY HOSPITALPU-121.7736.93111SAN FRANCISCO0E09/1987NN
2,443CA6602429.2*AAIRPORTPRAWPSFOWATSONVILLESANTA CRUZCAMONTEREY BAY ACADEMYPR-121.8436.970SAN FRANCISCO4WNNNN
2,440WVI02429.*AAIRPORTPUAWPSFOWATSONVILLESANTA CRUZCAWATSONVILLE MUNIPU-121.7836.93160SAN FRANCISCO3NWNGPRYNNNYNN
[
  {
    "id": 1444,
    "code": "2AZ8",
    "site_number": "00811.*U",
    "fac_type": "ULTRALIGHT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "NONE",
    "city": "TUBAC",
    "county": "SANTA CRUZ",
    "state": "AZ",
    "full_name": "TUBAC ULTRALIGHT FLIGHTPARK",
    "own_type": "PR",
    "longitude": -111.03,
    "latitude": 31.61,
    "elevation": 3200,
    "aero_cht": "PHOENIX",
    "cbd_dist": 1,
    "cbd_dir": "E",
    "act_date": "10/1987",
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": null,
    "mil_rts": null,
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 1312,
    "code": "OLS",
    "site_number": "00739.*A",
    "fac_type": "AIRPORT",
    "fac_use": "PU",
    "faa_region": "AWP",
    "faa_dist": "NONE",
    "city": "NOGALES",
    "county": "SANTA CRUZ",
    "state": "AZ",
    "full_name": "NOGALES INTL",
    "own_type": "PU",
    "longitude": -110.84,
    "latitude": 31.41,
    "elevation": 3955,
    "aero_cht": "PHOENIX",
    "cbd_dist": 7,
    "cbd_dir": "NE",
    "act_date": null,
    "cert": null,
    "fed_agree": "NGY",
    "cust_intl": "Y",
    "c_ldg_rts": "N",
    "joint_use": "N",
    "mil_rts": "Y",
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 2287,
    "code": "NSI",
    "site_number": "02220.*A",
    "fac_type": "AIRPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "NONE",
    "city": "SAN NICOLAS ISLAND",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "SAN NICOLAS ISLAND NOLF",
    "own_type": "MN",
    "longitude": -119.45,
    "latitude": 33.23,
    "elevation": 504,
    "aero_cht": "LOS ANGELES",
    "cbd_dist": 0,
    "cbd_dir": null,
    "act_date": null,
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": null,
    "mil_rts": "Y",
    "cntl_twr": "Y",
    "major": "N"
  },
  {
    "id": 2307,
    "code": "CL77",
    "site_number": "02241.1*A",
    "fac_type": "AIRPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "SANTA CRUZ",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "BONNY DOON VILLAGE",
    "own_type": "PR",
    "longitude": -122.12,
    "latitude": 37.07,
    "elevation": 2020,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 8,
    "cbd_dir": "NW",
    "act_date": "07/1986",
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": null,
    "mil_rts": null,
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 2308,
    "code": "CA37",
    "site_number": "02241.12*H",
    "fac_type": "HELIPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "SANTA CRUZ",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "DOMINICAN SANTA CRUZ HOSPITAL",
    "own_type": "PR",
    "longitude": -121.98,
    "latitude": 36.99,
    "elevation": 115,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 3,
    "cbd_dir": "NE",
    "act_date": null,
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": null,
    "mil_rts": null,
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 1693,
    "code": "6Q6",
    "site_number": "01486.9*A",
    "fac_type": "AIRPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "DAVENPORT",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "LAS TRANCAS",
    "own_type": "PR",
    "longitude": -122.27,
    "latitude": 37.08,
    "elevation": 125,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 6,
    "cbd_dir": "NW",
    "act_date": null,
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": "N",
    "mil_rts": "N",
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 2441,
    "code": "CA65",
    "site_number": "02429.1*H",
    "fac_type": "HELIPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "WATSONVILLE",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "ALTA VISTA",
    "own_type": "PR",
    "longitude": -121.86,
    "latitude": 36.97,
    "elevation": 480,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 5,
    "cbd_dir": "NW",
    "act_date": null,
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": "N",
    "mil_rts": "N",
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 2442,
    "code": "CL99",
    "site_number": "02429.11*H",
    "fac_type": "HELIPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "WATSONVILLE",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "WATSONVILLE COMMUNITY HOSPITAL",
    "own_type": "PU",
    "longitude": -121.77,
    "latitude": 36.93,
    "elevation": 111,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 0,
    "cbd_dir": "E",
    "act_date": "09/1987",
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": null,
    "mil_rts": null,
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 2443,
    "code": "CA66",
    "site_number": "02429.2*A",
    "fac_type": "AIRPORT",
    "fac_use": "PR",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "WATSONVILLE",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "MONTEREY BAY ACADEMY",
    "own_type": "PR",
    "longitude": -121.84,
    "latitude": 36.9,
    "elevation": 70,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 4,
    "cbd_dir": "W",
    "act_date": null,
    "cert": null,
    "fed_agree": null,
    "cust_intl": null,
    "c_ldg_rts": null,
    "joint_use": "N",
    "mil_rts": "N",
    "cntl_twr": "N",
    "major": "N"
  },
  {
    "id": 2440,
    "code": "WVI",
    "site_number": "02429.*A",
    "fac_type": "AIRPORT",
    "fac_use": "PU",
    "faa_region": "AWP",
    "faa_dist": "SFO",
    "city": "WATSONVILLE",
    "county": "SANTA CRUZ",
    "state": "CA",
    "full_name": "WATSONVILLE MUNI",
    "own_type": "PU",
    "longitude": -121.78,
    "latitude": 36.93,
    "elevation": 160,
    "aero_cht": "SAN FRANCISCO",
    "cbd_dist": 3,
    "cbd_dir": "NW",
    "act_date": null,
    "cert": null,
    "fed_agree": "NGPRY",
    "cust_intl": "N",
    "c_ldg_rts": "N",
    "joint_use": "N",
    "mil_rts": "Y",
    "cntl_twr": "N",
    "major": "N"
  }
]
SELECT 
   base."id" as "id",
   base."code" as "code",
   base."site_number" as "site_number",
   base."fac_type" as "fac_type",
   base."fac_use" as "fac_use",
   base."faa_region" as "faa_region",
   base."faa_dist" as "faa_dist",
   base."city" as "city",
   base."county" as "county",
   base."state" as "state",
   base."full_name" as "full_name",
   base."own_type" as "own_type",
   base."longitude" as "longitude",
   base."latitude" as "latitude",
   base."elevation" as "elevation",
   base."aero_cht" as "aero_cht",
   base."cbd_dist" as "cbd_dist",
   base."cbd_dir" as "cbd_dir",
   base."act_date" as "act_date",
   base."cert" as "cert",
   base."fed_agree" as "fed_agree",
   base."cust_intl" as "cust_intl",
   base."c_ldg_rts" as "c_ldg_rts",
   base."joint_use" as "joint_use",
   base."mil_rts" as "mil_rts",
   base."cntl_twr" as "cntl_twr",
   base."major" as "major"
FROM '../data/airports.parquet' as base
WHERE base."county" LIKE 'SANTA CRUZ'

Indexing to show top values for each dimension

It is often difficult to approach a new dataset. The index operator provides an intersting way to quickly gain an understanding of the dataset. By piping the results of an index another stage, we can quickly see all the interesting values for each of the interesting dimesions. Again, the weight shows the number of rows for that particular dimension/value.

document
run: duckdb.table('../data/airports.parquet') -> {
  index: *
} -> {
  group_by: fieldName
  nest: values is {
    group_by: fieldValue, weight
    order_by: weight desc
    limit: 10
  }
  order_by: fieldName
}
QUERY RESULTS
fieldNamevalues
act_date
fieldValueweight
12,040
04/1993145
02/1987129
03/1993127
11/1990122
11/1986119
02/1982118
08/1988115
03/1988113
11/1988113
aero_cht
fieldValueweight
NEW YORK1,581
CHICAGO1,480
DETROIT1,262
DALLAS-FT WORTH888
WASHINGTON887
ST LOUIS847
ATLANTA780
HOUSTON765
SEATTLE749
KANSAS CITY713
c_ldg_rts
fieldValueweight
15,145
N4,365
Y283
cbd_dir
fieldValueweight
N3,694
NW2,438
SW2,427
NE2,393
SE2,281
S2,155
E2,127
W2,065
119
SSW20
cbd_dist
fieldValueweight
0 to 7319,793
cert
fieldValueweight
19,142
AS 05/197394
CS 05/197384
BS 05/197381
LU 05/197337
LU 08/199033
DS 05/197326
AU 05/197325
ES 05/197315
LU 11/19749
city
fieldValueweight
HOUSTON108
LOS ANGELES60
COLUMBUS47
SPRINGFIELD45
JACKSON42
WASHINGTON38
GREENVILLE38
PHOENIX37
CLINTON37
PHILADELPHIA35
cntl_twr
fieldValueweight
N19,124
Y669
code
fieldValueweight
45AK1
GA521
CT031
5IS61
1IS71
27K1
08LA1
9LA41
IER1
53U1
county
fieldValueweight
WASHINGTON214
JEFFERSON199
LOS ANGELES176
MONTGOMERY154
JACKSON140
FRANKLIN140
HARRIS137
MARION127
ORANGE125
MARICOPA117
cust_intl
fieldValueweight
15,145
N4,575
Y73
elevation
fieldValueweight
-1 to 1244219,793
faa_dist
fieldValueweight
NONE7,085
CHI1,528
DET1,240
SEA1,162
MSP1,046
ATL1,029
NYC957
ORL905
HAR846
DEN679
faa_region
fieldValueweight
AGL4,437
ASW3,268
ASO2,924
AEA2,586
ANM2,102
ACE1,579
AWP1,503
ANE763
AAL608
23
fac_type
fieldValueweight
AIRPORT13,925
HELIPORT5,135
SEAPLANE BASE473
ULTRALIGHT125
STOLPORT86
GLIDERPORT37
BALLOONPORT12
fac_use
fieldValueweight
PR14,428
PU5,365
fed_agree
fieldValueweight
16,252
NGY1,682
N515
NGY3219
NGPY171
NY1123
N197
196
NGSY87
NGPY384
full_name
fieldValueweight
MEMORIAL HOSPITAL21
SMITH14
JOHNSON13
ST MARY'S HOSPITAL11
MILLER10
DAVIS FIELD10
HILLTOP10
FLYING W9
WILSON9
ST FRANCIS HOSPITAL9
id
fieldValueweight
1 to 1979319,793
joint_use
fieldValueweight
14,804
N4,779
Y210
latitude
fieldValueweight
-14.18 to 71.2819,793
longitude
fieldValueweight
-100.0 to 174.1119,793
major
fieldValueweight
N19,523
Y270
mil_rts
fieldValueweight
14,716
Y2,958
N2,119
own_type
fieldValueweight
PR14,306
PU5,174
MR128
MA107
MN78
site_number
fieldValueweight
50717.1*C1
00810.*A1
03111.4*A1
03570.11*H1
06260.*A1
04865.3*A1
04861.03*A1
04999.76*A1
05247.15*H1
11765.1*H1
state
fieldValueweight
TX1,845
CA984
IL890
FL856
PA804
OH749
IN643
AK608
NY576
WI543
fieldValueweight
19,793
[
  {
    "fieldName": "act_date",
    "values": [
      {
        "fieldValue": null,
        "weight": 12040
      },
      {
        "fieldValue": "04/1993",
        "weight": 145
      },
      {
        "fieldValue": "02/1987",
        "weight": 129
      },
      {
        "fieldValue": "03/1993",
        "weight": 127
      },
      {
        "fieldValue": "11/1990",
        "weight": 122
      },
      {
        "fieldValue": "11/1986",
        "weight": 119
      },
      {
        "fieldValue": "02/1982",
        "weight": 118
      },
      {
        "fieldValue": "08/1988",
        "weight": 115
      },
      {
        "fieldValue": "03/1988",
        "weight": 113
      },
      {
        "fieldValue": "11/1988",
        "weight": 113
      }
    ]
  },
  {
    "fieldName": "aero_cht",
    "values": [
      {
        "fieldValue": "NEW YORK",
        "weight": 1581
      },
      {
        "fieldValue": "CHICAGO",
        "weight": 1480
      },
      {
        "fieldValue": "DETROIT",
        "weight": 1262
      },
      {
        "fieldValue": "DALLAS-FT WORTH",
        "weight": 888
      },
      {
        "fieldValue": "WASHINGTON",
        "weight": 887
      },
      {
        "fieldValue": "ST LOUIS",
        "weight": 847
      },
      {
        "fieldValue": "ATLANTA",
        "weight": 780
      },
      {
        "fieldValue": "HOUSTON",
        "weight": 765
      },
      {
        "fieldValue": "SEATTLE",
        "weight": 749
      },
      {
        "fieldValue": "KANSAS CITY",
        "weight": 713
      }
    ]
  },
  {
    "fieldName": "c_ldg_rts",
    "values": [
      {
        "fieldValue": null,
        "weight": 15145
      },
      {
        "fieldValue": "N",
        "weight": 4365
      },
      {
        "fieldValue": "Y",
        "weight": 283
      }
    ]
  },
  {
    "fieldName": "cbd_dir",
    "values": [
      {
        "fieldValue": "N",
        "weight": 3694
      },
      {
        "fieldValue": "NW",
        "weight": 2438
      },
      {
        "fieldValue": "SW",
        "weight": 2427
      },
      {
        "fieldValue": "NE",
        "weight": 2393
      },
      {
        "fieldValue": "SE",
        "weight": 2281
      },
      {
        "fieldValue": "S",
        "weight": 2155
      },
      {
        "fieldValue": "E",
        "weight": 2127
      },
      {
        "fieldValue": "W",
        "weight": 2065
      },
      {
        "fieldValue": null,
        "weight": 119
      },
      {
        "fieldValue": "SSW",
        "weight": 20
      }
    ]
  },
  {
    "fieldName": "cbd_dist",
    "values": [
      {
        "fieldValue": "0 to 73",
        "weight": 19793
      }
    ]
  },
  {
    "fieldName": "cert",
    "values": [
      {
        "fieldValue": null,
        "weight": 19142
      },
      {
        "fieldValue": "AS 05/1973",
        "weight": 94
      },
      {
        "fieldValue": "CS 05/1973",
        "weight": 84
      },
      {
        "fieldValue": "BS 05/1973",
        "weight": 81
      },
      {
        "fieldValue": "LU 05/1973",
        "weight": 37
      },
      {
        "fieldValue": "LU 08/1990",
        "weight": 33
      },
      {
        "fieldValue": "DS 05/1973",
        "weight": 26
      },
      {
        "fieldValue": "AU 05/1973",
        "weight": 25
      },
      {
        "fieldValue": "ES 05/1973",
        "weight": 15
      },
      {
        "fieldValue": "LU 11/1974",
        "weight": 9
      }
    ]
  },
  {
    "fieldName": "city",
    "values": [
      {
        "fieldValue": "HOUSTON",
        "weight": 108
      },
      {
        "fieldValue": "LOS ANGELES",
        "weight": 60
      },
      {
        "fieldValue": "COLUMBUS",
        "weight": 47
      },
      {
        "fieldValue": "SPRINGFIELD",
        "weight": 45
      },
      {
        "fieldValue": "JACKSON",
        "weight": 42
      },
      {
        "fieldValue": "WASHINGTON",
        "weight": 38
      },
      {
        "fieldValue": "GREENVILLE",
        "weight": 38
      },
      {
        "fieldValue": "PHOENIX",
        "weight": 37
      },
      {
        "fieldValue": "CLINTON",
        "weight": 37
      },
      {
        "fieldValue": "PHILADELPHIA",
        "weight": 35
      }
    ]
  },
  {
    "fieldName": "cntl_twr",
    "values": [
      {
        "fieldValue": "N",
        "weight": 19124
      },
      {
        "fieldValue": "Y",
        "weight": 669
      }
    ]
  },
  {
    "fieldName": "code",
    "values": [
      {
        "fieldValue": "45AK",
        "weight": 1
      },
      {
        "fieldValue": "GA52",
        "weight": 1
      },
      {
        "fieldValue": "CT03",
        "weight": 1
      },
      {
        "fieldValue": "5IS6",
        "weight": 1
      },
      {
        "fieldValue": "1IS7",
        "weight": 1
      },
      {
        "fieldValue": "27K",
        "weight": 1
      },
      {
        "fieldValue": "08LA",
        "weight": 1
      },
      {
        "fieldValue": "9LA4",
        "weight": 1
      },
      {
        "fieldValue": "IER",
        "weight": 1
      },
      {
        "fieldValue": "53U",
        "weight": 1
      }
    ]
  },
  {
    "fieldName": "county",
    "values": [
      {
        "fieldValue": "WASHINGTON",
        "weight": 214
      },
      {
        "fieldValue": "JEFFERSON",
        "weight": 199
      },
      {
        "fieldValue": "LOS ANGELES",
        "weight": 176
      },
      {
        "fieldValue": "MONTGOMERY",
        "weight": 154
      },
      {
        "fieldValue": "JACKSON",
        "weight": 140
      },
      {
        "fieldValue": "FRANKLIN",
        "weight": 140
      },
      {
        "fieldValue": "HARRIS",
        "weight": 137
      },
      {
        "fieldValue": "MARION",
        "weight": 127
      },
      {
        "fieldValue": "ORANGE",
        "weight": 125
      },
      {
        "fieldValue": "MARICOPA",
        "weight": 117
      }
    ]
  },
  {
    "fieldName": "cust_intl",
    "values": [
      {
        "fieldValue": null,
        "weight": 15145
      },
      {
        "fieldValue": "N",
        "weight": 4575
      },
      {
        "fieldValue": "Y",
        "weight": 73
      }
    ]
  },
  {
    "fieldName": "elevation",
    "values": [
      {
        "fieldValue": "-1 to 12442",
        "weight": 19793
      }
    ]
  },
  {
    "fieldName": "faa_dist",
    "values": [
      {
        "fieldValue": "NONE",
        "weight": 7085
      },
      {
        "fieldValue": "CHI",
        "weight": 1528
      },
      {
        "fieldValue": "DET",
        "weight": 1240
      },
      {
        "fieldValue": "SEA",
        "weight": 1162
      },
      {
        "fieldValue": "MSP",
        "weight": 1046
      },
      {
        "fieldValue": "ATL",
        "weight": 1029
      },
      {
        "fieldValue": "NYC",
        "weight": 957
      },
      {
        "fieldValue": "ORL",
        "weight": 905
      },
      {
        "fieldValue": "HAR",
        "weight": 846
      },
      {
        "fieldValue": "DEN",
        "weight": 679
      }
    ]
  },
  {
    "fieldName": "faa_region",
    "values": [
      {
        "fieldValue": "AGL",
        "weight": 4437
      },
      {
        "fieldValue": "ASW",
        "weight": 3268
      },
      {
        "fieldValue": "ASO",
        "weight": 2924
      },
      {
        "fieldValue": "AEA",
        "weight": 2586
      },
      {
        "fieldValue": "ANM",
        "weight": 2102
      },
      {
        "fieldValue": "ACE",
        "weight": 1579
      },
      {
        "fieldValue": "AWP",
        "weight": 1503
      },
      {
        "fieldValue": "ANE",
        "weight": 763
      },
      {
        "fieldValue": "AAL",
        "weight": 608
      },
      {
        "fieldValue": null,
        "weight": 23
      }
    ]
  },
  {
    "fieldName": "fac_type",
    "values": [
      {
        "fieldValue": "AIRPORT",
        "weight": 13925
      },
      {
        "fieldValue": "HELIPORT",
        "weight": 5135
      },
      {
        "fieldValue": "SEAPLANE BASE",
        "weight": 473
      },
      {
        "fieldValue": "ULTRALIGHT",
        "weight": 125
      },
      {
        "fieldValue": "STOLPORT",
        "weight": 86
      },
      {
        "fieldValue": "GLIDERPORT",
        "weight": 37
      },
      {
        "fieldValue": "BALLOONPORT",
        "weight": 12
      }
    ]
  },
  {
    "fieldName": "fac_use",
    "values": [
      {
        "fieldValue": "PR",
        "weight": 14428
      },
      {
        "fieldValue": "PU",
        "weight": 5365
      }
    ]
  },
  {
    "fieldName": "fed_agree",
    "values": [
      {
        "fieldValue": null,
        "weight": 16252
      },
      {
        "fieldValue": "NGY",
        "weight": 1682
      },
      {
        "fieldValue": "N",
        "weight": 515
      },
      {
        "fieldValue": "NGY3",
        "weight": 219
      },
      {
        "fieldValue": "NGPY",
        "weight": 171
      },
      {
        "fieldValue": "NY1",
        "weight": 123
      },
      {
        "fieldValue": "N1",
        "weight": 97
      },
      {
        "fieldValue": "1",
        "weight": 96
      },
      {
        "fieldValue": "NGSY",
        "weight": 87
      },
      {
        "fieldValue": "NGPY3",
        "weight": 84
      }
    ]
  },
  {
    "fieldName": "full_name",
    "values": [
      {
        "fieldValue": "MEMORIAL HOSPITAL",
        "weight": 21
      },
      {
        "fieldValue": "SMITH",
        "weight": 14
      },
      {
        "fieldValue": "JOHNSON",
        "weight": 13
      },
      {
        "fieldValue": "ST MARY'S HOSPITAL",
        "weight": 11
      },
      {
        "fieldValue": "MILLER",
        "weight": 10
      },
      {
        "fieldValue": "DAVIS FIELD",
        "weight": 10
      },
      {
        "fieldValue": "HILLTOP",
        "weight": 10
      },
      {
        "fieldValue": "FLYING W",
        "weight": 9
      },
      {
        "fieldValue": "WILSON",
        "weight": 9
      },
      {
        "fieldValue": "ST FRANCIS HOSPITAL",
        "weight": 9
      }
    ]
  },
  {
    "fieldName": "id",
    "values": [
      {
        "fieldValue": "1 to 19793",
        "weight": 19793
      }
    ]
  },
  {
    "fieldName": "joint_use",
    "values": [
      {
        "fieldValue": null,
        "weight": 14804
      },
      {
        "fieldValue": "N",
        "weight": 4779
      },
      {
        "fieldValue": "Y",
        "weight": 210
      }
    ]
  },
  {
    "fieldName": "latitude",
    "values": [
      {
        "fieldValue": "-14.18 to 71.28",
        "weight": 19793
      }
    ]
  },
  {
    "fieldName": "longitude",
    "values": [
      {
        "fieldValue": "-100.0 to 174.11",
        "weight": 19793
      }
    ]
  },
  {
    "fieldName": "major",
    "values": [
      {
        "fieldValue": "N",
        "weight": 19523
      },
      {
        "fieldValue": "Y",
        "weight": 270
      }
    ]
  },
  {
    "fieldName": "mil_rts",
    "values": [
      {
        "fieldValue": null,
        "weight": 14716
      },
      {
        "fieldValue": "Y",
        "weight": 2958
      },
      {
        "fieldValue": "N",
        "weight": 2119
      }
    ]
  },
  {
    "fieldName": "own_type",
    "values": [
      {
        "fieldValue": "PR",
        "weight": 14306
      },
      {
        "fieldValue": "PU",
        "weight": 5174
      },
      {
        "fieldValue": "MR",
        "weight": 128
      },
      {
        "fieldValue": "MA",
        "weight": 107
      },
      {
        "fieldValue": "MN",
        "weight": 78
      }
    ]
  },
  {
    "fieldName": "site_number",
    "values": [
      {
        "fieldValue": "50717.1*C",
        "weight": 1
      },
      {
        "fieldValue": "00810.*A",
        "weight": 1
      },
      {
        "fieldValue": "03111.4*A",
        "weight": 1
      },
      {
        "fieldValue": "03570.11*H",
        "weight": 1
      },
      {
        "fieldValue": "06260.*A",
        "weight": 1
      },
      {
        "fieldValue": "04865.3*A",
        "weight": 1
      },
      {
        "fieldValue": "04861.03*A",
        "weight": 1
      },
      {
        "fieldValue": "04999.76*A",
        "weight": 1
      },
      {
        "fieldValue": "05247.15*H",
        "weight": 1
      },
      {
        "fieldValue": "11765.1*H",
        "weight": 1
      }
    ]
  },
  {
    "fieldName": "state",
    "values": [
      {
        "fieldValue": "TX",
        "weight": 1845
      },
      {
        "fieldValue": "CA",
        "weight": 984
      },
      {
        "fieldValue": "IL",
        "weight": 890
      },
      {
        "fieldValue": "FL",
        "weight": 856
      },
      {
        "fieldValue": "PA",
        "weight": 804
      },
      {
        "fieldValue": "OH",
        "weight": 749
      },
      {
        "fieldValue": "IN",
        "weight": 643
      },
      {
        "fieldValue": "AK",
        "weight": 608
      },
      {
        "fieldValue": "NY",
        "weight": 576
      },
      {
        "fieldValue": "WI",
        "weight": 543
      }
    ]
  },
  {
    "fieldName": null,
    "values": [
      {
        "fieldValue": null,
        "weight": 19793
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT
    group_set,
    CASE group_set
      WHEN 0 THEN 'id'
      WHEN 1 THEN 'code'
      WHEN 2 THEN 'site_number'
      WHEN 3 THEN 'fac_type'
      WHEN 4 THEN 'fac_use'
      WHEN 5 THEN 'faa_region'
      WHEN 6 THEN 'faa_dist'
      WHEN 7 THEN 'city'
      WHEN 8 THEN 'county'
      WHEN 9 THEN 'state'
      WHEN 10 THEN 'full_name'
      WHEN 11 THEN 'own_type'
      WHEN 12 THEN 'longitude'
      WHEN 13 THEN 'latitude'
      WHEN 14 THEN 'elevation'
      WHEN 15 THEN 'aero_cht'
      WHEN 16 THEN 'cbd_dist'
      WHEN 17 THEN 'cbd_dir'
      WHEN 18 THEN 'act_date'
      WHEN 19 THEN 'cert'
      WHEN 20 THEN 'fed_agree'
      WHEN 21 THEN 'cust_intl'
      WHEN 22 THEN 'c_ldg_rts'
      WHEN 23 THEN 'joint_use'
      WHEN 24 THEN 'mil_rts'
      WHEN 25 THEN 'cntl_twr'
      WHEN 26 THEN 'major'
    END as "fieldName",  CASE group_set
      WHEN 0 THEN 'number'
      WHEN 1 THEN 'string'
      WHEN 2 THEN 'string'
      WHEN 3 THEN 'string'
      WHEN 4 THEN 'string'
      WHEN 5 THEN 'string'
      WHEN 6 THEN 'string'
      WHEN 7 THEN 'string'
      WHEN 8 THEN 'string'
      WHEN 9 THEN 'string'
      WHEN 10 THEN 'string'
      WHEN 11 THEN 'string'
      WHEN 12 THEN 'number'
      WHEN 13 THEN 'number'
      WHEN 14 THEN 'number'
      WHEN 15 THEN 'string'
      WHEN 16 THEN 'number'
      WHEN 17 THEN 'string'
      WHEN 18 THEN 'string'
      WHEN 19 THEN 'string'
      WHEN 20 THEN 'string'
      WHEN 21 THEN 'string'
      WHEN 22 THEN 'string'
      WHEN 23 THEN 'string'
      WHEN 24 THEN 'string'
      WHEN 25 THEN 'string'
      WHEN 26 THEN 'string'
    END as "fieldType",  CASE group_set WHEN 99999 THEN NULL
      WHEN 1 THEN base."code"
      WHEN 2 THEN base."site_number"
      WHEN 3 THEN base."fac_type"
      WHEN 4 THEN base."fac_use"
      WHEN 5 THEN base."faa_region"
      WHEN 6 THEN base."faa_dist"
      WHEN 7 THEN base."city"
      WHEN 8 THEN base."county"
      WHEN 9 THEN base."state"
      WHEN 10 THEN base."full_name"
      WHEN 11 THEN base."own_type"
      WHEN 15 THEN base."aero_cht"
      WHEN 17 THEN base."cbd_dir"
      WHEN 18 THEN base."act_date"
      WHEN 19 THEN base."cert"
      WHEN 20 THEN base."fed_agree"
      WHEN 21 THEN base."cust_intl"
      WHEN 22 THEN base."c_ldg_rts"
      WHEN 23 THEN base."joint_use"
      WHEN 24 THEN base."mil_rts"
      WHEN 25 THEN base."cntl_twr"
      WHEN 26 THEN base."major"
    END as "fieldValue",
   COUNT(*) as weight,
    CASE group_set
      WHEN 99999 THEN ''    WHEN 0 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR)
      WHEN 12 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR)
      WHEN 13 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR)
      WHEN 14 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR)
      WHEN 16 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR)
    END as "fieldRange"
  FROM '../data/airports.parquet' as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
, __stage1 AS (
  SELECT
    "fieldName",
    "fieldType",
    COALESCE("fieldValue", "fieldRange") as "fieldValue",
    weight
  FROM __stage0
)
, __stage2 AS (
  SELECT
    group_set,
    base."fieldName" as "fieldName__0",
    CASE WHEN group_set=1 THEN
      base."fieldValue"
      END as "fieldValue__1",
    CASE WHEN group_set=1 THEN
      base."weight"
      END as "weight__1"
  FROM __stage1 as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
SELECT
  "fieldName__0" as "fieldName",
  COALESCE(LIST({
    "fieldValue": "fieldValue__1", 
    "weight": "weight__1"}  ORDER BY  "weight__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:10],[]) as "values"
FROM __stage2
GROUP BY 1
ORDER BY 1 ASC NULLS LAST

Sampling

With large datasets, you can also sample a small subsection using the sample: parameter. Sampled indexes are great at identifing the important low cardinality fields.

document
run: duckdb.table('../data/airports.parquet') -> {
  index: *
  sample: 5000  // sample only 5000 rows
} -> {
  group_by: fieldName
  nest: values is  {
    group_by: fieldValue, weight
    order_by: weight desc
    limit: 10
  }
  order_by: fieldName
}
QUERY RESULTS
fieldNamevalues
act_date
fieldValueweight
3,049
02/198740
08/198838
04/199335
02/198235
06/199134
11/198834
11/199033
11/198633
03/199331
aero_cht
fieldValueweight
NEW YORK422
CHICAGO390
DETROIT300
WASHINGTON248
DALLAS-FT WORTH228
ST LOUIS219
TWIN CITIES201
SEATTLE192
ATLANTA183
HOUSTON183
c_ldg_rts
fieldValueweight
3,818
N1,117
Y65
cbd_dir
fieldValueweight
N895
NW618
SE608
SW599
W562
S561
NE557
E554
29
ESE4
cbd_dist
fieldValueweight
0 to 635,000
cert
fieldValueweight
4,845
AS 05/197324
CS 05/197317
BS 05/197314
LU 05/197312
ES 05/19737
DS 05/19737
LU 08/19905
LU 11/19744
AU 05/19734
city
fieldValueweight
HOUSTON20
JACKSON14
LOS ANGELES13
SAN ANTONIO12
DALLAS12
PHOENIX11
OKLAHOMA CITY11
SPRINGFIELD11
ALBANY10
MARION10
cntl_twr
fieldValueweight
N4,839
Y161
code
fieldValueweight
31IN1
8M81
6K51
OL121
27OH1
1MO81
TN031
PN161
DTW1
ND691
county
fieldValueweight
WASHINGTON55
JEFFERSON49
LOS ANGELES45
MONTGOMERY45
FRANKLIN40
JACKSON36
MARION31
ADAMS30
ORANGE30
WAYNE29
cust_intl
fieldValueweight
3,818
N1,161
Y21
elevation
fieldValueweight
-128 to 124425,000
faa_dist
fieldValueweight
NONE1,767
CHI412
SEA290
DET288
MSP283
ATL278
NYC250
HAR229
ORL202
DCA181
faa_region
fieldValueweight
AGL1,144
ASW816
ASO710
AEA692
ANM522
ACE403
AWP364
ANE202
AAL142
5
fac_type
fieldValueweight
AIRPORT3,527
HELIPORT1,285
SEAPLANE BASE124
ULTRALIGHT35
STOLPORT17
BALLOONPORT6
GLIDERPORT6
fac_use
fieldValueweight
PR3,638
PU1,362
fed_agree
fieldValueweight
4,111
NGY416
N132
NGY352
NGPY44
NY131
131
NGPY326
N124
NGSY20
full_name
fieldValueweight
SMITH5
JOHNSON5
MEMORIAL HOSPITAL4
WILLIAMS4
RUSSELL4
ST JOSEPH'S HOSPITAL4
HILLTOP4
ST FRANCIS HOSPITAL3
COTTONWOOD3
EAGLES NEST3
id
fieldValueweight
10002 to 197875,000
joint_use
fieldValueweight
3,737
N1,207
Y56
latitude
fieldValueweight
10.28 to 70.915,000
longitude
fieldValueweight
-100.01 to 171.235,000
major
fieldValueweight
N4,941
Y59
mil_rts
fieldValueweight
3,716
Y727
N557
own_type
fieldValueweight
PR3,634
PU1,292
MR36
MN20
MA18
site_number
fieldValueweight
05412.01*A1
01975.1*A1
25398.1*C1
23760.81*A1
18722.23*A1
15695.*H1
22888.3*A1
04982.12*A1
25893.*A1
06728.5*A1
state
fieldValueweight
TX439
CA242
IL229
PA218
FL191
IN184
OH164
NY145
MN144
AK142
fieldValueweight
5,000
[
  {
    "fieldName": "act_date",
    "values": [
      {
        "fieldValue": null,
        "weight": 3049
      },
      {
        "fieldValue": "02/1987",
        "weight": 40
      },
      {
        "fieldValue": "08/1988",
        "weight": 38
      },
      {
        "fieldValue": "04/1993",
        "weight": 35
      },
      {
        "fieldValue": "02/1982",
        "weight": 35
      },
      {
        "fieldValue": "06/1991",
        "weight": 34
      },
      {
        "fieldValue": "11/1988",
        "weight": 34
      },
      {
        "fieldValue": "11/1990",
        "weight": 33
      },
      {
        "fieldValue": "11/1986",
        "weight": 33
      },
      {
        "fieldValue": "03/1993",
        "weight": 31
      }
    ]
  },
  {
    "fieldName": "aero_cht",
    "values": [
      {
        "fieldValue": "NEW YORK",
        "weight": 422
      },
      {
        "fieldValue": "CHICAGO",
        "weight": 390
      },
      {
        "fieldValue": "DETROIT",
        "weight": 300
      },
      {
        "fieldValue": "WASHINGTON",
        "weight": 248
      },
      {
        "fieldValue": "DALLAS-FT WORTH",
        "weight": 228
      },
      {
        "fieldValue": "ST LOUIS",
        "weight": 219
      },
      {
        "fieldValue": "TWIN CITIES",
        "weight": 201
      },
      {
        "fieldValue": "SEATTLE",
        "weight": 192
      },
      {
        "fieldValue": "ATLANTA",
        "weight": 183
      },
      {
        "fieldValue": "HOUSTON",
        "weight": 183
      }
    ]
  },
  {
    "fieldName": "c_ldg_rts",
    "values": [
      {
        "fieldValue": null,
        "weight": 3818
      },
      {
        "fieldValue": "N",
        "weight": 1117
      },
      {
        "fieldValue": "Y",
        "weight": 65
      }
    ]
  },
  {
    "fieldName": "cbd_dir",
    "values": [
      {
        "fieldValue": "N",
        "weight": 895
      },
      {
        "fieldValue": "NW",
        "weight": 618
      },
      {
        "fieldValue": "SE",
        "weight": 608
      },
      {
        "fieldValue": "SW",
        "weight": 599
      },
      {
        "fieldValue": "W",
        "weight": 562
      },
      {
        "fieldValue": "S",
        "weight": 561
      },
      {
        "fieldValue": "NE",
        "weight": 557
      },
      {
        "fieldValue": "E",
        "weight": 554
      },
      {
        "fieldValue": null,
        "weight": 29
      },
      {
        "fieldValue": "ESE",
        "weight": 4
      }
    ]
  },
  {
    "fieldName": "cbd_dist",
    "values": [
      {
        "fieldValue": "0 to 63",
        "weight": 5000
      }
    ]
  },
  {
    "fieldName": "cert",
    "values": [
      {
        "fieldValue": null,
        "weight": 4845
      },
      {
        "fieldValue": "AS 05/1973",
        "weight": 24
      },
      {
        "fieldValue": "CS 05/1973",
        "weight": 17
      },
      {
        "fieldValue": "BS 05/1973",
        "weight": 14
      },
      {
        "fieldValue": "LU 05/1973",
        "weight": 12
      },
      {
        "fieldValue": "ES 05/1973",
        "weight": 7
      },
      {
        "fieldValue": "DS 05/1973",
        "weight": 7
      },
      {
        "fieldValue": "LU 08/1990",
        "weight": 5
      },
      {
        "fieldValue": "LU 11/1974",
        "weight": 4
      },
      {
        "fieldValue": "AU 05/1973",
        "weight": 4
      }
    ]
  },
  {
    "fieldName": "city",
    "values": [
      {
        "fieldValue": "HOUSTON",
        "weight": 20
      },
      {
        "fieldValue": "JACKSON",
        "weight": 14
      },
      {
        "fieldValue": "LOS ANGELES",
        "weight": 13
      },
      {
        "fieldValue": "SAN ANTONIO",
        "weight": 12
      },
      {
        "fieldValue": "DALLAS",
        "weight": 12
      },
      {
        "fieldValue": "PHOENIX",
        "weight": 11
      },
      {
        "fieldValue": "OKLAHOMA CITY",
        "weight": 11
      },
      {
        "fieldValue": "SPRINGFIELD",
        "weight": 11
      },
      {
        "fieldValue": "ALBANY",
        "weight": 10
      },
      {
        "fieldValue": "MARION",
        "weight": 10
      }
    ]
  },
  {
    "fieldName": "cntl_twr",
    "values": [
      {
        "fieldValue": "N",
        "weight": 4839
      },
      {
        "fieldValue": "Y",
        "weight": 161
      }
    ]
  },
  {
    "fieldName": "code",
    "values": [
      {
        "fieldValue": "31IN",
        "weight": 1
      },
      {
        "fieldValue": "8M8",
        "weight": 1
      },
      {
        "fieldValue": "6K5",
        "weight": 1
      },
      {
        "fieldValue": "OL12",
        "weight": 1
      },
      {
        "fieldValue": "27OH",
        "weight": 1
      },
      {
        "fieldValue": "1MO8",
        "weight": 1
      },
      {
        "fieldValue": "TN03",
        "weight": 1
      },
      {
        "fieldValue": "PN16",
        "weight": 1
      },
      {
        "fieldValue": "DTW",
        "weight": 1
      },
      {
        "fieldValue": "ND69",
        "weight": 1
      }
    ]
  },
  {
    "fieldName": "county",
    "values": [
      {
        "fieldValue": "WASHINGTON",
        "weight": 55
      },
      {
        "fieldValue": "JEFFERSON",
        "weight": 49
      },
      {
        "fieldValue": "LOS ANGELES",
        "weight": 45
      },
      {
        "fieldValue": "MONTGOMERY",
        "weight": 45
      },
      {
        "fieldValue": "FRANKLIN",
        "weight": 40
      },
      {
        "fieldValue": "JACKSON",
        "weight": 36
      },
      {
        "fieldValue": "MARION",
        "weight": 31
      },
      {
        "fieldValue": "ADAMS",
        "weight": 30
      },
      {
        "fieldValue": "ORANGE",
        "weight": 30
      },
      {
        "fieldValue": "WAYNE",
        "weight": 29
      }
    ]
  },
  {
    "fieldName": "cust_intl",
    "values": [
      {
        "fieldValue": null,
        "weight": 3818
      },
      {
        "fieldValue": "N",
        "weight": 1161
      },
      {
        "fieldValue": "Y",
        "weight": 21
      }
    ]
  },
  {
    "fieldName": "elevation",
    "values": [
      {
        "fieldValue": "-128 to 12442",
        "weight": 5000
      }
    ]
  },
  {
    "fieldName": "faa_dist",
    "values": [
      {
        "fieldValue": "NONE",
        "weight": 1767
      },
      {
        "fieldValue": "CHI",
        "weight": 412
      },
      {
        "fieldValue": "SEA",
        "weight": 290
      },
      {
        "fieldValue": "DET",
        "weight": 288
      },
      {
        "fieldValue": "MSP",
        "weight": 283
      },
      {
        "fieldValue": "ATL",
        "weight": 278
      },
      {
        "fieldValue": "NYC",
        "weight": 250
      },
      {
        "fieldValue": "HAR",
        "weight": 229
      },
      {
        "fieldValue": "ORL",
        "weight": 202
      },
      {
        "fieldValue": "DCA",
        "weight": 181
      }
    ]
  },
  {
    "fieldName": "faa_region",
    "values": [
      {
        "fieldValue": "AGL",
        "weight": 1144
      },
      {
        "fieldValue": "ASW",
        "weight": 816
      },
      {
        "fieldValue": "ASO",
        "weight": 710
      },
      {
        "fieldValue": "AEA",
        "weight": 692
      },
      {
        "fieldValue": "ANM",
        "weight": 522
      },
      {
        "fieldValue": "ACE",
        "weight": 403
      },
      {
        "fieldValue": "AWP",
        "weight": 364
      },
      {
        "fieldValue": "ANE",
        "weight": 202
      },
      {
        "fieldValue": "AAL",
        "weight": 142
      },
      {
        "fieldValue": null,
        "weight": 5
      }
    ]
  },
  {
    "fieldName": "fac_type",
    "values": [
      {
        "fieldValue": "AIRPORT",
        "weight": 3527
      },
      {
        "fieldValue": "HELIPORT",
        "weight": 1285
      },
      {
        "fieldValue": "SEAPLANE BASE",
        "weight": 124
      },
      {
        "fieldValue": "ULTRALIGHT",
        "weight": 35
      },
      {
        "fieldValue": "STOLPORT",
        "weight": 17
      },
      {
        "fieldValue": "BALLOONPORT",
        "weight": 6
      },
      {
        "fieldValue": "GLIDERPORT",
        "weight": 6
      }
    ]
  },
  {
    "fieldName": "fac_use",
    "values": [
      {
        "fieldValue": "PR",
        "weight": 3638
      },
      {
        "fieldValue": "PU",
        "weight": 1362
      }
    ]
  },
  {
    "fieldName": "fed_agree",
    "values": [
      {
        "fieldValue": null,
        "weight": 4111
      },
      {
        "fieldValue": "NGY",
        "weight": 416
      },
      {
        "fieldValue": "N",
        "weight": 132
      },
      {
        "fieldValue": "NGY3",
        "weight": 52
      },
      {
        "fieldValue": "NGPY",
        "weight": 44
      },
      {
        "fieldValue": "NY1",
        "weight": 31
      },
      {
        "fieldValue": "1",
        "weight": 31
      },
      {
        "fieldValue": "NGPY3",
        "weight": 26
      },
      {
        "fieldValue": "N1",
        "weight": 24
      },
      {
        "fieldValue": "NGSY",
        "weight": 20
      }
    ]
  },
  {
    "fieldName": "full_name",
    "values": [
      {
        "fieldValue": "SMITH",
        "weight": 5
      },
      {
        "fieldValue": "JOHNSON",
        "weight": 5
      },
      {
        "fieldValue": "MEMORIAL HOSPITAL",
        "weight": 4
      },
      {
        "fieldValue": "WILLIAMS",
        "weight": 4
      },
      {
        "fieldValue": "RUSSELL",
        "weight": 4
      },
      {
        "fieldValue": "ST JOSEPH'S HOSPITAL",
        "weight": 4
      },
      {
        "fieldValue": "HILLTOP",
        "weight": 4
      },
      {
        "fieldValue": "ST FRANCIS HOSPITAL",
        "weight": 3
      },
      {
        "fieldValue": "COTTONWOOD",
        "weight": 3
      },
      {
        "fieldValue": "EAGLES NEST",
        "weight": 3
      }
    ]
  },
  {
    "fieldName": "id",
    "values": [
      {
        "fieldValue": "10002 to 19787",
        "weight": 5000
      }
    ]
  },
  {
    "fieldName": "joint_use",
    "values": [
      {
        "fieldValue": null,
        "weight": 3737
      },
      {
        "fieldValue": "N",
        "weight": 1207
      },
      {
        "fieldValue": "Y",
        "weight": 56
      }
    ]
  },
  {
    "fieldName": "latitude",
    "values": [
      {
        "fieldValue": "10.28 to 70.91",
        "weight": 5000
      }
    ]
  },
  {
    "fieldName": "longitude",
    "values": [
      {
        "fieldValue": "-100.01 to 171.23",
        "weight": 5000
      }
    ]
  },
  {
    "fieldName": "major",
    "values": [
      {
        "fieldValue": "N",
        "weight": 4941
      },
      {
        "fieldValue": "Y",
        "weight": 59
      }
    ]
  },
  {
    "fieldName": "mil_rts",
    "values": [
      {
        "fieldValue": null,
        "weight": 3716
      },
      {
        "fieldValue": "Y",
        "weight": 727
      },
      {
        "fieldValue": "N",
        "weight": 557
      }
    ]
  },
  {
    "fieldName": "own_type",
    "values": [
      {
        "fieldValue": "PR",
        "weight": 3634
      },
      {
        "fieldValue": "PU",
        "weight": 1292
      },
      {
        "fieldValue": "MR",
        "weight": 36
      },
      {
        "fieldValue": "MN",
        "weight": 20
      },
      {
        "fieldValue": "MA",
        "weight": 18
      }
    ]
  },
  {
    "fieldName": "site_number",
    "values": [
      {
        "fieldValue": "05412.01*A",
        "weight": 1
      },
      {
        "fieldValue": "01975.1*A",
        "weight": 1
      },
      {
        "fieldValue": "25398.1*C",
        "weight": 1
      },
      {
        "fieldValue": "23760.81*A",
        "weight": 1
      },
      {
        "fieldValue": "18722.23*A",
        "weight": 1
      },
      {
        "fieldValue": "15695.*H",
        "weight": 1
      },
      {
        "fieldValue": "22888.3*A",
        "weight": 1
      },
      {
        "fieldValue": "04982.12*A",
        "weight": 1
      },
      {
        "fieldValue": "25893.*A",
        "weight": 1
      },
      {
        "fieldValue": "06728.5*A",
        "weight": 1
      }
    ]
  },
  {
    "fieldName": "state",
    "values": [
      {
        "fieldValue": "TX",
        "weight": 439
      },
      {
        "fieldValue": "CA",
        "weight": 242
      },
      {
        "fieldValue": "IL",
        "weight": 229
      },
      {
        "fieldValue": "PA",
        "weight": 218
      },
      {
        "fieldValue": "FL",
        "weight": 191
      },
      {
        "fieldValue": "IN",
        "weight": 184
      },
      {
        "fieldValue": "OH",
        "weight": 164
      },
      {
        "fieldValue": "NY",
        "weight": 145
      },
      {
        "fieldValue": "MN",
        "weight": 144
      },
      {
        "fieldValue": "AK",
        "weight": 142
      }
    ]
  },
  {
    "fieldName": null,
    "values": [
      {
        "fieldValue": null,
        "weight": 5000
      }
    ]
  }
]
WITH __stage0 AS (
  SELECT * from (SELECT * FROM '../data/airports.parquet' USING SAMPLE 5000) as x limit 100000 )
, __stage1 AS (
  SELECT
    group_set,
    CASE group_set
      WHEN 0 THEN 'id'
      WHEN 1 THEN 'code'
      WHEN 2 THEN 'site_number'
      WHEN 3 THEN 'fac_type'
      WHEN 4 THEN 'fac_use'
      WHEN 5 THEN 'faa_region'
      WHEN 6 THEN 'faa_dist'
      WHEN 7 THEN 'city'
      WHEN 8 THEN 'county'
      WHEN 9 THEN 'state'
      WHEN 10 THEN 'full_name'
      WHEN 11 THEN 'own_type'
      WHEN 12 THEN 'longitude'
      WHEN 13 THEN 'latitude'
      WHEN 14 THEN 'elevation'
      WHEN 15 THEN 'aero_cht'
      WHEN 16 THEN 'cbd_dist'
      WHEN 17 THEN 'cbd_dir'
      WHEN 18 THEN 'act_date'
      WHEN 19 THEN 'cert'
      WHEN 20 THEN 'fed_agree'
      WHEN 21 THEN 'cust_intl'
      WHEN 22 THEN 'c_ldg_rts'
      WHEN 23 THEN 'joint_use'
      WHEN 24 THEN 'mil_rts'
      WHEN 25 THEN 'cntl_twr'
      WHEN 26 THEN 'major'
    END as "fieldName",  CASE group_set
      WHEN 0 THEN 'number'
      WHEN 1 THEN 'string'
      WHEN 2 THEN 'string'
      WHEN 3 THEN 'string'
      WHEN 4 THEN 'string'
      WHEN 5 THEN 'string'
      WHEN 6 THEN 'string'
      WHEN 7 THEN 'string'
      WHEN 8 THEN 'string'
      WHEN 9 THEN 'string'
      WHEN 10 THEN 'string'
      WHEN 11 THEN 'string'
      WHEN 12 THEN 'number'
      WHEN 13 THEN 'number'
      WHEN 14 THEN 'number'
      WHEN 15 THEN 'string'
      WHEN 16 THEN 'number'
      WHEN 17 THEN 'string'
      WHEN 18 THEN 'string'
      WHEN 19 THEN 'string'
      WHEN 20 THEN 'string'
      WHEN 21 THEN 'string'
      WHEN 22 THEN 'string'
      WHEN 23 THEN 'string'
      WHEN 24 THEN 'string'
      WHEN 25 THEN 'string'
      WHEN 26 THEN 'string'
    END as "fieldType",  CASE group_set WHEN 99999 THEN NULL
      WHEN 1 THEN base."code"
      WHEN 2 THEN base."site_number"
      WHEN 3 THEN base."fac_type"
      WHEN 4 THEN base."fac_use"
      WHEN 5 THEN base."faa_region"
      WHEN 6 THEN base."faa_dist"
      WHEN 7 THEN base."city"
      WHEN 8 THEN base."county"
      WHEN 9 THEN base."state"
      WHEN 10 THEN base."full_name"
      WHEN 11 THEN base."own_type"
      WHEN 15 THEN base."aero_cht"
      WHEN 17 THEN base."cbd_dir"
      WHEN 18 THEN base."act_date"
      WHEN 19 THEN base."cert"
      WHEN 20 THEN base."fed_agree"
      WHEN 21 THEN base."cust_intl"
      WHEN 22 THEN base."c_ldg_rts"
      WHEN 23 THEN base."joint_use"
      WHEN 24 THEN base."mil_rts"
      WHEN 25 THEN base."cntl_twr"
      WHEN 26 THEN base."major"
    END as "fieldValue",
   COUNT(*) as weight,
    CASE group_set
      WHEN 99999 THEN ''    WHEN 0 THEN MIN(CAST(base."id" as VARCHAR)) || ' to ' || CAST(MAX(base."id") as VARCHAR)
      WHEN 12 THEN MIN(CAST(base."longitude" as VARCHAR)) || ' to ' || CAST(MAX(base."longitude") as VARCHAR)
      WHEN 13 THEN MIN(CAST(base."latitude" as VARCHAR)) || ' to ' || CAST(MAX(base."latitude") as VARCHAR)
      WHEN 14 THEN MIN(CAST(base."elevation" as VARCHAR)) || ' to ' || CAST(MAX(base."elevation") as VARCHAR)
      WHEN 16 THEN MIN(CAST(base."cbd_dist" as VARCHAR)) || ' to ' || CAST(MAX(base."cbd_dist") as VARCHAR)
    END as "fieldRange"
  FROM __stage0 as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,27,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
, __stage2 AS (
  SELECT
    "fieldName",
    "fieldType",
    COALESCE("fieldValue", "fieldRange") as "fieldValue",
    weight
  FROM __stage1
)
, __stage3 AS (
  SELECT
    group_set,
    base."fieldName" as "fieldName__0",
    CASE WHEN group_set=1 THEN
      base."fieldValue"
      END as "fieldValue__1",
    CASE WHEN group_set=1 THEN
      base."weight"
      END as "weight__1"
  FROM __stage2 as base
  CROSS JOIN (SELECT UNNEST(GENERATE_SERIES(0,1,1)) as group_set  ) as group_set
  GROUP BY 1,2,3,4
)
SELECT
  "fieldName__0" as "fieldName",
  COALESCE(LIST({
    "fieldValue": "fieldValue__1", 
    "weight": "weight__1"}  ORDER BY  "weight__1" desc NULLS LAST) FILTER (WHERE group_set=1)[1:10],[]) as "values"
FROM __stage3
GROUP BY 1
ORDER BY 1 ASC NULLS LAST

A More Complex Example

The rest of this pages uses the model below. The data is an excerpt from the IMDB. The Malloy schema for this model is shown on the right. The core value is movies, but joined at the principals (the people that worked on the movie) and the people (the actual data about the individuals).

We use the measure total_ratings to determin a movie's popularity. An individual's popularity is determined by the some of all the ratings of the movies a person has worked on.

document
source: movies is duckdb.table('../data/titles.parquet') extend {
  join_many: principals is duckdb.table('../data/principals.parquet') {
    join_one: people is duckdb.table('../data/names.parquet') 
      on nconst = people.nconst
  } on tconst = principals.tconst
  measure: total_ratings is numVotes.sum()
}

Weights can be any measure

Often a row count will work nicely as a weight, but sometimes there is something better. In movies, for example the sum of the number of votes will not only find the interesting most interesting movies but will also find the most interesting people. For example the most interesting people in the dataset.

document
run: movies -> {
  group_by: principals.people.primaryName
  aggregate: total_ratings
} 
QUERY RESULTS
primaryNametotal_​ratings
Brad Pitt18,269,386
Stan Lee18,229,893
John Williams17,651,379
Leonardo DiCaprio17,032,662
Tom Hanks16,650,835
Christopher Nolan15,368,415
Steven Spielberg15,069,082
Robert De Niro14,629,861
Christian Bale13,730,681
Samuel L. Jackson13,423,134
Robert Downey Jr.12,685,331
George Lucas12,584,331
Matt Damon12,509,393
Quentin Tarantino12,348,394
Bruce Willis12,045,753
Jack Kirby11,901,244
Johnny Depp11,755,228
Morgan Freeman11,435,905
Tom Cruise11,290,573
Thomas Newman11,056,371
Hans Zimmer10,957,053
Scott Rudin10,793,692
Charles Roven10,778,794
Bob Kane10,685,405
Ian McKellen10,552,024
Chris Evans10,497,785
Harrison Ford10,449,365
Scarlett Johansson10,128,373
David S. Goyer10,022,517
Jonathan Nolan9,823,211
Martin Scorsese9,692,050
Peter Jackson9,612,306
Orlando Bloom9,482,690
James Newton Howard9,473,293
Stephen King9,454,020
Hugh Jackman9,409,702
Kevin Feige9,408,976
Ridley Scott9,334,292
Natalie Portman9,206,903
Liam Neeson9,185,450
David Heyman9,174,472
Mark Ruffalo9,171,115
Bradley Cooper8,980,668
Will Smith8,960,562
Al Pacino8,933,776
Emma Thomas8,773,124
Tim Bevan8,756,327
Roger Deakins8,737,852
Fran Walsh8,594,296
Michael Caine8,533,028
Lorne Orleans8,518,672
Lawrence Bender8,503,500
Keanu Reeves8,480,149
David Fincher8,427,910
Mark Wahlberg8,393,193
Joel Silver8,360,819
Robert Richardson8,351,268
Philippa Boyens8,336,233
Kathleen Kennedy8,278,785
Matthew McConaughey8,271,368
Ben Affleck8,268,036
James Horner8,253,828
Alan Silvestri8,072,953
Chris Hemsworth8,058,188
Carter Burwell7,997,536
Simon Kinberg7,846,138
Eric Fellner7,845,245
J.R.R. Tolkien7,798,667
Jake Gyllenhaal7,762,048
Jerry Bruckheimer7,721,877
Edward Norton7,713,816
Woody Harrelson7,711,063
Ewan McGregor7,655,458
George Clooney7,653,368
David Benioff7,637,309
Kevin Spacey7,603,094
Brian Grazer7,570,625
Anne Hathaway7,509,220
Arnon Milchan7,494,607
Jim Carrey7,421,247
Robert Zemeckis7,371,523
Danny Elfman7,340,555
Ryan Reynolds7,319,537
Jason Blum7,308,653
James Cameron7,306,015
Sally Menke7,239,769
Tom Hardy7,213,428
Clint Eastwood7,209,434
Russell Crowe7,166,573
Akiva Goldsman7,094,797
Nicolas Cage7,053,952
Michael Giacchino7,047,025
Gary Oldman7,036,991
Cate Blanchett7,017,425
Michael Bay7,016,200
J.J. Abrams6,991,513
Willem Dafoe6,924,815
Christopher Markus6,886,318
Stephen McFeely6,886,318
Neal H. Moritz6,870,329
Alexandre Desplat6,867,309
Jennifer Lawrence6,855,625
Rachel McAdams6,777,091
Lena Headey6,742,472
Elijah Wood6,734,358
Dwayne Johnson6,584,416
Jason Statham6,582,293
Steve Kloves6,559,963
Denzel Washington6,556,554
Sigourney Weaver6,515,096
Arnold Schwarzenegger6,482,925
Tim Burton6,467,281
Steve Carell6,436,581
Terry Rossio6,423,369
Ceán Chaffin6,415,865
Amy Adams6,401,556
Daniel Radcliffe6,393,663
J.K. Rowling6,392,020
Mel Gibson6,322,650
Anthony Hopkins6,315,913
Emma Watson6,304,351
6,300,267
Rick McCallum6,211,607
Jack Nicholson6,180,622
Ted Elliott6,136,354
Adam Sandler6,094,935
Ed Harris6,077,615
Chris Columbus6,076,896
Cameron Diaz6,042,333
Joseph Gordon-Levitt6,034,397
Emilia Clarke5,995,245
D.B. Weiss5,972,128
Joaquin Phoenix5,967,875
Daniel Craig5,967,704
Luc Besson5,961,182
Ryan Gosling5,956,376
Vin Diesel5,944,056
Andrew Stanton5,942,167
John Goodman5,940,986
Ralph Fiennes5,906,715
Peter Dinklage5,906,118
David Koepp5,894,926
Pete Docter5,862,141
Sylvester Stallone5,861,981
Angelina Jolie5,857,481
Ian Bryce5,848,379
Hugo Weaving5,841,890
Frank Darabont5,826,908
Ben Kingsley5,825,798
Viggo Mortensen5,823,719
Richard Francis-Bruce5,798,818
Colin Farrell5,788,835
Charlize Theron5,765,794
Joss Whedon5,757,097
Nikolaj Coster-Waldau5,749,486
Julianne Moore5,744,632
Jonah Hill5,718,674
Joel Coen5,700,458
Zack Snyder5,695,090
Ben Stiller5,685,709
Bryan Cranston5,685,238
Kate Winslet5,682,918
Ethan Coen5,663,547
Walter F. Parkes5,657,162
Howard Shore5,638,214
Jeremy Renner5,623,966
Marco Beltrami5,590,317
Bryan Singer5,590,237
Emma Stone5,583,422
Guillermo del Toro5,570,240
Owen Wilson5,556,539
Rupert Grint5,552,590
Jude Law5,538,884
Matthew Vaughn5,524,297
Alex Kurtzman5,522,344
Lana Wachowski5,518,169
Keira Knightley5,512,152
Lilly Wachowski5,444,217
Judd Apatow5,443,590
Chris Pratt5,439,043
Francis Ford Coppola5,436,313
Will Ferrell5,420,626
Seth Rogen5,419,570
Sandra Bullock5,405,221
Steve Ditko5,371,488
Frank Marshall5,363,210
John Logan5,353,165
Ken Watanabe5,352,344
Sam Raimi5,334,765
Carrie-Anne Moss5,305,827
Benedict Cumberbatch5,273,615
Robin Williams5,224,218
Jennifer Connelly5,222,568
Ron Howard5,215,903
John Travolta5,206,564
Laurence Fishburne5,187,514
Ethan Hawke5,186,955
Jamie Foxx5,180,837
Lawrence Kasdan5,173,484
Gerard Butler5,156,049
[
  {
    "primaryName": "Brad Pitt",
    "total_ratings": 18269386
  },
  {
    "primaryName": "Stan Lee",
    "total_ratings": 18229893
  },
  {
    "primaryName": "John Williams",
    "total_ratings": 17651379
  },
  {
    "primaryName": "Leonardo DiCaprio",
    "total_ratings": 17032662
  },
  {
    "primaryName": "Tom Hanks",
    "total_ratings": 16650835
  },
  {
    "primaryName": "Christopher Nolan",
    "total_ratings": 15368415
  },
  {
    "primaryName": "Steven Spielberg",
    "total_ratings": 15069082
  },
  {
    "primaryName": "Robert De Niro",
    "total_ratings": 14629861
  },
  {
    "primaryName": "Christian Bale",
    "total_ratings": 13730681
  },
  {
    "primaryName": "Samuel L. Jackson",
    "total_ratings": 13423134
  },
  {
    "primaryName": "Robert Downey Jr.",
    "total_ratings": 12685331
  },
  {
    "primaryName": "George Lucas",
    "total_ratings": 12584331
  },
  {
    "primaryName": "Matt Damon",
    "total_ratings": 12509393
  },
  {
    "primaryName": "Quentin Tarantino",
    "total_ratings": 12348394
  },
  {
    "primaryName": "Bruce Willis",
    "total_ratings": 12045753
  },
  {
    "primaryName": "Jack Kirby",
    "total_ratings": 11901244
  },
  {
    "primaryName": "Johnny Depp",
    "total_ratings": 11755228
  },
  {
    "primaryName": "Morgan Freeman",
    "total_ratings": 11435905
  },
  {
    "primaryName": "Tom Cruise",
    "total_ratings": 11290573
  },
  {
    "primaryName": "Thomas Newman",
    "total_ratings": 11056371
  },
  {
    "primaryName": "Hans Zimmer",
    "total_ratings": 10957053
  },
  {
    "primaryName": "Scott Rudin",
    "total_ratings": 10793692
  },
  {
    "primaryName": "Charles Roven",
    "total_ratings": 10778794
  },
  {
    "primaryName": "Bob Kane",
    "total_ratings": 10685405
  },
  {
    "primaryName": "Ian McKellen",
    "total_ratings": 10552024
  },
  {
    "primaryName": "Chris Evans",
    "total_ratings": 10497785
  },
  {
    "primaryName": "Harrison Ford",
    "total_ratings": 10449365
  },
  {
    "primaryName": "Scarlett Johansson",
    "total_ratings": 10128373
  },
  {
    "primaryName": "David S. Goyer",
    "total_ratings": 10022517
  },
  {
    "primaryName": "Jonathan Nolan",
    "total_ratings": 9823211
  },
  {
    "primaryName": "Martin Scorsese",
    "total_ratings": 9692050
  },
  {
    "primaryName": "Peter Jackson",
    "total_ratings": 9612306
  },
  {
    "primaryName": "Orlando Bloom",
    "total_ratings": 9482690
  },
  {
    "primaryName": "James Newton Howard",
    "total_ratings": 9473293
  },
  {
    "primaryName": "Stephen King",
    "total_ratings": 9454020
  },
  {
    "primaryName": "Hugh Jackman",
    "total_ratings": 9409702
  },
  {
    "primaryName": "Kevin Feige",
    "total_ratings": 9408976
  },
  {
    "primaryName": "Ridley Scott",
    "total_ratings": 9334292
  },
  {
    "primaryName": "Natalie Portman",
    "total_ratings": 9206903
  },
  {
    "primaryName": "Liam Neeson",
    "total_ratings": 9185450
  },
  {
    "primaryName": "David Heyman",
    "total_ratings": 9174472
  },
  {
    "primaryName": "Mark Ruffalo",
    "total_ratings": 9171115
  },
  {
    "primaryName": "Bradley Cooper",
    "total_ratings": 8980668
  },
  {
    "primaryName": "Will Smith",
    "total_ratings": 8960562
  },
  {
    "primaryName": "Al Pacino",
    "total_ratings": 8933776
  },
  {
    "primaryName": "Emma Thomas",
    "total_ratings": 8773124
  },
  {
    "primaryName": "Tim Bevan",
    "total_ratings": 8756327
  },
  {
    "primaryName": "Roger Deakins",
    "total_ratings": 8737852
  },
  {
    "primaryName": "Fran Walsh",
    "total_ratings": 8594296
  },
  {
    "primaryName": "Michael Caine",
    "total_ratings": 8533028
  },
  {
    "primaryName": "Lorne Orleans",
    "total_ratings": 8518672
  },
  {
    "primaryName": "Lawrence Bender",
    "total_ratings": 8503500
  },
  {
    "primaryName": "Keanu Reeves",
    "total_ratings": 8480149
  },
  {
    "primaryName": "David Fincher",
    "total_ratings": 8427910
  },
  {
    "primaryName": "Mark Wahlberg",
    "total_ratings": 8393193
  },
  {
    "primaryName": "Joel Silver",
    "total_ratings": 8360819
  },
  {
    "primaryName": "Robert Richardson",
    "total_ratings": 8351268
  },
  {
    "primaryName": "Philippa Boyens",
    "total_ratings": 8336233
  },
  {
    "primaryName": "Kathleen Kennedy",
    "total_ratings": 8278785
  },
  {
    "primaryName": "Matthew McConaughey",
    "total_ratings": 8271368
  },
  {
    "primaryName": "Ben Affleck",
    "total_ratings": 8268036
  },
  {
    "primaryName": "James Horner",
    "total_ratings": 8253828
  },
  {
    "primaryName": "Alan Silvestri",
    "total_ratings": 8072953
  },
  {
    "primaryName": "Chris Hemsworth",
    "total_ratings": 8058188
  },
  {
    "primaryName": "Carter Burwell",
    "total_ratings": 7997536
  },
  {
    "primaryName": "Simon Kinberg",