Ordering and Limiting

VSCode

Often when querying data the amount of data returned to look at is much smaller than the full result set, so the ordering of the data makes a big difference in what you actually see. To make things easier, Malloy has some smart defaults in the way it presents data. For the most part, you don't have to think too much about it, but in order to understand it, this document will show you how Malloy makes decisions about what to show you.

Implicit Ordering

Rule 1: Newest first

If a query stage has a dimensional column that represents a point in time, it is usually the most important concept in the query. Because the most recent data is usually the most relevant, Malloy sorts the newest data first.

run: flights -> {
  group_by: dep_month is dep_time.month
  aggregate: flight_count is count()
}

QUERY RESULTS

dep_month	flight_count
2005-12	5,770
2005-11	5,533
2005-10	6,083
2005-09	6,100
2005-08	6,415

[
  {
    "dep_month": "2005-12-01T00:00:00.000Z",
    "flight_count": 5770
  },
  {
    "dep_month": "2005-11-01T00:00:00.000Z",
    "flight_count": 5533
  },
  {
    "dep_month": "2005-10-01T00:00:00.000Z",
    "flight_count": 6083
  },
  {
    "dep_month": "2005-09-01T00:00:00.000Z",
    "flight_count": 6100
  },
  {
    "dep_month": "2005-08-01T00:00:00.000Z",
    "flight_count": 6415
  }
]

SELECT 
   DATE_TRUNC('month', flights."dep_time") as "dep_month",
   COUNT( 1) as "flight_count"
FROM '../data/flights.parquet' as flights
GROUP BY 1
ORDER BY 1 desc NULLS LAST

Rule 2: Largest first

If there is a measure involved, Malloy sorts larger values first.

In the following example, Rule 1 doesn't apply, so the default behavior is to sort by first aggregate, flight_count with the largest values first.

run: flights -> {
  group_by: carrier
  aggregate: flight_count is count()
}

QUERY RESULTS

carrier	flight_count
WN	88,751
US	37,683
AA	34,577
NW	33,580
UA	32,757

[
  {
    "carrier": "WN",
    "flight_count": 88751
  },
  {
    "carrier": "US",
    "flight_count": 37683
  },
  {
    "carrier": "AA",
    "flight_count": 34577
  },
  {
    "carrier": "NW",
    "flight_count": 33580
  },
  {
    "carrier": "UA",
    "flight_count": 32757
  }
]

SELECT 
   flights."carrier" as "carrier",
   COUNT( 1) as "flight_count"
FROM '../data/flights.parquet' as flights
GROUP BY 1
ORDER BY 2 desc NULLS LAST

Explicit Ordering

You can be explicit about result ordering by using the order_by clause.

In the following example, the results are ordered by carrier in reverse alphabetical order.

run: flights -> {
  order_by: carrier desc
  group_by: carrier
  aggregate: flight_count is count()
}

QUERY RESULTS

carrier	flight_count
WN	88,751
US	37,683
UA	32,757
TZ	3,033
RU	16,074

[
  {
    "carrier": "WN",
    "flight_count": 88751
  },
  {
    "carrier": "US",
    "flight_count": 37683
  },
  {
    "carrier": "UA",
    "flight_count": 32757
  },
  {
    "carrier": "TZ",
    "flight_count": 3033
  },
  {
    "carrier": "RU",
    "flight_count": 16074
  }
]

SELECT 
   flights."carrier" as "carrier",
   COUNT( 1) as "flight_count"
FROM '../data/flights.parquet' as flights
GROUP BY 1
ORDER BY 1 desc NULLS LAST

Like in SQL, Malloy's order_by always defaults to ascending order when desc is omitted. This is true for any column of any type. In the example below, the results are ordered by carrier in alphabetical order.

run: flights -> {
  order_by: carrier
  group_by: carrier
  aggregate: flight_count is count()
}

QUERY RESULTS

carrier	flight_count
AA	34,577
AS	8,453
B6	4,842
CO	7,139
DL	32,130

[
  {
    "carrier": "AA",
    "flight_count": 34577
  },
  {
    "carrier": "AS",
    "flight_count": 8453
  },
  {
    "carrier": "B6",
    "flight_count": 4842
  },
  {
    "carrier": "CO",
    "flight_count": 7139
  },
  {
    "carrier": "DL",
    "flight_count": 32130
  }
]

SELECT 
   flights."carrier" as "carrier",
   COUNT( 1) as "flight_count"
FROM '../data/flights.parquet' as flights
GROUP BY 1
ORDER BY 1 ASC NULLS LAST

Limiting

In Malloy, you can limit the number of results returned using limit: integer literal or top: integer literal. Both are provided for readability.

In the example below, the results are limited to 2 rows, which are sorted by dep_month with newest results first (due to Rule 1).

run: flights -> {
  limit: 2
  group_by: dep_month is dep_time.month
  aggregate: flight_count is count()
}

QUERY RESULTS

dep_month	flight_count
2005-12	5,770
2005-11	5,533

[
  {
    "dep_month": "2005-12-01T00:00:00.000Z",
    "flight_count": 5770
  },
  {
    "dep_month": "2005-11-01T00:00:00.000Z",
    "flight_count": 5533
  }
]

SELECT 
   DATE_TRUNC('month', flights."dep_time") as "dep_month",
   COUNT( 1) as "flight_count"
FROM '../data/flights.parquet' as flights
GROUP BY 1
ORDER BY 1 desc NULLS LAST
LIMIT 2