Visit our new documentation site! This documentation page is no longer updated.

Presto Resource Pools

Presto Resource Pools allows you to break up your available compute resources into manageable chunks. You can then organize their usage across project, groups, or use cases.

Presto Resource Pools are helpful for the following challenges:

  1. A customer has a team of analysts who often see significant queuing during the work day due to large scheduled queries that take, for a long period of time, the full account’s resources.

  2. A customer has critical SLA queries that they want to ensure always have resources available to run against, and want to ensure that other queries (scheduled, or ad-hoc) don’t get in the way.

Untitled-3
Resource pools are only available for accounts with 5 or greater Presto Compute Units allocated.

Table of Contents

Setup

This feature is enabled upon customer request. Contact support or your primary account representative if you want to use Presto Resource Pools.

Resource Pool Functionality

Introduction

Based on the number of Presto Compute Units your account has, you are able to use up to a specified maximum of the following:

  • Concurrent Query Limit
  • Memory Limits (per query)
  • Split Compute Limits

Resource Pools allow you to allocate the resources, by the percentage of your account’s total available amount.

Your resource pools can either be strictly partitioned, or can overlap the allocation of your total available resources.

For example:

  • One customer might enable a “scheduled” pool with access with up to 70% of their account’s total resources, and an “ad-hoc” pool with access up to 70% of the account’s resources. In this way 40% of the account’s resources are shared, and 30% of the account’s resources are dedicated for either ad-hoc or scheduled queries.

  • Another customer may want to set up their pools with 40% to a “scheduled” pool to maintain a more strict-SLA environment, & then 60% to their “ad-hoc” pool for development purposes.

For the first example given, queries are prioritized between the pools based on the following logic:

  1. Highest priority queued queries are issued first

  2. First come, first served

Resource Limits Applied to Pools

Resource Pools divvy up resources in an account as a percentage of their account total. These are based on:

  • Max-Anytime Splits
  • Query & Account Max Memory
  • Concurrent Queries (total allowed * Pool %, rounded up)

Options for Presto Resource Pool Allocation

Customers can choose up to a maximum of 3 total resource pools, allocated with percentages of their choosing. Typically customers will choose 1 of 2 typical configurations.

  1. A complete separation of resource usage

    In this configuration, customers choose to directly allocate resources to each pool – so that there is no shared resources between them.

    Example: 30%, 70% resource split

    A split is useful for:

    • SLA critical, such as for separating “development” and “production” environments.
    • When dividing up resources between multiple teams
  2. A partial shared environment

    In this configurations, customers choose to have some overlap between multiple pools. In this case, some resources are shared between pools, while some resources are saved for each pool.

    Example: 70%, 70% resource split.

    A shared split is useful for:

    • Analyst teams that want to maximize available resources for scheduled queries – especially during non-work hours, but want to keep some part of resources always available for their ad-hoc work.

Selecting Which Query Pool Your Query Will Run On

By default, resource pools can be enabled for scheduled saved queries and ad-hoc queries. If you use the default configuration, you do not need to use the following methods to select query pools. It is necessary to set your resource pool if you use a custom setup.

CLI Option

If you are using the CLI to issue queries, you are able to select a pool name to use. You can set up additional pools with custom names, as follows.

td query -database <database_name> --pool-name <resource_pool_name>

Select a Resource Pool

Console users can select a specific resource pool for use by adding the following comment at the top of their query:

-- set property resource_group = '<resource_pool_name>'

Frequently Asked Questions

What about resource pools for Hive?

Resource pools in Hive are available for Private Beta. Contact support@treasure-data.com if you are interested in trying this feature.

What is a Split Compute Unit?

Split Compute Units are Presto’s way of allocating a set amount of machine resources to a computation task. The amount of splits available to your account is proportional to the total number of machine processing resources being made available.

The amount of total Splits a query requires to run is in proportion to the amount of data scanned, and the complexity of the query.

What future improvements to Presto Resource Pools are planned?

Based on user feedback, we are considering the following possible improvements:

  • Provide customers ability to self-configure their resource pools (via CLI).
  • Give administrators the ability to allocate which resource pool(s) a given individual can utilize.
  • Enable scheduled daily changes to resource pool allocation (e.g. dedicate resources to Analysts during the work day, and remove that dedicated resources at night for scheduled queries).

My account limits Concurrent Queries (CQ). How do CQs get allocated across resource pools?

The allocation of concurrent queries is based on specified resource pool percentages. The following examples show how concurrent queries are allocated across resources pools with various percentages. In these examples, the account has an overall CQ of 8:

  • If you specify an allocation as 70% ad-hoc and 70% scheduled allocation, then the CQ for each pool is 6 (8 CQ * 0.70 = 5.6, rounded up to 6).
  • If you specify an allocation as 60% ad hoc, 40% scheduled, the CQ on the ad hoc pool is 5 (8 CQ * 0.6 = 4.8, rounded up to 5) and the CQ on the scheduled pool is 4 (8 CQ * 0.4 = 3.2, rounded up to 4).

The CQ for the overall account still applies. So in the second example, if you use all 5 queries allowed in the ad hoc pool, the scheduled pool is limited to 3 queries until one of the ad hoc queries finishes. The account, in this example, is specified for only 8 concurrent queries.

Further Support

Contact support@treasure-data.com if you have questions about this feature.


Last modified: Sep 19 2017 18:41:03 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.