Apache Druid
  • Technology
  • Use Cases
  • Powered By
  • Docs
  • Community
  • Apache
  • Download

›HTTP APIs

Getting started

  • Introduction to Apache Druid
  • Quickstart (local)
  • Single server deployment
  • Clustered deployment

Tutorials

  • Load files using SQL
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Query data
  • Aggregate data with rollup
  • Theta sketches
  • Configure data retention
  • Update existing data
  • Compact segments
  • Deleting data
  • Write an ingestion spec
  • Transform input data
  • Convert ingestion spec to SQL
  • Run with Docker
  • Kerberized HDFS deep storage
  • Get to know Query view
  • Unnesting arrays
  • Query from deep storage
  • Jupyter Notebook tutorials
  • Docker for tutorials
  • JDBC connector

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Overview
  • Ingestion concepts

    • Source input formats
    • Input sources
    • Schema model
    • Rollup
    • Partitioning
    • Task reference

    SQL-based batch

    • SQL-based ingestion
    • Key concepts
    • Security
    • Examples
    • Reference
    • Known issues

    Streaming

    • Apache Kafka ingestion
    • Apache Kafka supervisor
    • Apache Kafka operations
    • Amazon Kinesis

    Classic batch

    • JSON-based batch
    • Hadoop-based
  • Ingestion spec reference
  • Schema design tips
  • Troubleshooting FAQ

Data management

  • Overview
  • Data updates
  • Data deletion
  • Schema changes
  • Compaction
  • Automatic compaction

Querying

    Druid SQL

    • Overview and syntax
    • Query from deep storage
    • SQL data types
    • Operators
    • Scalar functions
    • Aggregation functions
    • Array functions
    • Multi-value string functions
    • JSON functions
    • All functions
    • SQL query context
    • SQL metadata tables
    • SQL query translation
  • Native queries
  • Query execution
  • Troubleshooting
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Nested columns
    • Multitenancy
    • Query caching
    • Using query caching
    • Query context

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

API reference

  • Overview
  • HTTP APIs

    • Druid SQL
    • SQL-based ingestion
    • JSON querying
    • Tasks
    • Supervisors
    • Retention rules
    • Data management
    • Automatic compaction
    • Lookups
    • Service status
    • Dynamic configuration
    • Legacy metadata

    Java APIs

    • SQL JDBC driver

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Java runtime
  • Durable storage
  • Security

    • Security overview
    • User authentication and authorization
    • LDAP auth
    • Password providers
    • Dynamic Config Providers
    • TLS support

    Performance tuning

    • Basic cluster tuning
    • Segment size optimization
    • Mixed workloads
    • HTTP compression
    • Automated metadata cleanup

    Monitoring

    • Request logging
    • Metrics
    • Alerts
  • High availability
  • Rolling updates
  • Using rules to drop and retain data
  • Migrate from firehose
  • Working with different versions of Apache Hadoop
  • Misc

    • dump-segment tool
    • reset-cluster tool
    • insert-segment-to-db tool
    • pull-deps tool
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Contribute to Druid docs
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Druid AWS RDS Module
  • Kubernetes
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Prometheus Emitter
  • Firehose (deprecated)
  • JSON-based batch (simple)
  • Realtime Process
  • kubernetes
  • Cardinality/HyperUnique aggregators
  • Select
  • Load files natively
Edit

Legacy metadata API

This document describes the legacy API endpoints to retrieve datasource metadata from Apache Druid. Use the SQL metadata tables to retrieve datasource metadata instead.

Segment loading

GET /druid/coordinator/v1/loadstatus

Returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster.

GET /druid/coordinator/v1/loadstatus?simple

Returns the number of segments left to load until segments that should be loaded in the cluster are available for queries. This does not include segment replication counts.

GET /druid/coordinator/v1/loadstatus?full

Returns the number of segments left to load in each tier until segments that should be loaded in the cluster are all available. This includes segment replication counts.

GET /druid/coordinator/v1/loadstatus?full&computeUsingClusterView

Returns the number of segments not yet loaded for each tier until all segments loading in the cluster are available. The result includes segment replication counts. It also factors in the number of available nodes that are of a service type that can load the segment when computing the number of segments remaining to load. A segment is considered fully loaded when:

  • Druid has replicated it the number of times configured in the corresponding load rule.
  • Or the number of replicas for the segment in each tier where it is configured to be replicated equals the available nodes of a service type that are currently allowed to load the segment in the tier.

GET /druid/coordinator/v1/loadqueue

Returns the ids of segments to load and drop for each Historical process.

GET /druid/coordinator/v1/loadqueue?simple

Returns the number of segments to load and drop, as well as the total segment load and drop size in bytes for each Historical process.

GET /druid/coordinator/v1/loadqueue?full

Returns the serialized JSON of segments to load and drop for each Historical process.

Segment loading by datasource

Note that all interval query parameters are ISO 8601 strings—for example, 2016-06-27/2016-06-28. Also note that these APIs only guarantees that the segments are available at the time of the call. Segments can still become missing because of historical process failures or any other reasons afterward.

GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?forceMetadataRefresh={boolean}&interval={myInterval}

Returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster for the given datasource over the given interval (or last 2 weeks if interval is not given). forceMetadataRefresh is required to be set.

  • Setting forceMetadataRefresh to true will force the coordinator to poll latest segment metadata from the metadata store (Note: forceMetadataRefresh=true refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
  • Setting forceMetadataRefresh to false will use the metadata cached on the coordinator from the last force/periodic refresh. If no used segments are found for the given inputs, this API returns 204 No Content

GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?simple&forceMetadataRefresh={boolean}&interval={myInterval}

Returns the number of segments left to load until segments that should be loaded in the cluster are available for the given datasource over the given interval (or last 2 weeks if interval is not given). This does not include segment replication counts. forceMetadataRefresh is required to be set.

  • Setting forceMetadataRefresh to true will force the coordinator to poll latest segment metadata from the metadata store (Note: forceMetadataRefresh=true refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
  • Setting forceMetadataRefresh to false will use the metadata cached on the coordinator from the last force/periodic refresh. If no used segments are found for the given inputs, this API returns 204 No Content

GET /druid/coordinator/v1/datasources/{dataSourceName}/loadstatus?full&forceMetadataRefresh={boolean}&interval={myInterval}

Returns the number of segments left to load in each tier until segments that should be loaded in the cluster are all available for the given datasource over the given interval (or last 2 weeks if interval is not given). This includes segment replication counts. forceMetadataRefresh is required to be set.

  • Setting forceMetadataRefresh to true will force the coordinator to poll latest segment metadata from the metadata store (Note: forceMetadataRefresh=true refreshes Coordinator's metadata cache of all datasources. This can be a heavy operation in terms of the load on the metadata store but can be necessary to make sure that we verify all the latest segments' load status)
  • Setting forceMetadataRefresh to false will use the metadata cached on the coordinator from the last force/periodic refresh.

You can pass the optional query parameter computeUsingClusterView to factor in the available cluster services when calculating the segments left to load. See Coordinator Segment Loading for details. If no used segments are found for the given inputs, this API returns 204 No Content

Metadata store information

Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL sys.segments table.

GET /druid/coordinator/v1/metadata/segments

Returns a list of all segments for each datasource enabled in the cluster.

GET /druid/coordinator/v1/metadata/segments?datasources={dataSourceName1}&datasources={dataSourceName2}

Returns a list of all segments for one or more specific datasources enabled in the cluster.

GET /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus

Returns a list of all segments for each datasource with the full segment metadata and an extra field overshadowed.

GET /druid/coordinator/v1/metadata/segments?includeOvershadowedStatus&datasources={dataSourceName1}&datasources={dataSourceName2}

Returns a list of all segments for one or more specific datasources with the full segment metadata and an extra field overshadowed.

GET /druid/coordinator/v1/metadata/datasources

Returns a list of the names of datasources with at least one used segment in the cluster, retrieved from the metadata database. Users should call this API to get the eventual state that the system will be in.

GET /druid/coordinator/v1/metadata/datasources?includeUnused

Returns a list of the names of datasources, regardless of whether there are used segments belonging to those datasources in the cluster or not.

GET /druid/coordinator/v1/metadata/datasources?includeDisabled

Returns a list of the names of datasources, regardless of whether the datasource is disabled or not.

GET /druid/coordinator/v1/metadata/datasources?full

Returns a list of all datasources with at least one used segment in the cluster. Returns all metadata about those datasources as stored in the metadata store.

GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}

Returns full metadata for a datasource as stored in the metadata store.

GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments

Returns a list of all segments for a datasource as stored in the metadata store.

GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full

Returns a list of all segments for a datasource with the full segment metadata as stored in the metadata store.

GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments/{segmentId}

Returns full segment metadata for a specific segment as stored in the metadata store, if the segment is used. If the segment is unused, or is unknown, a 404 response is returned.

GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments

Returns a list of all segments, overlapping with any of given intervals, for a datasource as stored in the metadata store. Request body is array of string IS0 8601 intervals like [interval1, interval2,...]—for example, ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"].

GET /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full

Returns a list of all segments, overlapping with any of given intervals, for a datasource with the full segment metadata as stored in the metadata store. Request body is array of string ISO 8601 intervals like [interval1, interval2,...]—for example, ["2012-01-01T00:00:00.000/2012-01-03T00:00:00.000", "2012-01-05T00:00:00.000/2012-01-07T00:00:00.000"].

Datasources

Note that all interval URL parameters are ISO 8601 strings delimited by a _ instead of a /—for example, 2016-06-27_2016-06-28.

GET /druid/coordinator/v1/datasources

Returns a list of datasource names found in the cluster as seen by the coordinator. This view is updated every druid.coordinator.period.

GET /druid/coordinator/v1/datasources?simple

Returns a list of JSON objects containing the name and properties of datasources found in the cluster. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime.

GET /druid/coordinator/v1/datasources?full

Returns a list of datasource names found in the cluster with all metadata about those datasources.

GET /druid/coordinator/v1/datasources/{dataSourceName}

Returns a JSON object containing the name and properties of a datasource. Properties include segment count, total segment byte size, replicated total segment byte size, minTime, and maxTime.

GET /druid/coordinator/v1/datasources/{dataSourceName}?full

Returns full metadata for a datasource.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals

Returns a set of segment intervals.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals?simple

Returns a map of an interval to a JSON object containing the total byte size of segments and number of segments for that interval.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals?full

Returns a map of an interval to a map of segment metadata to a set of server names that contain the segment for that interval.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}

Returns a set of segment ids for an interval.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?simple

Returns a map of segment intervals contained within the specified interval to a JSON object containing the total byte size of segments and number of segments for an interval.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}?full

Returns a map of segment intervals contained within the specified interval to a map of segment metadata to a set of server names that contain the segment for an interval.

GET /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}/serverview

Returns a map of segment intervals contained within the specified interval to information about the servers that contain the segment for an interval.

GET /druid/coordinator/v1/datasources/{dataSourceName}/segments

Returns a list of all segments for a datasource in the cluster.

GET /druid/coordinator/v1/datasources/{dataSourceName}/segments?full

Returns a list of all segments for a datasource in the cluster with the full segment metadata.

GET /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}

Returns full segment metadata for a specific segment in the cluster.

GET /druid/coordinator/v1/datasources/{dataSourceName}/tiers

Return the tiers that a datasource exists in.

Intervals

Note that all interval URL parameters are ISO 8601 strings delimited by a _ instead of a / as in 2016-06-27_2016-06-28.

GET /druid/coordinator/v1/intervals

Returns all intervals for all datasources with total size and count.

GET /druid/coordinator/v1/intervals/{interval}

Returns aggregated total size and count for all intervals that intersect given ISO interval.

GET /druid/coordinator/v1/intervals/{interval}?simple

Returns total size and count for each interval within given ISO interval.

GET /druid/coordinator/v1/intervals/{interval}?full

Returns total size and count for each datasource for each interval within given ISO interval.

Server information

GET /druid/coordinator/v1/servers

Returns a list of servers URLs using the format {hostname}:{port}. Note that processes that run with different types will appear multiple times with different ports.

GET /druid/coordinator/v1/servers?simple

Returns a list of server data objects in which each object has the following keys:

  • host: host URL include ({hostname}:{port})
  • type: process type (indexer-executor, historical)
  • currSize: storage size currently used
  • maxSize: maximum storage size
  • priority
  • tier

Query server

This section documents the API endpoints for the processes that reside on Query servers (Brokers) in the suggested three-server configuration.

Broker

Datasource information

Note that all interval URL parameters are ISO 8601 strings delimited by a _ instead of a / as in 2016-06-27_2016-06-28.

Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL INFORMATION_SCHEMA.TABLES, INFORMATION_SCHEMA.COLUMNS, and sys.segments tables.

GET /druid/v2/datasources

Returns a list of queryable datasources.

GET /druid/v2/datasources/{dataSourceName}

Returns the dimensions and metrics of the datasource. Optionally, you can provide request parameter "full" to get list of served intervals with dimensions and metrics being served for those intervals. You can also provide request param "interval" explicitly to refer to a particular interval.

If no interval is specified, a default interval spanning a configurable period before the current time will be used. The default duration of this interval is specified in ISO 8601 duration format via: druid.query.segmentMetadata.defaultHistory

GET /druid/v2/datasources/{dataSourceName}/dimensions

This API is deprecated and will be removed in future releases. Please use SegmentMetadataQuery instead which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use INFORMATION_SCHEMA tables if you're using SQL.

Returns the dimensions of the datasource.

GET /druid/v2/datasources/{dataSourceName}/metrics

This API is deprecated and will be removed in future releases. Please use SegmentMetadataQuery instead which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use INFORMATION_SCHEMA tables if you're using SQL.

Returns the metrics of the datasource.

GET /druid/v2/datasources/{dataSourceName}/candidates?intervals={comma-separated-intervals}&numCandidates={numCandidates}

Returns segment information lists including server locations for the given datasource and intervals. If "numCandidates" is not specified, it will return all servers for each interval.

← Dynamic configurationSQL JDBC driver →
  • Segment loading
  • Segment loading by datasource
  • Metadata store information
  • Datasources
  • Intervals
  • Server information
  • Query server
    • Broker

Technology · Use Cases · Powered by Druid · Docs · Community · Download · FAQ

 ·  ·  · 
Copyright © 2022 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.