Databricks Plugin documentation

Report an issue

Back to destination list

Official

Databricks

Sync your data from any supported CloudQuery source into the Databricks Data Intelligence Platform.

Publisher

cloudquery

Latest version

v1.3.14

Type

Destination

Platforms

Date Published

Documentation Changelog

Overview Types Licenses

Overview #

Databricks destination plugin

This destination plugin lets you sync data from a CloudQuery source to Databricks.

Supported Databricks versions: >= 12

Configuration #

Example #

kind: destination
spec:
  name: "databricks"
  path: "cloudquery/databricks"
  registry: "cloudquery"
  version: "v1.3.14"
  write_mode: "append"
  spec:
    hostname: ${DATABRICKS_HOSTNAME} # optionally it can include protocol like https://abc.cloud.databricks.com
    http_path: ${DATABRICKS_HTTP_PATH} # HTTP path for SQL compute
    staging_path: ${DATABRICKS_STAGING_PATH} # Databricks FileStore or Unity volume path to store temporary files for staging
    auth:
      access_token: ${DATABRICKS_ACCESS_TOKEN}
    # Optional parameters
    # protocol: https
    # port: 443
    # catalog: ""
    # schema: "default"
    # migration_concurrency: 10
    # timeout: 1m
    # batch:
    #   size: 10000
    #   bytes: 5242880 # 5 MiB
    #   timeout: 20s

The (top level) spec section is described in the Destination Spec Reference.

Make sure you use environment variable expansion in production instead of committing the credentials to the configuration file directly.

Databricks spec #

This is the (nested) spec used by the Databricks destination plugin.

hostname (string) (required)
SQL compute hostname. May optionally include protocol value as well (like https://server.databricks.com).
http_path (string) (required)
SQL compute HTTP path.
staging_path (string) (required)
Unity volume path where temporary (staging) files should be uploaded to.
auth (Auth spec) (required)
Authentication options.
catalog (string) (required)
Catalog to be used.
protocol (string) (optional) (default: https)
Protocol for connecting to Databricks. Can be also specified in the hostname.
port (integer) (optional) (default: 443)
Port for connecting to Databricks.
schema (string) (optional) (default: cloudquery)
Schema to be used. If it doesn't exist, it will be created.
batch (Batching spec) (optional)
Batching options.
migration_concurrency (integer) (optional) (default: 10)
How many table operations will be performed in parallel during migration.
timeout (duration) (optional) (default: 1m (= 1 minute))
Timeout for the queries.

Databricks authentication spec

This section allows specifying authentication method to connect to Databricks. Currently only personal access tokens are supported.

access_token (string) (required)
Personal access token.

Batching spec

This section controls how data is batched for writing.

size (integer) (optional) (default: 10000)
Maximum number of items that may be grouped together to be written in a single write.
bytes (integer) (optional) (default: 5242880 (= 5 MiB))
Maximum size of items that may be grouped together to be written in a single write.
timeout (duration) (optional) (default: 1m (= 1 minute))
Maximum interval between batch writes.

Types #

Apache Arrow type conversion #

The Databricks destination plugin supports most of Apache Arrow types. The following table shows the supported types and how they are mapped to Databricks data types.

Unsupported types are always mapped to STRING.

Arrow Column Type	Databricks Type
Binary	BINARY
Binary View	BINARY
Boolean	BOOLEAN
Date32	DATE
Date64	DATE
Decimal128 (Decimal)	DECIMAL
Decimal256	DECIMAL
Fixed Size Binary	BINARY
Fixed Size List	ARRAY
Float16	FLOAT
Float32	FLOAT
Float64	DOUBLE
Int8	TINYINT
Int16	SMALLINT
Int32	INTEGER
Int64	BIGINT
Large Binary	BINARY
Large List	ARRAY
Large String	STRING
List	ARRAY
Null	VOID
Map	MAP
String	STRING
String View	STRING
Struct	STRUCT
Time32	TIMESTAMP
Time64	TIMESTAMP
Timestamp	TIMESTAMP
UUID (CloudQuery extension)	STRING
Uint8	SMALLINT
Uint16	INTEGER
Uint32	BIGINT
Uint64	BIGINT

Nested Apache Arrow types (lists, structs, map) have their element values converted according to the aforementioned rules.

Licenses #

The following tools / packages are used in this plugin:

Name	License
github.com/JohnCGriffin/overflow	MIT
github.com/adrg/xdg	MIT
github.com/andybalholm/brotli	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apache/thrift/lib/go/thrift	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/bahlo/generic-list-go	BSD-3-Clause
github.com/buger/jsonparser	MIT
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/coreos/go-oidc/v3/oidc	Apache-2.0
github.com/databricks/databricks-sql-go	Apache-2.0
github.com/davecgh/go-spew/spew	ISC
github.com/ghodss/yaml	MIT
github.com/go-jose/go-jose/v4	Apache-2.0
github.com/go-jose/go-jose/v4/json	BSD-3-Clause
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/golang/snappy	BSD-3-Clause
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/huandu/xstrings	MIT
github.com/invopop/jsonschema	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/klauspost/cpuid/v2	MIT
github.com/mailru/easyjson	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pkg/browser	BSD-2-Clause
github.com/pkg/errors	BSD-2-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/shopspring/decimal	MIT
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/wk8/go-ordered-map/v2	Apache-2.0
github.com/zeebo/xxh3	BSD-2-Clause
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/crypto/pbkdf2	BSD-3-Clause
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/oauth2	BSD-3-Clause
golang.org/x/sync/errgroup	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbody	Apache-2.0
google.golang.org/genproto/googleapis/rpc/status	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT