Back to plugin list
Official
Databricks
This plugin is in preview.
Sync your data from any supported CloudQuery source into the Databricks Data Intelligence Platform.
Price
Free
Overview
Databricks destination plugin
This destination plugin lets you sync data from a CloudQuery source to Databricks.
Supported Databricks versions: >=
12
Configuration
Example
kind: destination
spec:
name: "databricks"
path: "cloudquery/databricks"
registry: "cloudquery"
version: "v1.0.3"
write_mode: "append"
spec:
hostname: ${DATABRICKS_HOSTNAME} # optionally it can include protocol like https://abc.cloud.databricks.com
http_path: ${DATABRICKS_HTTP_PATH} # HTTP path for SQL compute
staging_path: ${DATABRICKS_STAGING_PATH} # Databricks FileStore or Unity volume path to store temporary files for staging
auth:
access_token: ${DATABRICKS_ACCESS_TOKEN}
# Optional parameters
# protocol: https
# port: 443
# catalog: ""
# schema: "default"
# migration_concurrency: 10
# timeout: 1m
# batch:
# size: 10000
# bytes: 5242880 # 5 MiB
# timeout: 20s
The (top level) spec section is described in the Destination Spec Reference.
Databricks spec
This is the (nested) spec used by the Databricks destination plugin.
hostname
(string
) (required)SQL compute hostname. May optionally includeprotocol
value as well (likehttps://server.databricks.com
).http_path
(string
) (required)SQL compute HTTP path.staging_path
(string
) (required)Unity volume path where temporary (staging) files should be uploaded to.auth
(Auth spec) (required)Authentication options.protocol
(string
) (optional) (default:https
)Protocol for connecting to Databricks. Can be also specified in thehostname
.port
(integer
) (optional) (default:443
)Port for connecting to Databricks.catalog
(string
) (optional) (default: not used)Catalog to be used.schema
(string
) (optional) (default:default
)Schema to be used.batch
(Batching spec) (optional)Batching options.migration_concurrency
(integer
) (optional) (default:10
)How many table operations will be performed in parallel during migration.timeout
(duration
) (optional) (default:1m
(= 1 minute))Timeout for the queries.
Databricks authentication spec
This section allows specifying authentication method to connect to Databricks.
Currently only personal access tokens are supported.
access_token
(string
) (required)Personal access token.
Batching spec
This section controls how data is batched for writing.
size
(integer
) (optional) (default:10000
)Maximum number of items that may be grouped together to be written in a single write.bytes
(integer
) (optional) (default:5242880
(= 5 MiB))Maximum size of items that may be grouped together to be written in a single write.timeout
(duration
) (optional) (default:1m
(= 1 minute))Maximum interval between batch writes.
Types
Apache Arrow type conversion
The Databricks destination plugin supports most of Apache Arrow types.
The following table shows the supported types and how they are mapped
to Databricks data types.
Arrow Column Type | Databricks Type |
---|---|
Binary | BINARY |
Binary View | BINARY |
Boolean | BOOLEAN |
Date32 | DATE |
Date64 | DATE |
Decimal128 (Decimal) | DECIMAL |
Decimal256 | DECIMAL |
Fixed Size Binary | BINARY |
Fixed Size List | ARRAY |
Float16 | FLOAT |
Float32 | FLOAT |
Float64 | DOUBLE |
Int8 | TINYINT |
Int16 | SMALLINT |
Int32 | INTEGER |
Int64 | BIGINT |
Large Binary | BINARY |
Large List | ARRAY |
Large String | STRING |
List | ARRAY |
Null | VOID |
Map | MAP |
String | STRING |
String View | STRING |
Struct | STRUCT |
Time32 | TIMESTAMP |
Time64 | TIMESTAMP |
Timestamp | TIMESTAMP |
UUID (CloudQuery extension) | STRING |
Uint8 | SMALLINT |
Uint16 | INTEGER |
Uint32 | BIGINT |
Uint64 | BIGINT |