Iceberg Catalog

RFC PR: datafuselabs/databend#8215
Tracking Issue: datafuselabs/databend#8216

Summary

Iceberg is a widely supported table format among data lake-houses. This RFC describes how the iceberg external catalog will behave and how we will implement it.

Motivation

The Apache Iceberg is a table format widely used in data lake-houses, it offers a more complete logical view of databases and has much higher performance of accessing as external tables, since users don't need to know about partitions and less indexing through files when accessing is required comparing to Hive.

The splitted and well-defined structure of Iceberg also makes concurrent accesssing and version managementing on data sources safer, saver and convienient.

stack of Apache Iceberg

Supporting Iceberg will empower databend as a open data lake, giving it more OLAP abilities.

Guide-level explanation

Create Iceberg Catalog

To use Iceberg catalog, users need to have an iceberg storage with neccessary permission established. Then they can create a catalog on it.

CREATE CATALOG my_iceberg
  TYPE="iceberg"
  URL='s3://path/to/iceberg'
  CONNECTION=(
    ACCESS_KEY_ID=...
    SECRET_ACCESS_KEY=...
    ...
  )

Accessing Iceberg Table's content

SELECT * FROM my_iceberg.iceberg_db.iceberg_tbl;

Joint query on normal table and Iceberg Table:

SELECT normal_tbl.book_name, my_iceberg.iceberg_db.iceberg_tbl.author FROM normal_tbl, iceberg_tbl WHERE normal_tbl.isbn = my_iceberg.iceberg_db.iceberg_tbl.isbn AND iceberg_tbl.sales > 100000;

On operating the table, all data remains still on the user-provided ends.

Time Travel

Iceberg offers a list of snapshots and its timestamps. Time travelling on it is natrually.

SELECT ...
FROM <iceberg_catalog>.<database_name>.<table_name>
AT ( { SNAPSHOT => <snapshot_id> | TIMESTAMP => <timestamp> } );

Accessing Snapshot Metadata

Users should be able to lookup the list of snapshot id in catalog:

SELECT snapshot_id FROM ICEBERG_SNAPSHOT(my_iceberg.iceberg_db.iceberg_tbl);

     snapshot_id
---------------------
 1234567890123456789
 1234567890123456790
(2 rows)

Current snapshot id the external table is reading will always be the newest. But users can using time travel with AT.

Reference-level explanation

A new catalog type ICEBERG, and table engine for reading data from Iceberg storage.

Table Engine

The table engine enables users reading data from established Apache Iceberg endpoints. All table content and metadata of external table should remain in user-provided Iceberg data sources, in Iceberg's manner.

The engine will track the last commited snapshot, and should able to read from former snapshots.

external-location

No matter where the Iceberg is, like S3, GCS or OSS, if Databend support the storage, then the Iceberg should be accessable. Users need to tell databend where the Iceberg storage is and how should databend access the storage.

Type convention

Iceberg	Note	Databend
`boolean`	True or false	`BOOLEAN`
`int`	32 bit signed integer	`INT32`
`long`	64 bit signed integer	`INT64`
`float`	32 bit IEEE 754 float point number	`FLOAT`
`double`	64 bit IEEE 754 float point number	`DOUBLE`
`decimal(P, S)`	Fixed point decimal; precision P, scale S	not supported
`date`	Calendar date without timezone or time	`DATE`
`time`	Timestamp without date, timezone	`TIMESTAMP`, convert date to today
`timestamp`	Timestamp without timezone	`TIMESTAMP`, convert timezone to GMT
`timestamptz`	Timestamp with timezone	`TIMESTAMP`
`string`	UTF-8 string	`VARCHAR`
`uuid`	16-byte fixed byte array, universally unique identifiers	`VARCHAR`
`fixed(L)`	Fixed-length byte array of length L	`VARCHAR`
`binary`	Arbitrary-length byte array	`VARCHAR`
`struct`	a tuple of typed values	`OBJECT`
`list`	a collection of values with some element type	`ARRAY`
`map`		`OBJECT`

Drawbacks

None

Rationale and alternatives

Iceberg External Table

Creating an external table from Iceberg storage:

CREATE EXTERNAL TABLE [IF NOT EXISTS] [db.]table_name
[(
    <column_name> <data_type> [ NOT NULL | NULL] [ { DEFAULT <expr> }],
    <column_name> <data_type> [ NOT NULL | NULL] [ { DEFAULT <expr> }],
    ...
)] ENGINE=ICEBERG
ENGINE_OPTIONS=(
  DATABASE='db0'
  TABLE='tbl0'
  LOCATION=<external-location>
)

A new table engine ICEBERG will be introduced, and all data will remain still in the Iceberg storage. The external table also should support users to query its snapshot data and time travelling.

SELECT snapshot_id FROM ICEBERG_SNAPSHOT('<db_name>', '<external_table_name');

 snapshot_id
--------------
73556087355608

After discussion, the catalog way above is chosen, for its more complete support to Iceberg features, and it is more aligned to the current design of Hive catalog.

Prior art

None

Unresolved questions

None

Future possibilities

Create Table from Iceberg Snapshots

Iceberg provides a snapshot capability and tables can be created from them.

CREATE TABLE [IF NOT EXISTS] [db.]table_name
(
    <column_name> <data_type> [ NOT NULL | NULL] [ { DEFAULT <expr> }],
    <column_name> <data_type> [ NOT NULL | NULL] [ { DEFAULT <expr> }],
    ...
) ENGINE = `ICEBERG`
ENGINE_OPTIONS = (
URL = 's3://path/to/iceberg'
DATABASE = <iceberg_db>
TABLE = <iceberg_tbl>
[ SNAPSHOT = { SNAPSHOT_ID => <snapshot_id> | TIMESTAMP => <timestamp> } ]
)

Schema Evolution

The default Iceberg snapshot use in external table should always be the newest commited one:

SELECT snapshot_id from ICEBERG_SNAPSHOT(iceberg_catalog.iceberg_db.iceberg_tbl);

  snapshot_id
---------------
 0000000000001

Supporting schema evolution making databend able to modify contents of Iceberg. For example, inserting a new record into Iceberg Table:

INSERT INTO iceberg_catalog.iceberg_db.iceberg_tbl VALUES ('datafuselabs', 'How To Be a Healthy DBA', '2022-10-14');

This will create a new snapshot in Iceberg Storage:

SELECT snapshot_id from ICEBERG_SNAPSHOT(iceberg_catalog.iceberg_db.iceberg_tbl);

  snapshot_id
---------------
 0000000000001
 0000000000002

Summary​

Motivation​

Guide-level explanation​

Create Iceberg Catalog​

Accessing Iceberg Table's content​

Time Travel​

Accessing Snapshot Metadata​

Reference-level explanation​

Table Engine​

external-location​

Type convention​

Drawbacks​

Rationale and alternatives​

Iceberg External Table​

Prior art​

Unresolved questions​

Future possibilities​

Create Table from Iceberg Snapshots​

Schema Evolution​

Summary

Motivation

Guide-level explanation

Create Iceberg Catalog

Accessing Iceberg Table's content

Time Travel

Accessing Snapshot Metadata

Reference-level explanation

Table Engine

external-location

Type convention

Drawbacks

Rationale and alternatives

Iceberg External Table

Prior art

Unresolved questions

Future possibilities

Create Table from Iceberg Snapshots

Schema Evolution