Testing Out Time Travel With DuckLake
I had the opportunity recently to look into Ducklake, an open source metadata catalog extension, that integrates with DuckDB.
The key benefit of a database catalog is to easily track schema and data changes to specific tables.
While Ducklake is still early in development, it shows some potential!
I will show you how to get started with Ducklake and familiarize you with a feature I frequently use in Snowflake: time travel.
Database time travel is very useful for auditing or debugging purposes. Have a complicated procedure updating a table and want to compare versions from the last commit? This feature is for you.
First, install DuckDB if you haven’t already.
curl https://install.duckdb.org | shThen, create a folder in a directory of your choosing.
Within that folder, git clone the repo.
Next, find the setup file, which issues the following commands:
installs the ducklake extension
attaches the ducklake
creates the my_ducklake.lake schema
creates a table from a csv hosted by motherduck
returns the top 20 rows from the table you created
Within the folder you created, issue this command, which runs the sql file.
duckdb -f setup.sqlAfter running the command, you should see a similar result, displaying the top 20 rows of the table you created.
If you navigate to the folder duckdbtest/ducklake_data/lake/who_ambient_air_quality_2024, you should see one parquet file, which is the initial csv load from the setup sql file. The purpose of the parquet is to capture each change in the table.
Next, we will insert data via this command, using the inserts.sql file.
duckdb -f inserts.sqlIf you navigate back to the who_ambient_air_quality_2024 folder, you should now see another parquet file. This means our insert statement was successful!
Because of these parquet files, we can now query different versions of the table, before and after the insert statement.
Navigate to the root directory of your folder. Then, execute this command, which runs the queries.sql file. As you can see, there are different record counts between each version of the table. Success!
This is accomplished via this DuckDB SQL syntax, allowing you to specify the version number.
SELECT count(*) as count, ‘3’ as version
FROM my_ducklake.lake.who_ambient_air_quality_2024
AT (VERSION => 3)Time travel is an interesting feature and I am glad to see it implemented in open source tools like DuckDB!
I encourage you to explore more of what DuckLake has to offer here.





