Streamlit + DuckDB Tutorial

Streamlit + DuckDB Tutorial

Streamlit + DuckDB Tutorial

The beautiful thing about learning is that nobody can take it away from you. B.B King

Motivation

In this era of Everything Web, we are basically compelled to build web interfaces for our various application to ease the access, hence, bring value.

As a Python developer , we have a lot modules & framework that helps us build web interfaces.

This is where Streamlit comes in, with it’s pythonic way of creating and web applications, it's the go to solution to build fast and with great ease web application.

We’ll use duckDB for it’s built-in OLAP capabilities and easy usage (the databases is a file, no complex infrastructure needed).

In this tutorial, you’ll learn how to build web apps using Streamlit and using duckDB as your database.

Table of content

What is Streamlit ?

Streamlit turns data scripts into shareable web apps in minutes.

All in pure Python. No front‑end experience required.

With Streamlit, we can easily create using python ONLY dynamic web application. It’s ideal for data apps as it natively has built-in function to present data in different ways (tables, charts, etc.).

You can read more here

What is duckDB ?

DuckDB is an in-process SQL OLAP database management system.

It is serverless comes with great features like (Direct Parquet & CSV querying, Vectorized Engine, Parallel query Processing). It is an alternative to SQLite purposely built for Analytics workload.

You can read more here.

Code

The full code can be access below.

https://github.com/mikekenneth/streamlit_duckdb

Data Generator

The below generates a 1000 rows data and loads to orders table in duck.db when the button is clicked:

from generate_data import generate_dataset_orders, load_file

button = st.button(label="Generate or Refresh Data")
if button:
    db = "duck.db"
    destination_table_name = "orders"
    filename = "orders.csv"
    generate_dataset_orders(filename=filename, num_rows=1000)
    load_file(db=db, infile_path=filename, table_name=destination_table_name)

Screenshot

Generate Data

I wrote a little helper to easily generate the orders data and load into duckDB. The code can be accessed here.

Extract data from DuckDB

I created a function execute_query to easily get the data using only the desired SQL query. It also allows specifying the type of output needed. You can read more in the official documentation.

def execute_query(query: str, return_type: str = "df"):
    with duckdb.connect("duck.db", read_only=True) as con:
        if return_type == "df":
            return con.execute(query).df()
        elif return_type == "arrow":
            return con.execute(query).arrow()
        elif return_type == "list":
            return con.execute(query).fetchall()

destination_table_name = "orders"
data = execute_query(f"select * from {destination_table_name}", return_type="df")

Display a sample of the data

With Streamlit we can easily display a Dataframe like a table. You can read more here.

st.write("## Sample")
st.dataframe(data.head(10), height=300)

Screenshot

Display Sample Data

Visualization

Streamlit allow us to easily display charts. We will use it to display some bar charts. You can read more here.

st.write("## Visualization")
option = st.selectbox("Select a dimension", ["product_name", "customer_name", "status"], key="option")
if option:
    st.write(f"### Bar Chart: {option} x Quantity")
    st.bar_chart(data, x=option, y="quantity")

    st.write(f"### Bar Chart: {option} x Amount")
    st.bar_chart(data, x=option, y="amount")

    st.write(f"### Bar Chart: {option} x Count")
    st.bar_chart(data[option].value_counts())

Screenshot

Visualization

Filtering Data

We can also filter out data directly from the web using the below. You can read more here.

st.write("## Filters (by Products Name)")
products_list = [
    row[0]
    for row in execute_query(
        f"select distinct(product_name) from {destination_table_name}", db=db, return_type="list"
    )
]
product_filter = st.selectbox(label="Select a Product", options=products_list, key="product_filter")
if product_filter != "--":
    result = execute_query(
        f"select * from {destination_table_name} where product_name = '{product_filter}'",
        db=db,
        return_type="df",
    )
    st.dataframe(result, height=400)

Screenshot

Filtering Data

Download / Exporting Data

We can export data to csv using the Streamlit st.download_button function.

@st.experimental_memo  # An optimization wrapper to memoize the result of the function
def export_df(df):
    return df.to_csv(index=False).encode("utf-8")

st.write("### Download Data")
st.download_button(
    label="Press to Download",
    data=export_df(result),
    file_name=f"orders - product='{product_filter}'.csv",
    mime="text/csv",
    key="download-csv",
)

Screenshot

Download / Export Data

That is a wrap. I hope this helps you. 🙂

About Me

I am a Data Engineer with 2+ years of experience and more years as a Software Engineer (5+ years). I enjoy learning and teaching (mostly learning 😎).

You can get in touch with me by email, Twitter & LinkedIn.