Building a dashboard in Python using Streamlit

You may be familiar with the phrase “A picture is worth a thousand words” and, in the context of data science, a visualized plot provides just as much value. It does this by providing a different look at tabular data, perhaps in the form of simple line charts, histogram distribution and more elaborate pivot charts.

As useful as these can be, a typical chart that we may see in print or on the web are most likely static. Imagine how much more engaging it would be to manipulate these static variables in an interactive dashboard?

🏂

Ready to jump right in? Here’s the dashboard app and GitHub repo.

In this blog, you’ll learn how to build a Population Dashboard app that displays data and visualizations of the US population for the years 2010-2019 as obtained from the US Census Bureau.

I'll guide you through the process of building this interactive dashboard app from scratch using Streamlit for the frontend. Our backend muscle comes from PyData heavyweights like NumPy, Pandas, Scikit-Learn, and Altair, ensuring robust data processing and analytics.

You’ll learn how to:

Define key metrics
Perform EDA analysis
Build the dashboard app with Streamlit

What’s inside the dashboard?

Here’s a visual breakdown of the components that make up this population dashboard:

Let’s get started!

1. Define key metrics

Before we dive into actually building the dashboard, we need to first come up with well-defined metrics to measure what matters.

1.1 Overview of key metrics

The goal of any dashboard is to surface insights that provide the basis for data-driven decision making. What is the primary purpose of the dashboard? This will guide the subsequent questions you want the dashboard to answer in the form of key metrics.

For example:

In sales, the primary goal may be to understand: “How are sales teams performing?” Metrics may include total revenue by sales rep, units sold by territory, or new leads generated over time.
In marketing, the primary goal may be to understand “How is my campaign performing?” this may include measuring leading indicators such as response rates or click-through rate, and lagging indicators such as revenue conversion rate or customer acquisition costs.
In finance, the dashboard may need to answer “How profitable is our business?” this might include gross profit, operating margin, and return on assets.

1.2 Key metrics selected for this app

The primary question this population dashboard aims to answer is: how do US state populations change over time?

What questions do we need to ask that will help us answer this dashboard goal?

How do total populations compare among different states?
How do state populations evolve over time and how do they compare to each other?
In a given year, which states experienced more than 50,000 people moving in, or out? We'll label these inbound and outbound migration metrics.

2. Perform EDA analysis

Once we have our key metrics, we will then need to collect and gain a solid understanding about the data available before we can present it in a visually aesthetic way in our dashboard.

Exploratory data analysis (EDA) can be defined as an iterative process for data understanding that entails asking and answering questions through investigative work of analyzing the data. In essence, your dashboard starts out as a blank canvas and EDA provides a pragmatic approach for coming up with compelling data visuals that tells a story.

John Tukey's seminal work on EDA in 1977 meticulously sets the stage for effective data communication. Here are some notable key takeaways:

“The greatest value of a graph is when it forces us to see what we never expected.” In fact Tukey, introduced the Box and whisker plot (aka box plots).
Having a flexible and open mindset when approaching data, hence the “exploratory” nature of EDA.

2.1 What data is available to you?

Here’s a sample of the dataset from the US Census Bureau we’re using for our population dashboard. There are 3 potential variables (states, year and population) that will serve as the basis for our metrics.

states, states_code, id, year, population
Alabama, AL, 1, 2010, 4785437
Alaska, AK, 2, 2010, 713910
Arizona, AZ, 4, 2010, 6407172
Arkansas, AR, 5, 2010, 2921964
California, CA, 6, 2010, 37319502

2.2 Prepare the data

Consolidate the year columns into a single unified column.

The advantage of subsetting the data by year, will provide the necessary format for generating possible visualizations (e.g. a heatmap, choropleth map, etc.) and a sortable dataframe.

2.3 Select charts to best visualize our key metrics

Now that we have a better understanding of the data at our fingertips, and the key metrics to measure, it’s time to decide how to visualize the results on our dashboard. There are countless ways to visualize your datasets, here’s what was selected for our population dashboard app.

What is the comparison of total populations among different states?
- A choropleth map adds a geospatial dimension to highlight the most and least populated states.
How do populations of different states evolve over time and how do they compare to each other?
- A heatmap offers a comprehensive overview of states with the highest and lowest population values by presenting this information across different years
- Sorting the dataframe provides a quick and direct comparison of the most and least populated states, thereby eliminating the need to wander through different sections of the charts.
In a given year, what percent of all states experience inbound/outbound migration >50,000 people?
- A donut chart is a pie chart with an empty inner arc and we’re using this to visualize the percentage of inbound and outbound state migration.

There are countless ways to visualize your datasets!

You can discover even more visualization options from the growing collection of custom components that the community. Here’s a few that you can try out:

streamlit-extras affords a wide range of widgets that extends the native functionality of Streamlit.
streamlit-shadcn-ui provides several UI frontend components (modal, hovercard, badges, etc.) that can be incorporated into the dashboard app.
streamlit-elements allows the creation of draggable and resizable dashboard components.

3. Build your dashboard with Streamlit

💡

Here’s the dashboard app and the GitHub repo.

3.1 Import libraries

First, we’ll start by importing the prerequisite libraries:

Streamlit - a low-code web framework
Pandas - a data analysis and wrangling tool
Altair - a data visualization library
Plotly Express - a terse and high-level API for creating figures

import streamlit as st
import pandas as pd
import altair as alt
import plotly.express as px

3.2 Page configuration

Next, we’ll define settings for the app by giving it a page title and icon that are displayed on the browser. This also defines the page content to be displayed in a wide layout that fits the page’s width as well as showing the sidebar in the expanded state.

Here, we also set the color theme for the Altair plot to be dark in order to accompany dark color theme of the app.

st.set_page_config(
    page_title="US Population Dashboard",
    page_icon="🏂",
    layout="wide",
    initial_sidebar_state="expanded")

alt.themes.enable("dark")

3.3 Load data

Next, we’ll load data into the app using Pandas’ read_csv() function as follows:

df_reshaped = pd.read_csv('data/us-population-2010-2019-reshaped.csv')

We’re now going to create the app title via st.title() and create drop-down widgets for allowing users to select the specific year and color theme via st.selectbox().

The selected_year (from the available years from 2010-2019) will then be used to subset the data for that year, which is then displayed in-app.

The selected_color_theme will allow the choropleth map and heatmap to be colored according to the selected color specified by the aforementioned widget.

with st.sidebar:
    st.title('🏂 US Population Dashboard')
    
    year_list = list(df_reshaped.year.unique())[::-1]
    
    selected_year = st.selectbox('Select a year', year_list, index=len(year_list)-1)
    df_selected_year = df_reshaped[df_reshaped.year == selected_year]
    df_selected_year_sorted = df_selected_year.sort_values(by="population", ascending=False)

    color_theme_list = ['blues', 'cividis', 'greens', 'inferno', 'magma', 'plasma', 'reds', 'rainbow', 'turbo', 'viridis']
    selected_color_theme = st.selectbox('Select a color theme', color_theme_list)

3.5 Plot and chart types

Next, we’re going to define custom functions to create the various plots displayed in the dashboard.

Heatmap

A heatmap will allow us to see the population growth over the years from 2010-2019 for the 52 states.

def make_heatmap(input_df, input_y, input_x, input_color, input_color_theme):
    heatmap = alt.Chart(input_df).mark_rect().encode(
            y=alt.Y(f'{input_y}:O', axis=alt.Axis(title="Year", titleFontSize=18, titlePadding=15, titleFontWeight=900, labelAngle=0)),
            x=alt.X(f'{input_x}:O', axis=alt.Axis(title="", titleFontSize=18, titlePadding=15, titleFontWeight=900)),
            color=alt.Color(f'max({input_color}):Q',
                             legend=None,
                             scale=alt.Scale(scheme=input_color_theme)),
            stroke=alt.value('black'),
            strokeWidth=alt.value(0.25),
        ).properties(width=900
        ).configure_axis(
        labelFontSize=12,
        titleFontSize=12
        ) 
    # height=300
    return heatmap

Choropleth map

Next, a colored map of the 52 US states for the selected year is depicted by the choropleth map.

def make_choropleth(input_df, input_id, input_column, input_color_theme):
    choropleth = px.choropleth(input_df, locations=input_id, color=input_column, locationmode="USA-states",
                               color_continuous_scale=input_color_theme,
                               range_color=(0, max(df_selected_year.population)),
                               scope="usa",
                               labels={'population':'Population'}
                              )
    choropleth.update_layout(
        template='plotly_dark',
        plot_bgcolor='rgba(0, 0, 0, 0)',
        paper_bgcolor='rgba(0, 0, 0, 0)',
        margin=dict(l=0, r=0, t=0, b=0),
        height=350
    )
    return choropleth

Donut chart

Next, we’re going to create a donut chart for the states migration in percentage.

Particularly, this represents the percentage of states with annual inbound or outbound migration > 50,000 people. For instance, in 2019, there were 12 out of 52 states and this corresponds to 23%.

Before creating the donut chart, we’ll need to calculate the year-over-year population migrations.

def calculate_population_difference(input_df, input_year):
  selected_year_data = input_df[input_df['year'] == input_year].reset_index()
  previous_year_data = input_df[input_df['year'] == input_year - 1].reset_index()
  selected_year_data['population_difference'] = selected_year_data.population.sub(previous_year_data.population, fill_value=0)
  return pd.concat([selected_year_data.states, selected_year_data.id, selected_year_data.population, selected_year_data.population_difference], axis=1).sort_values(by="population_difference", ascending=False)

The donut chart is then created from the aforementioned percentage value for states migration.

def make_donut(input_response, input_text, input_color):
  if input_color == 'blue':
      chart_color = ['#29b5e8', '#155F7A']
  if input_color == 'green':
      chart_color = ['#27AE60', '#12783D']
  if input_color == 'orange':
      chart_color = ['#F39C12', '#875A12']
  if input_color == 'red':
      chart_color = ['#E74C3C', '#781F16']
    
  source = pd.DataFrame({
      "Topic": ['', input_text],
      "% value": [100-input_response, input_response]
  })
  source_bg = pd.DataFrame({
      "Topic": ['', input_text],
      "% value": [100, 0]
  })
    
  plot = alt.Chart(source).mark_arc(innerRadius=45, cornerRadius=25).encode(
      theta="% value",
      color= alt.Color("Topic:N",
                      scale=alt.Scale(
                          #domain=['A', 'B'],
                          domain=[input_text, ''],
                          # range=['#29b5e8', '#155F7A']),  # 31333F
                          range=chart_color),
                      legend=None),
  ).properties(width=130, height=130)
    
  text = plot.mark_text(align='center', color="#29b5e8", font="Lato", fontSize=32, fontWeight=700, fontStyle="italic").encode(text=alt.value(f'{input_response} %'))
  plot_bg = alt.Chart(source_bg).mark_arc(innerRadius=45, cornerRadius=20).encode(
      theta="% value",
      color= alt.Color("Topic:N",
                      scale=alt.Scale(
                          # domain=['A', 'B'],
                          domain=[input_text, ''],
                          range=chart_color),  # 31333F
                      legend=None),
  ).properties(width=130, height=130)
  return plot_bg + plot + text

Convert population to text

Next, we’ll going to create a custom function for making population values more concise as well as improving the aesthetics. Particularly, instead of being displayed as numerical values of 28,995,881 in the metrics card to a more concised form as 29.0 M. This was also applied to numerical values in the thousand range.

Metrics cards showing states with high inbound/outbound migration in the selected year of interest (2019 in this case).

def format_number(num):
    if num > 1000000:
        if not num % 1000000:
            return f'{num // 1000000} M'
        return f'{round(num / 1000000, 1)} M'
    return f'{num // 1000} K'

3.6 App layout

Finally, it’s time to put everything together in the app.

Define the layout

Begin by creating 3 columns:

col = st.columns((1.5, 4.5, 2), gap='medium')

Particularly, the input argument (1.5, 4.5, 2) indicated that the second column has a width of about three times that of the first column and that the third column has a width about twice less than that of the second column.

Column 1

The Gain/Loss section is shown where metrics card are displaying states with the highest inbound and outbound migration for the selected year (specified via the Select a year drop-down widget created via st.selectbox).

The States migration section shows a donut chart where the percentage of states with annual inbound or outbound migration > 50,000 are displayed.

with col[0]:
    st.markdown('#### Gains/Losses')

    df_population_difference_sorted = calculate_population_difference(df_reshaped, selected_year)

    if selected_year > 2010:
        first_state_name = df_population_difference_sorted.states.iloc[0]
        first_state_population = format_number(df_population_difference_sorted.population.iloc[0])
        first_state_delta = format_number(df_population_difference_sorted.population_difference.iloc[0])
    else:
        first_state_name = '-'
        first_state_population = '-'
        first_state_delta = ''
    st.metric(label=first_state_name, value=first_state_population, delta=first_state_delta)

    if selected_year > 2010:
        last_state_name = df_population_difference_sorted.states.iloc[-1]
        last_state_population = format_number(df_population_difference_sorted.population.iloc[-1])   
        last_state_delta = format_number(df_population_difference_sorted.population_difference.iloc[-1])   
    else:
        last_state_name = '-'
        last_state_population = '-'
        last_state_delta = ''
    st.metric(label=last_state_name, value=last_state_population, delta=last_state_delta)

    
    st.markdown('#### States Migration')

    if selected_year > 2010:
        # Filter states with population difference > 50000
        # df_greater_50000 = df_population_difference_sorted[df_population_difference_sorted.population_difference_absolute > 50000]
        df_greater_50000 = df_population_difference_sorted[df_population_difference_sorted.population_difference > 50000]
        df_less_50000 = df_population_difference_sorted[df_population_difference_sorted.population_difference < -50000]
        
        # % of States with population difference > 50000
        states_migration_greater = round((len(df_greater_50000)/df_population_difference_sorted.states.nunique())*100)
        states_migration_less = round((len(df_less_50000)/df_population_difference_sorted.states.nunique())*100)
        donut_chart_greater = make_donut(states_migration_greater, 'Inbound Migration', 'green')
        donut_chart_less = make_donut(states_migration_less, 'Outbound Migration', 'red')
    else:
        states_migration_greater = 0
        states_migration_less = 0
        donut_chart_greater = make_donut(states_migration_greater, 'Inbound Migration', 'green')
        donut_chart_less = make_donut(states_migration_less, 'Outbound Migration', 'red')

    migrations_col = st.columns((0.2, 1, 0.2))
    with migrations_col[1]:
        st.write('Inbound')
        st.altair_chart(donut_chart_greater)
        st.write('Outbound')
        st.altair_chart(donut_chart_less)

Column 2

Next, the second column displays the choropleth map and heatmap using custom functions created earlier.

with col[1]:
    st.markdown('#### Total Population')
    
    choropleth = make_choropleth(df_selected_year, 'states_code', 'population', selected_color_theme)
    st.plotly_chart(choropleth, use_container_width=True)
    
    heatmap = make_heatmap(df_reshaped, 'year', 'states', 'population', selected_color_theme)
    st.altair_chart(heatmap, use_container_width=True)

Column 3

Finally, the third column shows the top states via a dataframe whereby the population are shown as a colored progress bar via the column_config parameter of st.dataframe.

An About section is displayed via the st.expander() container to provide information on the data source and definitions for terminologies used in the dashboard.

with col[2]:
    st.markdown('#### Top States')

    st.dataframe(df_selected_year_sorted,
                 column_order=("states", "population"),
                 hide_index=True,
                 width=None,
                 column_config={
                    "states": st.column_config.TextColumn(
                        "States",
                    ),
                    "population": st.column_config.ProgressColumn(
                        "Population",
                        format="%f",
                        min_value=0,
                        max_value=max(df_selected_year_sorted.population),
                     )}
                 )
    
    with st.expander('About', expanded=True):
        st.write('''
            - Data: [U.S. Census Bureau](<https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html>).
            - :orange[**Gains/Losses**]: states with high inbound/ outbound migration for selected year
            - :orange[**States Migration**]: percentage of states with annual inbound/ outbound migration > 50,000
            ''')

3.7 Deploying the Dashboard app to the cloud

For a video walkthrough on deploying a Streamlit app, check out this tutorial on YouTube.

BONUS: 5 reminders when building dashboards

Perform EDA to gain data understanding
Identify key metrics for tracking what matters
Decide on charts to best visualize key metrics
Group related metrics together
Use clear and concise labels to describe metrics

Wrapping up

In summary, Streamlit offers a quick, efficient, and code-friendly way to build interactive dashboard apps in Python, making it a go-to tool for data scientists and developers working with data visualization.

One of the key features of Streamlit is its ability to automatically update and re-render the app based on incremental changes in the data or input parameters, which makes it highly suitable for real-time data visualization tasks.

Check out this tutorial video to follow along:

What dashboards are you building? In the comments below, share your dashboard below to inspire the community, or ask for feedback!

Follow me on X at @thedataprof, on LinkedIn at Chanin Nantasenamat, or subscribe to my Data Professor channel on Youtube!

Happy Streamlit-ing! 📊