Enhance your Amazon Redshift cloud data warehouse with easier, simpler, and faster machine learning using Amazon SageMaker Canvas

Machine learning (ML) helps organizations to increase revenue, drive business growth, and reduce costs by optimizing core business functions such as supply and demand forecasting, customer churn prediction, credit risk scoring, pricing, predicting late shipments, and many others.

Conventional ML development cycles take weeks to many months and requires sparse data science understanding and ML development skills. Business analysts’ ideas to use ML models often sit in prolonged backlogs because of data engineering and data science team’s bandwidth and data preparation activities.

In this post, we dive into a business use case for a banking institution. We will show you how a financial or business analyst at a bank can easily predict if a customer’s loan will be fully paid, charged off, or current using a machine learning model that is best for the business problem at hand. The analyst can easily pull in the data they need, use natural language to clean up and fill any missing data, and finally build and deploy a machine learning model that can accurately predict the loan status as an output, all without needing to become a machine learning expert to do so. The analyst will also be able to quickly create a business intelligence (BI) dashboard using the results from the ML model within minutes of receiving the predictions. Let’s learn about the services we will use to make this happen.

Amazon SageMaker Canvas is a web-based visual interface for building, testing, and deploying machine learning workflows. It allows data scientists and machine learning engineers to interact with their data and models and to visualize and share their work with others with just a few clicks.

SageMaker Canvas has also integrated with Data Wrangler, which helps with creating data flows and preparing and analyzing your data. Built into Data Wrangler, is the Chat for data prep option, which allows you to use natural language to explore, visualize, and transform your data in a conversational interface.

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it cost-effective to efficiently analyze all your data using your existing business intelligence tools.

Amazon QuickSight powers data-driven organizations with unified (BI) at hyperscale. With QuickSight, all users can meet varying analytic needs from the same source of truth through modern interactive dashboards, paginated reports, embedded analytics, and natural language queries.

Solution overview

The solution architecture that follows illustrates:

Prerequisites

Before you begin, make sure you have the following prerequisites in place:

AWS Identity and Access Management (IAM)

IAM roles. A provisioned or serverless Amazon Redshift data warehouse. For this post we’ll use a provisioned Amazon Redshift cluster. A SageMaker domain. A QuickSight account (optional).

Set up the Amazon Redshift cluster

We’ve created a CloudFormation template to set up the Amazon Redshift cluster.

Next

Capabilities

I acknowledge that AWS CloudFormation might create IAM resources

Create stack

The stack will run for 10–15 minutes. After it’s finished, you can view the outputs of the parent and nested stacks as shown in the following figures:

Parent stack

Nested stack

Sample data

You will use a publicly available dataset that AWS hosts and maintains in our own S3 bucket as a workshop for bank customers and their loans that includes customer demographic data and loan terms.

Implementation steps

Load data to the Amazon Redshift cluster

Opening query editor v2

DROP table IF EXISTS public.loan_cust;CREATE TABLE public.loan_cust (    loan_id bigint,    cust_id bigint,    loan_status character varying(256),    loan_amount bigint,    funded_amount_by_investors double precision,    loan_term bigint,    interest_rate double precision,    installment double precision,    grade character varying(256),    sub_grade character varying(256),    verification_status character varying(256),    issued_on character varying(256),    purpose character varying(256),    dti double precision,    inquiries_last_6_months bigint,    open_credit_lines bigint,    derogatory_public_records bigint,    revolving_line_utilization_rate double precision,    total_credit_lines bigint,    city character varying(256),    state character varying(256),    gender character varying(256),    ssn character varying(256),    employment_length bigint,    employer_title character varying(256),    home_ownership character varying(256),    annual_income double precision,    age integer) DISTSTYLE AUTO;

loan_cust

COPY

COPY loan_cust  FROM 's3://redshift-demos/bootcampml/loan_cust.csv'iam_role defaultregion 'us-east-1' delimiter '|'csvIGNOREHEADER 1;

SELECT * FROM loan_cust LIMIT 100;

Set up chat for data

Amazon Bedrock

Model access

Enable specific models

Anthropic

Claude

Next

Submit

Canvas

Open Canvas

Datasets

Import data

Tabular

Dataset name

redshift_loandata

Create.

Data Source

Redshift

+ Add Connection

Cluster Identifier

ProducerClusterName

Nested Stack

Database name

dev

Database user

awsuser

Unload IAM Role ARN

RedshiftDataSharingRoleName

Connection Name

MyRedshiftCluster

Add connection

public

loan_cust

Create dataset

redshift_loandata

Create a data flow

redshift_flow

Create

Chat for data prep

summarize my data

run arrow

Drop ssn and filter for ages over 17

run arrow

Add to steps

drop ssn and filter age > 17

Update

Create model

loan_data_forecast_dataset

Dateset name

Model name,

loan_data_forecast

Problem type, s

Predictive analysis

loan_status

Export and create model

Quick build

Analyze

Use the model to make predictions

Predict

Choose the prediction type

Batch prediction

Manual

loan_data_forecast_dataset

Generate predictions

Ready

Preview

Choose the prediction type

Single Prediction

Update

Analyze the predictions

We will now show you how to use Quicksight to visualize the predictions data from SageMaker canvas to further gain insights from your data. SageMaker Canvas has direct integration with QuickSight, which is a cloud-powered business analytics service that helps employees within an organization to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime, on any device.

Send to Amazon QuickSight

Send

QuickSight

Datasets

Edit Dataset

State

Create

Filled map

State

Probability

Field wells

Probability

Aggregate

Average

Show as

Percent

Filter

loan_status

fully paid

Apply

Share

Publish Dashboard

Publish dashboard

Clean up

Use the following steps to avoid any extra cost to your account:

Sign out of SageMaker Canvas

delete the CloudFormation

Conclusion

We believe integrating your cloud data warehouse (Amazon Redshift) with SageMaker Canvas opens the door to producing many more robust ML solutions for your business at faster and without needing to move data and with no ML experience.

You now have business analysts producing valuable business insights, while letting data scientists and ML engineers help refine, tune, and extend models as needed. SageMaker Canvas integration with Amazon Redshift provides a unified environment for building and deploying machine learning models, allowing you to focus on creating value with your data rather than focusing on the technical details of building data pipelines or ML algorithms.

Additional reading:

SageMaker Canvas Workshop

re:Invent 2022 – SageMaker Canvas

Hands-On Course for Business Analysts – Practical Decision Making using No-Code ML on AWS

About the Authors

Suresh Patnam is Principal Sales Specialist AI/ML and Generative AI at AWS. He is passionate about helping businesses of all sizes transform into fast-moving digital organizations focusing on data, AI/ML, and generative AI.

Sohaib Katariwala is a Sr. Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. His interests are in all things data and analytics. More specifically he loves to help customers use AI in their data strategy to solve modern day challenges.

Michael Hamilton is an Analytics & AI Specialist Solutions Architect at AWS. He enjoys all things data related and helping customers solution for their complex use cases.

Nabil Ezzarhouni is an AI/ML and Generative AI Solutions Architect at AWS. He is based in Austin, TX and passionate about Cloud, AI/ML technologies, and Product Management. When he is not working, he spends time with his family, looking for the best taco in Texas. Because…… why not?