AWS Machine Learning Blog 2024年07月16日
How Mixbook used generative AI to offer personalized photo book experiences
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Mixbook,一家获奖的设计平台,通过使用AWS的生成式人工智能,为用户提供了一个智能字幕功能,使得创建个性化的照片书体验更加便捷。该功能不仅理解用户照片,还增添了创意元素,让故事更加鲜活。Mixbook的这一创新举措,提升了用户体验,增强了操作效率。

📸 Mixbook的智能字幕功能通过分析用户上传的照片,自动生成富有情感和创意的描述。它结合了计算机视觉和自然语言处理技术,提供了个性化的照片书创作体验。

🚀 Mixbook将操作工作负载迁移到Amazon Web Services(AWS),这一战略举措带来了可靠性和性能上的显著优势,为用户提供了一个弹性可扩展的系统。

🧠 智能字幕的实现包括数据摄取、信息推理和创意合成三个主要部分。其中,推理过程利用AWS Lambda和Amazon SageMaker,通过并行处理独立图像分析步骤,提高了成本效率和弹性。

This post is co-written with Vlad Lebedev and DJ Charles from Mixbook.

Mixbook is an award-winning design platform that gives users unrivaled creative freedom to design and share one-of-a-kind stories, transforming the lives of more than six million people. Today, Mixbook is the #1 rated photo book service in the US with 26 thousand five-star reviews.

Mixbook is empowering users to share their stories with creativity and confidence. Their mission is to assist users in celebrating the beautiful moments of their lives. Mixbook aims to foster the profound connections between users and their loved ones through sharing of their stories in both physical and digital mediums.

Years ago, Mixbook undertook a strategic initiative to transition their operational workloads to Amazon Web Services (AWS), a move that has continually yielded significant advantages. This pivotal decision has been instrumental in propelling them towards fulfilling their mission, ensuring their system operations are characterized by reliability, superior performance, and operational efficiency.

In this post we show you how Mixbook used generative artificial intelligence (AI) capabilities in AWS to personalize their photo book experiences—a step towards their mission.

Business Challenge

In today’s digital world, we have a lot of pictures that we take and share with our friends and family. Let’s consider a scenario where we have hundreds of photos from a recent family vacation, and we want to create a coffee-table photo-book to make it memorable. However, choosing the best pictures from the lot and describing them with captions can take a lot of time and effort. As we all know, a picture’s worth a thousand words, which is why trying to sum up a moment with a caption of just six to ten words can be so challenging. Mixbook really gets the problem, and they’re here to fix it.

Solution

Mixbook Smart Captions is the magical solution to the caption conundrum. It doesn’t only interpret user photos; it also adds a sprinkle of creativity, making the stories pop.

Most importantly, Smart Captions doesn’t fully automate the creative process. Instead, it provides a creative partner to enable the user’s own storytelling to imbue a book with personal flourishes. Whether it’s a selfie or a scenic shot, the goal is to make sure users’ photos speak volumes, effortlessly.

Architecture overview

The implementation of the system involves three primary components:

Caption generation is heavily reliant on the inference process, because the quality and meaningfulness of the comprehension process output directly influence the specificity and personalization of the caption generation. The following is the data flow diagram of the caption generation process., which is described in the text that follows.

Data intake

A user uploads photos into Mixbook. The raw photos are stored in Amazon Simple Storage Service (Amazon S3).

The data intake process involves three macro components: Amazon Aurora MySQL-Compatible Edition, Amazon S3, and AWS Fargate for Amazon ECS. Aurora MySQL serves as the primary relational data storage solution for tracking and recording media file upload sessions and their accompanying metadata. It offers flexible capacity options, ranging from serverless on one end to reserved provisioned instances for predictable long-term use on the other. S3, in turn, provides efficient, scalable, and secure storage for the media file objects themselves. Its storage classes enable the maintenance of recent uploads in a warm state for low-latency access, while older objects can be transitioned to Amazon S3 Glacier tiers, thus minimizing storage expenses over time. Amazon Elastic Container Registry (Amazon ECS), when used in conjunction with the low-maintenance compute environment of AWS Fargate, forms a convenient orchestrator for containerized workloads, bringing all components together seamlessly.

Inference

The comprehension phase extracts essential contextual and semantic elements from the input, including image descriptions, temporal and spatial data, facial recognition, emotional sentiment, and labels. Among these, the image descriptions generated by a computer vision model offer the most fundamental understanding of the captured moments. Amazon Rekognition delivers precise detection of faces’ bounding boxes and emotional expressions. Face detection is crucial for optimal automatic photo placement and cropping, while emotion recognition allows for more effective story tone adjustments. The detected face bounding boxes on the photos are primarily used for optimal automatic photo placement and cropping. The emotions are used to help select a better tone to make it funnier or more nostalgic (for example). Furthermore, Amazon Rekognition enhances safety by identifying potentially objectionable content.

The inference pipeline is powered by an AWS Lambda-based multi-step architecture, which maximizes cost-efficiency and elasticity by running independent image analysis steps in parallel. AWS Step Functions enables the synchronization and ordering of interdependent steps.

The image captions are generated by an Amazon SageMaker inference endpoint, which is enhanced by an Amazon ElastiCache for Redis-powered buffer. The buffer was implemented after benchmarking the captioning model’s performance. The benchmarking revealed that the model performed optimally when processing batches of images, but underperformed when analyzing individual images.

Generation

The caption-generating mechanism behind the writing assistant feature is what turns Mixbook Studio into a natural language story-crafting tool. Powered by a Llama language model, the assistant initially used carefully engineered prompts created by AI experts. However, the Mixbook Storyarts team sought more granular control over the style and tone of the captions, leading to a diverse team that included an Emmy-nominated scriptwriter reviewing, adjusting, and adding unique handcrafted examples. This resulted in a process of fine-tuning the model, moderating modified responses, and deploying approved models for experimental and public releases. After inference, three captions are created and stored in Amazon Relational Database Service (Amazon RDS).

The following image shows the Mixbook Smart Captions feature in Mixbook Studio.

Benefits

Mixbook implemented this solution to provide new features to their customers. It provided an improved user experience with operational efficiency.

User experience

System

As a result of their improved user delight, Mixbook has been named as an official honoree of the Webby Awards in 2024 for Apps & Software Best Use of AI & Machine Learning.

“AWS enables us to scale the innovations our customers love most. And now, with the new AWS generative AI capabilities, we are able to blow our customers minds with creative power they never thought possible. Innovations like this are why we’ve been partnered with AWS since the beta in 2006.”

– Andrew Laffoon, CEO, Mixbook

Conclusion

Mixbook started experimenting with AWS generative AI solutions to augment their existing application in early 2023. They started with a quick proof-of-concept to yield results to show the art of the possible. Continuous development, testing, and integration using AWS breadth of services in compute, storage, analytics, and machine learning allowed them to iterate quickly. After they released the Smart Caption features in beta, they were able to quickly adjust according to real-world usage patterns, and protect the product’s value.

Try out Mixbook Studio to experience the storytelling. To learn more about AWS generative AI solutions, start with Transform your business with generative AI. To hear more from Mixbook leaders, listen to the AWS re:Think Podcast available from Art19, Apple Podcasts, and Spotify.


About the authors

Vlad Lebedev is a Senior Technology Leader at Mixbook. He leads a product-engineering team responsible for transforming Mixbook into a place for heartfelt storytelling. He draws on over a decade of hands-on experience in web development, system design, and data engineering to drive elegant solutions for complex problems. Vlad enjoys learning about both contemporary and ancient cultures, their histories, and languages.

DJ Charles is the CTO at Mixbook. He has enjoyed a 30-year career architecting interactive and e-commerce designs for top brands. Innovating broadband tech for the cable industry in the ’90s, revolutionizing supply-chain processes in the 2000s, and advancing environmental tech at Perillon led to global real-time bidding platforms for brands like Sotheby’s & eBay. Beyond tech, DJ loves learning new musical instruments, the art of songwriting, and deeply engages in music production & engineering in his spare time.

Malini Chatterjee is a Senior Solutions Architect at AWS. She provides guidance to AWS customers on their workloads across a variety of AWS technologies. She brings a breadth of expertise in Data Analytics and Machine Learning. Prior to joining AWS, she was architecting data solutions in financial industries. She is very passionate about semi-classical dancing and performs in community events. She loves traveling and spending time with her family.

Jessica Oliveira is an Account Manager at AWS who provides guidance and support to Commercial Sales in Northern California. She is passionate about building strategic collaborations to help ensure her customers’ success. Outside of work, she enjoys traveling, learning about different languages and cultures, and spending time with her family.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mixbook 智能字幕 生成式AI AWS
相关文章