AWS Machine Learning Blog 05月03日 00:26
WordFinder app: Harnessing generative AI on AWS for aphasia communication
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了QARC团队与AWS合作开发的WordFinder移动应用。该应用利用AWS的生成式AI技术,帮助失语症患者通过图像识别和语义联想,扩大词汇量,提高沟通能力。WordFinder通过图像输入,结合Amazon Rekognition和Amazon Bedrock等服务,为用户提供相关词汇,从而辅助他们表达想法。该项目体现了AWS在社会公益领域的应用,展示了科技如何助力改善特定人群的生活。

📸用户通过WordFinder应用,可以拍摄或选择图片,应用利用Amazon Rekognition进行图像识别,提取图片中的物体标签。

💡识别出的物体标签将作为初始词汇列表,用户可以进一步选择相关词汇,进行语义探索,从而找到更准确的表达方式。

⚙️WordFinder应用的核心在于利用Amazon Bedrock提供的生成式AI模型(如Anthropic的Claude模型),通过API Gateway和Lambda函数,为用户提供基于初始词汇的语义相关词汇。

📱该应用采用React Native框架构建,并在AWS Amplify上部署,实现了跨平台移动应用的功能,用户可以通过Amazon Cognito进行安全身份验证。

In this post, we showcase how Dr. Kori Ramajoo, Dr. Sonia Brownsett, Prof. David Copland, from QARC, and Scott Harding, a person living with aphasia, used AWS services to develop WordFinder, a mobile, cloud-based solution that helps individuals with aphasia increase their independence through the use of AWS generative AI technology.

In the spirit of giving back to the community and harnessing the art of the possible for positive change, AWS hosted the Hack For Purpose event in 2023. This hackathon brought together teams from AWS customers across Queensland, Australia, to tackle pressing challenges faced by social good organizations.

The University of Queensland’s Queensland Aphasia Research Centre (QARC)’s mission is to improve access to technology for people living with aphasia, a communication disability that can impact an individual’s ability to express and understand spoken and written language.

The challenge: Overcoming communication barriers

In 2023, it was estimated that more than 140,000 people in Australia were living with aphasia. This number is expected to grow to over 300,000 by 2050. Aphasia can make everyday tasks like online banking, using social media, and trying new devices challenging. The goal was to create a mobile app that could assist people with aphasia by generating a word list of the objects that are in a user-selected image and extend the list with related words, enabling them to explore alternative communication methods.

Overview of the solution

The following screenshot shows an example of navigating the WordFinder app, including sign in, image selection, object definition, and related words.


In the preceding diagram, the following scenario unfolds: 

    Sign in: The first screen shows a simple sign-in page where users enter their email and password. It includes options to create an account or recover a forgotten password. Image selection: After signing in, users are prompted to Pick an image to search. This screen is initially blank. Photo access: The next screen shows a popup requesting private access to the user’s photos, with a grid of sample images visible in the background. Image chosen: After an image is selected (in this case, a picture of a koala), the app displays the image along with some initial tags or classifications such as Animal, Bear, Mammal, Wildlife, and Koala. Related words: The final screen shows a list of related words based on the selection of Related Words next to Koala from the previous screen. This step is crucial for people with aphasia who often have difficulties with word-finding and verbal expression. By exploring related words (such as habitat terms like tree and eucalyptus, or descriptive words like fur and marsupial), users can bridge communication gaps when the exact word they want isn’t immediately accessible. This semantic network approach aligns with common aphasia therapy techniques, helping users find alternative ways to express their thoughts when specific words are difficult to recall.

This flow demonstrates how users can use the app to search for words and concepts by starting with an image, then drilling down into related terminology—a visual approach to expanding vocabulary or finding associated words.

The following diagram illustrates the solution architecture on AWS.

In the following sections, we discuss the flow and key components of the solution in more detail.

    Secure access using Route 53 and Amplify 
      The journey begins with the user accessing the WordFinder app through a domain managed by Amazon Route 53, a highly available and scalable cloud DNS web service. AWS Amplify hosts the React Native frontend, providing a seamless cross-environment experience. 
    Secure authentication with Amazon Cognito 
      Before accessing the core features, the user must securely authenticate through Amazon Cognito. Cognito provides robust user identity management and access control, making sure that only authenticated users can interact with the app’s services and resources. 
    Image capture and storage with Amplify and Amazon S3 
      After being authenticated, the user can capture an image of a scene, item, or scenario they wish to recall words from. AWS Amplify streamlines the process by automatically storing the captured image in an Amazon Simple Storage Service (Amazon S3) bucket, a highly available, cost-effective, and scalable object storage service. 
    Object recognition with Amazon Rekognition 
      As soon as the image is stored in the S3 bucket, Amazon Rekognition, a powerful computer vision and machine learning service, is triggered. Amazon Rekognition analyzes the image, identifying objects present and returning labels with confidence scores. These labels form the initial word prompt list within the WordFinder app, kickstarting the word-finding journey. 
    Semantic word associations with API Gateway and Lambda 
      While the initial word list generated by Amazon Rekognition provides a solid starting point, the user might be seeking a more specific or related word. To address this challenge, the WordFinder app sends the initial word list to an AWS Lambda function through Amazon API Gateway, a fully managed service that securely handles API requests. 
    Lambda with Amazon Bedrock, and generative AI and prompt engineering using Amazon Bedrock
      The Lambda function, acting as an intermediary, crafts a carefully designed prompt and submits it to Amazon Bedrock, a fully managed service that offers access to high-performing foundation models (FMs) from leading AI companies, including Anthropic’s Claude model. Amazon Bedrock generative AI capabilities, powered by Anthropic’s Claude model, use advanced language understanding and generation to produce semantically related words and concepts based on the initial word list. This process is driven by prompt engineering, where carefully crafted prompts guide the generative AI model to provide relevant and contextually appropriate word associations.

WordFinder app component details

In this section, we take a closer look at the components of the WordFinder app.

React Native and Expo

WordFinder was built using React Native, a popular framework for building cross-environment mobile apps. To streamline the development process, Expo was used, which allows for write-once, run-anywhere capabilities across Android and iOS operating systems.

Amplify

Amplify played a crucial role in accelerating the app’s development and provisioning the necessary backend infrastructure. Amplify is a set of tools and services that enable developers to build and deploy secure, scalable, and full stack apps. In this architecture, the frontend of the word finding app is hosted on Amplify. The solution uses several Amplify components:

Related words

The generated initial word list is the first step toward finding the desired word, but the labels returned by Amazon Rekognition might not be the exact word that someone is looking for. The project team then considered how to implement a thesaurus-style lookup capability. Although the project team initially explored different programming libraries, they found this approach to be somewhat rigid and limited, often returning only synonyms and not entities that are related to the source word. The libraries also added overhead associated with packaging and maintaining the library and dataset moving forward.

To address these challenges and improve responses for related entities, the project team turned to the capabilities of generative AI. By using the generative AI foundation models (FMs), the project team was able to offload the ongoing overhead of managing this solution while increasing the flexibility and curation of related words and entities that are returned to users. The project team integrated this capability using the following services:

Benefits of API Gateway and Lambda

The project team briefly considered using the AWS SDK for JavaScript v3 and credentials sourced from Amazon Cognito to directly interface with Amazon Bedrock. Although this would work, there were several benefits associated with implementing API Gateway and a Lambda function:

Prompt engineering

One of the core features of WordFinder is its ability to generate related words and concepts based on a user-provided source word. This source word (obtained from the mobile app through an API request) is embedded into the following prompt by the Lambda function, replacing {word}:

prompt = "I have Aphasia. Give me the top 10 most common words that are related words to the word supplied in the prompt context. Your response should be a valid JSON array of just the words. No surrounding context. {word}"

The team tested multiple different prompts and approaches during the hackathon, but this basic guiding prompt was found to give reliable, accurate, and repeatable results, regardless of the word supplied by the user.

After the model responds, the Lambda function bundles the related words and returns them to the mobile app. Upon receipt of this data, the WordFinder app updates and displays the new list of words for the user who has aphasia. The user might then find their word, or drill deeper into other related words.

To maintain efficient resource utilization and cost optimization, the architecture incorporates several resource cleanup mechanisms:

Conclusion

The QARC team and Scott Harding worked closely with AWS to develop WordFinder, a mobile app that addresses communication challenges faced by individuals living with aphasia. Their winning entry at the 2023 AWS Queensland Hackathon showcased the power of involving those with lived experiences in the development process. Harding’s insights helped the tech team understand the nuances and impact of aphasia, leading to a solution that empowers users to find their words and stay connected.

References


About the Authors

Kori Ramijoo is a research speech pathologist at QARC. She has extensive experience in aphasia rehabilitation, technology, and neuroscience. Kori leads the Aphasia Tech Hub at QARC, enabling people with aphasia to access technology. She provides consultations to clinicians and provides advice and support to help people with aphasia gain and maintain independence. Kori is also researching design considerations for technology development and use by people with aphasia.

Scott Harding lives with aphasia after a stroke. He has a background in Engineering and Computer Science. Scott is one of the Directors of the Australian Aphasia Association and is a consumer representative and advisor on various state government health committees and nationally funded research projects. He has interests in the use of AI in developing predictive models of aphasia recovery.

Sonia Brownsett is a speech pathologist with extensive experience in neuroscience and technology. She has been a postdoctoral researcher at QARC and led the aphasia tech hub as well as a research program on the brain mechanisms underpinning aphasia recovery after stroke and in other populations including adults with brain tumours and epilepsy.

David Copland is a speech pathologist and Director of QARC. He has worked for over 20 years in the field of aphasia rehabilitation. His work seeks to develop new ways to understand, assess and treat aphasia including the use of brain imaging and technology. He has led the creation of comprehensive aphasia treatment programs that are being implemented into health services.

Mark Promnitz is a Senior Solutions Architect at Amazon Web Services, based in Australia. In addition to helping his enterprise customers leverage the capabilities of AWS, he can often be found talking about Software as a Service (SaaS), data and cloud-native architectures on AWS.

Kurt Sterzl is a Senior Solutions Architect at Amazon Web Services, based in Australia.  He enjoys working with public sector customers like UQ QARC to support their research breakthroughs.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

WordFinder 失语症 AWS 生成式AI
相关文章