The GitHub Blog 01月31日
4 steps to building a natural language search tool
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一个为联合国决议开发的自然语言搜索工具。该工具旨在解决手动筛选PDF文件查找信息的难题,通过技术手段将繁琐的档案流程转化为高效直观的搜索体验。该工具利用Amazon Textract提取文本,MongoDB Atlas建立数据库,Vue.js构建前端界面,并部署在AWS上。用户可以通过自然语言查询快速检索相关联合国决议,极大地提高了联合国决议等文件的检索效率,并为其他组织提供了可借鉴的蓝图。

📄 使用Amazon Textract提取联合国安理会和大会决议的文本,并使用正则表达式将文本分割成单独的决议,以便索引。

🗄️ 利用MongoDB Atlas数据库存储解析后的决议文本,并将其转化为嵌入式向量,使得内容结构化,方便快速搜索。

💻 使用Vue.js构建直观的单页应用前端,用户可以通过输入自然语言查询,例如“武装冲突中的人道主义援助决议”,来快速获得结果。

🚀 后端采用AWS Lambda和API Gateway,确保了应用的可扩展性和流畅性能,并部署在AWS Amplify上,保证了可靠性和易用性。

“We have a problem. Our current search method for sifting through PDFs is extremely manual and time consuming. Is there an easier way?”

As a developer, this is one of those questions that really gets me excited. I was tasked with finding a way to transform a cumbersome, archival process into an efficient, intuitive search experience. It’s a way to make a group of people’s lives easier, and because of the organizations they work for, help them be more effective in providing humanitarian assistance to people in need around the world. I couldn’t imagine a better project to be working on.

Unlocking the United Nations’ legacy for rapid action

Since 1945, the United Nations has produced resolutions and other documents that guide international peace and security efforts. Yet accessing this wealth of knowledge remains a challenge, including for organizations such as the International Committee of the Red Cross (ICRC). Currently, delegates at ICRC’s permanent observer mission to the UN advise member states and other stakeholders on international humanitarian law and humanitarian issues. When states negotiate relevant resolutions and other UN products, leaning on pre-existing humanitarian language from UN resolutions can provide precedence. This often requires sifting through PDFs to find relevant content within documents—a time-intensive, manual process ill-suited to the fast-paced world of humanitarian diplomacy.

A live, accessible, and scalable search platform

To solve this, I built a single-page application (SPA) that enables users to input natural language queries and instantly retrieve relevant UN resolutions. The solution is live now at resolutions.projectrefuge.io and serves as a robust example of how technology can simplify access to critical information.

How it works

    Text extraction and structuring
    Using Amazon Textract, I extracted raw text from decades’ worth of UN Security Council Resolutions and Presidential Statements and six years of UN General Assembly Resolutions. A Go script then parsed this text using Regex matching, segmenting it into individual resolutions for easier indexing.

    Search-ready database with MongoDB Atlas
    I adapted a Node.js script from MongoDB to upload the parsed resolutions as embeddings into a MongoDB Atlas database. This step ensures the content is structured for fast and relevant searches.

    User interface built with Vue.js
    The front end is an intuitive SPA created with Vue.js. Users simply enter semantic search queries—such as “resolutions on humanitarian access in armed conflicts”—and receive results in seconds.

    Backend hosted on AWS
    The backend relies on AWS Lambda and API Gateway, ensuring scalability and seamless performance. The entire application is hosted as a subdomain on AWS Amplify, combining reliability with ease of access.

This code is publicly available at projectrefuge/resolutions-search-template. This initiative will empower other organizations to adapt and expand the solution to their unique needs.

Broader implications: a blueprint for impact

The implications of this project go far beyond the ICRC’s use case with UN Resolutions. With slight modifications, the tool could index and search any collection of legal and policy documents. This approach is a blueprint for organizations aiming to leverage technology for better decision-making and more effective action. For nonprofits, this demonstrates the power of owning your code and building tailored solutions. For developers, it’s a reminder of how open source can accelerate progress in humanitarian and public policy sectors.

Build together with open source

Projects like resolutions.projectrefuge.io highlight the potential of open source to transform how we access and use information. If you’re a nonprofit, explore GitHub for Nonprofits to discover tools and resources that can help you build your own solutions. Developers eager to contribute to impactful work can browse the For Good First Issue program to find projects that align with their skills and values.

Finally, stay tuned as we work to identify other opportunities with humanitarian actors such as the ICRC to bridge the technology and humanitarian space. Together, we can build a future where knowledge is more accessible and tools are built with collaboration in mind, ensuring that humanitarian efforts are supported by cutting-edge technology.

Let’s code for good—and make a lasting impact.

If you’d like to lend your developer skills for good, check out For Good First Issue, a curated platform of open source projects that contribute to a better future for everyone.

The post 4 steps to building a natural language search tool appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

自然语言搜索 联合国决议 开源 人道主义 技术
相关文章