The GitHub Blog 前天 00:08
GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GitHub Issues 搜索功能迎来重大升级,引入了逻辑 AND/OR 运算符和嵌套括号,允许用户构建更精确的查询。为了实现这一功能,团队克服了诸多挑战,包括保持与现有搜索的向后兼容性、在高查询量下维持性能,以及设计用户友好的嵌套搜索体验。新功能通过抽象语法树(AST)解析查询,并转换为 Elasticsearch 查询,从而实现更灵活的搜索。此更新旨在满足开发者社区长期以来对更强大 Issue 搜索功能的需求,提升问题追踪和管理效率。

✨GitHub Issues搜索现在支持跨所有字段使用逻辑AND/OR运算符构建查询,并支持嵌套查询条件。例如,`is:issue state:open author:rileybroughten (type:Bug OR type:Epic)` 可以查找所有打开的、由 rileybroughten 编写的、类型为 Bug 或 Epic 的 issue。

⚙️为了实现这一功能,团队用新的搜索模块(ConditionalIssuesQuery)替换了现有的搜索模块(IssuesQuery),该模块能够处理嵌套查询,同时继续支持现有的查询格式。这涉及到重写 IssueQuery,该模块负责解析查询字符串并将其映射到 Elasticsearch 查询。

🌳新的解析方法采用抽象语法树(AST)。由于嵌套查询可能是递归的,因此将搜索字符串解析为列表已不再足够。团队使用解析库 parslet 将用户的搜索字符串解析为抽象语法树 (AST)。

🔍新的查询生成方法采用递归 AST 遍历来生成 Elasticsearch 布尔查询。解析过程中生成的嵌套结构和布尔运算符可以很好地映射到 Elasticsearch 的布尔查询,其中 AND、OR 和 NOT 运算符分别映射到 must、should 和 should_not 子句。

Originally, Issues search was limited by a simple, flat structure of queries. But with advanced search syntax, you can now construct searches using logical AND/OR operators and nested parentheses, pinpointing the exact set of issues you care about.

Building this feature presented significant challenges: ensuring backward compatibility with existing searches, maintaining performance under high query volume, and crafting a user-friendly experience for nested searches. We’re excited to take you behind the scenes to share how we took this long-requested feature from idea to production.

Here’s what you can do with the new syntax and how it works behind the scenes

Issues search now supports building queries with logical AND/OR operators across all fields, with the ability to nest query terms. For example is:issue state:open author:rileybroughten (type:Bug OR type:Epic) finds all issues that are open AND were authored by rileybroughten AND are either of type bug or epic.

How did we get here?

Previously, as mentioned, Issues search only supported a flat list of query fields and terms, which were implicitly joined by a logical AND. For example, the query assignee:@me label:support new-project translated to “give me all issues that are assigned to me AND have the label support AND contain the text new-project.

But the developer community has been asking for more flexibility in issue search, repeatedly, for nearly a decade now. They wanted to be able to find all issues that had either the label support or the label question, using the query label:support OR label:question. So, we shipped an enhancement towards this request in 2021, when we enabled an OR style search using a comma-separated list of values.

However, they still wanted the flexibility to search this way across all issue fields, and not just the labels field. So we got to work. 

Technical architecture and implementation

From an architectural perspective, we swapped out the existing search module for Issues (IssuesQuery), with a new search module (ConditionalIssuesQuery), that was capable of handling nested queries while continuing to support existing query formats.

This involved rewriting IssueQuery, the search module that parsed query strings and mapped them into Elasticsearch queries.

To build a new search module, we first needed to understand the existing search module, and how a single search query flowed through the system. At a high level, when a user performs a search, there are three stages in its execution:

    Parse: Breaking the user input string into a structure that is easier to process (like a list or a tree)Query: Transforming the parsed structure into an Elasticsearch query document, and making a query against Elasticsearch.Normalize: Mapping the results obtained from Elasticsearch (JSON) into Ruby objects for easy access and pruning the results to remove records that had since been removed from the database.

Each stage presented its own challenges, which we’ll explore in more detail below. The Normalize step remained unchanged during the re-write, so we won’t dive into that one.

Parse stage

The user input string (the search phrase) is first parsed into an intermediate structure. The search phrase could include:

 Example search phrase: 

The old parsing method: flat list

When only flat, simple queries were supported, it was sufficient to parse the user’s search string into a list of search terms and filters, which would then be passed along to the next stage of the search process.

The new parsing method: abstract syntax tree

As nested queries may be recursive, parsing the search string into a list was no longer sufficient. We changed this component to parse the user’s search string into an Abstract Syntax Tree (AST) using the parsing library parslet.

We defined a grammar (a PEG or Parsing Expression Grammar) to represent the structure of a search string. The grammar supports both the existing query syntax and the new nested query syntax, to allow for backward compatibility.

A simplified grammar for a boolean expression described by a PEG grammar for the parslet parser is shown below:

class Parser < Parslet::Parser  rule(:space)  { match[" "].repeat(1) }  rule(:space?) { space.maybe }  rule(:lparen) { str("(") >> space? }  rule(:rparen) { str(")") >> space? }  rule(:and_operator) { str("and") >> space? }  rule(:or_operator)  { str("or")  >> space? }  rule(:var) { str("var") >> match["0-9"].repeat(1).as(:var) >> space? }  # The primary rule deals with parentheses.  rule(:primary) { lparen >> or_operation >> rparen | var }  # Note that following rules are both right-recursive.  rule(:and_operation) {     (primary.as(:left) >> and_operator >>       and_operation.as(:right)).as(:and) |     primary }      rule(:or_operation)  {     (and_operation.as(:left) >> or_operator >>       or_operation.as(:right)).as(:or) |     and_operation }  # We start at the lowest precedence rule.  root(:or_operation)end

For example, this user search string:
is:issue AND (author:deborah-digges OR author:monalisa ) 
would be parsed into the following AST:

{  "root": {    "and": {      "left": {        "filter_term": {          "attribute": "is",          "value": [            {              "filter_value": "issue"            }          ]        }      },      "right": {        "or": {          "left": {            "filter_term": {              "attribute": "author",              "value": [                {                  "filter_value": "deborah-digges"                }              ]            }          },          "right": {            "filter_term": {              "attribute": "author",              "value": [                {                  "filter_value": "monalisa"                }              ]            }          }        }      }    }  }}

Query

Once the query is parsed into an intermediate structure, the next steps are to:

    Transform this intermediate structure into a query document that Elasticsearch understandsExecute the query against Elasticsearch to obtain results

Executing the query in step 2 remained the same between the old and new systems, so let’s only go over the differences in building the query document below.

The old query generation: linear mapping of filter terms using filter classes

Each filter term (Ex: label:documentation) has a class that knows how to convert it into a snippet of an Elasticsearch query document. During query document generation, the correct class for each filter term is invoked to construct the overall query document.

The new query generation: recursive AST traversal to generate Elasticsearch bool query

We recursively traversed the AST generated during parsing to build an equivalent Elasticsearch query document. The nested structure and boolean operators map nicely to Elasticsearch’s boolean query with the AND, OR, and NOT operators mapping to the must, should, and should_not clauses.

We re-used the building blocks for the smaller pieces of query generation to recursively construct a nested query document during the tree traversal.

Continuing from the example in the parsing stage, the AST would be transformed into a query document that looked like this:

{  "query": {    "bool": {      "must": [        {          "bool": {            "must": [              {                "bool": {                  "must": {                    "prefix": {                      "_index": "issues"                    }                  }                }              },              {                "bool": {                  "should": {                    "terms": {                      "author_id": [                        "<DEBORAH_DIGGES_AUTHOR_ID>",                        "<MONALISA_AUTHOR_ID>"                      ]                    }                  }                }              }            ]          }        }      ]    }    // SOME TERMS OMITTED FOR BREVITY  }}

With this new query document, we execute a search against Elasticsearch. This search now supports logical AND/OR operators and parentheses to search for issues in a more fine-grained manner.

Considerations

Issues is one of the oldest and most heavily -used features on GitHub. Changing core functionality like Issues search, a feature with an average of  nearly 2000 queries per second (QPS)—that’s almost 160M queries a day!—presented a number of challenges to overcome.

Ensuring backward compatibility

Issue searches are often bookmarked, shared among users, and linked in documents, making them important artifacts for developers and teams. Therefore, we wanted to introduce this new capability for nested search queries without breaking existing queries for users. 

We validated the new search system before it even reached users by:

Preventing performance degradation

We expected more complex nested queries to use more resources on the backend than simpler queries, so we needed to establish a realistic baseline for nested queries, while ensuring no regression in the performance of existing, simpler ones.

For 1% of Issue searches, we ran equivalent queries against both the existing and the new search systems. We used scientist, GitHub’s open source Ruby library, for carefully refactoring critical paths, to compare the performance of equivalent queries to ensure that there was no regression.

Preserving user experience

We didn’t want users to have a worse experience than before just because more complex searches were possible

We collaborated closely with product and design teams to ensure usability didn’t decrease as we added this feature by:

Minimizing risk to existing users

For a feature that is used by millions of users a day, we needed to be intentional about rolling it out in a way that minimized risk to users.

We built confidence in our system by:

And there you have it, that’s how we built, validated, and shipped the new and improved Issues search!

Feedback

Want to try out this exciting new functionality? Head to our docs to learn about how to use boolean operators and parentheses to search for the issues you care about!

If you have any feedback for this feature, please drop us a note on our community discussions.

Acknowledgements

Special thanks to AJ Schuster, Riley Broughten, Stephanie Goldstein, Eric Jorgensen Mike Melanson and Laura Lindeman for the feedback on several iterations of this blog post!

The post GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GitHub Issues 高级搜索 Elasticsearch 抽象语法树
相关文章