Blog on Text Analytics - Provalis Research 2024年11月27日
Lowering the OOOM Impact with Text Analytics by John Ford
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了作者在公共部门大型调查中,如何利用文本分析技术更有效地处理员工的自动回复邮件(OOOMs),从而更准确地计算调查回复率。作者通过WordStat软件分析OOOMs中的关键词和短语,将其分为临时缺席和永久缺席两类,从而剔除永久缺席人员,提高了调查回复率的准确性。该方法不仅减少了手动审查的工作量,也为其他调查研究提供了新的思路和启示,例如关注OOOMs中出现的主题,并根据不同调查背景进行调整。

🤔 **问题背景:**公共部门大型调查中,大量员工的自动回复邮件(OOOMs)给调查回复率的计算带来了挑战,需要区分临时缺席和永久缺席。

🔍 **方法:**利用WordStat软件对OOOMs进行文本分析,识别频繁出现的关键词和短语,并将其归类到临时缺席和永久缺席类别,构建了一个OOOMs词典。

📊 **结果:**通过文本分析,识别出710个永久缺席的OOOMs,将这些信息剔除后,调查回复率提高了0.3%,同时将手动审查的工作量减少到13%。

💡 **启示:**OOOMs中出现的主题(例如退休、接受新工作、医疗休假等)可能在其他调查中也存在,研究人员可以借鉴这些主题并根据具体情况进行调整。

🗓️ **其他因素:**调查时间、组织文化、经济环境等因素都会影响OOOMs中的语言,需要研究人员关注并进行相应调整。

John Ford is a public sector Research Psychologist who has authored numerous reports and articles about assessment, training, and other personnel management issues. He wrote this guest blog about a recent challenge of how to deal more efficiently with out of office messages with respect to tabulating the survey response rate. In this blog he describes the techniques he used. The process may help you in your research or perhaps spark some ideas about how to solve similar issues.  If you have a technique, experience, revelation to share in a blog, we would like to post it. 

I work for a public-sector organization that conducts periodic large-scale surveys of government employees.  We recently had a behind-the-scenes task to adjust the survey refusal rate considering those who never saw the invitation email. Text analytics helped us accomplish this task more efficiently than a low- or no-tech process would have done. We summarize our approach below and hope that the insights obtained will have value for other survey researchers.

The Problem

Initial email invitations and reminders to participate in a recent web-based survey resulted in numerous Out of Office messages (OOOMs) from individuals in our survey sample.  Email account owners can create these messages and toggle their systems to return one to each sender of an incoming email.  OOOMs differ from the bounced email notifications a server sends when an email address is invalid.  Bounced emails are clear indications a potential survey participant has not seen the email—and did not actually refuse to take the survey.

OOOMs can be harder to classify.  They are created for a variety of reasons that may or may not mean users have seen a survey invitation.  Fortunately, the focused content of OOOMs makes them easier to analyze than more free-form text.  OOOMs are essentially brief form letters intended to quickly and clearly communicate a specific message.

Our task was to distinguish between OOOMs indicating an employee’s absence was Temporary from those indicating a long-term or Permanent absence.  Reasons for temporary absences are often unspecified, but may include short-term illness, leave (vacation), and off-site meetings.  Reasons for permanent absences include resigning, being fired, retirement, surgery and recuperation, and military deployment.  Accurate classification into these two categories allows the survey researchers to subtract the permanent absence count from the survey refusal rate and more accurately determine the response rate for the survey.

The Approach

The OOOM Outlook database was imported into the WordStat software from Provalis Research (Peladeau, 2017b).  WordStat was configured to run without any previously constructed categorization dictionary, without an exclusion dictionary to remove unwanted “stop” words, and with lemmatization and stemming turned off so there would be no reduction of plurals and other word morphology to standard root forms of each word.  This project required close examination of the unmodified language used by the OOOM authors.

Identify terms.  WordStat identified the frequently-occurring words and phrases in the OOOM collection.  For each term, we examined all occurrences using WordStat’s concordance (Key-Word-in-Context) feature, which enables examination of each term and the language around it.  This makes it possible to quickly determine what terms mean and whether they can be used to classify.  Terms that were clear and consistent indicators of either temporary or permanent absences were added to the appropriate Temporary Absence or Permanent Absence category in an OOOMs dictionary created for this project.

This process resulted in 957 terms (words and phrases) that were associated with Temporary Absence and 72 terms associated with Permanent Absence.  Selected Temporary Absence and Permanent Absence themes are described in the two tables below.  The different number of themes in the two categories likely reflects both the greater number of temporary OOOMs in this data set and the more direct nature of the permanent OOOM language.

Classify OOOMs.  This two-category WordStat dictionary was used to classify each OOOM into one of the two absence categories.  OOOMs with either no dictionary hits or hits in both categories were classified manually.  Only 3,059 (13%) OOOMs needed some form of additional manual review, significantly reducing the effort required to classify the full set of OOOMs.

 

Temporary Absence Themes

Example Terms

1.     The term absence is consistently used for a finite period that will end with the employee’s return.absence, absence_from_the_office
2.     The access to or checking emails is only mentioned if the employee has ongoing responsibility to respond on behalf of the employing organization.access_during_this_time, access_to_emails, access_to_my_email, checking_emails, email_access
3.     Identification of someone who is temporarily acting in the employee’s position only occurs if the employee will return to resume job responsibilities.act_on_my_behalf, acting_director
4.     Specificity about the employee’s schedule below the day level only occurs when short periods of temporary absence are being described.afternoon, end_of_the_day, evenings, monday_morning, rest_of_the_day, thursday_afternoon
5.     Some terms either directly reference or indirectly imply an eventual return to the office.away_from_the_office_and_will_return, am_back_in_the_office, expect_to_return
6.     Emphasis of what the employee is currently or presently doing implies that this is a condition which will change in the near future.currenlty, time frame, am_not_in_the_office_at_this_time, am_presently_out_of_the_office
7.     References to employee being in or returning to the office indicate that this will happen soon.await_my_return, back_in_the_office, back_monday, returning, plan_to_return_to_the_office
8.     Instructions about what to do or who to contact in an emergency imply that the normal procedure is to wait for the employee’s expected return.emergency_assistance, immediate, immediate_concerns, immediate_help, pressing_matter, urgent
9.     Terms which indicate that something (usually checking messages) will occur occasionally over a period of time signal ongoing responsibility and eventual return.infrequent, infrequently, intermittently, limited_access, mail_during_this_time, periodically, occasionally, regularly_checking, sporadically, temporarily
10.  Direct references to leave or vacation indicate temporary absence.leave_beginning_monday, leave_from_friday, leave_the_week_of_august
11.  References to a holiday indicate short-term leave.labor_day

 

Permanent Absence ThemesExample Terms
1.     Indicates that employee has retired.retired_effective, am_retired, retiring
2.     Indicates that employee is no longer working there—a reason may or may not be given.accepted_a_position, i_have_left, i_am_no_longer_with, leaving_my_position, my_last_day, no_longer_working
3.     Employee is on an outside work assignment for an extended period.disaster_deployment, extended_deployment, i_am_currently_on_rotation
4.     Employee is on leave for an extended period.am_out_of_the_office_on_maternity_leave, extended_leave, indefinitely, medical_leave, post_surgery_recovery
5.     A direct indication that email will not be seen.  Yes, a few times it was really this simple.can_no_longer_be_reached_through_this_email

 

The Outcome

Our text analytics-enhanced review identified 710 of the OOOMs as permanent absences.  This number was removed from the refusal rate for the survey, improving the rate by .3%.  While not a large gain, the increase in reporting accuracy did contribute to the project.  The 12 hours spent accomplishing this task was a better time investment than the many more hours of low-tech review that would have been necessary to achieve the same result, perhaps with less accuracy.

Some Thoughts

While the processes used in this project were appropriate for our behind-the-scenes classification task, the classification dictionary could have been further developed to identify more than 87% of the OOOMs.  This was not a priority because the reduction of manual review to 13% of the OOOMs was a sufficient outcome and it was not clear that further development would have taken less time than review of the remaining OOOMs.  Had development continued, use of rules and word patterns would have been the likely next step in this further development.

The dictionary itself is unlikely to be directly useful to other survey researchers.  While OOOMs are similar across organizational settings, there is variability in the specifics of OOOM language.  For example, language in this sample was noticeably influenced by the government work context, the time of year the survey was fielded, and by the military culture in some parts of the surveyed workforce.

Themes.  What may be more useful to researchers adopting this approach are the Temporary and Permanent category themes identified in the two tables above.  Sets of terms associated with retirement, accepting another job, and medical absences are likely to be similar in other contexts.  Terms associated with absence, periodic message checking, transfer of authority, and vacation may differ somewhat, but these themes seem likely to be present in other collections of OOOMs.

Researchers should also watch for additional themes to emerge or increase in importance in other survey contexts.  Different timing of this survey, for example, would likely have resulted in a different set of holiday terms and required a somewhat different strategy for interpreting them.  An impending budget-driven government shutdown or other economic concerns would likely produce additional absence-related themes in workforce OOOMs.  The themes from this project should be considered a useful guide, rather than a complete map of the term space for similar future projects.

Leave.  A few armchair linguistics observations seem appropriate.  The most interesting term in the OOOM collection is leave, along with its variations in form and context.  Unlike absence, which reliably indicates a short period out of the office, the meaning of leave varies with context.  By itself, it can also indicate short-term absence.  But it can indicate long-term absence if it is “extended” or part of phrases like “I am leaving” or “I have left.”  This highlights the importance of turning off lemmatization and other word transformations for this type of text mining task to accurately capture situations where differences in word form signal important differences in meaning.

The leave example also reinforces Tom Reamy’s (2017) repeated emphasis of the importance of context in text analytics and Normand Peladeau’s (2017b) recommendation of phrase analysis as a key component of WordStat text mining projects.  This analysis also found many straightforward, word-level classification features.  But the words by themselves aren’t everything—we must be crafty feature engineers (Zeng, 2017) beyond the word level to harvest full value from our deep oceans of seemingly-unfathomable text.

John Ford is a Public Sector Research Psychologist. He can be reached by email at johnford514@yahoo.com

References

Peladeau, N.  (2017a).  How to build categorization dictionaries with WordStat.  Webinar retrieved from https://provalisresearch.com/resources/tutorials/webinar-content-analysis-text-mining/ on 6/12/2017.

Peladeau, N.  (2017b).  WordStat 7.1.17.  Software retrieved from https://provalisresearch.com/Download/wordstat.php on 6/12/2017.

Reamy, T.  (2017).  Deep text: Using text analytics to conquer information overload, get real value from social media, and add big text to big data.  Information Today, Inc.:  Medford, NJ.

Zeng, A.  (2017).  Mastering feature engineering:  Principles and techniques for data scientists.  O’Reilly Media:  Sebastopol, CA.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

文本分析 调查回复率 自动回复邮件 OOOMs 关键词分析
相关文章