Artificial Fintelligence 01月17日
How to hire ML engineers/researchers
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了如何高效招聘机器学习(ML)工程师和研究人员。文章强调,招聘高质量候选人是营销和品牌建设的问题,而面试过程的关键在于运行与实际工作相关的测试。文章建议使用实际工作中遇到的问题作为面试题目,例如调试ML问题或评估模型。此外,文章还强调要明确公司对研究人员的期望,如不以发表论文为优先,而是以产品和业务需求为导向。文章还建议尽早向候选人坦诚公司的不足之处,以避免后期不匹配造成的损失。最后,文章区分了研究工程师和研究科学家,强调了不同角色对技能和文化适应性的不同要求。

🛠️ 使用工作样本测试:面试应尽可能贴近实际工作,避免使用Leetcode等与实际工作脱节的题目。应该使用团队工作中出现的问题,例如调试ML问题,或者让候选人找出代码中的常见错误。

🧐 关注评估能力:面试中,要考察候选人对模型评估的理解和经验,以及他们在评估中可能遇到的问题。 允许长时间的沉默,观察候选人的思考过程和反应能力,鼓励候选人多表达。

⚠️ 明确技能要求:不要假设候选人具备所有“理所当然”的技能,如版本控制。如果某些技能并非必需,就不应将其作为筛选条件,避免不必要地缩小候选人范围。例如,不应该过分强调RL技能,而是注重通用的ML专业知识。

📢 强调业务导向:在招聘初期,应明确告知候选人公司以产品和业务需求为导向,不以发表论文为优先。尽早向候选人坦诚公司的不足之处,以避免后期不匹配造成的损失,确保招聘符合公司实际需求。

👩‍🔬 区分研究工程师和研究科学家:研究工程师需要具备在大型代码库中实现研究的能力,而研究科学家更注重选择正确的研究问题和文化契合度。对于研究科学家,编码能力并非首要,而选择研究方向的能力更为关键。

I’m going to assume that you’ve figured out how to find candidates which appear great on paper and your only problem is figuring out which of them to hire. Getting high quality candidates is more of a marketing/brand/sales exercise, which I don’t have that much experience with. Getting high quality candidates to apply is a non-trivial problem in the current market, particularly if you are trying to hire anyone with more than ~3 years of experience. But, nonetheless, it is beyond the scope of this article. I’m going to discuss how you should run interview processes for ML engineers/researchers.

Before I begin, a request: I’m writing an article about human data, so if you manage/use the results of a human labelling pipeline, or use signals from your users for model training/evaluation, please get in touch.

When discussing roles, I’m going to use the DeepMind classification, which has three main technical roles and a common experience ladder:

    Software engineer (SWE), which is a standard software engineer that isn’t required to know ML or research (although it is, of course, an advantage).

    Research engineer (RE), which is basically everyone who isn’t a SWE or a RS. Most companies in the LLM era that are hiring “researchers” that are expected to be able to code their ideas in large codebases are looking for REs. Their duties could run the gamut from managing experiments, to optimizing code, to doing novel research.

    Research Scientists (RS), which is someone with a PhD whose success is judged entirely on their publication record. This job is not dissimilar from being a postdoc. Some RSs spend very little time coding, and some are better coders than most REs. The key differentiating factor between an RE and a RS is that an RS typically has weaker coding skills and spends more of their time thinking about what to work on next.

The hiring process for all of these roles is broadly similar. To hire for any of them (or any role in general!) you should be running work sample tests for the specific tasks you expect each of these candidates to be able to do, while maintaining a consistent high standard in your evaluation. The hardest part, by far, about running a good hiring process is getting buy in from the rest of your organization to continue to run a rigorous process, and to maintain a high bar. Often you have an immediate need to hire to meet some goal (if you don’t, you probably shouldn’t be hiring), so it’s always tempting to relax the bar slightly. Don’t. If you do, you’ll wake up 18 months later with a mediocre team.

I’m going to focus mostly on interviewing candidates who fall in the RE bucket, as that’s what most organizations in the product-driven research era need. These are candidates who can implement all of their ideas and have the technical expertise to run large-scale experiments by themselves.

Subscribe now

Work sample tests

You want your interviews to be as close to the job as possible. I dislike Leetcode questions for this reason. They can have their place, as they’re generally a good way to screen for competency/conscientiousness, but they tend not to work as well with researchers/ML engineers as they spend less time preparing for Leetcode.

An approach I like is to take problems that have come up during your team’s work and turn those into tasks. One that I like is debugging a real world ML problem. A question that I have used in the past is “I have a new idea to make our models better. I implement it. It doesn’t work. What should I do?” This is a common problem that happens at work all the time! I try something, it doesn’t work, and I grab a coworker to talk to them about it. Another variant is to take a script that works, and add a bunch of common bugs to it to see if they can find them, ideally bugs that have happened as part of your work.

Another question that I like to ask is to discuss evaluation, and probe the candidate on which problems can come up with evaluation. There are many weird ways that evaluation can fail, most of which aren’t explicitly written about, so it’s a good screen to see what a candidate has experience working on.

When asking these questions, one useful tactic is to allow long, uncomfortable silences to develop. Your general rule of thumb should be to let the candidate talk as much as possible (and a good metric would be % of time the interviewer is talking— should be as close to 0 as possible). If you ask the candidate a question, like the “new idea doesn’t work” question above, be ready to let the question hang in the air while you sit in silence until they answer. You want to 1) let the person think and 2) see how they react.

The main goal with questions like these is to get away from the contrived Leetcode like problems which can be memorized or prepared for, and instead focus on questions which require practical experience in the role. Those have value, to be clear, but I don’t think they’re as relevant for the research family of roles.

Be careful about what you include/exclude

Your candidates will have shocking gaps in knowledge. If you don’t test for a skill, you can’t assume candidates will have that skill. This is true even for “obvious” skills like “can use source control.” I have worked with very skilled researchers that barely know how to use Git, and have basically no experience working in a team on a shared codebase.

The corollary to this is that if there’s a skill that you think “everyone should have”, many people won’t, so if you screen for it, you will remove them from the candidate pool. Be careful as to whether or not you actually want to remove them from the candidate pool; if a skill is not required, you are needlessly making the candidate pool weaker.

For instance, I have some friends that run a company using reinforcement learning (RL) to control industrial facilities. They are world experts in RL. I encouraged them to not screen for RL skills, but only to screen for general ML expertise, as they are probably the best people in the world to upskill their employees in RL.

If you’re not sure what skills you want to include/exclude, particularly on the behavioural side, I would encourage you to read Talent, by Tyler Cowen and Daniel Gross. It’s a good overview. A particular skill I like to see is that someone has a track record of relentlessly doing what was necessary to make their project succeed, across abstraction levels. For instance, Julian Schrittweiser, the lead for AlphaZero, did everything from writing papers, coming up with research ideas, implementing the Jax training pipelines in Python and writing highly-optimized C++. On the flip-side, if candidates restrict themselves to only engaging in certain subsets of the project— not having a history of cleaning data, or only engaging at the idea level, and not writing any code— I would view that as an anti-signal.

Screen candidates aggressively

A common anti-pattern that I have seen is where companies will only screen for technical skills, or, if they do other, behavioural interviews, they will only focus on leadership/teamwork skills. These are really important! But an area that screws a lot of AI companies up is that they will be hiring incredibly skilled ML people coming directly from academia, who do not want to work on products. Many PhDs, and a lot of master’s/undergrads graduates, are only familiar with academia, and value their publication record above all else, including compensation. When hiring general software engineers this is not typically an issue, as most software engineers want to build products that are successful and make money.

Until recently, many of the large industrial research labs (DeepMind, FAIR, MSR, etc.) were run in a manner very similar to academia, and the way to advance one’s career was to publish academic papers in academic journals, so many people who have spent their careers at these organizations are still immersed in the academic mindset. They have not spent any of their professional lives trying to improve business related KPIs, and many have no experience orienting their work around organization level business goals (like OKRs). For many product companies, particularly startups, this is exactly the opposite of what they need. It is a point of pride for some researchers that they work on “pure science” which has no apparent useful application (if this mindset is strange to you, there’s a famous essay by Hardy, a famous mathematician, who explains it in detail).

My advice is to spend the first call with the candidate addressing this explicitly, perhaps saying something like “Publishing papers is not a priority to us. Do not expect that you will ever publish a paper as part of your job. You will be expected to work on research that is driven by the needs of the product/business and will not have academic freedom to pursue whatever ideas you find interesting.” It may sound harsh, but this is true at most companies, and it’s worth making it explicit upfront.

I would generally advise that other unappealing aspects of working at your company should be mentioned in the first call as well. Matt Mochary has written about how important the anti-sell is. You want to give candidates an accurate understanding of what working at your company will be like; one of the worst outcomes for you is to hire someone, spend a lot of time onboarding/training them, and have them churn because they don’t actually like the job. Do this as early in the process as possible, ideally the first call.

Hiring scientists vs engineers

Using the DeepMind RE vs RS distinction, many companies only have what DeepMind would call REs, as you need to be able to implement your research in large codebases. The main difference is that, for research scientists, coding ability is less important, the ability to choose the right problem is much more important, and you have to focus on culture fit more.

Many people who are sticklers about the “scientist” label instead of being ok with the engineer label 1) expect to be able to publish papers and 2) expect to do “pure” research that’s not driven by product needs. That’s often not acceptable at most companies, so they will be unhappy and churn 12-18 months in. Screen for this.

Behavioural skills matter

You have to screen for behavioural skills Particularly once engineers/reseachers hit the senior level (using the Google scale, so L5+) soft skills are more important than hard skills. Possibly even earlier in their career.

Mentoring matters. Feedback matters. Connecting with your teammates matters. For more junior roles, being “teachable” matters more, but as the person gets more senior, their ability to mentor and give feedback becomes more and more important.

Additionally, as a researcher, often you are dealing with ambiguity throughout the research process, so it’s important to discuss your experiments with your coworkers. If someone is particularly disagreeable, this will not go well, which can make your team less productive.

If you’re hiring someone from a large company, it’s important to assess their ability to add more process in a reasonable way. A common failure among senior people from big tech that are too “bigtech minded” is that they will add too much, unnecessary process, or expect to be able to grow their team quickly to match the staffing levels they’ve been historically used to.

Keep a rigorous process

It’s easy to think “we need someone so let’s hire someone quick.” Don’t. Keep a high bar and encourage the rest of your team, particularly if you’re at a company paying top of market. Otherwise, you’ll wake up 12 months later with a mediocre team.

Jeff Bezos had a list of questions:

    “Will you admire this person?”

    “Will this person raise the average level of effectiveness of the group they’re entering?”

    “Along what dimension might this person be a superstar?

I think this is the right approach. You should generally try to only hire the best people, and you can get by with a surprisingly small team. I subscribe to the Nat Friedman philosophy:

Smaller teams are better:

    Faster decisions, fewer meetings, more fun

    No need to chop up work for political reasons

    No room for mediocre people (can pay more, too!)

    Large-scale engineering projects are more soluble in IQ than they appear

    Many tech companies are 2-10x overstaffed

Thanks to Morgan McGuire, Tom McGrath, Kostis Gourgoulias, Sholto Douglas, Priya Joseph, Pavel Surmenok, and Johnny for reading drafts of this.

Subscribe now

Finally, again: I’m writing an article about human data, so if you manage/use the results of a human labelling pipeline, or use signals from your users for model training/evaluation, please get in touch.

Misc resources

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器学习 招聘面试 ML工程师 研究科学家 工作样本测试
相关文章