Society's Backend 07月22日 23:49
ML for SWEs #60: The skills software engineers should focus on to get involved with AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文为软件工程师提供了在AI领域发展的实用建议。作者指出,与其纠结于复杂的机器学习模型训练,工程师更应聚焦于AI系统生产化所需的工程技能,例如MLOps、数据工程、分布式系统和云计算等。这些技能是AI应用落地的关键,能让软件工程师更快速地贡献价值,并抓住AI发展带来的更多就业机会。文章还梳理了近期AI领域的多个重要研究和产品动态,包括LLM的上下文长度、可靠性、模型架构优化以及AI在各行业的应用进展。

🚀 **聚焦软件工程技能,拥抱AI机遇:** 对于软件工程师而言,AI发展并非威胁,而是创造了更多就业机会。与其将精力过分投入到研究型的模型训练,不如专注于AI系统生产化所需的工程技能,例如机器学习基础设施(MLOps)、数据工程、分布式系统、云计算和网络安全。这些技能是AI模型落地应用的关键,能让工程师更快地在AI领域贡献价值。

⚙️ **MLOps与数据工程是关键领域:** 机器学习基础设施(MLOps)的工程师需求旺盛,因为良好的训练系统是模型高效迭代的基础。同时,90%的AI问题都与数据处理相关,因此掌握数据管道构建、数据分析和可扩展数据系统能力的工程师在AI领域将极具价值。

🌐 **分布式系统与云计算是AI基石:** 大型AI模型的开发离不开云端和分布式训练系统。几乎所有AI岗位都要求具备分布式系统或云计算知识,这是支撑AI大规模运行和部署的基础。

💡 **AI模型并非完美,关注实际应用:** 文章提及,即使是先进的LLM在面对压力或反驳时也可能出现“放弃正确答案”的情况,这提示我们在实际应用中需要关注模型的可靠性,并结合工程手段(如输出验证、重试机制)来确保AI系统的稳定运行。

📰 **AI领域动态更新:** 文章还汇总了近期AI领域的多个重要动态,包括Apple Intelligence的模型技术、Amazon Kiro IDE的发布、Meta的AI数据中心建设、Google的研究成果(如LLM在压力下的表现、长上下文处理)以及Transformer架构的演进等,为读者提供行业前沿资讯。

Welcome to Machine Learning for Software Engineers! Each week I curate insights and resources specifically for software engineers learning AI. This article includes:

That are helpful and important for software engineers learning AI. Subscribe to get these in your inbox each week.

If you find Machine Learning for Software Engineers helpful, consider becoming a paid subscriber for just $3/month forever for a limited time.

Get 40% off forever

Everybody knows the saying: “When there's a gold rush, sell the shovels”. When people talk about this regarding AI, they're usually referring to NVIDIA. When I talk about this, I'm referring to how software engineers should focus their skillset to future-proof their career.

I get tons of questions each week from software engineers asking what they need to learn to get involved with AI. Most focus on training models and lean heavily into understanding machine learning algorithms. Both of these things are good and important, but they're fundamentally different from the skillset a software engineer already has.

These skills are research science-based instead of engineering-based. Software engineers are used to designing, planning, and building largely deterministic systems via good engineering principles. Switching from an engineering to a research science mindset is difficult and it takes a while to wrap one's head around.

But luckily, software engineers don't have to go too far into this to get involved with AI. Most of the demand for jobs, knowledge, and skills in AI is in productionizing real-world machine learning systems that work for real applications. This is what software engineers are already good at. Importantly, this is why I try to clarify that AI won't take developer jobs–it'll create a ton more of them.

So what does this actually mean for you? For you to sell the shovels, you need to focus on the software engineering skills vital for AI systems. This will get you up to speed quicker and help you contribute faster.

A lot of software engineers get overwhelmed with how much they need to learn to get started with AI because they start learning how to model. Instead, they should focus on building upon the skills they already know.

Here are some areas I think are worth focusing on:

Now I'm not saying you shouldn't learn machine learning algorithms or how to model. Those skills will also be important.

If you’re interested in ML infra, check out the article I wrote about what makes it so interesting (also my best performing article to date!):

What I am trying to convey is that the modeling skillset is very different from what software engineers currently have. Instead of diving headfirst into modeling and becoming overwhelmed with the massive learning process, you can focus on software engineering-related skills that are going to 100% be necessary in AI.

If you have any questions, please don't hesitate to reach out. Enjoy the resources this week. 😊

If you missed us last week, we discussed by transformers might not be the future of AI:

Must Reads for Machine Learning Engineers This Week

Context Rot: How increasing input tokens impacts LLM performance: Modern LLMs feature input context lengths reaching millions of tokens and often score near-perfectly on benchmarks like Needle in a Haystack (NIAH). NIAH is a direct lexical matching task. Model performance degrades as input length increases, even under minimal conditions, often in surprising and non-uniform ways.

How to Ensure Reliability in LLM Applications: Large Language Models are powerful tools capable of performing a wide variety of tasks, but their stochastic outputs lead to unreliability. Ensuring output consistency involves using markup-tags, output validation, and tweaking the system prompt. Error handling strategies include implementing a retry mechanism, increasing the temperature, and having backup LLMs.

What Google's Viral AI Paper teaches us about Long Context Collapse, Agentic Evals, AI Safety, and more [Breakdowns] by : Google's Gemini 2.5 paper focuses on advanced reasoning, multimodality, long context, and next-generation agentic capabilities. Takeaways from this paper include information on long context collapse, agentic evaluations, and AI safety.

The Big LLM Architecture Comparison by : The original GPT architecture was developed seven years ago. Modern models like DeepSeek-V3 and Llama 4 maintain structural similarities to earlier versions such as GPT-2. Refinements include the evolution from absolute to rotational positional embeddings, the replacement of Multi-Head Attention with Grouped-Query Attention, and SwiGLU superseding activation functions like GELU.

Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA: Running large language model inference with stringent latency constraints requires optimizing the time-to-next-token during decode. The all-reduce collective became a significant bottleneck, accounting for 23% of end-to-end decode latency on an 8-way tensor parallel Gemma2 LLM with 8 NVIDIA H100 GPUs. A custom single-shot all-reduce algorithm, which aggregates data and performs reduction in a single stage, was implemented instead of the traditional ring algorithm.

Other Interesting Things This Week

Apple's MLX adding CUDA support: MLX is gaining a CUDA backend through an ongoing Work In Progress (WIP) Pull Request #1983. This development enables the MLX tutorial example to run, though its functionality is currently limited. The CUDA backend has been tested on Ubuntu 22.04 with CUDA 11.6.

Amazon launches Kiro, its own Claude-powered challenger to Windsurf and Codex: Amazon launched Kiro, an agentic integrated development environment (IDE). Kiro uses Claude Sonnet 3.7 and 4.0 as default model backends and is available in public preview for macOS, Windows, and Linux. Kiro operates as a general-purpose agentic IDE supporting any platform, distinguishing it from Q Developer's more limited third-party IDE support.

Elon Musk’s Grok is making AI companions, including a goth anime girl: Grok now offers AI companions for its "Super Grok" subscribers. This feature is available for $30 per month. Current companions include Ani, an anime girl, and Bad Rudy, a 3D fox creature.

Cognition (Devin AI) to Acquire Windsurf: Cognition signed a definitive agreement to acquire Windsurf, an agentic IDE. The acquisition includes Windsurf’s IP, product, trademark, brand, and its team. Windsurf generates $82M in annual recurring revenue and serves over 350 enterprise customers.

Let AI Tune Your Voice Assistant: Voice assistants represent the complete system a user interacts with, connected to an AI Model through a Live API that manages real-time audio and data streaming. The AI Model, a Large Language Model, understands user intent and determines actions. An assistant's effectiveness hinges on instructing its underlying AI model on tool usage through function calling.

Mark Zuckerberg says Meta is building a 5GW AI data center: Meta is constructing Hyperion, an AI data center in Louisiana projected to scale to five gigawatts of computational power. A separate 1 GW super cluster named Prometheus will come online in Ohio by 2026. A Meta data center project in Newton County, Georgia, has caused water taps to run dry for some residents.

Meta's Recruiting Secret: A post suggesting top researchers join Meta not for money, but for the ambitious goal of building superintelligence.

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems: Large language models (LLMs) form, maintain, and lose confidence in their answers, exhibiting cognitive biases similar yet distinct from human biases. LLMs demonstrate initial overconfidence but rapidly change answers and lose confidence when presented with counterarguments, even if those arguments are incorrect.

Reflections on OpenAI: OpenAI expanded from over 1,000 employees to more than 3,000 within a year, starting in May 2024. This rapid growth led to significant challenges in company communication, reporting structures, and product delivery.

Why Transformers Aren't the Future of AI: A perspective exists where Large Language Models (LLMs) may not lead to Artificial General Intelligence (AGI). Arguments supporting this view often claim LLMs will not scale effectively with increased resources. A proper definition for AGI does not currently exist.

Military AI contracts awarded to Anthropic, OpenAI, Google, and xAI: The Pentagon awarded military AI contracts worth up to $800 million. Google, OpenAI, Anthropic, and xAI are among the recipients.

A summer of security: empowering cyber defenders with AI: AI innovations enhance cybersecurity, providing new tools for defenders to locate vulnerabilities. Google's Big Sleep agent, developed by Google DeepMind and Google Project Zero, actively searches for unknown security flaws. This agent found its first real-world vulnerability by November 2024 and continues to discover multiple flaws.

Call for Tech Blogs: A request for software and AI bloggers to share their work to create a master list.

Your 1M+ Context Window LLM Is Less Powerful Than You Think: Large Language Models exhibit context windows ranging from 200K (Claude) to 2M tokens (Gemini 1.5 Pro). An LLM's effective working memory can overload with relatively small inputs, occurring far before context window limits are reached. This phenomenon explains previously reported LLM failures, including an inability to detect plot holes or struggles with long stories.

Ex-Waymo engineers launch Bedrock Robotics to automate construction: Bedrock Robotics, an autonomous vehicle technology startup founded by Waymo and Segment veterans, has secured an $80 million funding round. The company develops a self-driving kit to retrofit construction and worksite vehicles. This technology upgrades existing fleets with sensors, compute, and intelligence for continuous operation.

Open Deep Research: Open Deep Research is an open source agent built on LangGraph. It connects to data sources, LLMs, and MCP servers. LangChain published information on Open Deep Research.

Google France hosted a hackathon to tackle healthcare's biggest challenges: Google France hosted a 12-hour hackathon in Paris, gathering 130 experts to prototype new medical solutions. Twenty-six teams used Google's open AI models, including Gemma and MedGemma, to address challenges such as emergency-room triage and oncology patient support. Google.org committed $5 million to organizations using AI to advance European healthcare.

The Kaitchup Index: A Leaderboard for Quantized LLMs: The Kaitchup Index provides a leaderboard for quantized LLMs, comparing formats such as GGUF, GPTQ, and AWQ, across different bitwidths. This benchmark primarily evaluates factual accuracy, world knowledge, and instruction following, with a significant multilingual component. It also incorporates a "Quantization Fidelity" metric, assessing how closely a quantized model replicates its original version based on generated tokens and sequences.

Forward vs Backward Differentiation: Three innovations enable training neural networks with billions of parameters: function vectorization for parallelization, gradient descent for optimizing multivariate functions, and backpropagation for efficient loss gradient computation. Forward-mode differentiation computes derivatives using the chain rule, involving partial derivatives for all edges, paths from input to output, multiplication of partial variables along paths, and summing the products.

Context Engineering by Hand ✍️: Context engineering extends beyond prompt engineering and this walks you through it by hand. AI by Hand Workshops are scheduled for July 23, covering topics such as Agent, Transformer, and SOTA.

Hackers exploit a blind spot by hiding malware inside DNS records: Malware is being concealed within DNS records, specifically broken into hexadecimal chunks and stored in TXT records of various subdomains. This allows malicious binaries to be retrieved via DNS lookup traffic, which often bypasses common security monitoring. DomainTools researchers identified this technique hosting Joke Screenmate malware.

Apple Intelligence Foundation Language Models Tech Report 2025: Two multilingual, multimodal foundation language models power Apple Intelligence features across devices and services. One is a ~3B-parameter on-device model optimized for Apple silicon through architectural innovations. The other is a scalable server model built on a novel Parallel-Track Mixture-of-Experts transformer.

Introducing ChatGPT agent: ChatGPT agents autonomously perform tasks by leveraging advanced language understanding. They interact with external tools and APIs to automate complex workflows and execute multi-step processes.

ChatGPT agent System Card: ChatGPT agent is a new agentic model in the OpenAI o3 family, integrating Deep Research's multi-step research capabilities with Operator's remote visual browser task execution. It includes a Terminal tool for code execution and data analysis, plus access to external data via Connectors. The system's launch is treated as High capability in the Biological and Chemical domain under the Preparedness Framework, activating associated safeguards.

New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap: Google's new Gemini Embedding model is now generally available. This model ranks number one on the Massive Text Embedding Benchmark (MTEB) and is a core part of the Gemini API and Vertex AI.

Exhausted man defeats AI model in world coding championship: Polish programmer Przemysław Dębiak defeated an OpenAI AI model in the 10-hour AtCoder World Tour Finals 2025 Heuristic contest in Tokyo. The competition involved solving a single complex optimization problem, and the OpenAI model finished in second place.

Matmul on GPU/TPU by hand ✍️: Details matrix multiplication on GPUs and TPUs, including 91 frames that show how to divide large matrices into tiles for accelerators. This is a process JAX automatically handles to speed up core calculations.

The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

Nobody knows how to build with AI yet: A project called Protocollie was developed in four days using unfamiliar languages without directly touching the code. The development "system" evolved from a single architecture document into four, used for solving problems, repeating workflows, and stories. The most experienced AI pair programmer has been active for at most two years.

Building Software is Easier Than Ever: A comment on the current ease of software development.

Intelligence as Efficiency: A take from François Chollet that true intelligence is the efficiency of acquiring and deploying new skills, not just a collection of them, cautioning against over-reliance on benchmarks.

Grok CLI Announced: An announcement for an open-source, hackable AI agent that brings Grok to the command line.

Writing scientific articles is an integral part of the scientific method for communicating research findings and uncovering new ideas.: Handwriting can lead to widespread brain connectivity and and positively affect learning and memory. Large-language models can generate scientific articles and peer-review reports in minutes, but they are not considered authors as they lack accountability.

An advanced version of Gemini Deep Think achieved a gold-medal standard at the International Mathematical Olympiad (IMO).: It solved five of six problems perfectly, earning 35 total points in a performance officially certified by IMO coordinators.


Thanks for reading!

Always be (machine) learning,

Logan

Share

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 软件工程 MLOps LLM 职业发展
相关文章