Welcome to Machine Learning for Software Engineers! Each week I curate insights and resources specifically for software engineers learning AI. This article includes:
The most important lesson from this past in AI
Must-read engineer learning resources
The interesting things about this past week
Job postings and market updates (not this week, unfortunately. I didn’t do a great job of tracking them)
That are helpful and important for software engineers learning AI. Subscribe to get these in your inbox each week.
If you find Machine Learning for Software Engineers helpful, consider becoming a paid subscriber for just $3/month forever for a limited time.
Everybody knows the saying: “When there's a gold rush, sell the shovels”. When people talk about this regarding AI, they're usually referring to NVIDIA. When I talk about this, I'm referring to how software engineers should focus their skillset to future-proof their career.
I get tons of questions each week from software engineers asking what they need to learn to get involved with AI. Most focus on training models and lean heavily into understanding machine learning algorithms. Both of these things are good and important, but they're fundamentally different from the skillset a software engineer already has.
These skills are research science-based instead of engineering-based. Software engineers are used to designing, planning, and building largely deterministic systems via good engineering principles. Switching from an engineering to a research science mindset is difficult and it takes a while to wrap one's head around.
But luckily, software engineers don't have to go too far into this to get involved with AI. Most of the demand for jobs, knowledge, and skills in AI is in productionizing real-world machine learning systems that work for real applications. This is what software engineers are already good at. Importantly, this is why I try to clarify that AI won't take developer jobs–it'll create a ton more of them.
So what does this actually mean for you? For you to sell the shovels, you need to focus on the software engineering skills vital for AI systems. This will get you up to speed quicker and help you contribute faster.
A lot of software engineers get overwhelmed with how much they need to learn to get started with AI because they start learning how to model. Instead, they should focus on building upon the skills they already know.
Here are some areas I think are worth focusing on:
Machine Learning Infrastructure/MLOps: I frequently get contacted about job opportunities in machine learning infrastructure because every company is trying to figure theirs out right now. Behind every good model is a training system that allows it to be put together. Companies pay well for good infrastructure engineers because it's the difference between their modelers experimenting once every seven days versus seven times in one day.
Hardware: AI accelerators are just getting started. So many companies are working in this space trying to figure them out because the demand is so high. As AI further develops and new architectures become paramount to success, this space will only get larger.
Data Engineering: I would guess 90% of interesting problems with machine learning come down to understanding and properly handling data. If you know how to build data pipelines, analyze data, investigate data, and build scalable systems with data, your skill set will be valuable working with AI.
Distributed Systems and Cloud Computing: All large models are developed in the cloud using distributed training systems. Pretty much all AI jobs that I see require some sort of distributed system or cloud knowledge.
Cybersecurity: As AI systems become more complex and different AI systems are developed, there will be a need to understand how to secure those systems and use them in vital applications with sensitive data. In a world where technology is only growing and being used more, knowing how to keep things secure and private will always be good.
Now I'm not saying you shouldn't learn machine learning algorithms or how to model. Those skills will also be important.
If you’re interested in ML infra, check out the article I wrote about what makes it so interesting (also my best performing article to date!):
What I am trying to convey is that the modeling skillset is very different from what software engineers currently have. Instead of diving headfirst into modeling and becoming overwhelmed with the massive learning process, you can focus on software engineering-related skills that are going to 100% be necessary in AI.
If you have any questions, please don't hesitate to reach out. Enjoy the resources this week. 😊
If you missed us last week, we discussed by transformers might not be the future of AI:
Must Reads for Machine Learning Engineers This Week
Context Rot: How increasing input tokens impacts LLM performance: Modern LLMs feature input context lengths reaching millions of tokens and often score near-perfectly on benchmarks like Needle in a Haystack (NIAH). NIAH is a direct lexical matching task. Model performance degrades as input length increases, even under minimal conditions, often in surprising and non-uniform ways.
How to Ensure Reliability in LLM Applications: Large Language Models are powerful tools capable of performing a wide variety of tasks, but their stochastic outputs lead to unreliability. Ensuring output consistency involves using markup-tags, output validation, and tweaking the system prompt. Error handling strategies include implementing a retry mechanism, increasing the temperature, and having backup LLMs.
What Google's Viral AI Paper teaches us about Long Context Collapse, Agentic Evals, AI Safety, and more [Breakdowns] by : Google's Gemini 2.5 paper focuses on advanced reasoning, multimodality, long context, and next-generation agentic capabilities. Takeaways from this paper include information on long context collapse, agentic evaluations, and AI safety.
The Big LLM Architecture Comparison by : The original GPT architecture was developed seven years ago. Modern models like DeepSeek-V3 and Llama 4 maintain structural similarities to earlier versions such as GPT-2. Refinements include the evolution from absolute to rotational positional embeddings, the replacement of Multi-Head Attention with Grouped-Query Attention, and SwiGLU superseding activation functions like GELU.
Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA: Running large language model inference with stringent latency constraints requires optimizing the time-to-next-token during decode. The all-reduce collective became a significant bottleneck, accounting for 23% of end-to-end decode latency on an 8-way tensor parallel Gemma2 LLM with 8 NVIDIA H100 GPUs. A custom single-shot all-reduce algorithm, which aggregates data and performs reduction in a single stage, was implemented instead of the traditional ring algorithm.
Other Interesting Things This Week
Apple's MLX adding CUDA support: MLX is gaining a CUDA backend through an ongoing Work In Progress (WIP) Pull Request #1983. This development enables the MLX tutorial example to run, though its functionality is currently limited. The CUDA backend has been tested on Ubuntu 22.04 with CUDA 11.6.
Amazon launches Kiro, its own Claude-powered challenger to Windsurf and Codex: Amazon launched Kiro, an agentic integrated development environment (IDE). Kiro uses Claude Sonnet 3.7 and 4.0 as default model backends and is available in public preview for macOS, Windows, and Linux. Kiro operates as a general-purpose agentic IDE supporting any platform, distinguishing it from Q Developer's more limited third-party IDE support.
Elon Musk’s Grok is making AI companions, including a goth anime girl: Grok now offers AI companions for its "Super Grok" subscribers. This feature is available for $30 per month. Current companions include Ani, an anime girl, and Bad Rudy, a 3D fox creature.
Cognition (Devin AI) to Acquire Windsurf: Cognition signed a definitive agreement to acquire Windsurf, an agentic IDE. The acquisition includes Windsurf’s IP, product, trademark, brand, and its team. Windsurf generates $82M in annual recurring revenue and serves over 350 enterprise customers.
Let AI Tune Your Voice Assistant: Voice assistants represent the complete system a user interacts with, connected to an AI Model through a Live API that manages real-time audio and data streaming. The AI Model, a Large Language Model, understands user intent and determines actions. An assistant's effectiveness hinges on instructing its underlying AI model on tool usage through function calling.
Mark Zuckerberg says Meta is building a 5GW AI data center: Meta is constructing Hyperion, an AI data center in Louisiana projected to scale to five gigawatts of computational power. A separate 1 GW super cluster named Prometheus will come online in Ohio by 2026. A Meta data center project in Newton County, Georgia, has caused water taps to run dry for some residents.
Meta's Recruiting Secret: A post suggesting top researchers join Meta not for money, but for the ambitious goal of building superintelligence.
Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems: Large language models (LLMs) form, maintain, and lose confidence in their answers, exhibiting cognitive biases similar yet distinct from human biases. LLMs demonstrate initial overconfidence but rapidly change answers and lose confidence when presented with counterarguments, even if those arguments are incorrect.
Reflections on OpenAI: OpenAI expanded from over 1,000 employees to more than 3,000 within a year, starting in May 2024. This rapid growth led to significant challenges in company communication, reporting structures, and product delivery.
Why Transformers Aren't the Future of AI: A perspective exists where Large Language Models (LLMs) may not lead to Artificial General Intelligence (AGI). Arguments supporting this view often claim LLMs will not scale effectively with increased resources. A proper definition for AGI does not currently exist.
Military AI contracts awarded to Anthropic, OpenAI, Google, and xAI: The Pentagon awarded military AI contracts worth up to $800 million. Google, OpenAI, Anthropic, and xAI are among the recipients.
A summer of security: empowering cyber defenders with AI: AI innovations enhance cybersecurity, providing new tools for defenders to locate vulnerabilities. Google's Big Sleep agent, developed by Google DeepMind and Google Project Zero, actively searches for unknown security flaws. This agent found its first real-world vulnerability by November 2024 and continues to discover multiple flaws.
Call for Tech Blogs: A request for software and AI bloggers to share their work to create a master list.
Your 1M+ Context Window LLM Is Less Powerful Than You Think: Large Language Models exhibit context windows ranging from 200K (Claude) to 2M tokens (Gemini 1.5 Pro). An LLM's effective working memory can overload with relatively small inputs, occurring far before context window limits are reached. This phenomenon explains previously reported LLM failures, including an inability to detect plot holes or struggles with long stories.
Ex-Waymo engineers launch Bedrock Robotics to automate construction: Bedrock Robotics, an autonomous vehicle technology startup founded by Waymo and Segment veterans, has secured an $80 million funding round. The company develops a self-driving kit to retrofit construction and worksite vehicles. This technology upgrades existing fleets with sensors, compute, and intelligence for continuous operation.
Open Deep Research: Open Deep Research is an open source agent built on LangGraph. It connects to data sources, LLMs, and MCP servers. LangChain published information on Open Deep Research.
Google France hosted a hackathon to tackle healthcare's biggest challenges: Google France hosted a 12-hour hackathon in Paris, gathering 130 experts to prototype new medical solutions. Twenty-six teams used Google's open AI models, including Gemma and MedGemma, to address challenges such as emergency-room triage and oncology patient support. Google.org committed $5 million to organizations using AI to advance European healthcare.
The Kaitchup Index: A Leaderboard for Quantized LLMs: The Kaitchup Index provides a leaderboard for quantized LLMs, comparing formats such as GGUF, GPTQ, and AWQ, across different bitwidths. This benchmark primarily evaluates factual accuracy, world knowledge, and instruction following, with a significant multilingual component. It also incorporates a "Quantization Fidelity" metric, assessing how closely a quantized model replicates its original version based on generated tokens and sequences.
Forward vs Backward Differentiation: Three innovations enable training neural networks with billions of parameters: function vectorization for parallelization, gradient descent for optimizing multivariate functions, and backpropagation for efficient loss gradient computation. Forward-mode differentiation computes derivatives using the chain rule, involving partial derivatives for all edges, paths from input to output, multiplication of partial variables along paths, and summing the products.
Context Engineering by Hand ✍️: Context engineering extends beyond prompt engineering and this walks you through it by hand. AI by Hand Workshops are scheduled for July 23, covering topics such as Agent, Transformer, and SOTA.
Hackers exploit a blind spot by hiding malware inside DNS records: Malware is being concealed within DNS records, specifically broken into hexadecimal chunks and stored in TXT records of various subdomains. This allows malicious binaries to be retrieved via DNS lookup traffic, which often bypasses common security monitoring. DomainTools researchers identified this technique hosting Joke Screenmate malware.
Apple Intelligence Foundation Language Models Tech Report 2025: Two multilingual, multimodal foundation language models power Apple Intelligence features across devices and services. One is a ~3B-parameter on-device model optimized for Apple silicon through architectural innovations. The other is a scalable server model built on a novel Parallel-Track Mixture-of-Experts transformer.
Introducing ChatGPT agent: ChatGPT agents autonomously perform tasks by leveraging advanced language understanding. They interact with external tools and APIs to automate complex workflows and execute multi-step processes.
ChatGPT agent System Card: ChatGPT agent is a new agentic model in the OpenAI o3 family, integrating Deep Research's multi-step research capabilities with Operator's remote visual browser task execution. It includes a Terminal tool for code execution and data analysis, plus access to external data via Connectors. The system's launch is treated as High capability in the Biological and Chemical domain under the Preparedness Framework, activating associated safeguards.
New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap: Google's new Gemini Embedding model is now generally available. This model ranks number one on the Massive Text Embedding Benchmark (MTEB) and is a core part of the Gemini API and Vertex AI.
Exhausted man defeats AI model in world coding championship: Polish programmer Przemysław Dębiak defeated an OpenAI AI model in the 10-hour AtCoder World Tour Finals 2025 Heuristic contest in Tokyo. The competition involved solving a single complex optimization problem, and the OpenAI model finished in second place.
Matmul on GPU/TPU by hand ✍️: Details matrix multiplication on GPUs and TPUs, including 91 frames that show how to divide large matrices into tiles for accelerators. This is a process JAX automatically handles to speed up core calculations.
The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
Nobody knows how to build with AI yet: A project called Protocollie was developed in four days using unfamiliar languages without directly touching the code. The development "system" evolved from a single architecture document into four, used for solving problems, repeating workflows, and stories. The most experienced AI pair programmer has been active for at most two years.
Building Software is Easier Than Ever: A comment on the current ease of software development.
Intelligence as Efficiency: A take from François Chollet that true intelligence is the efficiency of acquiring and deploying new skills, not just a collection of them, cautioning against over-reliance on benchmarks.
Grok CLI Announced: An announcement for an open-source, hackable AI agent that brings Grok to the command line.
Writing scientific articles is an integral part of the scientific method for communicating research findings and uncovering new ideas.: Handwriting can lead to widespread brain connectivity and and positively affect learning and memory. Large-language models can generate scientific articles and peer-review reports in minutes, but they are not considered authors as they lack accountability.
An advanced version of Gemini Deep Think achieved a gold-medal standard at the International Mathematical Olympiad (IMO).: It solved five of six problems perfectly, earning 35 total points in a performance officially certified by IMO coordinators.
Thanks for reading!
Always be (machine) learning,
Logan