Write Good Enough Code, Quickly

Published on December 15, 2024 4:45 AM GMT

At the start of my Ph.D. 6 months ago, I was generally wedded to writing "good code". The kind of "good code" you learn in school and standard software engineering these days: object oriented, DRY, extensible, well-commented, and unit tested. I like writing "good code" - in undergrad I spent hours on relatively trivial assignments, continually refactoring to construct clean and intuitive abstractions. Part of programming's appeal is this kind of aesthetic sensibility - there's a deep pleasure in constructing a pseudo-platonic system of types, objects, and functions that all fit together to serve some end. More freedom than math, but more pragmatism than philosophy. This appreciation for the "art" of programming can align with more practical ends - beauty often coincides with utility. Expert programmers are attracted to Stripe because of their culture of craftsmanship, and Stripe promotes a culture of craftsmanship (in part, presumably) because "building multi-decadal abstractions" (in the words of Patrick Collison) is useful for the bottom line.

And this is all well and good, if (as is more often the case than not) you expect your code to be used in 1, 2, 10 years. But in research, this is often not the case! Projects typically last on the order of weeks to months, not years to decades. Moreover, projects typically involve a small number (often 1 or 0) of highly involved collaborators, as opposed to the large, fragmented teams typical of industry. Moreover, speed is paramount. Research is a series of bets, and you want to discover the outcome of the bet as fast as possible. Messy code might incur technical debt, but you don't have to pay if you scrap the entire project.

I had heard advice like this going into my PhD, both in the context of research and product development generally (MVP, Ballmer Peak, etc). It took me a while to internalize it though, in part, I suspect, because there's an art to writing "messy" code too. Writing error-prone spaghetti code is not the answer - you need stuff to work quickly to get results quickly. The goal is to write good enough code, efficiently, but learning what good enough means is a skill unto itself.

Principles for Good-Enough Code

Below is a first pass at some guiding principles. I focused on ML research in Python, but I suspect the lessons are generalizable

where t is something like an exponential distribution with median 1 day.

indirection

This is the kind of advice that’s horrible for freshman CS students, but probably helpful for first-year PhD students ^[1] Having everything in one place increases context - you can just read the program logic, without having to trace through various submodules and layers of abstraction. It also encourages you to constantly review code which otherwise might be tucked away, naturally helping you to catch errors, identify improvements, or notice additional axes of variation in the system.

When you’ve had the impulse two or three times to pull something out into a separate function or object, do it. As a default though, be very suspicious of coding activity that isn’t directly doing the thing.

magic numbers

In keeping with minimizing indirection, it's often better to reuse a component by copying and pasting rather than sharing it across functions/scripts. Not only does this improve context, but it also promotes decoupling. If you end up needing to modify the component for a particular usecase, you can do so without worrying about how the change will affect functionality elsewhere (the conserve of this is that if you want to make the same modification, you have to do it twice, so, as always, user discretion is required).

When told to prioritize speed in coding, we often imagine the rogue hacker, wizzing away at a terminal , no time wasted without a keystroke. And sure, maybe 10x engineers operate something like this. But for mere mortals, it's important to remember that you can still do a bit of planning before getting to work. For me, planning usually takes the form of pseudo-code comments, but a little diagram sketching and rubbing ducking won't hurt either. The key is to efficiently execute an imperfect plan - and this requires having an imperfect plan to begin with.

ML debugging is hard

This should be obvious. As of December 14th 2024, I'd recommend Curser with Sonnet 3.5 (though I occasionally use O1 to work through some math)

Again, all this advice assumes a baseline of "standard software engineering practices" - I want to help cure you of deontic commitments like never repeating yourself. But if you don't need curing in the first place, you should probably reverse this advice.

My ML Research Workflow

With these principles in mind, I'll walk through my current research workflow. My goal is to fluidly transition and forth from a rough experimental notebook to a full experiment pipeline with tracking, sweeps, and results visualization.

Initialize an empty python project with a project-specific virtual environment (I’d recommend poetry, which makes dependency and virtual environment management really seamless - dependency hell is a great way to get slowed down)

mkdir my-projectcd my-projectmkdir my_projecttouch my_project/init.pypoetry init --no-iteraction

Install bare-minimum dependencies - numpy, pandas matplotlib, torch, and (to use a jupyter notebook) ipykernel.

poetry add numpy pandas torch matplotlib ipykernel

This is where you (without loss of generality) load the dataset and model, play around with transforms, tokenization, dataloading, etc, check that shapes are as expected, and write the “for epoch in range(epochs)” loopDon’t worry too much about extensive metric logging with fancy experiment trackers like tensorboard or wandb - log the minimum amount of information (often with dictionaries and print statements) to convince you that training is roughly working as expected (but fine, tensorboard can be helpful here in plotting training curves in real time)

Adam beta values

@dataclassclass Config():  lr: float=1e-3  weight_decay: float=1e-4  epochs: int = 5  # ...

Using the config, setup a simple experiment tracking system. In general, use a datetime system rather than config-specific directories to start - your code and configs will change a lot early on, config file names can get long, and you don't want to overwrite old experiments after making changes. Do make sure to log the serialized config in the experiment directory though.

Again, feel free to use experiment tracking systems like tensorboard and wandb, but you can get a surprising amount of mileage out of nested collections, print statements, and matplotlib, and there are often benefits to "rolling your own" (c.f. minimize indirection, or more generally maximizing context)

@dataclassclass Config():  #...  expdir: str = f"output/{datetime.now().strftime('%Y-%m-%d%H-%M-%S')}"conf = Config()# Note we use OmegaConfig - which we'll return to in the next step!with open(f"{exp_dir}/config.yaml", "w") as f:    OmegaConf.save(config=conf, f=f)metrics = {}# run experiments and log metrics ...with open(f"{conf.exp_dir}/metrics.json", 'r') as f:   json.dump(metrics, f)

Eventually though, you’ll want to run experiment sweeps, typically on a shared cluster managed by slurm. This requires submitting a slurm job - consisting of required resources and a command to execute. Since we can't run notebooks directly, I use nbconvert to convert my experiment notebook into a runable script:

#!/bin/bashNOTEBOOK_PATH="$1"jupyter nbconvert --clear-output --inplace "$NOTEBOOK_PATH"jupyter nbconvert --to script "$NOTEBOOK_PATH"

is_notebook() function

OmegaConf

if not is_notebook():    import sys     overrides = OmegaConf.from_cli(sys.argv[1:])    conf_dict = OmegaConf.merge(OmegaConf.structured(conf), overrides)    conf = Config(*conf_dict) # reinitialize for dot access

With this infrastructure in place, we can execute experimental sweeps. Typical practice is to create bash scripts with different settings, but preferring to work in python, I create a separate notebook, ( exp_sweeps.ipynb) constructing an experiment config that contains a subset of the full configuration parameters (remember, you are the user - you don’t need to enforce this subset with inheritance or type checks).

@dataclass class Experiment:   lr: float = 1e-3  weight_decay: float = 1e-4  exp_dir: str = None  def __post_init(self):    # we use semantic directories for structured sweeps    self.expdir = f"output/lr{self.lr}wd{self.weight_decay}"# contruct experiments from itertools import productlrs = [1e-4, 1e-3, 1e-2]wds = [1e-3, 1e-3]experiments = [Experiment(lr, wd) for lr, wd in product(lrs, wds)]

When running sweeps, I tend to overwrite the default experiment directory with semantic directory names corresponding to the experiment config. While this sometimes introduces the problems I discussed above (namely overwriting prior experiments), it feels more appropriate in the sweep stage where we typically compare results to other results in the sweep, rather than earlier sweep iterations. And in cases where we want to preserve results that would otherwise be overwritten, we can just take care in doing so (by e.g. moving them to a different directory).
After constructing a list of experiment objects, I use submitit to launch experiments programmatically, converting the experiment configs to command line arguments:

def conf_to_args(conf: dict):    args = []    for key, value in conf.items():        # check if value is an enum         if isinstance(value, Enum):            value = value.name         elif value is None:            value = 'null'        args.append(f"{key}={value}")    return argsdef run_experiments(executor, experiments: list[Experiment], script_name: str):    with executor.batch():        jobs = []        for exp in experiments:            function = submitit.helpers.CommandFunction(                ["python", script_name] + conf_to_args(exp.dict__)            )            jobs.append(executor.submit(function))    return jobs  # example executor that runs locallyexecutor = submitit.AutoExecutor(folder=out_dir)executor.update_parameters(timeout_min=60  48, mem_gb=16,gres="gpu:1")jobs = run_experiments(executor, experiments, "run_exp.py")

Once the experiments are completed, I load and analyze the results using the same experiment objects. In this way, data generation and analysis are tightly coupled - paper figures are defined in the same notebook where experiments are run

def get_exp_metrics(exp: Experiment):    if not (exp.exp_dir / "metrics.json").exists():        raise FileNotFoundError(f"Metrics file not found for {exp.exp_dir}")    with open(exp.exp_dir / "metrics.json", "r") as f:        exp_metrics = json.load(f)    return exp_metrics  # load exp metrics after jobs are completed exp_metrics = [get_exp_metrics(exp) for exp in experiments]# ... (analyze data, make figures, etc)

Mileage on this exact setup may vary, but thus far I’ve found it strikes a great balance between flexibility and efficiency. Most significantly, I've found my "ugh field" around moving from local experimental notebook to submitting cluster jobs has been substantially reduced.

Conclusion

So yeah, those are my tips and basic setup. Again, they apply most strongly to early stage research, and most weakly to developing large compressive pieces of infrastructure (including research infrastructure like PyTorch, Hugging Face, and Transformer-lens). In some sense, the core mistake is to assume that early stage research requires novel extensive research infrastructure^[2]. Developing open source infrastructure is, to a first approximation^[3] prosocial: the gains are largely born by other users. So by all means, develop nice open-source frameworks - the world will benefit from you. But if you have new research ideas that you're eager to try out, the best approach is often to just try them ASAP.

Principles for Good-Enough Code

My ML Research Workflow

Conclusion

Related Articles/ Sources of Inspiration

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签