Source Control for Prototyping and Analysis

Published on September 26, 2024 1:50 AM GMT

When I'm doing exploratory work I want to run many analyses. I'musually optimizing for getting something quick, but I want to documentwhat I'm doing enough that if there are questions about my analysis orI later want to draw on it I can reconstruct what I did. I've taken afew approaches to this over the years, but here's how I work thesedays:

For each analysis I make a local directory,~/work/YYYY-MM-DD--topic/. These contain large files I'mcopying locally to work with, temporary files, and outputs. Whenthese get too big I delete them; they're not backed up, and I canrebuild them from things that are backed up.

Code goes in a git repo, in files named likeYYYY-MM-DD--topic.py. Most of my work lately has beengoing into an internal repo, but if there's nothing sensitive I'll usea publicone. I don't bother with meaningful commit messages; the goal isjust to get the deltas backed up. If I later want to run an analysissimilar to an old one I duplicate the code and make a new workdirectory.

Code is run from the command line in the work directory, whichmeans that in my permanentshell history every command I ran related to topicwill be tagged with ~/work/YYYY-MM-DD--topic/.

For example, the code for the figures in my recent NAO blogpost on flu is in 2024-09-05--flu-chart.pyand 2024-09-12--rai1pct-violins.py.

This approach optimized for writing over reading, but maintainingenough context that I can figure out what I was doing if I need to.I'll usually link the code from documents that depend on it, but evenif I forget to it's pretty fast to figure out which code it would havebeen from names and dates. Running git grep and histgrepget me a lot of what other people seem to get from LLM-autocomplete,and someday I'd like to try priming an LLM with my personal history.

Often something I'm doing moves from "playing around trying tounderstand" to "something real that my team will continue to rely on".I try to pay attention to whether I'm getting to that point and thenstart taking care of the code properly, in an appropriate repo withmeaningful commit messages etc.

Comment via: facebook, mastodon

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签