Published on November 15, 2024 12:22 AM GMT
TLDR: the AI safety initiative at Georgia Tech recently hosted an AI safety focused track at the college's flagship AI hackathon. In this post I share how it went and some of our thoughts.
Overview
Hey! I’m Yixiong, co-director of Georgia Tech’s AI safety student group. We recently hosted an AI safety focused track at Georgia Tech’s biggest AI hackathon, AI ATL. I’m writing this retrospective because I think this could be a useful data point to update on for fellow AIS groups thinking about hosting similar things!
The track was focused on evaluations on safety-critical and interesting capabilities, this is the track page that was shown to hackers (feel free to reuse/borrow content, just let us know!)
Huge thank you (in no particular order) to Michael Chen, Long Phan, Andrey Anurin, Abdur Raheem, Esben Kran, Zac Hatfield-Dodds, Aaron Begg, Alex Albert, Oliver Zhang, and others who helped us make this happen!
Quick stats:
- ~350 hackers (overall hackathon).104 projects submitted (overall hackathon).Submissions to our AI safety track: 16 teams (~50 people).
- 6 projects were solid/relevant, the rest were very noisy submissions, since you could submit to as many tracks as you want to.
- Privacy-Resilience and Adaptability Benchmark (PRAB): puts the model in a realistic and sensitive deployment environment and benchmarks models against several categories of prompting attacksStressTestAI: similar to the above, but in less realistic and ‘higher stakes’ settings like disaster response, but creative metrics.DiALignment: benchmarked refusal after performing activation steering away from the refusal behavior.AgentArena: set up agents in cooperative games (like prisoner’s dilemma) and observed behaviorAre you sure about that?: tried to benchmark LLMs’ ability to spot unfaithful CoT against humans (the user)LLM Defense Toolkit: set up a pipeline to benchmark the safety of a user specified LLM with an array of generated attacks.
Relevant track details:
We tried to optimize the track in a bunch of ways, including but not limited to:
- Competitive prize (cash is per team):
- 1st place: $400 cash + auto acceptance to AISI’s AI safety fellowship next spring2nd place: $200 cash + auto acceptance to AISI’s AI safety fellowship next spring
- Anthropic gave the track a shout out next to their other track “build with Claude”
- Background reading for AI safety/evaluation fundamentals ~30 min totalEvents: about 30 people attended each one.
- Workshop by Apart research: how to scaffold LLMs, build agents, and run evaluations against themTalk by METR: the case for AI evaluations and governanceTalk by CAIS: jailbreak and red-teaming LLMs
Execution
As with all events, execution matters a lot. This is the area that we felt could be improved the most.
- Collaboration: do a vibe check if you’re thinking about collaborating with your main hackathon org on campus!
- We chose to host this as part of a general AI hackathon (rather than standalone) hoping to leverage the main host’s organizing capacity and reach to expose new people to AI safety. This was a pain for us, mainly because the main hosts never really tried to understand what our track was about (probably faults on both sides). The impression is that it was a chore to deal with us, so make sure they’re on board before collaborating! You shouldn’t over-update on this, we may just have an outlier.
- You should try to have them before hacking begins. A complaint is that speakers/workshops take time away from hacking.Great feedback from hackers who read through our materials and gave it a shot
- What we did
- Have mentors available in person and online during office hoursProject virtual speaker events onto a screen in a physical room and announce them in person.
- Have a booth/table on day 1 of the hackathon when everyone comes to check in. Give away stickers/merch and pitch our trackIn person speakers, especially from big name companies.Wear AI safety club merch (although we don’t even have merch…)
Our opinion/takes
- Hackathon patterns: hackathons are a staple at major technical universities. These may be well known, but I had never attended a hackathon before this and found these interesting.
- The BEST time to pitch your track and make announcements is the first day when people come for check in, since everyone is there.Do the convincing (speaker, workshop, etc) before the end of the first day, since people usually decide which track to do/their idea by then.Best time for in-person events (when people will be at the venue): first day during check in and right after food is served…
- If you can, try to communicate that popular tools/libraries/frameworks are useful for your track!
- People want to use their existing stack, and thinking they have no comparative advantage in anything new to them.Probably the main shortcoming, despite our track being by far the most interesting (the rest were like “best use of XXX”...)
- Make sure you explain clearly what you want people to do as this is a niche topic for now, give example projects (see ours) and starter code.People do NOT like reading! Maximize the signal to words ratio!!!
- People have to know that your track exists!
- The distribution of people at a general hackathon is different from the distribution of people who will come if you advertise a standalone AI safety hackathon. If your goal is to reach new audiences, then being a part of a general hackathon will increase your chances of nerd-sniping!
What’s next?
I think our attempt serves as a successful proof of concept for bringing the topics of AI safety/alignment hackathons to campuses. People will engage with the topic if you try really hard. Don’t hesitate to reach out for help if you’re thinking of something similar and want to learn more about what we did!
Two things we might do in the future:
- Iterate on this and host a track at Georgia Tech’s data science hackathonBecome an Apart Research node for hackathons and host standalone AI safety hackathons.
Thanks for reading and I hope it wasn’t too noisy!
Yixiong & the Georgia Tech AISI team.
Discuss