Roman Lutz - Home

November 2023 - present
Remote (WA)

Responsible AI Engineer on the AI Red Team at Microsoft

The header image of the Microsoft blog post saying 8 lessons from the front lines of AI red teaming

January 2025: We published a new whitepaper titled "Lessons Learned from Red Teaming 100 Generative AI Products" as described in this Microsoft blog.

July 2024: PyRIT was a key part of Phi-3 Safety Post-Training which is highlighted in the associated paper Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

May 2024: One of the highlights from my perspective was going to the Microsoft //Build conference in May to talk to customers about PyRIT. After just about eight (!) years at Microsoft this was my first //Build conference. My colleagues Tori Westerhoff and Pete Bryan did an amazing job talking about the work of the AI Red Team in their session.

The PyRIT project mascot, a raccoon named Roakey, in pirate clothes with a parrot on her shoulder.

February 2024: We released PyRIT! Since then, we have been expanding its capabilities to allow for probing multimodal generative AI systems (rather than just text-based ones). Another focus area has been state-of-the-art attack techniques. This space moves pretty fast, but we jave added (or are in the process of adding) PAIR, Tree of attacks with pruning, GCG, Crescendo, Skeleton Key, and several others. Some of these are our own contributions, some of them happened via collaborations or contributions facilitated via the open source repository.

November 2023: I have joined the AI Red Team. See these articles for some background [1], [2], and [3].

December 2021 - November 2023
Remote (MA)

Responsible AI Engineer on the Azure AI team at Microsoft

A screenshot from the JMLR website showing the new Fairlearn paper title with authors.

October 2023: Our new paper is on ArXiv! Titled A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications, it talks about some of the ways we have been evaluating LLMs. This was a joint effort of many teams at Microsoft and Microsoft Research. I am particularly happy with the emphasis on input from domain experts. This is merely a tool to help speed up evaluations, but the actual decisions about mitigations and whether a system is deployed remains (and should remain) with humans.

August 2023: The new Fairlearn paper is now in the Journal for Machine Learning Research (Open Source Software section)! A screenshot from the JMLR website showing the new Fairlearn paper title with authors.

It captures our change from being a project under Microsoft governance to being a true open source project with open governance. As of today, half the maintainers are employed by Microsoft (including myself). Also, the focus of the project has shifted significantly since the original whitepaper. Back then, the Python toolkit was the main focus whereas now the educational materials are being prioritized. This aims to acknowledge the sociotechnical nature of fairness.

RAI dashboard view of error analysis tool

December 2021: We released the Responsible AI dashboard. As one of the key contributors on the engineering side I am really proud of this milestone. Of course, this is only where it really starts as we can now iterate on the first version. Make sure to try it and leave some feedback! The functionality is better captured by the blog and website, but something not mentioned there that I am really excited about is that we pulled this off in the open on GitHub. That means anyone can see what goes into this, ask for features, or even contribute bugfixes. Doing impactful work is awesome, but seeing the recognition in the entire company takes this to a whole different level. For example, I have seen tweets about this by Microsoft CTO Kevin Scott and Chief Scientific Officer Eric Horvitz.

September 2021 - November 2021
Tübingen, Germany

Graduate student at the Max Planck Institute for Intelligent Systems

July 2017 - August 2021
Cambridge, MA (until 2020), Bellevue, WA (2020-2021)

Responsible AI Engineer on the Azure ML team at Microsoft

November 2019: For a little while now I have been working on Responsible AI at Microsoft. Now that Sarah Bird announced our tools at Ignite I can finally point to our tools publicly. A lot of my time over the past months went into Fairlearn, our open source toolkit for fairness assessment and unfairness mitigation. We just released v0.3.0, so there is a lot more to come in the next months. I will be in Vancouver for NeurIPS in December to demo our tools around fairness and interpretability. Talk to me if you will be there!

Abstraction in Fairness-aware Machine Learning

November 2019: Since fairness is tricky to get right we have been meeting bi-weekly as a Responsible AI reading group. Today I had the honor to lead the discussion about "Fairness and Abstraction in Sociotechnical Systems" by Andrew D. Selbst, danah boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. I highly encourage everyone to read this paper to avoid the mentioned abstraction traps when building machine learning systems. Maybe this should be part of a mandatory checklist before releasing models... If you are interested in my slides you may download them here.

March 2018: Participating at MIT`s Breaking the Mold Hackathon for Inclusion was truly a blessing. With so many truly difficult problems to tackle, it is fantastic to see all the ideas people came up with. Big shoutout to MIT for organizing this, Microsoft for the venue (and encouraging me to go!), and Amazon for sending two inspiring mentors for my team all the way from Seattle! Thanks also to my team for creating a creative environment where everybody could express their ideas. I learned a ton from all of you, and winning 3rd prize tops it all off. I hope everybody takes some time to think about Machine Learning Bias. With ML becoming increasingly prevalent, it is more important than ever to take bias into account.

July 2017: After a year on the Office team, I moved to the Azure Machine Learning team. We are building the infrastructure and services from scratch using lots of open source (e.g., Kubernetes, Linux, .NET core).

June 2016 - June 2017
Cambridge, MA

Software Engineer at Microsoft Office (Docs)

May 2017: I participated in the Hacking Bias in ML workshop at Microsoft`s New England Research and Development Center (which happens to be my office, too). My group specifically looked at gender bias in text through word embeddings. We found lots of evidence of gender bias, e.g., some words are generally more used in connection with men ("smart"), some more with women ("lovely"). You can play around with the tool resulting from the workshop here.

June 2016: I have joined the Docs team within Microsoft Office. We enhance collaboration capabilities through the Share feature in all Office apps.

Fall 2015 - Spring 2016
Amherst, MA

MS in Computer Science, Focus on Distributed Systems and ML

I had the pleasure of working as a Research Assistant in the Computer Networking Lab with Professor Don Towsley and Professor Antonio Rocha on the Simulation of Cache Networks. The results still remain to be published, so I will write about it if that happens. Separately, I conducted experiments with cache networks for a graduate seminar on distributed systems. You can download my project report here.

January 2016: There are several NSF-funded Future Internet Architecture research projects in the US. Their focus is mostly on improving the scalability and efficiency. I am interested in how the different approaches affect (or do not affect) the privacy of users in comparison to the current Internet. My main focus was the feasibility of censorship circumvention. As an example, I picked Content-oriented Networking. See the full paper at arxiv.org/abs/1601.01278.

January 2016: Based on NFL game data we try to predict the outcome of a play in multiple different ways including Decision and Classification Trees, Nearest Neighbors, Naive Bayes, Linear Discriminant Analysis, Support Vector Machines and Regression, and Artificial Neural Networks. An application of this is the following: by plugging in various play options one could determine the best play for a given situation in real time. While the outcome of a play can be described in many ways we had the most promising results with a newly defined measure that we call "progress". We see this work as a first step to include predictive analysis into NFL playcalling. See the full paper at arxiv.org/abs/1601.00574; in collaboration with Brendan Teich and Valentin Kassarnig.

June 2015 - August 2015
Eschborn, Germany

Systems Engineering Intern

Fall 2014 - Spring 2015
Amherst, MA

Graduate Exchange Student

I took part in Baden-Württemberg Exchange between University of Ulm and University of Massachusetts Amherst.

May 2015: New paper! The ubiquity of professional sports and specifically the NFL have lead to an increase in popularity for Fantasy Football. Users have many tools at their disposal: statistics, predictions, rankings of experts and even recommendations of peers. There are issues with all of these, though. Especially since many people pay money to play, the prediction tools should be enhanced as they provide unbiased and easy-to-use assistance for users. This paper provides and discusses approaches to predict Fantasy Football scores of Quarterbacks with relatively limited data. See the full paper at arxiv.org/abs/1505.06918.

Fall 2011 - Summer 2014
Ulm, Germany

University of Ulm

After a year of studying in the Mathematics Bachelor`s program with a minor in Computer Science I decided to swap major and minor. I still graduated with a BSc in Computer Science with honors. The courses covered basics in Systems, AI, and Theory.

Adaptive Large Neighborhood Search - Destory and Repair

The goal of my Bachelor`s thesis was to implement the Adaptive Large Neighborhood Search (ALNS) heuristic and possibly come up with improvements. ALNS was described first by S. Ropke and D. Pisinger and is based on P. Shaw`s Large Neighborhood Search. The idea is that some problems are difficult to solve with basic local search algorithms because of a tightly constrained search space. Small changes to a solution will rarely bring improvements. As a consequence, LNS and ALNS change larger parts based on different heuristics. For this thesis, I was awarded the innoWake Award 2015. innoWake was a software modernization company based in Austin, TX and had a number of branch offices including one in Germany. They have since been acquired by Deloitte.

Graph from http://commons.wikimedia.org/wiki/File:Dinic_algorithm_Gf2.svg, public domain

As a teaching assistant for Prof. Jacobo Toran, Gunnar Völkel and Dominikus Krüger, I explained the solutions to weekly assignments to a group of 20 students whose work I also graded. In addition to that, I often gave a review of the material presented in class. It made me very happy to see the attendance rate constantly high throughout the semester and especially the positive feedback at the end of the course.

Bees from https://pixabay.com/en/queen-cup-honeycomb-honey-bee-337695/, CC0 Public Domain

We can observe many kinds of behavior of animals, bacteria etc. in nature where an adaption to the specific environment has taken place due to evolution. In a way, an optimization process has taken place. This idea is the basis for so-called nature-inspired metaheuristics. The Artificial Bee Colony (ABC) meta-heuristic by D. Karaboga is such a nature-inspired metaheuristic. It projects the foraging behavior of bees on an algorithm in order to solve optimization problems.

Traffic from https://pixabay.com/en/traffic-highway-lights-night-road-332857/, CC0 Public Domain

Under the guidance of Christian Spann, I read up on different ways to implement concurrent programs in Java, from Threads, Runnables and Executors to thread-safe versions of data structures. Finally, I presented the different approaches and techniques in a seminar talk.

Roman Lutz Responsible AI engineer
Open source maintainer

Map of the US states I have visited

Map of the countries I have visited

Miscellaneous