This talk goes through understanding evals & benchmarks in compound and complex multi agent systems which are becoming increasingly popular. It discusses the importance of evaluating early and continously. It talks about the technical requirements as well as the non technical requirement of making sure the SME's are heavily involved in developing the evals and bencmarks
Press the 'reject' button to only accept essential cookies. See below for a list of the cookies we use and their purpose.
_ga, _ga_*Google Analytics
These cookies are used to collect information about how visitors use our website. We use the information to compile reports and to help us improve the website.