As part of the User Interface Software course in the Human-Computer Interaction Institute at Carnegie Mellon University, the students and instructor developed a set of 7 benchmark tasks. These benchmarks are designed to be representative of a wide range of user interface styles, and have been implemented in about 20 different toolkits on different platforms in different programming languages.. The students’ written reports suggest that by implementing the benchmarks, they are learning the strengths and weaknesses of the various systems. Implementations of the benchmarks by a number of more experienced toolkit users and developers suggest that the benchmarks also do a good job of identifying the effectiveness of the toolkits for different kinds of user interfaces, and point out deficiencies in the toolkits. Thus, benchmarks seem to be very useful both for teaching and for evaluating user interface development environments.