Data Collection and Technology Resources

Photo of Ambika Mathur presenting the Data Collection Panel

ModeratorAmbika Mathur, Wayne State University
Presenters*: Patrick Brandt, University of North Carolina Chapel Hill; Cynthia Fuhrmann, University of Massachusetts Medical School; Stephanie Watts, Michigan State University

*Three of the panelists were unable to attend the workshop due to the hurricane. This section, therefore, is limited to the report of Ambika Mathur’s presentation focusing on career outcomes data. You can view the other presentations—which describe other types of data, uses and dissemination of data, and a unique BEST Action Inventory tool developed by MSU—by downloading the slides.

Website content compiled and edited by Laura Daniel, Carol Rouzer, Ambika Mathur, Roger Chalkley, and the workshop panelist

 Importance of data collection

Two key components of the BEST grant awards are the requirement for each site to collect, report, and evaluate data at their sites, and participate in cross-site reporting and evaluation of trainee program participation, progress, and outcomes. The individual and cross-site data plans allow us to measure the success of a wide range of programs and strategies within and across the BEST schools. From there, we can leverage faculty buy-in, inform training grant needs, and develop national policies that could ultimately influence graduate and postdoctoral training across the nation. However, it is vital that the data obtained be available to all interested parties, for if national and international career outcomes data cannot be mined efficiently, compared, or evaluated, the potential impact of such data collection efforts is drastically reduced.

Barriers to data collection

To generate the most comprehensive reports, data must be collected from the point of the student’s original application through his or her career outcome. It must be collected uniformly and completely in every school, department, and program of the institution. However, many universities face challenges in tracking and reporting the career outcomes of their graduate and postdoctoral alumni. One significant institutional barrier to success is the fact that data are often deposited in departmental silos preventing convenient access by other departments. In addition, these data vary in completeness and cleanliness. For example, one might find a student’s prior academic history in the admissions office, professional development statistics in the departmental office, academic performance in the registrar’s office, research productivity in the research office, financial information in human resources, and career outcomes in the alumni office. The latter may be a particular problem, as graduate school alumni are frequently not followed by the institution’s alumni office. Not only does this reduce the ability to create a comprehensive database of career outcomes, it can profoundly impact the ability to support diversity and inclusion.

Another barrier to data collection is that institutions and organizations struggle to create an intuitive, comprehensive, and replicable career taxonomy that succinctly and unambiguously describes the career outcomes of their alumni. A holistic approach must be taken to surmount these barriers; this requires a broad-based student-centered data system.

Student-centered data collection system

Curating the data

Information about former graduate students can be obtained from:

  • Former mentors
  • Graduate programs
  • Graduate School records
  • Alumni Relations Offices
  • Other central institutional databases
  • Cyber-sleuthing of social media accounts (LinkedIn, Google, etc.)
  • NSF’s survey of earned doctorates
  • Direct outreach by staff

For example, Wayne State University took 18 months to set up a system that curated data across all graduate programs. They used a combination of all of the above methods, the staffing for which required: one associate dean within the graduate school (0.25 FTE), one associate dean (1.0 FTE), and three undergraduate work-study students or student assistants.

The effort was launched initially using only institutional resources prior to BEST funding, and then BEST grant funding was used to partially fund the staffing and licenses required to establish a central student-lifecycle database system. Once the system was established, it became automated. Aggregated reports on student performance are now easy to generate using the database. This has improved analytics and is currently used to assist in recruitment, program assessment, funding allocation, professional development, and new program design.

The effort to make data complete and uniform should not stop at the institutional level, which would essentially make each institute its own silo. Creating a nation-wide data-collection protocol is the next step. This will allow for comparisons between institutions to determine which programs work best for each type of trainee or institution. It also has the added benefit of being able to track the progress of a trainee as they move between institutes (graduate student, postdoc, and eventually staff). However, not all institutions have the same data in the same format, so comparisons are very difficult.

BEST Practices for data collection

In order to compare outcomes among institutions and to rationalize aggregate data, a single standard taxonomy and methodology for data curation and analysis are needed. Many groups have attempted to address this problem, including BEST, The Group on Graduate Research, Education, and Training (GREAT), Future of Biomedical Graduate and Postdoctoral Training (FOBGAPT), Rescuing Biomedical Research (RBR), and the Council of Graduate Schools (CGS). In Spring 2017, the BEST Consortium formed a working group to design a taxonomy of career outcomes; the goal is to establish consistent definitions for employment sectors, career type, and job functions that will work for all biomedical trainees regardless of institution or ultimate career path. The career categories from my individual development plan (myIDP), by Science Careers, were used as a starting point for revisions, refinement, and additions. The resulting taxonomy was subsequently incorporated into another collaborative effort led by RBR, which included representatives from the BEST, the Association of American Universities (AAU), Association of American Medical Colleges (AAMC), National Institutions of Health (NIH), and academic institutions external to the BEST Consortium. The taxonomy has now been devised and is being used by several programs and institutions.

How to use the taxonomy data

Adoption of this taxonomy will help standardize required career classifications for training grant tracking and improve internal administrative alumni tracking. This universal taxonomy will also permit clear public representation of data, empowering prospective graduate students and postdoctoral candidates to easily compare the longitudinal career outcomes between institutions and consider that information in their decision-making process.

Employment ultimately obtained by many trainees requires job skills that many may not realize are important. Institutions can use these data to make more informed decisions as to what job skills trainees need and prepare them accordingly.

Wayne State University use of the taxonomy

Wayne State University surveyed all 950 biomedical doctoral alumni who graduated in the 15-year period from 1999-2014; of those 874 (92%) responded. The three-tier career taxonomy for those that responded is reported below. The report does not include any Job Function in which 1% or less of the alumni are employed. These job functions are: Adjunct Teaching; Clinical Service; Data Science, Analytics, and Software Engineering; Deceased/Retired; Entrepreneurship; Faculty Member – Nontenure Track; Function that is Not Directly Related to Science; Intellectual Property and Law; Training (Other); Regulatory Affairs; Sales and Marketing; Science Education and Outreach; Science Policy and Government Affairs; and Science Writing and Communication.

Moving forward with the taxonomy

The BEST Consortium anticipates that expanding the classification may be needed for graduate programs or careers that fall outside of STEM fields. This will require adding additional job functions to the taxonomy; however, most of the existing taxonomy would be applicable. It will be necessary to continue a national conversation about the taxonomy, recognizing that the changing economy will necessitate revisions to the Job Function to incorporate burgeoning fields over time.

As the system is adopted, it is important that we decide how long to track trainees. In general, a consensus appears to be forming around 15 years past graduation; NIH T32 training grants and RBR have both required this time frame.

Click on images to enlarge


Many individuals, groups, and institutions have laid the groundwork for this taxonomy, several of whom also participated in its refinement. The Job Functions are adapted from the Science Careers myIDP career categories (; the “Workforce” and “Career Type” categories are related to work published by Wayne State University in Feig, et al. (2016), Using Longitudinal Data on Career Outcomes to Promote Improvements and diversity in Graduate Education, Change: The Magazine of Higher Learning, 48:6, 42 – 49 , and by the University of California San Francisco in Silva, et al. (2016) Tracking Career Outcomes for Postdoctoral Scholars: A Call to Action. PLoS Biology, 14(5), e1002458.

For comments or questions: