Reflections on RStudio Instructor Training

tl;dr You should definitely look into it.

Photo by oatawa on iStock

A while back I wrote about how we need more data science training as grad students in psychology and that one of the best ways for us to get this training is to learn from each other. This is just one of many reasons why I’m so humbled and excited to have recently become an RStudio Certified Instructor in the tidyverse.

I’m looking forward to start implementing and sharing what I learned with my fellow grad students, especially at the end of this month when I will be leading an introductory R workshop for the new cohort of first year PhD and master’s students in my department.

As a small way of paying it forward, I wanted to offer a reflection on what I think makes this training so unique and worthwhile along with a summary of what’s involved and some resources for those who might be interested in knowing more.

The what

The best place the start is to read about the RStudio Instructor Training and Certification Program here. The RStudio Education Blog also has lots of helpful posts about the program.

The process boils down to three steps.

Training Course

The first step is to sign up for the training course, which will likely be held over Zoom and chunked into 3-4 hour segments across 2-3 days. You can access the training materials here.

The training course is an interactive introduction to applying evidence-based teaching methods to data science education, such as learner personas, concept maps, and formative assessment (i.e. short, diagnostic questions or exercises to figure out if learners are forming accurate mental models). The course also covers how to design teaching materials with reverse instructional design that takes into account cognitive load, multiple learning strategies, and issues of inclusivity and student motivation in the classroom.

Technical Exam

The technical exam assesses your proficiency in whatever topic for which you are attempting to become a certified instructor. Currently there are options to become a certified instructor in tidyverse and shiny, and each has its own accompanying technical exam.

I took the tidyverse exam, which, broadly speaking, consists of a series of live coding challenges related to using core tidyverse packages for data cleaning and wrangling, data visualization, string manipulation, functional programming, basic statistical modeling, and creating reproducible documents with R Markdown.

As many others have suggested, a great way to prepare for this exam is to work through the exercises R for Data Science, particularly for topics that feel rusty to you, and review the community-contributed solutions.

I highly recommend going through these sample exams from the RStudio Education Blog start to finish to get a sense of what you might need to review.

Teaching Exam

The first certification exam assesses pedagogical skills related to teaching data science with R and requires giving a 15-minute demonstration lesson on a topic of your choice followed by a series of applied questions, which will likely involve creating formative assessments on unseen material (e.g. multiple choice questions and fill-in-the-blank coding exercises), developing concept maps on data science topics and giving feedback on example teaching based on pedagogical theory.

There are also sample teaching exams available.

Demonstration Lesson: Example

Column-wise operations with dplyr: Old and New

If you’d like to see an example of a demonstration lesson, below are the materials I created for this portion of the teaching exam. I used penguins from the {palmerpenguins} package as an example data set. (why? because penguins 🐧 🐧 🐧).

You can find all of the materials for this lesson and the accompanying code on Github. Feel free to share, adapt and re-use for your own teaching.


I made heavy use of Yihui Xie’s {xaringan} 📦, Garrick Aden-Buie’s {xaringanExtra} 📦, and Kelly Bodwin’s {flair} 📦, along with Allison Horst’s unbeatable artwork. For an excellent {xaringan} tutorial, I recommend you check out these slides, from the R Markdown whisperer herself, Alison Hill. Note: you absolutely do not have to use {xaringan} to make your slides, and if your lesson includes more images than code, another method for delivering your slides might be better.

Full slides available here.

❖ ❖ ❖

Concept Map

For other community-contributed data science concept maps you can use in your teaching and/or lesson prep, see here.

❖ ❖ ❖

Learner Persona

For a list of other example learner personas, see here.

❖ ❖ ❖

Formative Assessment

I created these interactive exercises using the learnr package, which I highly recommend you check out. It’s quite powerful and versatile.

Here’s a quick look.

The why

Ok, this might all seem like quite a bit of time and effort. Why go to the trouble of doing this training? In a word, Greg Wilson.

Greg, who co-founded the Software Carpentry, has over 35 years of experience in education in data science and software engineering, and it shows. He is now part of the RStudio Education team, where he runs the the instructor training and certification program. One of the reasons this program stands out is that it benefits from Greg’s unique expertise and careful curation of decades of research on evidence-based teaching methods that he has translated into clear and actionable advice. I can guarantee that you will learn a LOT from him.

Greg’s most important advice for teaching, in my opinion:

“Be kind: all else is details.”

Now, here are some other reasons why you should do this training…

Surge in online teaching

Interest in data science education seems to be ever-increasing. The fact that COVID-19 has forced most education to go online might actually present an opportunity to meet this demand in a more scalable and (hopefully more accessible way that doesn’t incur the traditional limitations of travel costs or room capacity. Of course, online education comes with a host of inherent challenges. The training course includes a whole section on this. I also recommend you check out this RStudio webinar and accompanying blog post along with answers to some frequently asked questions about teaching online.

As online data science education is becoming increasingly the norm, it seems natural to assume that there will be a need for more certified instructors to meet the growing demand.

❖ ❖ ❖

Teaching resources galore

Another great reason to become a certified instructor is that, as a data science educator, you have a huge and ever-increasing bank of resources at your disposal. What’s more, as a certified instructor, you are eligible for free licenses to RStudio Pro products and a significant discount for RStudio Cloud. Here are just some of the great teaching tools from RStudio and the #rstats community.

The RStudio Education Blog is a 💎 TREASURE TROVE 💎 of resources. Add it to your bookmarks immediately.

Teaching with RStudio Cloud

Interactive lessons with learnr

Openly licensed teaching materials


❖ ❖ ❖

Tried and tested

The instructor training program started back in February 2019 and as of August 2020 there are almost 150 certified tidyverse instructors and 20 shiny instructors. This means that the program has gone through multiple iterations and has made data-driven improvements based on feedback from participants – especially in the realm of supporting online teaching in the aftermath of COVID-19. So you can rest assured that, while it is still a relatively new program, all the kinks have been worked out.

Plus, I’m sure that the content and structure of the training will continue to adapt to the needs and priorities of the community, and you might even be lucky enough to catch a special guest presentation. For example…

❖ ❖ ❖

A focus on inclusivity

The focus of this training is not technical competency – it’s how be an effective teacher. One of the most critical components of teaching effectively is to be inclusive of all learners, regardless of race, religion, sexual orientation, gender identity, disability, etc.

The #rstats learning community is known for being welcoming and inclusive, so it’s no surprise that the training course emphasizes these values as well. What I appreciate most about this aspect of the training is that it will challenge you to think about questions and hypothetical scenarios to which there are no easy answers.

However, it is extremely important to be pushed out of your comfort zone to consciously and proactively reflect on how you will confront issues such as systemic racism and institutionalized violence against BIPOC communities, sexism and a deeply ingrained culture of sexual harassment. These issues will inevitably arise in one form or another in your classroom or teaching setting, and it’s absolutely necessary that we confront these challenges now more than ever. Check out this slide and this talk on effective allyship when you get a chance.

For further reading on inclusivity and social justice in data science education, I recommend you read this post by Nicole Thompson Gonzalez and this one by Yim Register. Also check out the amazing work that JooYoung Seo, the first blind RStudio Certified Instructor, has been doing to make data science tools more accessible.

Another exciting feature is that the training materials are now available in Spanish, courtesy of Laura Acion, and hopefully other languages soon, as interest in the training seems to be growing around the world. A similar ongoing project to check out is glosario, an open source glossary of data science terms translated in multiple languages that can be used for teaching (read more here).

❖ ❖ ❖

Community of practice

To extend the idea of including everyone who wants to learn data science, we must be active in building teaching communities that extend beyond just the walls of academic institutions.

RStudio’s mission is to

“equip everyone, regardless of means, to participate in a global economy that rewards data literacy.”

A more concrete goal, put forth by Carl Howe, Director of Education at RStudio, is to train the next million R users. In becoming an RStudio certified instructor, you can better position yourself to actively participate in reaching this goal. But it’s worth reflecting on the fact that teaching and learning doesn’t happen in a vacuum – this is where the idea of community comes in. In my opinion, the fact that R users around the world already have a strong sense of community will make it that much easier to welcome new learners into the fold and make it more likely that they themselves will start to train others one day.

Read more about building a community of practice here.

P.S. If you need yet another reason to do this training, you get a fancy certificate at the end. ✨

Get in touch

Please feel free to reach out if you are thinking of participating in the training yourself and want to hear more from someone who’s gone through it recently. I would be glad to chat any time!

Brendan Cullen
Doctoral Student | NSF GRFP Fellow

Psychology PhD student and aspiring data scientist studying precision medicine approaches to health behavior change.