Data Science Curriculum Roadmap
What follows is a set of broad recommendations, and it will inevitably require a lot of adjustments in each implementation. Given that caveat, here are our curriculum recommendations.
We venture to suggest a curriculum roadmap after receiving multiple requests for one from academic partners. As a group, we have spent the vast majority of our time in industry, although many of us have had spent time in one academic capacity or another. What follows is a set of broad recommendations, and it will inevitably require a lot of adjustments in each implementation. Given that caveat, here are our curriculum recommendations.
More application than theory
We want to lead by emphasizing that the single most important factor in preparing students to apply their knowledge in an industry setting is applicationcentric learning. Working with realistic data to answer realistic questions is their best preparation. It grounds abstract concepts in handson experience, and it teaches data mechanics and data intuition at the same time, something that is impossible to do in isolation.
With that as a foundation, we present a list of topics that prepare one well to practice data science.
Curriculum archetypes
The types of data science and datacentric academic programs closely mirror the major skill areas we have identified in our work. There are programs that emphasize engineering, programs that emphasize analytics, and programs that emphasize modeling. The distinction between these is that analytics focuses on the question of what can we learn from our data, modeling focuses on the problem of estimating data we wish we had, and engineering focuses on how to make it all run faster, more efficiently, and more robustly.
There are also general data science programs that cover all these areas to some degree. In addition there are quite a few domain specific programs, where a subset of engineering, analytics, and modeling skills specific to a given field are taught.
The curriculum recommendations for each of these program archetypes will be different. However, all of them will share some core topics. Then analytics, engineering, and modelingcentric programs will have additional topic areas of their own. A general curriculum will include some aspects of the analytics, engineering, and modeling curricula, although perhaps not to the same depth. It is common for students to selfselect courses from any combination of the three areas.
Curricula for domain specific programs look similar to a general program, except that topics, and even entire courses, will be focused on specific skills common to the area. For instance, an actuarialfocused data analytics program would likely include software tools most commonly used in insurance companies, time series and rareevent prediction algorithms, and visualization methods that are accepted throughout the insurance industry. The student can best practice their skills through a project based on real domainspecific data. Handson projects or internships are highly recommended. When designing the programs, institutions may also consider offering interdisciplinary degrees and programs. Domain specific programs often combine courses from multiple departments or colleges.
Here are the major topics we suggest including in each area, with some of the particularly important subtopics enumerated.
Foundational topics
 Programming
 File and data manipulation
 Scripting
 Plotting
 Basic database queries
 Probability and statistics
 Probability distributions
 Hypothesis testing
 Confidence intervals
 Statistical significance
 Algebra
 Data ethics
 Data interpretation and communication
 Presentation
 Technical writing
 Data concepts for nontechnical audiences
Analytics topics
 Advanced statistics
 Experiment design
 Statistical power
 A/B testing
 Bayesian inference
 Causal inference
 Calculus
 Applications
 Costbenefit tradeoffs
 Practical significance
 Visualization
Engineering topics
 Software engineering
 Collaborative development
 Version control and reproducibility
 Processing data streams
 Production engineering
 Pipeline construction
 Debugging and unit testing
 Software systems and infrastructure
 Parallel and distributed processing
 Clientserver architectures
 Cloud computing
 Computational complexity
 Data structures
 Databases
 Design
 Data modeling
 Advanced database queries
 Data management
 Security
 Privacy
 Governance
 Regulatory compliance
Modeling topics
 Linear algebra
 Supervised learning
 Classification
 Regression
 Unsupervised learning
 Clustering
 Dimensionality reduction
 Neural networks
 Multilayer perceptrons
 Convolutional neural networks
 Recurrent neural networks
 Feature engineering
 Natural language processing
 Computer Vision
 Algorithm design
 Optimization
Note that for each topic and subtopic, there are many effective ways to split it into courses. The best way for your institution will depend on many factors, including length of term, hours per class, existing departmental boundaries, instructor availability, and the rate at which your students are expected to absorb information. These recommendations assume a twoyear masters program with the primary goal of preparing students for employment and continued career growth, although they can certainly be scaled up or down to fit the scope of other programs.
It bears repeating that applicationfocused instruction will best prepare the students for professional positions. The more theory is grounded in concrete examples, and the more specific skills are exercised in the context of solving a larger problem, the deeper the student's understanding of how it works, and where to apply it.
Original. Reposted with permission.
Related:
 5 Famous Deep Learning Courses/Schools of 2019
 Addressing the Growing Need for Skills in Data Science
 Top 7 Things I Learned in my Data Science Masters
Top Stories Past 30 Days  


