Resources
Here are some online resources I find useful for empirical work. Nowadays, it is quite easy to spot a myriad of similar things out there. But to me, many are of poor quality, some even misleading and may result in bad habits. Indistinguishable names also makes Googling hard. This motivates me to constantly maintain this list so that I can consult them anywhere anytime without taking up memory.
Computing
- R Packages: Organize, Test, Document, and Share Your Code by Hadley Wickham.
- A data.table and dplyr tour by Atrebas.
- Org-Mode Reference Card.
- Code and Data for the Social Sciences by Matthew Gentzkow and Jesse Shapiro
- The Plain Person’s Guide to Plain Text Social Science by Kieran Healy
Program
- R for Data Science
by Hadley Wickham and Garrett Grolemund
- Hadley Wickham’s sermon for the tidyverse revolution
- Core concepts: Transformation / Tidy Data / Functions / Vectors & Lists / Visualisation
- R Programming for Data Science
by Roger Peng
- Some useful parts: The apply Family / Regular Expressions / Parallel Computing
- Exploratory Data Analysis with R
by Roger Peng
- Some useful parts: The ggplot2 Plotting System / Color Palettes
- Advanced R by Hadley Wickham
- Discusses advanced topics and reveals the spirit of R: Functional Programming
- Efficient R Programming by Colin Gillespie and Robin Lovelace
- Automate the Boring Stuff with Python by Al Sweigart
- A Byte of Python by Swaroop CH
- The Hitchhiker’s Guide to Python by Kenneth Reitz
- Python Data Science Handbook by Jake VanderPlas
- Stata: Masayuki Kudamatsu’s useful links
Wrangle
- dplyr: The black magic that transcends R from
its base
- Useful:
filter()
/select()
/group_by()
/summarise()
/mutate()
/nest()
/…
- Useful:
- data.table: Convenient for
working with large datasets
- Useful:
rbindlist()
/fread()
/fwrite()
/…
- Useful:
- magrittr:
Useful pipe operator
%>%
enhances readability - Teach the tidyverse to Beginners by David Robinson
Scrape
- Automated Collection of Web and Social Data taught by Pablo Barberá and Dan Cervone at ECPR Summer School
- rvest / RSelenium / tabulizer / urllib / BeautifulSoup / Selenium
Visualize
- Introduction to R graphics with ggplot2 taught by Ista Zahn at Harvard IQSS Workshop
- A Layered Grammar of Graphics by Hadley Wickham
- Data Visualization for Social Science
by Kieran Healy
- Explains every useful aspects of
ggplot2
along with some useful packages such asggrepel
for adding labels andmaps
for creating maps - Some best parts: How ggplot Works / Geoms / Scales, Guides, and Themes / Labels / Maps / Colors / Models
- Explains every useful aspects of
Automate
- Learn Enough Command Line to Be Dangerous by Michael Hartl
- Working with CSVs on the Command Line by Brian Connelly
- Pandoc User’s Guide
- Why Use Make by Mike Bostock
- Minimal Make Tutorial by Karl Broman
- The Unix Workbench by Sean Kross
Collaborate
- Git vs. Dropbox from a Researcher’s Perspective by Michael Stepner
- An Introduction to Version Control Using GitHub Desktop by Daniel van Strien
- Git/Github Guide: A Minimal Tutorial by Karl Broman
- Happy Git and GitHub for the useR by Jenny Bryan and TAs
- Pro Git by Scott Chacon and Ben Straub
Communicate
- R Markdown Quick Tour
- Basics of Jupyter Notebook
- Literate Programming and knitr by Roger Peng
- Bookdown by Yihui Xie
- Blogdown by Yihui Xie, Amber Thomas, and Alison Presmanes Hill
Typeset
Text Analysis
- Text as Data by Justin Grimmer and Brandon Stewart
- Text as Data by Matthew Gentzkow, Bryan Kelly, and Matt Taddy
- Texts as Data taught by Justin Grimmer at Stanford
- Text Mining with R
by Julia Silge and David Robinson
- Useful: tf-idf / N-grams / Sentiment Analysis / Topic Modeling
- Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper
Statistical Learning
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
- The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
- Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi
- Computer Age Statistical Inference by Bradley Efron and Trevor Hastie
- Big Data taught by Matt Taddy at Chicago Booth
- Statistical Machine Learning taught by Larry Wasserman at CMU
Style Guides
- R Style Guide / lintr
- Python Style Guide / pycodestyle
- SQL Style Guide / sqlint
- Towards LaTeX Coding Standards by Didier Verna
- Markdown Style Guide by Google
Hybrid Materials
- Data Science and Social Science taught by Pablo Barberá and Dan Cervone at NYU
- Computing for the Social Sciences taught by Benjamin Soltoff at Chicago
- Computational Tools for Social Science taught by Rochelle Terman at Chicago
- Data Wrangling, Exploration, and Analysis with R taught by Jenny Bryan at UBC