Scraping Xi's Talk

 

· scrape · R

I’ve encountered on Facebook an online database containing Xi Jinping’s important talks since 2012. The database is maintained by the Communist party’s official news paper People’s Daily.

I thought it may be a good idea for me to practice scraping it. The resulting code and data in this GitHub Repository. It contains a total of 224 talks, dating from 2012-11-19 to 2017-09-27. It also comes with eight pre-defined categories: economics, politics, culture, society, ecology, party, defense, and diplomacy.

One small challenge is that I have to find when to stop going to “next page” for each category. The other challenge is that if one visits too frequently, the connection may be blocked. So I have to define a function to wait for a couple seconds until the connections are back and re-scrape the page. But overall it is quite simple.

I still have not come up with some specific questions, though I do think knowing more about this strong man is quite important. Please feel free to discuss with me if you have some thoughts or comments.