Self-supervised Learning

Characterizing Image Sharing Behaviors in US Politically Engaged, Random, and Demographic Audience Segments

This work advances understandings of image-sharing behavior on Twitter, across race, gender, age, and political engagement. We infer account-level demographic measures via profile pictures of US Twitter accounts and characterize 20 types of images. Several of these types predict one's demographics using account-level logistic regression models. Around half of the learned clusters (e.g., infographics, natural scenery, sports) are predictive of the user's age, race, or gender, while several other clusters appear to be popular among politically engaged accounts (e.g., images of groups and images of single individuals, which often contain politicians). Our findings suggest it is possible to characterize certain audiences via different types of visual imagery, which has implications for information quality, online engagement, and communications.

Mapping Visual Themes among Authentic and Coordinated Memes

What distinguishes authentic memes from those created by state actors? I utilize a self-supervised vision model, DeepCluster, to learn low-dimensional visual embeddings of memes and apply K-means to jointly cluster authentic and coordinated memes without additional inputs. I find that authentic and coordinated memes share a large fraction of visual themes but with varying degrees. Coordinated memes from Russian IRA accounts promote more themes around celebrities, quotes, screenshots, military, and gender. Authentic Reddit memes include more themes with comics and movie characters. A simple logistic regression on the low-dimensional embeddings can discern IRA memes from Reddit memes with an out-sample testing accuracy of 0.84.