A model for detecting bots on Twitter via likes

Important features of the Like graph

Likes are directed, i.e., a tweet is posted by user $A$. Then a follower $F$ of user $A$ read the tweet and choose to like, retweet or reply to it, which leads to exposure of the tweet to the followers of $F$. Thus, in this network the users are represented by a node, and there is an edge between users $A$ and $B$ if $A$ liked a tweet from $B$ in the past days.
Users have different type of interests. For example, I might be interested in my former class mates, but at the same I am interested in jazz music, politics and many more. The group of people that share some interest we call a community. Thus, reasonably, a good model for Twitter should allow for overlapping communities, where the communities can have an arbitrary internal structure. Hence, if we can specify any internal (dense/sparse) structure, we can distinguish between for example school classes, families, or larger communities as political interest.
There are very important accounts on Twitter, who are present in a large community. These accounts have substantially more followers and thus also receive more likes, this should be a feature that is present in some of the large communities. There might be a so-called sybil group $S$ of coorporating (sybil) accounts that try to influence the public opinion of the rest of the community, the so called non-sybil group $N$, without that an accounts in $S$ receive many likes from $N$. The most efficient way to influence this opinion, is to reply and like tweets coming from the people that have the most followers as these have most exposure in the non-sybil group.
On the one hand, a sybil account might not be controlled by a human. On the other hand, a single person might control multiple sybil accounts to influence the public opinion. Considering these facts, arguably, it is hard for a sybil account to obtain likes from real users, as this would require a lot of energy by the maintainer of the account. To not be too different from real users in terms of the number of connections, we expect that a sybil group is relatively well connected, and that there are not many links from non-sybil accounts to sybil accounts.

Directed Random Intersection Graph with Communitities and Sybils

There are plenty of models that aim to resemble (directed/undirected) social networks, although no network matches all of the above features. A model that comes very close, i.e., it allows for overlapping communities with arbitrary internal structure is the Random Intersection Graph with Communities, as introduced by Viktoria Vadon and co-authors in this paper. See for the more mathematical papers the preprints at arXiv (paper 1, paper 2).

In this model there are $N$ vertices (representing an account on Twitter) and $M$ communities, represented by a graph, whose internal structure is specified as input. The final graph is then obtained by a matching procedure by assigning the vertices to a number of communities, which we describe by this example. To do this, every vertex is equipped with some so-called membership tokens, whose number represent the number of communities in which this vertex is active. Every community (defined as a graph on $|V|$ vertices) has $|V|$ community-roles. The authors of the original paper describe an example of the construction very clearly here:

construction random graph with communities “A small example for constructing the RIGC model. From left to right, we see each step of the construction. Leftmost, we represent the list of community graphs in different colors, and the individuals by the numbered vertices; the number of stubs drawn for each individual corresponds to the number of membership tokens. The second subfigure represents the matching, with the imaginary edges drawn dashed. An imaginary edge between an individual and a vertex in a community means that one of the tokens of that individual was matched with that community role. The third subfigure is simply a redrawing of the second for the sake of clarity. Rightmost, we see the resulting RIGC, after the imaginary edges have been contracted.”

Direction and community types

Since we want to apply the model to the directed graph of Twitter, we modify the community-graphs and make them directed, without changing the matching procedure. For the internal community structure we identified two possibilities.

Joost Jorritsma

A model for detecting bots on Twitter via likes

Important features of the Like graph

Directed Random Intersection Graph with Communitities and Sybils

Direction and community types

Infer communities from real data

Use different models that generate graphs

Addition of sybil accounts

Finding sybil accounts in an artificial graph

Verification on Twitter data