Paper Reading- Uncovering Coordinated Networks on Social Media:Methods and Case Studies

Link: https://arxiv.org/pdf/2001.05658.pdf

Table of contents

Summary:

introduction

Methods

Case Study 1: Account Handle Sharing

 Coordination Detection

analyze

Case Study 2: Image Coordination

Coordination Detection

Analysis

Case Study 3: Hashtag Sequences

 Coordination Detection

Analysis

Case Study 4: Co-Retweets、

Coordination Detection

Analysis

 Case Study 5: Synchronized Actions

 Coordination Detection

Analysis

Discussion


Summary:

        Coordinated campaigns are used to influence and manipulate social media platforms and their users, a serious challenge to the free exchange of information online.

Unsupervised, Network-Based Approach

        Here, we introduce a general, unsupervised network-based approach to discover potentially coordinating groups of accounts. The proposed method builds a coordination network based on arbitrary behavior trajectories shared among accounts.

Detect coordinated twitter accounts

        We provide five case studies of influence campaigns, four of which took place in the context of US elections, Hong Kong protests, the Syrian civil war, and cryptocurrency manipulation. In each case, we detect coordinated Twitter account networks by examining Twitter account identities, images, hashtag sequences, retweets, or temporal patterns . It turns out that the proposed method is broadly applicable to discover different types of coordination across information warfare scenarios.

introduction

background status

        Online social media has revolutionized the way people get news and information and form opinions. By enabling communication unhindered by geographical barriers and reducing the cost of information production and consumption, social media has greatly expanded participation in civic and political discourse.

        While this may strengthen democratic processes, there is growing evidence that malicious actors pollute the information ecosystem with disinformation and manipulation (Lazer et al., 2018; Vosoughi, Roy, and Aral, 2018; Bessi and Ferrara, 2016; Shao et al., 2018; Ferrara 2017; Stella, Ferrara, and De Domenico 2018; Deb et al. 2019; Bovet and Makse 2019; Grinberg et al. 2019).

        While influence campaigns, misinformation and propaganda have always existed (Jowett and O'Donnell 2018), social media has created new vulnerabilities and opportunities for abuse. Just as like-minded users can easily connect in support of legitimate causes, groups with fringe, conspiratorial, or extremist beliefs can reach critical mass and be immune to expert or moderate views. Platform APIs and commoditized fake accounts make it simple to develop software to impersonate users and hide the identities of those who control these social bots—whether they are fraudsters spreading spam, politicians amplifying misleading narratives, or launching Nation-States in Online Warfare (Ferrara et al., 2016). Cognitive and social biases make us more vulnerable to manipulation by social robots: limited attention spans facilitate the propagation of unexamined claims, confirmation bias causes us to ignore facts, groupthink and echo chambers distort perceptions of norms, and bandwagon effect leads us to focus on robotically amplified memes (Weng et al. 2012; Hills 2019; Ciampaglia et al. 2018; Lazer et al. 2018; Pennycook et al. 2019).

        Despite advances in countermeasures such as social media platforms using machine learning algorithms and human fact-checkers to detect misinformation and inauthentic accounts, malicious actors continue to effectively deceive the public, amplify misinformation, and drive polarization (Barrett 2019) . We observe an arms race in which the sophistication of attacks evolves to evade detection.

        Most machine learning tools for combating online abuse target detection of social bots, and mainly use methods targeting individual accounts (Davis et al. 2016; Varol et al. 2017; Yang et al. 2019; 2020; Sayyadiharikandeh et al. 2020)). However, malicious groups may employ coordinated tactics that appear harmless on an individual level, and their suspicious behavior can only be detected when observing the network of interactions between accounts . For example, it might be normal for one account to change its handle, but it's unlikely a coincidence that a group of accounts take turns changing their name.

sparse similarity matrix

        Here, we propose a method to reveal coordinated behavior among multiple actors, regardless of their automated/organic nature or malicious/benign intentions. The idea is to extract features from social media data to build coordination networks, where two accounts are strongly connected if they exhibit unexpectedly similar behavior. These similarities can originate from any metadata, such as content entities and profile characteristics . Networks provide efficient representations for sparse similarity matrices, as well as natural methods for detecting important clusters of coordinated accounts. Our main contributions are:

        We propose a general method for detecting coordination that can in principle be applied to any social media platform for which data are available. Since the method is completely unsupervised, no labeled training data is required.

        Using Twitter data, we present five case studies by instantiating methods for detecting different types of coordination based on (i) processing changes, (ii) image sharing, (iii) sequential use of hashtags, (iv) co-retweets Synchronize with (v).

        A case study illustrates the generality and effectiveness of our approach: we are able to detect coordinated activity based on identity presentation, picture display, text writing, retweeting, or the timing of these actions.

        We show that coordinated behavior does not necessarily imply automation. In case studies, we detected possible bot and human accounts working together in malicious campaigns.

        • Code and data are available at github.com/IUNetSci/coordination-detection to reproduce the current results and apply our method to other cases.

Methods

        The proposed method for detecting accounts that coordinate actions on social media is shown in Figure 1. It can be divided into four stages:

 (Figure 1: Coordinated detection approach. On the left, we see behavioral traces that can be extracted from social media profiles and messages. The four steps described in the paper identify groups of suspicious accounts.)

        Behavior trace extraction:

        The starting point for coordinated detection should be a suspicion of suspicious behavior. Assuming that real users are somehow independent of each other, we consider the surprising lack of independence as evidence of coordination. The implementation of this approach is guided by the selection of traces that capture such suspicious behavior. For example, if we speculate that an account is controlled by an entity with the intent of amplifying the exposure of a disinformation source, we can extract shared URLs as traces. Coordination scenarios can be associated with several broad categories of suspicious traces:

        a) Content: If coordination is based on shared content, suspect traces may include words, ngrams, hashtags, media, links, user mentions, etc.

         (b) Activity: Coordination can be revealed through spatiotemporal patterns of activity. Examples of traces that can reveal suspicious behavior include timestamps, locations, and geographic coordinates.

         (c) Identity: Accounts can be coordinated based on roles or groups. Traces of identity descriptors can be used to detect these types of coordination: name, handle, description, profile picture, home page, account creation date, etc.

        (d) Combination: Coordinated detection may require the combination of multiple dimensions. For example, instead of just tracking which hashtags are used or when an account is active, one could combine the two types of tracking into an ad-hoc content detection method, as in content-based or activity-based suspicious tracking. The combined version is more restrictive and thus reduces the number of false positives.

        Once traces of interest are identified, we can build a network of accounts based on traces of similar behavior. Preliminary data cleaning can be applied, filtering out nodes that lack support—low ​​activity or little interaction with the chosen trace—because there is not enough evidence to establish their coordination. For example, sharing a small number of images will not reliably compute image-based similarity.

        2. Two-way network construction:

        The next step is to build a two-way network, linking accounts and features extracted from their profiles and messages .

        At this stage, we can use behavioral traces as features, or design new features derived from traces . For example, content analysis can produce features based on sentiment, stance, and narrative framework .

        Temporal characteristics such as hour of day and day of week can be inferred from timestamp metadata.

        Features can be designed by aggregating traces, such as combining locations into countries or images into color profiles. More complex features can be designed by considering sets or sequences of trajectories.

        Bipartite networks can be weighted according to the strength of association between accounts and features —sharing the same image multiple times is a stronger signal than sharing it just once. Weights can incorporate normalization (eg IDF) to account for popular features; it is not suspicious if many accounts mention the same celebrity.

        

        3. Project to account network:

        Project the bipartite network onto a network that holds account nodes, and add edges between nodes based on some similarity measure of features. The weights of edges in the resulting undirected coordination network can be computed by simple co-occurrence, Jaccard coefficient, cosine similarity or more complex statistical measures such as mutual information or χ 2 . In some cases, every edge in a coordination network is suspect by construction.

        In other cases, edges may provide noisy signals about coordination between accounts, leading to false positives.

        For example, an account sharing multiple of the same meme is not necessarily suspicious if those memes are very popular. In these cases, manual curation may be required to filter out low-weight edges in the coordination network to focus on the most suspicious interactions. One way to do this is to keep the edges with the highest weight percentile. The Discussion section presents the distribution of edge weights in some case studies, illustrating how aggressive filtering allows one to prioritize precision over recall, thereby minimizing false positives.

        4. Cluster analysis:

        The final step is to find groups of accounts whose behavior is likely to be coordinated across the network of accounts. Network community detection algorithms that can be used for this purpose include connected components, k-core, k-cliques, modularity maximization, and label propagation, among others (Fortunato 2010). In the case studies presented here, we use connected components because we only consider suspicious edges (either by design or filtering).

        In summary, the four phases of the proposed detection coordination method are translated into eight actionable steps: (i) formulate conjectures about suspicious behaviors; (ii) select traces of such behaviors, or (iii) design signatures if necessary; (iv) Pre-filter the dataset according to the support; choose (v) the weight of the bipartite network and (vi) the similarity measure as the weight of the account coordination network; (vii) filter out the low-weight edges; finally, (viii) extract coordination group. Although the proposed method is unsupervised and thus does not require labeled training data, we recommend manual inspection of suspicious clusters and their contents . Such analysis will provide method validation and evidence of whether the coordination group is malicious and/or automated. In the following sections, we present five case studies in which we implement the proposed method to detect coordination through shared identities, images, hashtag sequences, co-retweets, and activity patterns.

Case Study 1: Account Handle Sharing

        On Twitter and some other social media platforms, although each user account has an immutable ID, many relationships are based on account handles (called screen names) which are mutable and often reusable. One exception is that handles for suspended accounts cannot be reused on Twitter. Users may have legitimate reasons to change the handle. However, the possibility of changing and reusing handles exposes users to abuses such as username squatting1 and impersonation (Mariconti et al. 2017). In a recent example, the same Twitter account used Twitter handles associated with different characters to spread the name of a Ukrainian whistleblower in the US President's impeachment trial. 2

        For a concrete example of how handle changes can be exploited, consider the following chronological events: 1. User 1 (named @super cat) follows User 2 (named @kittie) who posts pictures of felines.

         2. User 3 (named @superdog) posts photos of dogs.

        3. User 1 tweets mentioning User 2: "I love @kittie". Mentions on Twitter create a link to the profile of the account mentioned. So, at time step 3, user 1's tweet links to user 2's profile page.

         4. User 2 renames his handle to @tiger.

        5. User 3 renames its handle to @kittie, reusing User 2's handle.

        Although User 1's social network will not change regardless of the name change (User 1 still follows User 2), the name change will not be reflected in previous posts, so anyone who clicks the link in step 3 will be Redirects to User3's profile instead of backsquatting User2 to User2 as originally intended by User1. This type of squatting, paired with multiple accounts, can be used to promote entities, run “follow-up” campaigns, infiltrate communities, and even promote polarization (Mariconti et al., 2017). Since social media posts are often indexed by search engines, these actions can be used to promote content beyond social media boundaries.

        To detect such coordination on Twitter, we applied an approach using identity tracking (ie, Twitter handles) . We started with request logs to Botometer.org, the social bot detection service of the Indiana University Social Media Observatory (Yang et al., 2019). Each log record contains a timestamp, Twitter user ID and handle, and a bot score. We follow users with at least ten entries (queries) so that multiple handle changes can be observed. This yielded 54 million records and 1.9 million handles. See Table 1 for details.

 Coordination Detection

        We created a two-way network of suspicious usernames and accounts. We consider a handle suspicious if it is shared by at least two accounts, and we consider an account suspicious if it uses at least one suspicious handle. So no edges are filtered. More restrictive measures can be taken, such as treating an account as suspicious if it acquires multiple suspicious handles.

        To detect suspicious clusters, we project the network, connecting accounts based on the number of times they share a handle . This is equivalent to using co-occurrence , the simplest similarity measure. Each connected component in the resulting network identifies a set of coordinating accounts and the set of handles they share. Table 1 summarizes the methodological decisions.

analyze

        Figure 2 shows the handle sharing network. It is a weighted, undirected network with 7,879 nodes (Twitter account). We can divide components into three categories:

         1. The hub-and-spoke component captures the main account (central node) for cybersquatting and/or hijacking . To confirm this, we analyzed the time series of handle transitions involving star components. Typically, the handle is switched from the account (presumably the victim) to the hub, and then (presumably after some form of ransom payment) it is switched from the hub back to the original account. These types of interconversions occur 12 times more frequently in stars than in any other component.

 (Figure 2: Handle sharing network. A node represents a Twitter account, and its size is proportional to the number of accounts that share a handle with it. The weight of an edge is the number of unique handles shared by two accounts. Suspicious coordination groups are identified by different colors .We exemplify the characteristics of several synergistic groups, namely the number of accounts, the number of shared handles, the average number of accounts sharing a handle, and the maximum and median number of times handles are switched between accounts. The number of switches is a lower bound estimated from our data sample .We also show tweets from independent individuals who exposed malicious , which are discussed in the main text)

        2. This massive assembly includes 722 accounts sharing 181 names (orange group in center of Figure 2). Using the Louvain community detection algorithm (Blondel et al. 2008), we further classify megacomponents into 13 subgroups. We suspect that they represent temporal clusters corresponding to distinct coordinated activities of the same group . This investigation is left for future research.

        3. Other components may represent different situations that require further investigation, as described below. Figure 2 presents several stories of malicious behavior corresponding to two coordination handle sharing groups that have been discovered by others. In June 2015, the @GullyMN49 account made news for tweets attacking President Obama. 3 More than a year later, the same account is still posting similar content. In March 2017, we observed 23 different accounts taking over the account within 5 days. We speculate that this may be an attempt to keep the character created in 2015 alive and to escape Twitter's suspension following reports of abuse on the platform. Currently, the @GullyMN49 account is banned, but 21 of the 23 accounts are still active.

        The second example in Figure 2 shows a cluster of six accounts sharing seven handles. They have all been suspended since then. Interestingly, the cluster is sharing handles that appear to belong to conflicting political groups such as @ProTrumpMvmt and @AntiTrumpMvmt. Over time, some dubious accounts kept changing positions. Further investigation revealed that the accounts were highly active; they created the appearance of a political fundraiser in an attempt to extract money from both parties.

Case Study 2: Image Coordination

        Images make up a large portion of social media content. A group of accounts posting many identical or similar images may reveal suspicious coordinated behavior . In this case study, we use media images as content traces to identify such groups on Twitter in the context of the 2019 Hong Kong protest movement. We used the BotSlayer tool (Hui et al. 2019) to collect tweets matching dozens of hashtags related to protests in six languages, and subsequently downloaded all images and thumbnails in the collected tweets. We focus on 31,772 tweets containing one or more images and remove all retweets to avoid trivial duplication of the same image. See Table 2 for more information on data sources.

 (Fig. 3: Coordination network of accounts on Twitter about the Hong Kong protests. Nodes represent accounts whose size is proportional to their degree. On the left, accounts are colored blue if they are likely to coordinate, otherwise they are colored gray. On the right side, we focus on the connected components corresponding to possible coordinating groups. The three largest components are colored according to the content of their images—one pro-protest cluster and two anti-protest clusters, purple and orange, respectively. We shows some sample images shared by these groups, and the corresponding number of different URLs)

Coordination Detection

        Every time an image is published, it is assigned a different URL. Therefore, detecting identical or similar images is not as simple as comparing URLs; it is necessary to analyze the actual image content. We represent each image with an RGB color histogram , divide each channel into 128 bins, and generate a 384-dimensional vector. Binned histograms allow matching variants: images with the same vector are either identical or similar, and correspond to the same features. While enlarging the bins will provide more variant matches, we want to ensure that the space is sparse enough to maintain high matching accuracy.

        We excluded accounts that tweeted fewer than five images to reduce the noise generated by insufficient support. One can tune precision and recall by adjusting this support threshold. We set thresholds to maximize precision while maintaining reasonable recall. Sensitivity of precision to support threshold parameters is analyzed in the Discussion section. We then construct an unweighted bipartite network consisting of account and image features by linking accounts with their vectors of shared images. We project the bipartite network to obtain a weighted account coordination network whose edge weights are computed by Jaccard coefficients. We consider accounts that are highly similar in sharing the same images to be coordinated . To this end, we keep the 1% of edges with the largest weight (see Figure 11). Excluding singletons (accounting for no evidence of coordination), we rank the connected components of the network by size. Table 2 summarizes the methodological decisions in this example.

        

 

 (Fig. 11: Coordinated network weight distributions for the three case studies. Dashed lines represent edge filters: we keep the top 1% weighted edges in case 2, and the top 0.5% weighted edges in cases 4 and 5)

Analysis

        Figure 3 shows the account coordination network. We found three suspicious clusters involving 315 accounts posting images supporting or opposing the protests . Anti-protest groups shared images with Chinese text, targeting a Chinese-speaking audience, while pro-protest groups shared images with English text. We observe that some shared image features correspond to the exact same image, while others are slightly different . For example, 59 image URLs corresponding to the same feature in the supporting protest cluster contain subtle variations in different brightness and cropping. So were the 61 corresponding counter-protest images.

        While this approach identified the coordination of accounts, it did not characterize the coordination as malicious or benign, nor as automatic or organic. In fact, there are many coordinated accounts that behave like humans (see Discussion). These groups were identified because their constituent accounts spread the same graphic content more often than other groups.

Case Study 3: Hashtag Sequences

        A key element of a disinformation campaign is reaching a large audience . To spread beyond followers, malicious actors can use hashtags to target other users who are interested in a topic and may be searching for related tweets.

        If a group of automated accounts posts messages with the same text, this will look suspicious and be easily detected by the platform's anti-spam measures. It's easy to imagine malicious users exploiting language models (such as GPT-2) to interpret their messages. Detection can be made more difficult as the app publishes paraphrased text on behalf of the user. An example of this type of behavior is the "Backfire Trump" Twitter app, which tweets President Trump whenever gun violence results in a death. However, we speculate that even paraphrased texts may contain the same hashtags based on the goals of the coordinated campaign . Therefore, in this case study, we explore how to identify coordinated accounts that post highly similar hashtag sequences across multiple tweets .

        We evaluate this approach on a dataset of raw tweets (without retweets) collected around the 2018 US midterm elections . See Table 3 for more information on data sources. Before applying our framework, we partition the dataset into daily intervals to detect when accounts are reconciled.

 Coordination Detection

        A data preprocessing step filters out accounts with few tweets and hashtags . The threshold depends on the time period of the assessment. In this case, we use at least 5 tweets and 5 unique hashtags within a 24-hour period to ensure sufficient support for possible coordination. More stringent filtering can be applied to reduce the likelihood that two accounts will accidentally produce similar sequences.

        In this case, we designed functionality that combines content (hashtags) and activity (timestamp) tracking . In particular, we use an ordered sequence of hashtags for each user (Fig. 4). A bipartite network consists of accounts in one layer and hashtag sequences in another layer. In the projection phase, we draw an edge between two accounts with the same hashtag sequence. These edges are unweighted and we do not apply any filtering, based on the assumption that two independent users are unlikely to post the same sequence of five or more hashtags on the same day . We also considered a fuzzy approach to matching accounts with slightly different sequences and found similar results.

 (Figure 4: Hashtag sequence features. Hashtags and their positions are extracted from tweet metadata. Accounts tweeting the same hashtag sequence are easily identifiable.)

Analysis

        We identified 617 instances of daily coordination performed by 1,809 unique accounts. Figure 5 shows the 32 suspicious groups identified in one day. The largest component consisted of 404 nodes that sent a series of tweets advocating for stricter gun control laws through the "Backfire Trump" Twitter app. This application is no longer valid. Some of the claims in those tweets were at odds with a report from the nonprofit Gun Violence Archive. The smallest groups consist only of account pairs. One of the pair tweeted a link to a now-defunct page that advertised online casino bonuses. Another pair of accounts also provided a link to a list of elected candidates for office that have been endorsed by the Humane Society's legislative fund. Of course, we can use longer time windows and potentially reveal larger coordination networks. For example, the Backfire Trump cluster in Figure 5 is part of a larger network of 1,175 accounts.

Case Study 4: Co-Retweets、

        Amplification of information sources is perhaps the most common form of manipulation. On Twitter, a group of accounts retweeting the same tweet or the same group of accounts may indicate coordinated behavior. Therefore, we focus on retweeting in this case study.

        We apply the proposed method to detect coordinated accounts that amplify narratives related to the "White Helmets", a volunteer organization that was targeted by disinformation campaigns during the Syrian civil war. Recent reports have identified Russian sources behind these activities (Wilson and Starbird 2020). Data was collected from Twitter using English and Arabic keywords. See Table 4 for more details on the data.

Coordination Detection

        We build a bipartite network between retweeting accounts and retweeting messages, excluding accounts that retweet themselves and have fewer than 10 retweets. The network uses TF-IDF for weighting to discount the contribution of popular tweets. Thus, each account is represented as a TFIDF vector of retweeted tweet IDs. Then, the projected co-forwarding network is weighted by the cosine similarity between account vectors. Finally, we keep only the most suspicious 0.5% of edges in order to focus on evidence of potential coordination (see Figure 11) . These parameters can be tuned to trade off precision and recall; the effect of the threshold on precision is analyzed in the Discussion section. Table 4 summarizes the methodological decisions.

Analysis

        Figure 6 shows the co-forwarding network and highlights two sets of coordinated accounts. Accounts in the orange and purple groups retweeted messages supporting and opposing the “white helmets,” respectively . The example tweet shown in the image is no longer publicly available.

 Case Study 5: Synchronized Actions

        A "pump and dump" is a shady scheme to inflate a stock's price by making a false statement (a pump) to simulate a surge in buyer interest and sell a cheaply bought stock at a higher price (a dump). Investors are vulnerable to this manipulation because they want to act quickly when buying stocks that appear to promise high future profits. By exposing investors to information from seemingly disparate sources in a short period of time, fraudsters create a false sense of urgency that prompts victims to take action. Social media provides fertile ground for such scams (Mirtaheri et al., 2019). We study the effectiveness of our method in detecting coordinated cryptocurrency pump and dump activity on Twitter. The data was collected using keywords related to 25 vulnerable cryptocurrencies and cash tags (eg, $BTC) as query terms. We consider both original tweets and retweets because they both add to the feed considered by potential buyers. See Table 5 for more details on the dataset.

 Coordination Detection

        We assume a coordinated pump and dump campaign using software to have multiple accounts post pumps in close proximity in time. Therefore, tweet timestamps are used as behavioral traces of accounts . The shorter the time between two tweets, the less likely they are to be coincidental. However, shorter time intervals lead to significantly fewer matches and increased computation time. On the other hand, a longer interval (such as daily) produces many false positive matches. To balance these issues, we use a 30-minute interval. Intuitively, any two users may post one or two tweets at any time interval; however, this is not the case for a larger set of tweets. In order to focus on accounts with sufficient coordination support, we only keep those accounts that have posted at least 8 messages. This particular support threshold was chosen to minimize false positive matches, as shown in the Discussion section.

        Tweets are then categorized based on the time intervals in which they were posted. These temporal features are used to build a two-party network of account and tweet timing. Edges are weighted using TF-IDF. Similar to the previous case, the projected account coordination network is weighted by the cosine similarity between TF-IDF vectors. After manual inspection, we found that many of the tweets shared in this network had nothing to do with cryptocurrency, while only a small percentage were marginally related to the topic. These edges also share high similarities and generate strong coordination signals. Therefore, we only keep the 0.5% edges with the largest cosine similarity (see Figure 11). Table 5 summarizes the methodological decisions.

Analysis

        Figure 7 shows the synchronous action network. Qualitative analysis of connected components in the network to assess accuracy. The purple subplot marks clusters of coordinated accounts where suspicious pump and dump schemes were observed. We found many cryptocurrency schemes with different examples.

 (Figure 7: Time Coordination Network. Nodes (accounts) are connected if they post or retweet within the same 30 minutes. Singletons are omitted. Accounts in the purple cluster and accounts in the small yellow cluster at 8 o'clock Highly suspicious of running a pump and dump scheme. Some tweet excerpts are shown; these tweets are no longer public.)

        Changes in the stock market, especially those cryptocurrencies that focus on short-term trading, can be difficult to capture due to market volatility. Also, it's hard to attribute price changes to a single cause, such as Twitter activity related to push-ups and sell-offs. This makes quantitative verification of our results difficult. However, during the week of December 15-21, 2017, the tokens Verge (XVG), Enjin (ENJ) and DigiByte (DGB) were all on a daily upward trend. Every day, the price spikes after a flood of simultaneous tweets commenting on its move. These trends precede the record prices for these tokens to date, with XVG on December 23, 2017, and ENJ and DGB on January 7, 2018. Clusters of accounts with large withdrawals of these three tokens are highlighted in yellow in Figure 7.

Discussion

        The five case studies presented in this paper merely illustrate how our proposed method can be implemented to find coordination. The approach could in principle be applied to other social media platforms besides Twitter. For example, image coordination methods can be applied to Instagram, while coordination between Facebook pages can be discovered through the content they share.

        Several of the unsupervised methods discussed in the related work section, like the five applications of our method presented here, focus on different types of coordination. Therefore, these methods cannot be directly compared. A key contribution of this paper is to provide a flexible and general approach to describe these different approaches in a unified scheme. For example, Debot (Chavoshi, Hamooni, and Mueen 2016) can be described as a special case of our method based on a complex time-hashing scheme that preserves dynamic time-warping distances (Keogh and Ratanamahatana 2005), while SynchroTrap (Cao et al. 2014 ) ) exploit synchronous information by matching actions within a time window. The method of Giglietto et al. (2020) and Chen and Subramanian (2018) are special cases using similarity based on shared links. The method of Ahmed and Abuaish (2013) uses a table of contingency accounts with features equivalent to our bipartite network.

        Our approach aims to identify coordination among accounts, but it does not characterize the intent or authenticity of coordination, nor does it allow the discovery of underlying mechanisms . Recent news reports have highlighted an example of malicious intent in which a coordinated network of teens posted false narratives about the election. However, it is important to remember that coordinated activity may be initiated by real users with benign intentions. For example, social movement participants use hashtags in a coordinated fashion to raise awareness for their causes.

        Figure 8 shows the distribution of robot scores in case studies 1-3. (Due to anonymization in the dataset, we were unable to analyze the bot scores in Cases 4-5.) We observed that while coordinating accounts were more likely to have high bot scores, many coordinating accounts had low scores (similar to human ) - mostly in two of the three cases . Therefore, detecting social bots is not sufficient to detect coordinated activities.

 (Number of bots for suspicious and non-suspicious accounts. Histogram of bot scores for suspicious accounts identified by our method versus other accounts. Top, middle, and bottom panels represent account handle sharing (Case Study 1), image coordination (Case Study 2, ) and hashtag sequences (case study 3). Bot scores for case study 1 were obtained from Botometer version 3 (Yang et al., 2019), collected from May 2018 to April 2019. For the other two cases , bot scores are obtained from BotometerLite (Yang et al., 2020). The dataset may include multiple scores for the same account.)

        Although the case studies presented here are based on data from different sources, they are not intended to exaggerate the effectiveness of the proposed method nor to focus on malicious accounts. Figure 9 shows that the distribution of bot scores for the account sets analyzed in Case Studies 1 and 3 is consistent with the distribution of scores obtained from a random sample of tweets . We note that this is not a random sample of accounts - it is biased account activity. Case study 2 is an exception; we surmise that bots were used to post a large number of images during the Hong Kong protests.

 (Distribution of bot scores. The QQ plot compares the distribution of bot scores in the three case studies to the distribution of scores obtained from a 24-hour 1% random sample of tweets. The sources of bot scores are shown in Figure 8. All distributions are heavily skewed towards Lower bot score values ​​(i.e., more humans than bots), except for case study 2, where bots scored higher and the distribution was close to uniform.)

Guess you like

Origin blog.csdn.net/qq_40671063/article/details/132075747