Notes on Using Radical Environmentalist Texts to Uncover Network Structure and Network Features

Using Radical Environmentalist Texts to Uncover Network Structure and Network Features

Sociological Methods & Research 2019, Vol. 48(4) 905-960

Authors: Zack W. Almquist and Benjamin E. Bagozzi

DOI: 10.1177/0049124117729696

Abstract

Radical social movements are broadly engaged in, and dedicated to, promoting change in their social environment. In their corresponding efforts to call attention to various causes, communicate with like-minded groups, and mobilize support for their activities, radical social movements also produce an enormous amount of text. These texts, like radical social movements themselves, are often (i) densely connected and (ii) highly variable in advocated protest activities. Given a corpus of radical social movement texts, can one uncover the underlying network structure of the radical activist groups involved in this movement? If so, can one then also identify which groups (and which subnetworks) are more prone to radical versus mainstream protest activities? Using a large corpus of British radical environmentalist texts (1992–2003), we seek to answer these questions through a novel integration of network discovery and unsupervised topic modeling. In doing so, we apply classic network descriptives (e.g., centrality measures) and more modern statistical models (e.g., exponential random graph models) to carefully parse apart these questions. Our findings provide a number of revealing insights into the networks and nature of radical environmentalists and their texts.

Quotes

extant research on radical social movements has also been limited by a paucity of data pertaining to the strategies, linkages, and agendas of these organizations. This deficiency is unsurprising. Many radical movements are short-lived or are highly volatile in their ideology, strategies, and membership (Smith and Damphousse 2009; Simi and Futrell 2015). The groups involved in these movements are also typically nonhierarchal in structure and anonymous in membership (Fitzgerald and Rodgers 2000; Joossee 2007), which limits researchers’ abilities to identify and compare membership structures and pat- terns across groups. By virtue of occupying the fringes of the society, the viewpoints of radicals, and the literature they produce, commonly fail to make it into mainstream media sources or public archives.

The ideology of radical activist groups, the illegal (or antigovernment) protest tactics that they often favor, and past countermovement efforts led by government actors have together given radical activist groups the incentives to conceal, obfuscate, and misrepresent their membership and ties with other radical groups as well as their favored tactics and ideologies (Plows et al. 2001, 2004; Simi and Futrell 2015). As a consequence of these tendencies, research on radical activism has been largely limited to qualitative case studies and small-N research.

We thus believe that a specific subset of automated content analysis techniques known as topic models, which allow one to systematically unpack a corpus of text documents by identifying words and phrases that commonly group together across documents, can help researchers to uncover the latent topics, or common themes, that arise across radical environmentalist texts.

the topics identified by topic modeling can be viewed as representing the ideology, tactics, and foci of radical groups themselves. As such, whereas the texts produced by radical groups offer a window into their activities and interests, topic models provide us with a means of systematically accessing this window.

Social network analysis (SNA) offers an additional, and complementary, suite of tools for the systematic study of radical groups. Indeed, group inter- action, cohesion, and coordination have long been of interest to the social science community and lie at the heart of radical group behaviors. Over the last century, the development of methods and theories to both describe and predict such interaction has been largely developed under the umbrella of SNA. This theory includes formal mathematical models for describing com- plex relationships between entities (e.g., organizational coordination), and such methods have now been widely applied to studies of (radical) social movements

However, while this technique has much potential for understanding radicalism, it has been employed rarely in this area of research due to the complexity of gathering high-quality network data for the covert actors described above. A key contribution of our article rests in the development of a means to overcome this limitation: We argue below, and show, that quantitative text analysis techniques can help to rectify data limitation challenges for social network analyses of covert groups.

Specifically, our proposed approach details how one can com- bine the use of co-occurrence counts, social network statistics, and structural topic models (STM) to extract an environmental group network—and the full set of protest strategies used within that network—from a corpus of self- produced (and often highly variable) radical environmentalist texts.

In full, our approach enables us to explain why, when, and how radical groups may cooperate with one another in order to achieve their respective aims as well as the manners in which these groups choose to cooperate.

This finding may help to explain the decline in the (UK) radical environmental protest movement during the late 1990s and early 2000s, as the centrality of nonenvironmentalist actors within our identified network suggests that the persistence of this network, and the groups therein, may have primarily rested on broader support for nonenvironmental leftist groups and views (e.g., anarchism) which was also waning during this time period.

[n]etworking brought with it the influence of the ideologies, issues foci, repertoires of action, strategies and articulations of existing green networks [ . . . ], mass NVDA repertoires were derived from the peace and Australian rainforest movements. Repertoires of sabotage, in turn, were derived from animal liberation militants and to a lesser extent EF! Foreman and Haywood 1993). (Wall 1999b:90)

Preprocessing

Our analysis requires that we create two intermediate input quantities from the text files described above: (i) a list of relevant (radical) UK environmen- tal groups and (ii) a corpus of fully preprocessed text “documents.” The former is used for identifying the pairs of radical groups that co-occur within our text corpus and the underlying network of radical groups that arises from these co-occurrences. The latter is used for estimating the latent strategies that are discussed across our corpus and the variation in these latent strategies vis-a`-vis our identified group network. We directly derive each intermediate input quantity from our text sample of DoD issues 1–10 and discuss each process in turn below, beginning first with our radical UK group list.

We used information contained within DoD itself to identify and construct our list of relevant UK groups.

The final three to four pages of each DoD issue provide comprehensive contact information (e.g., names and addresses) for a wide range of radical leftist groups, typically separated by international and domestic (UK) groups as well as by group type

our analytic framework follows past scholarship (e.g., Saunders 2007b) in analyzing our environ- mental group network through a relational approach—that allows the group interactions and behaviors within our texts to inform our network—rather than through a positional approach that defines our network based upon a (subjective) classification of our groups into specific issue areas (or categories) a priori.

Saunders, Clare. 2007b. “Using Social Network Analysis to Explore Social Movements: A Relational Approach.” Social Movement Studies 3:227-43.

while many of our 143 identified groups have historically advocated for issues that fall outside of the environmental arena (e.g., anarchism), we continue to characterize all 143 groups as (radical) environmental groups in our ensuing discussions, given that all groups in our sample were (i) at least tangentially involved in environmental issues and (ii) listed as key contacts, and frequently referenced within the texts, of a radical environmentalist publication. While we believe this to be most consistent with our relational approach, however, we do take care to note those instances where a specific group’s focus extended beyond radical environmentalism.

our final 143-group list also contains a number of UK groups whose tactics did not traditionally encompass radical environmental direct action. Rather than arbitrarily remove these potentially nonradical groups from our UK-group list ex post, we maintain these groups in our sample and use this variation (in known group strategies) to validate our protest strategy extraction methods below by assessing whether our unsupervised text analysis approach is indeed correctly identifying those groups that have been known to use, or not use, radical environmental direct action tactics.

Because the unsupervised topic modeling techniques used below require that we apply these methods to a collection of text documents, we must first define what a standard document should be for this stage of our text analysis. Based upon our substantive knowledge of the DoD corpus, as well as previous applications of topic models to social science texts, plausible document designations include each individual DoD issue, each individual page of text within our DoD sample, individual sentences or paragraphs, or each individual story entry within the corpus. Extant social science research has applied similar topic models to those used below to documents ranging in size from individual tweets (Barbera ́ et al. 2014) to individual books (Blaydes, Grimmer, and McQueen 2015). Others have used more arbitrary text breaks to define documents such as page breaks, sentences, multisentence sequences, or paragraphs (Bagozzi and Schrodt 2012; Brown 2012; Chen et al. 2013). Hence, for the methods applied below, a great deal of flexibility can be afforded in defining one’s documents.

Before constructing these sentence sequence documents, we first removed the aforementioned UK (and international) group contact lists from the final pages of each DoD issue. While we use these contacts for identifying our groups of interest, we wish to avoid treating this text as actual content text during either the extraction of group co-occurrence information or the topic modeling stages of our analysis, given that the contact lists included within DoD provide little to no surrounding content text aside from the listed names and contact information for each group.

Altogether, these preprocessing steps created a corpus with 3,210 unique documents and 2,082 unique word stems. Further below, these preprocessed documents are used within topic models to discover the latent strategies that underlie the DoD corpus and to examine how these strategies vary in relation to our (separately derived) radical UK group network.

The use of co- occurrences for building networks in this fashion is well established (Chang, Boyd-Graber, and Blei 2009; Culotta et al. 2004; Davidov et al. 2007), and using this literature as a guide, alongside information on the typical document length for our corpus, we believe 12-sentence sequences to be a reasonable balance between the identification of true co-occurrences and the avoidance of false co-occurrences.

With this set of group name variations in hand, we implemented a processing script that individually standardized each group name within our unprocessed documents, while also incorporating unique features into some group names so as to ensure that groups with closely overlapping names were not incorrectly counted as (co-)occurring in instances when only a similarly named group was mentioned in a document.

Environmental Group Networks

Social network methods have a long history in the social sciences, dating back to the 1930s (Freeman 2004). Co-occurrence networks have been used effectively in the bioinformatics literature (e.g., Cohen et al. 2005), computer science and engineering (e.g., Matsuo and Ishizuka 2004), and the bibliometric literature (for a review, see King 1987). Social networks have been employed for related theoretical development in explaining social movements (e.g., Byrd and Jasny 2010) and in organizational (e.g., Burt 2000; Spiro, Almquist, and Butts 2016) activities. Further, one can represent a collection of individuals engaged in shared activity as single entity which engages in collaborative activities, for example, citation networks between blogs or disaster relief (e.g., see Almquist and Butts 2013; Almquist, Spiro, and Butts 2016). This work builds on this literature and extends it in several key dimensions. First, we directly incorporate text information to understand the diffusion of information, ideas, and activities which occur through these networks. Second, we explore the endogenous nature of the network and concepts which create these complex social interactions. Third, we empirically validate these measures in the policy relevant setting of modern environmental movements.

Here, we adopt the social network nomenclature of representing a network as a mathematical object known as a graph (G). A graph is defined by two sets, G 1⁄4 ðE; V Þ: (i) an edge set (E) which represents a relationship, such as friendship and (ii) a vertex set (V) which represents an entity, such as an individual or organization. Note that the edge set is sometimes referred to as a link or tie, interchangeably, and that the vertex set is referred to as the node set or actor set, interchangeably, in much of the social network literature (Wasserman and Faust 1994). We will use the same conventions. The edge set can be either directed (e.g., A ! B) or undirected (e.g., A $ B). Further, for analysis purposes, it can be shown that a graph may be fully identified by its adjacency matrix. An adjacency matrix (Y ) is a square dyadic matrix where yij represents a relation (0 or 1) between actor i and actor j. In this case, one specifies a relationship is present with a 1 and absent with a 0.

The network literature is comprised of both theoretical and methodological insights. In the context of radical environmental group networks we are particularly interested in descriptive analysis (Almquist 2012; Wasserman and Faust 1994), metrics of power (also known as centrality indices, see Freeman 1979), and group analysis (i.e., community detection or clustering, see, e.g., Fortunato 2010; Mucha et al. 2010).

The Network and Descriptive Statistics

Typically network analysis begins by defining the bounds of the network and the relation of interest (Wasserman and Faust 1994). The group network analyzed in this article consists of 143 radical environmental groups identified in the text analysis portion of this research.

The edge set is defined, as mentioned in the fourth section, by at least one co-occurrence between a ði; jÞ group in a given document. We treat this as a symmetric (or undirected) relation, such that yij 1⁄4 yji. To visualize the network and compute the descriptive statistics, we employ a specialized package in R (R Core Team 2015) dedicated to network analysis (i.e., the SNA package; see Butts 2008, for more details).

our observance of a highly stratified network between active and nonactive members is consistent with Saunders (2007b) relational network analysis of London-based environmental organizations, which found that environmental radicals exhibited a small number of ties, relative to other environmental organizations, and did not collaborate with more mainstream conservationist groups. Hence, our broader network appears to be consistent with existing understandings of radical environmentalist net- works. We use node-level characteristics to discuss the most central mem- bers of this network in the next section, which are often interpreted as influence or power in the network.

Centrality

In SNA, the concept of node centrality is an important and classic area of study within the field (Freeman 1979). There exist within the social network literature a large set of centrality metrics and measures of power.

We choose to focus on the four most popular (Freeman 1979; Wasserman and Faust 1994): degree, eigen, betweenness, and closeness centrality. Each captures a slightly different, but important aspect of node position within the network. Degree centrality is a straightforward measure of how connected a given individual actor is to all other actors in the network (Anderson, Butts, and Carley 1999).

Eigen centrality (or eigenvector centrality) corresponds to the values of the first eigenvector of the adjacency matrix; this score is related to both Bonacich’s power centrality measure and page-rank and can be thought of as core–periphery measure (Anderson et al. 1999; Bonacich 1972)

Betweenness is a (often normalized) measure of shortest path or geodesic distance between overall graph combinations; conceptually, a node with high-betweenness resides in a space with a large number of nonredundant shortest paths between other nodes. This can be interpreted as nodes which act as “bridges” or “boundary spanners” (Freeman 1977). The last centrality measure we consider is closeness which is another geodesic distance–based measure. Here we use the Freeman (1979) version which is defined for disconnected graphs and has largely the same properties as the classic closeness measure.

Radical Environmental Group Tactics

we next seek to discover the underlying protest tactics that are pursued by environmental groups within this network’s connected component and to evaluate whether these methods can accurately identify which groups actively pursued radical protest, and which did not, in an unsupervised fashion. To do so, we apply unsupervised topic models to our preprocessed text documents, so as to simultaneously (i) uncover the latent themes or “topics” that are discussed across documents and (ii) associate these topics with the group ties and clusters that we discussed above. Topic models allow one to recover the former quantity by treating one’s documents as a combi- nation of multiple overlapping topics, each with a representative set of words.

 

Topic model extensions that incorporate predetermined network information have largely focused on conditioning topic estimation upon network structures (and documents) linked by authorship/recipient or citations (e.g., Chang and Blei 2009; Dietz, Bickel, and Scheffer 2007), although Chang et al. (2009) and Krafft et al. (2014) have developed more flexible network- oriented topic models that infer topic descriptions and their relationships from documents that are indexed by a network’s entities and/or entity pairs.

the vast majority of our (12-sentence sequence) documents are not associated with specific network entities, entity pairs, or clusters.

Only a small proportion of these documents contains relevant network information, which effectively precludes us from using the models discussed above to extract topic-based information concerning our network’s underlying strategies and tactics.

Accordingly, we favor a more recently developed approach for the incorporation of external document-level information and structure into unsupervised topic model analyses known as the structural topic model (STM Roberts et al. 2014). The STM estimates latent topics in a similar hierarchical manner to that of the general discussion provided earlier, while also incorporating document-level information via external covariates into one’s prior distributions for document topics or topic words. In this manner, one can use the STM to not only identify a set of shared latent topics across a corpus but also to evaluate potential relationships between document-level covariates and the prevalence of a given topic within and across documents. As such, the STM has been effectively used to estimate the effects of survey–respondent characteristics upon variation in respondents’ open-ended responses (Roberts et al. 2014) as well as the effects of country-level characteristics (e.g., regime type) upon the topical attention of U.S. State Department Country Reports on Human Rights Practices (Bagozzi and Berliner 2017). For our application, the STM’s advantages intuitively lie in its ability to incorporate structural network information—namely, group tie presence and cluster member presence—as binary predictors of variation in attention toward different radical protest strategies and tactics across documents.

while we rely on the topwords to define our topics above, note that we also use the STM to identify a sample of 10 highly representative documents for each topic, and have used these sample documents to qualitatively guide our topic labeling efforts.

As one would expect, we find in this case that the Class War and The Ecologist group pairing is not significantly associated with any of the four identified protest tactics. This implies that, within DoD, these two groups are not associated along radical protest dimensions and that our proposed method is capable of avoiding false positives with respect to the association of radical protest strategies with specific (nonradical) group pairs.

A number of anarchist bookstores and community organizing centers, which often serve as organizing venues for EF! and related group activates, also appear in this cluster. We therefore expected that cluster 3’s shared tactics would largely fall within the protest camps– and direct action/ecotage–type tactics, as these are the activities that cluster 3’s groups are most often and likely to coordinate on—which is indeed the case in Figure 6. Hence, these findings further demonstrate that our combined network and text analysis strategy is able to uncover useful and theoretically consistent information with respect to radical protest tactics and strategies.

The results presented in Figure 6 also reveal a number of novel theoretical insights that together help to sharpen our understandings of social and environmental movements in the United Kingdom. For instance, while we find above that both clusters 1 and 2 are positively associated with our direct action/ecotage topic, we find that cluster 2 is marginally less likely to coordinate on violent protest, whereas cluster 1 is significantly more likely than not to coordinate on violent protest.

Network Discovery/Classification via Topic Models

A common problem in network analysis is that of network discovery or network classification of a given relation (see e.g., Wasserman and Faust 1994). This is especially a problem in the case of networks derived from text, such as co-occurrence networks. The methods employed in this article can be used for a two-fold analysis of this issue. First, the topic models can be engaged so as to allow for a qualitative understanding of the co- occurrence relationships (e.g., coordination or communication over direct action or media campaign). Second, these models can be employed as clas- sifiers (here we use classifier in the manner computer scientists refer to the term, for a review, see Vapnik and Vapnik 1998, where a classifier is a method to identify categorical items of interest) to select out relations of interest.

under the Walk Trap selected by a modularity score clustering algorithm (Gallos 2004)—ignoring isolates—we find three core groups for the direct action/ecotage network and five groups for the eco-literature classified net- work. Further, we can test substantive network behavior through statistical models of these social networks as discussed in the next subsection.

Statistical analysis of Direct Action/Ecotage Network and Eco-Literature Network

Here, we consider probabilistic models of social networks fit by maximum likelihood estimation (MLE). These models are derived from exponential random family models and have thus been referred to as exponential random- family graph models (ERGM).

Analysis of direct action/ecotage network. In line with the typical ERGM (and larger statistical model literature), we begin with an edge model (or random graph model; Bayesian information criterion; BIC 1⁄4 200.69) and then extend it to encompass more complex effects, such as edgewise-shared partners (a measure of clustering, Hunter and Handcock 2012, BIC 1⁄4 197.42), and then follow up by controlling for topic similarity through our cosine similarity metric.

Analysis of eco-literature network. Again, in the typical fashion, we begin with a random graph model and then extend it to encompass more complex effects, such as edgewise-shared partners (a measure of clustering, Hunter and Hand- cock 2012) and then extend this by controlling for topic similarity through our cosine similarity metric.

Discussion and Summary of Topic Based Network Classification

We have found that it is possible to use our favored topic model approaches to carefully classify subnetworks from the core co-occurrence network for further analysis and comparison. These methods not only provide for a better qualitative understanding of a given edge without direct human coding but also allow for the identification of salient subnetworks within the larger network without the aid of human coders.

Further, we can individually analyze these networks to uncover distinct substantive findings. For example, in doing so above, we found that the direct action/ecotage network can be largely explained through edgewise shared partner clustering and topic similarity, whereas the eco-literature network requires the extra information of local clustering in order to model it correctly.

This suggests that there are distinct local groups within the eco-literature network, whereas the direct action/ecotage network appears to be much more cohesive. The latter finding lends support to past anecdotes of high inbreeding among radical environ- mental groups in the United Kingdom—which previously noted that insufficient data on radical UK environmental groups limited systematic conclusions in these regards (Saunders 2007b:237)—as well as to findings concerning the tendencies of subversive and radical groups to disproportionately seek out closer and/or more like-minded collaborators when coordinating on potentially illegal or violent actions (Almquist and Bagozzi 2016).

altogether, we find that this is a strong and clever technique for improving our understanding of networks and subnetworks through topic models and mod- ern text analysis approaches.

Summary

This article presents a suite of tools for the automated discovery of radical activist group networks and tactics from raw unstructured text and applies these methods to a collection of radical UK environmentalist publications. While radical groups are often covert in their networks, membership, and tactics, they produce a great deal of text during their efforts to publicize their concerns, communicate with like-minded groups, and mobilize support for their activities. The advent of the World Wide Web, along with more recent advancements in automated text analysis and SNA tools, enables scholars to take advantage of these self-produced texts in novel ways. As we show, such methods not only allow one to uncover detailed network information from unstructured environmentalist texts but can also identify the underlying tac- tics that are discussed in these texts and help to pair these tactics with one’s uncovered network.

our UK environmental group application provides scholars with a detailed guide for implementing these techniques and offers a number of insights into the network and tactics of radical leftist and environmentalist groups within the UK.

Specifically, we find that the UK environmental movement of the 1990s and early 2000s, while most commonly associated with the UK EF! organi- zation, is actually embedded within a larger network of radical leftist groups and venues. Surprisingly, the most central members of this network share close ties to the EF! movement, but primarily advocate for a more general set of leftist ideals, including opposition to globalization, the promotion of

worker’s rights, and anarchism. This is consistent with characterizations of leftist global social movements’ increasing de-environmentalization, and subsequent reorientation toward social justice and economic issues during the late 1990s and onward (Buttel and Gould 2004). With respect to the UK specifically, scholars have often noted the decline of environmental protest in the UK during the late 1990s (Rootes 2000:49), which was shortly followed by the demise of the DoD publication itself. We believe that the highly central, but nonenvironmentalist, radical leftist groups that we identify—and variation in support for their respective movements and causes—may help to explain this decline.