Tools & Code

Open-source classifiers, datasets, and reproducible pipelines

Computational tools and code by Alex Newhouse for political-text classification, network analysis, and extremism research.

I build and release computational tools that operationalize political-science theory at scale. The artifacts below sit at the intersection of academic research and applied trust-and-safety work — the same pipelines I have deployed in partnerships with gaming platforms, federal investigators, and policy groups.

Last updated: April 2026

For researchers

Replication-ready workflows for text classification, network analysis, and time-series designs in sensitive-data settings.

For trust & safety teams

Applied methods for turning ambiguous policy categories into measurable model outputs and review pipelines.

For students

Examples of how to move from social-science theory to transparent code, validation, and responsible interpretation.

Featured projects

DistilBERT for Political Text Classification

Transformer-based classifier for extremist content in political discussion. 94% F1 on a labeled corpus; full PyTorch + Hugging Face pipeline with data augmentation.

Python PyTorch Transformers NLP

View project →

Apocalyptic Rhetoric Classifier (in prep)

Fine-tuned transformer model for detecting apocalyptic and millenarian rhetoric in online political forums. Used in the dissertation Chapter 2 study of /pol/ reactions to mass-casualty attacks.

Python Transformers Time-series

Coming with paper release

Network Analyses of Neo-Fascist Coalitions

Reproducible R + igraph pipelines accompanying the CTC Sentinel and GNET reports on multi-node accelerationist networks.

R igraph Networks

On GitHub →

Methods stack

Layer	Tools
Languages	R, Python, SQL
ML / NLP	PyTorch, Hugging Face Transformers, Scikit-Learn, spaCy
Networks	`igraph`, `statnet`, NetworkX
Causal inference	Interrupted time-series, intervention analysis, regression
Data viz	`ggplot2`, `plotly`, D3
Infra	Git, Docker, HPC (SLURM), Splunk

Datasets & resources

Glossary of right-wing terminology, slang, and imagery — developed at CTEC. Available on request for vetted academic use.
Sainthood corpus — 11 years of 4chan /pol/ posts annotated for canonization rhetoric (in development; dissertation Ch. 4).
Mass-casualty attacks dataset (OECD, 20 yr) — derived from ACLED, GTD, RTV; in preparation alongside the dissertation.

Industry & policy collaborations

Roblox — detection and mitigation of violent and hateful user networks.
Spectrum Labs — multilingual datasets of online toxicity across 7 languages.
U.S. House Select Committee on January 6th — cross-platform investigation, hearing material, and final-report contributions.
Department of Homeland Security — two TVTP grants ($1.33M total) supporting extremism and gaming research.

Code & repos

GitHub: github.com/alexbnewhouse

If you’d like access to data, replication code, or a model not yet public, email me — I’m happy to share with vetted researchers.