Tools & Code

Open-source classifiers, datasets, and reproducible pipelines

Computational tools and code by Alex Newhouse for political-text classification, network analysis, and extremism research.

I build and release computational tools that operationalize political-science theory at scale. The artifacts below sit at the intersection of academic research and applied trust-and-safety work — the same pipelines I have deployed in partnerships with gaming platforms, federal investigators, and policy groups.

Last updated: April 2026

Methods stack

Layer Tools
Languages R, Python, SQL
ML / NLP PyTorch, Hugging Face Transformers, Scikit-Learn, spaCy
Networks igraph, statnet, NetworkX
Causal inference Interrupted time-series, intervention analysis, regression
Data viz ggplot2, plotly, D3
Infra Git, Docker, HPC (SLURM), Splunk

Datasets & resources

  • Glossary of right-wing terminology, slang, and imagery — developed at CTEC. Available on request for vetted academic use.
  • Sainthood corpus — 11 years of 4chan /pol/ posts annotated for canonization rhetoric (in development; dissertation Ch. 4).
  • Mass-casualty attacks dataset (OECD, 20 yr) — derived from ACLED, GTD, RTV; in preparation alongside the dissertation.

Industry & policy collaborations

  • Roblox — detection and mitigation of violent and hateful user networks.
  • Spectrum Labs — multilingual datasets of online toxicity across 7 languages.
  • U.S. House Select Committee on January 6th — cross-platform investigation, hearing material, and final-report contributions.
  • Department of Homeland Security — two TVTP grants ($1.33M total) supporting extremism and gaming research.

Code & repos

GitHub: github.com/alexbnewhouse

If you’d like access to data, replication code, or a model not yet public, email me — I’m happy to share with vetted researchers.

Back to top