Projects

Selected work

Selected work in AI infrastructure, NLP, and applied research.

I — In production

N° 01

Molcajete

AI-powered transcription and analysis pipeline for political focus group research.

A full audio-to-insight pipeline replacing error-prone transcription and hours of note-taking per project. Speaker diarization, transcription, theme classification, and integrated reporting — all surfaced through a tooling layer that researchers actually use.

1,300+

hours of audio processed

<60 min

turnaround per project

N° 02

Adapta

Data preprocessing pipeline and LLM fine-tuning infrastructure for Mexican Spanish political analysis.

Specialized LLMs, produced by a reproducible fine-tuning and evaluation pipeline. Built for empirical comparison of base models and prompts.

40+

evaluation metrics

100+

training runs

N° 03

Nopalero

Automated participant screening system for qualitative recruitment.

Automated intake pipeline that replaces hours of manual data entry per project. Combines OCR, fraud detection, and socioeconomic classification — so analysts focus on the decisions, not the paperwork.

validation checks

manual data entry

II — Open source

N° 01

Judex

noah-art3mis/judex-mini

Python scraper and parser for Brazilian Supreme Court (STF) case data.

Typer-based CLI with three cache-first stages — scrape, download, extract — supporting heavily sharded runs with proxy rotation, feeding a DuckDB warehouse. Multiple OCR backends, including self-hosted Tesseract on fly.io for cheap inference.

R$ 52

yearly HC sweep

0.93/s

PDFs sustained

0.28/s

cases sustained

OCR backends

Have a problem that doesn't fit a template?

Most of the work above started as someone saying exactly that.

Start a conversation