Projects

Selected work

Selected work in AI infrastructure, NLP, and applied research.

I — In production
N° 01

Molcajete

AI-powered transcription and analysis pipeline for political focus group research.

A full audio-to-insight pipeline replacing error-prone transcription and hours of note-taking per project. Speaker diarization, transcription, theme classification, and integrated reporting — all surfaced through a tooling layer that researchers actually use.

1,300+
hours of audio processed
<60 min
turnaround per project
N° 02

Adapta

Data preprocessing pipeline and LLM fine-tuning infrastructure for Mexican Spanish political analysis.

Specialized LLMs, produced by a reproducible fine-tuning and evaluation pipeline. Built for empirical comparison of base models and prompts.

40+
evaluation metrics
100+
training runs
N° 03

Nopalero

Automated participant screening system for qualitative recruitment.

Automated intake pipeline that replaces hours of manual data entry per project. Combines OCR, fraud detection, and socioeconomic classification — so analysts focus on the decisions, not the paperwork.

48
validation checks
0
manual data entry
II — Open source

Python scraper and parser for Brazilian Supreme Court (STF) case data.

Typer-based CLI with three cache-first stages — scrape, download, extract — supporting heavily sharded runs with proxy rotation, feeding a DuckDB warehouse. Multiple OCR backends, including self-hosted Tesseract on fly.io for cheap inference.

R$ 52
yearly HC sweep
0.93/s
PDFs sustained
0.28/s
cases sustained
4
OCR backends

Have a problem that doesn't fit a template?

Most of the work above started as someone saying exactly that.

Start a conversation