av Arnaldo Vitaliano
Career Portfolio · April 2026

Arnaldo
Vitaliano

Senior Data Engineer — 18+ years across the data lifecycle. Builds the foundations and the agents that sit on top of them.

Six product areas
in four years at Meta
Consumer-social data
Monetization business ops
Monetization reliability
Accessibility compliance
Controls & compliance
Abuse / scraping detection
Professional Summary

The shape of the work

A senior data practitioner spanning monetary policy, financial services, and large-scale consumer social sectors. At Meta, has delivered production data products across six distinct product areas in four years — consumer-social, monetization business operations, monetization reliability, accessibility compliance, controls & compliance measurement, and abuse / scraping detection — with a parallel investment in AI-powered analytics tooling. Equally fluent at building the underlying data foundations (pipelines, dimensional models, semantic layers) and the leadership-facing dashboards and AI agents that make those foundations actionable.

Key Achievements

Six things that shipped

No. 01

Built a Compliance Measurement Framework for a Global-Scale Accessibility Program

What
Designed and shipped the dimensional model, daily and unified data pipelines, and central dashboards used by leadership to track regulatory accessibility compliance across the company's product surfaces.
Scale
Coverage spanning a flagship consumer product (billions of users) and dozens of regulated user-facing flows, defects, and remediation milestones.
How
Distributed SQL query engine, data pipeline orchestrator, dimensional modeling (Kimball), an internal BI dashboarding platform, and a dynamic configuration system for milestone-aware reporting.
Result
Single source of truth for accessibility compliance reporting in H2 2025 — adopted by program managers, engineering leadership, and legal / compliance stakeholders. 100+ landed changes covering defects, customer reports, prevalence heat maps, and regulation-milestone modeling.
No. 02

Launched Reliability Insights for a Multi-Billion-Dollar Monetization Platform

What
Built the first reliability and revenue-fluctuation analytics dashboard suite for a monetization business unit, enabling engineering teams and stack owners to identify hotspots and prioritize reliability investments during planning.
Scale
Dashboards used across the entire monetization org; the underlying schema was templated and reused by an adjacent monetization surface — a peer team replicated the pattern.
How
Distributed SQL engines, real-time analytics platform, internal BI dashboarding tooling; cube-based pivot models for fast slice-and-dice across team / stack / incident-severity / pillar dimensions.
Result
Established a repeatable playbook for reliability data products across the org and informed planning and roadmap decisions on where to invest in reliability work.
No. 03

Designed and Bootstrapped a Cross-Domain Semantic-Model Platform

What
Co-led adoption of an internal semantic-model platform that powers natural-language analytics agents across multiple risk and compliance domains. Bootstrapped domain-specific semantic models, authored an end-to-end scaffolding skill, and shipped the customer-facing semantic-model registry dashboard with grades, freshness indicators, and privacy guardrails.
Scale
Semantic models covering multiple top-level risk domains for an org responsible for billions of user interactions; the production-driven evaluation pattern was adopted as the team standard.
How
Semantic-model registry, LLM evaluation framework (production-driven evals, fuzzy-match graders), LLM-driven analytics agents, dimensional modeling, server-side dashboard framework.
Result
Registry dashboard launched as the canonical landing surface for semantic models in the org; production-driven eval pattern reduced regression risk and gave model owners actionable per-grade visibility.
No. 04

Modernized an Abuse / Scraping Detection Pipeline

What
Owned the modernization of the inference pipeline behind a large-scale abuse-detection (scraping) system: schema-evolved the unified session funnel, reduced compute footprint, and built the operational monitoring views that the on-call team uses day-to-day.
Scale
Hourly scoring workflows over high-volume session data covering the full session funnel of a flagship consumer social platform.
How
Distributed SQL engines, data pipeline orchestrator, container orchestration entitlements, and multi-page operational dashboards with story-driven layout.
Result
Cut hourly scoring memory footprint by ~70% (200 GB → 64 GB per task), isolated the workflow with a dedicated entitlement, and shipped a 6-page ML-pipeline monitor giving on-call end-to-end visibility in minutes.
No. 05

Built AI-Powered Analytics Agents for Risk & Compliance

What
Co-developed and presented multiple LLM-powered agents — a general-purpose analytics agent, a controls-maturity agent that turns complex risk data into actionable briefings for senior stakeholders, and a SQL-documentation agent — and presented them at internal AI learning events.
Scale
Used by analysts and engineers across a large risk-and-compliance org; demonstrated end-to-end pipelines from data discovery to knowledge scaling.
How
Retrieval-augmented generation, semantic models as a grounding layer, prompt engineering with reusable "recipes" / "cookbooks", and custom skills and slash commands integrated into developer workflows.
Result
Shipped agents from prototype to demo-ready; presented an EMEA deep-dive at an org-wide AI learning event. Pioneered a hackathon approach using LLMs to map critical user flows to engagement-weighted risk profiles — finding ~70% of regulated flows mapped to low-risk surfaces, informing the following half's planning.
No. 06

Operated Across Six Product Areas in Four Years

What
Took on data-engineering ownership in six distinct product areas during the Meta tenure — consumer-social (creator and relevance data), monetization business operations, monetization reliability, accessibility compliance, controls & compliance measurement, and abuse / scraping detection — repeatedly ramping into new domains, building the data foundation, and handing off cleanly.
Scale
Each area required understanding a different stakeholder set (creators, ad sales, compliance officers, integrity / abuse PMs, risk leadership) and a different data domain (engagement events, revenue signals, accessibility defects, control attestations, session funnels).
How
A consistent playbook: design-doc-first ramping, dimensional model definition, pipeline ownership, dashboard delivery, then on-call ownership of the resulting surfaces. Disciplined cross-functional partnership with product, engineering, legal / compliance, and program management.
Result
Trusted as a fast-ramping cross-domain DE; consistently positioned as the data partner brought in when a new measurement problem needs a foundation. On top of area rotation, took rotating on-call ownership for a privacy-and-risk data team (multi-week shifts, 25–45 tasks each, 0 incident escalations).
Ways of Working with AI

Operating principles

A deliberate investment in working with LLMs, not just on them — building the personal infrastructure, the team patterns, and the evaluation discipline that turn agentic tools into reliable colleagues.

01

Second-brain-first

Persistent, curated knowledge is the substrate for every useful agent. PARA-style personal workspaces, durable team brains, end-of-day digests. Agents grounded in well-tended memory feel like collaborators who remember the project.

02

Semantic models as the grounding layer

An LLM is only as good as the metadata it sits on. Domain definitions, dimension dictionaries, grade and freshness signals. The semantic model is the contract between humans and machines.

03

Production-driven evaluation

Evals against real production prompts and outputs, not synthetic toys. Fuzzy-match graders, golden-set regression checks, per-grade visibility. Ship the eval framework before the agent.

04

Agentic developer workflow

Operate primarily inside agentic IDEs. Author custom skills and slash commands for recurring tasks. Compose specialized agents — a researcher, a writer, a reviewer — and orchestrate them.

05

Recipes and cookbooks, not bare prompts

Capture working prompt patterns as reusable, parameterized recipes — versioned, named, shareable. A library of well-tested recipes beats one-off prompt engineering, especially for cross-functional partners.

06

MCP as the integration spine

Model Context Protocol servers as the durable interface between agents and the tool surface (warehouses, ticketing, docs, chat). Stable contracts at the boundary; agents stay portable across model changes.

07

Teaching is the multiplier

Internal AI learning days, deep dives, hackathons, live demos. Default to teaching, not hoarding — this is how one contributor's AI productivity gains become an organisation's.

Technical Skills

Tools of the trade

Languages & Frameworks

  • SQL — advanced
  • Python — pipelines, ML/LLM tooling
  • Typed server-side languages
  • Notebook environments

Infrastructure & Systems

  • Distributed SQL query engines
  • Large-scale columnar warehouses
  • Pipeline orchestrators (Airflow-class)
  • Real-time analytics platforms
  • Time-series monitoring systems
  • BI dashboarding (cube-based)
  • Dynamic config management
  • Monorepo VCS workflows
  • Hermetic build systems
  • Container orchestration

AI / LLM Tooling

  • Agentic IDEs (Claude Code-class)
  • Custom skills + slash commands
  • RAG pipelines
  • Semantic-model grounding
  • Production-driven LLM evals
  • MCP server design + integration
  • Multi-agent orchestration
  • PARA / second-brain workflows

Domains

  • Consumer-internet-scale data
  • Monetization reliability analytics
  • Accessibility compliance measurement
  • Abuse / fraud / scraping detection
  • Risk & controls measurement
  • AI-augmented analytics
  • Cross-functional partnership
Career Trajectory

Eighteen years, traced

'08–'19

Banco Central do Brasil — Data Engineer

Eleven years at Brazil's central bank. Delivered the Credit Risk Datamart ETL (Informatica, DB2, Control-M) processing 750M credit operations per month — in continuous production since 2012. Built a Master Data Warehouse integrating 220M citizen records and 1M geographic entities, real-time ETL flows that cut mainframe costs by 20%, a Python address-standardization algorithm across the 220M-record SSN base, and an ML default-probability model covering 85M borrowers. Authored datRprofile, an R package for automated data profiling.

'19–'21

Banco Central do Brasil — Analytics & Research Manager

Led the data work for Brazil's Financial Citizenship Report 2021, surfacing systemic gaps in financial inclusion and education. Coordinated a nationwide study on risky indebtedness that revealed 5M borrowers exceeding their repayment capacity. Automated the Financial Citizenship Index pipeline, computing results for all 27 Brazilian states from 85M borrowers, 40M low-income individuals, and 23M social-security contributors.

'21–'22

Banco Central do Brasil — Business Intelligence Manager

Owned the BI surface for Pix, Brazil's instant-payment scheme — 100M+ users, 1B+ transactions, BRL 500B/month in flows. Built dashboards, graphs, and reports used by the central bank's payments team and published Statistics of Retail Payments and the Card Market in Brazil. Relocated from Brazil to the UK in mid-2022 to join Meta.

'22–'23

Meta — Onboarding / Consumer Social Data (London)

Joined as a Data Engineer in the consumer social product org. Ramped on internal data infrastructure; contributed to creator-facing data products and analytics for the relevance and ranking surfaces.

'23

Monetization Business Operations — DE

Moved into the monetization business unit's data engineering team. Launched the org's first reliability and revenue-fluctuation insights dashboard suite, establishing a pattern that adjacent monetization teams adopted.

'24

Monetization Reliability Scale-up

Scaled the reliability and incident data products: hardened pipelines, built reliability cubes for engineering and management views, ran a large-scale documentation and codemod campaign across the data assets.

'25

Privacy & Risk / Accessibility — Senior IC

Moved to the Privacy & Risk org as the data-engineering owner for accessibility compliance measurement. Designed the dimensional model, pipelines, and central dashboards adopted as the source of truth for accessibility compliance reporting in H2 2025. Took on rotating on-call ownership.

'25–'26

Controls Maturity & AI Agents

Co-authored the v1 metrics for an internal controls-maturity measurement framework. Built leadership dashboards and an LLM agent that turn complex risk data into briefings for senior stakeholders. Presented an analytics-agent deep dive at an org-wide AI learning event (Mar 2026).

'26 →

Abuse Detection + Semantic Models (current)

Leads data engineering for an abuse-detection (scraping) modernization initiative and drives semantic-model adoption across the risk org. Built the V2 unified session funnel schema, modernized hourly scoring (~70% memory reduction), shipped the semantic-model registry dashboard, authored a reusable bootstrap skill.

Selected Talks & Internal Publications

On stage and on paper

"Analytics Agent Deep Dive — From Data Discovery to Knowledge Scaling"
Internal AI Learning Day · EMEA · March 2026
"Going Beyond Simple Prompts: Recipes, Semantic Models, and Cookbooks"
Internal AI Knowledge Sharing series · March 2026
"Controls-Maturity Agent: Turning Complex Risk Data into Actionable Intelligence"
Internal Risk Engineering Intelligence series · 2026
"Unified Session Funnel — Data Foundation Design"
Internal design doc · 2026
AI Hackathon co-presenter
December 2025
Working Style

How the work happens

Operates equally as a hands-on engineer and as a connector across product, engineering, legal/compliance, and program management.

Strong written communicator. Relies on design docs and one-pagers to align cross-functional stakeholders before building.

Comfortable ramping into new domains every 12–18 months — six product areas in four years.

Bias toward shipping in stacks of small, reviewable changes.

Advocate for AI-augmented developer workflows; early adopter of LLM-driven analytics tooling.