Data & AI Architect
Experienced architect specializing in agentic AI systems and large-scale data platforms. Deep focus on multi-agent architectures, real-time data pipelines, and cloud-native solutions โ combining data engineering foundations with cutting-edge AI to design systems that operate autonomously at scale.
Creating intelligent agents for data processing and analysis that combine traditional data engineering with autonomous decision-making capabilities.
Designing autonomous systems for data pipeline management that can adapt and optimize themselves based on changing requirements.
Integrating LLMs with traditional data engineering patterns to create next-generation data processing systems.
Developing custom data agents based on ADK samples, combining my data expertise with agentic AI capabilities to create intelligent data processing workflows.
As a thought leader in data analytics and AI, I actively contribute to the developer community through speaking engagements and knowledge sharing.
Demonstrated practical applications of Generative AI in BigQuery, showcasing how to leverage Colab Data Science Agents and BigFrames for advanced data analytics workflows. Explored the integration of AI-powered tools with BigQuery to enable data scientists and analysts to build intelligent data processing pipelines with natural language interfaces and automated insights generation.
Presented insights on Google Cloud's latest data analytics innovations from Next '25, focusing on AI integration with BigQuery and the crucial role of metadata in enabling AI agents. Covered specialized AI agents for various user roles, AI-assisted notebooks, and the BigQuery AI Query Engine's capabilities with both structured and unstructured data.
Explored practical techniques for performing real-time inference on streaming data using large language models (LLMs) and SQL. Demonstrated seamless integration of LLMs into existing application workflows, enabling real-time insights, predictions, and classifications directly within familiar SQL environments.
A powerful AI-powered data analysis system combining BigQuery with Google Agent Development Kit (ADK). Features multi-agent orchestration with specialized sub-agents for data retrieval, data science workflows, and BQML operations. Includes RAG corpus integration for BQML documentation and MCP protocol support.
A comprehensive tutorial for deploying MCP (Model Context Protocol) servers to Google Cloud Run, featuring a zoo animal database with interactive tools. Demonstrates modern AI integration patterns with cloud-native deployment.
Production-ready MDM solution with 5-strategy AI matching for batch processing and 4-strategy real-time matching for streaming. Features vector embeddings with Gemini, fuzzy matching, business rules, and AI natural language reasoning. Unified batch and streaming architecture with BigQuery and Spanner.
Comprehensive BigQuery Data Clean Room implementation with Analytics Hub integration. Demonstrates privacy-preserving analytics, BQML collaborative ML, and secure data sharing patterns with automated setup scripts for both DCR and DCX deployments.
Production-ready BigQuery tools and demos covering advanced analytics patterns. Includes FinOps cost optimization, geospatial routing, Places Insights competitive analysis, RLS/CLS security with Dataform, Firebase Analytics integration, Streaming CDC pipelines, and dbt migration workflows.
Curated collection of AI agent configurations, coding standards, and workspace architecture guides for multi-model agentic workflows. Includes OpenClaw workspace architecture guides for Anthropic and Gemini, Google-style coding standards for AI-generated code, BigQuery data science agent prompt libraries, and opencode configuration scripts.
Comprehensive solution for Spark integration with BigLake Metastore and Apache Iceberg, supporting both Dataproc and Docker-based deployments. Demonstrates hybrid cloud computing patterns for modern data lakes.
Enhanced fork of Google Cloud Platform's utility for identifying and rewriting common anti-patterns in BigQuery SQL. Added query grouping functionality and clustering optimization patterns for improved performance analysis.
Integration of Google Sheets as a data source for PySpark on Dataproc Serverless. Includes Airflow demo for scheduling notebook execution with three deployment options: PythonVirtualenvOperator, Vertex AI Custom Training, and Dataproc Serverless.
Comprehensive Dataflow examples for streaming Kafka data to BigQuery. Features multi-branch processing, Beam SQL aggregations, multi-stream joins, and both custom Java pipelines and Flex Templates for different deployment scenarios.
Demonstration of Apache Beam with standard BigQueryIO and Managed I/O for BigQuery operations. Showcases 8 pipeline patterns including BigQuery Iceberg and BigLake Iceberg table operations with automatic schema handling.
Complete real-time data pipeline solution from Pub/Sub to BigQuery using Cloud Run Functions. Includes data generation, streaming processing, and automated table management.
Python streaming pipeline from Pub/Sub to BigQuery using BigQuery Storage Write API. Features micro-batching, Pub/Sub metadata capture, and partitioned tables with DirectRunner and DataflowRunner V2 support.
Test infrastructure for diagnosing the Dataflow/BigQuery "Noisy Neighbor" throughput degradation pattern. Six rounds of testing across Pub/Sub and Kafka sources (Python + Java SDKs) โ 2.2 billion rows, 2.4 TB, 901k rows/sec peak, zero errors. Confirmed linear scaling and identified a shared Kafka consumer group as the root cause of production degradation. Exceeded the BigQuery Storage Write API regional quota and sustained it.
Automated one-command installation script for a complete development environment with NVM, Node.js, and Google's Gemini CLI. Streamlines developer onboarding for AI-powered workflows.
Agentic vision tool built as an OpenClaw skill, leveraging Gemini's native code execution sandbox for spatial grounding, visual math, and UI auditing tasks. Demonstrates OpenClaw skill architecture for vision-based agentic workflows.