I am a Senior Data Engineer and Researcher at GenomaSUS — Ministério da Saúde, working on genomics and precision medicine data infrastructure. I am a PhD Candidate at PPGCI/UNESP researching the integration of Large Language Models (LLMs), Knowledge Graphs, and Agentic AI architectures for semantic retrieval of biomedical scientific literature — with focus on multi-agent retrieval systems, bilingual controlled vocabularies (MeSH/DeCS), RAG pipelines, and multi-dimensional evaluation frameworks applied to biomedicine. I am also a Professor at UNIMAR, teaching Machine Learning Applied, Deep Learning, AI Algorithms. I hold an MSc in Information Science (UNESP) and an MBA in Data Science and Analytics (USP).
My work sits at the intersection of data engineering, NLP, and biomedical informatics from production-grade genomic data pipelines and cloud infrastructure to agentic LLM workflows, RAG systems, and ML models applied to clinical and genomic data.
- biophenotype-rag — RAG application for question answering over biological and genomic phenotype data, combining information retrieval with LLM-based response generation.
- neuroPredict-precision-medicine-system — Precision medicine system integrating clinical, genomic, neuroimaging, and literature data via Knowledge Graphs, Deep Learning, and LLMs to predict treatment response in refractory epilepsy.
- diabetes-clinical-etl-pipeline — Public health data engineering pipeline for Brazilian SUS Diabetes Mellitus data (ICD-10 E10–E14): collection, standardization, validation, integration, and visualization.
- healthAPI-quality-assurance-framework — QA framework for health APIs with Grafana, SonarQube, and automated test pipelines.
- cardiovascular-ai-detection-project — Supervised ML pipeline for cardiovascular risk prediction using structured clinical data, feature engineering, model interpretability, and an interactive application.
- ml-pathogenic-genomic-variants — ML-based classification of genomic variants following ACMG/AMP guidelines (American College of Medical Genetics).
- precisionPsychiatry-precision-psychiatry-clinical-decision-support — Prototype AI/ML system for clinical decision support in psychiatry, integrating genetic, neurobiological, and psychosocial data.
- rna_seq_dac_project — Differential gene expression analysis of RNA-Seq data from patients with Coronary Artery Disease (CAD) using R.
- analise-de-fenotipos-com-R — Phenotypic data analysis in health using R (Shiny, Plotly, Random Forest) to identify patterns in physical characteristics and clinical biomarkers.
- genotrack-genomic-phenotypic-data-validation-system — System for validation and visualization of genetic and phenotypic patient data for genomics studies.
- cliniccare-medical-clinic-management-system — Clinic management system (scheduling, electronic health records, financial control, and interactive reports) built with Dash and Plotly.
- health-tracker — Application for monitoring health indicators, BMI calculation, and visualization of metric history.
- world-marathon-run-majors-analytics-challenge — Complete Data Engineering, Analytics, ML, and Streamlit dashboard pipeline on the Abbott World Marathon Majors (Tokyo, Boston, London, Berlin, Chicago, New York) covering 628,000+ runner records across the 2018–2025 seasons.
- counter-strike-call-of-duty-analytics-challenge-kaggle — Data Engineering, Analytics, and ML pipeline analyzing competitive performance patterns in CS:GO and Call of Duty (Kaggle competition).
- star-wars-data-visualization — Data visualization project exploring characters, species, and relationships in the Star Wars universe using various visualization techniques.
- credit-decision-LLM-RAG-platform — Enterprise platform for automated credit decision-making using LLMs and RAG, with risk assessment, decision trails, and audit support for financial institutions.
- hybrid-llm-text-detection — Hybrid ML/NLP approach for detecting LLM-generated text, inspired by the Kaggle LLM Detect AI Generated Text competition.
- acmr-rag-rename-mbausp — MBA thesis (USP/ESALQ): information retrieval system based on LLMs and RAG applied to the Brazilian RENAME essential medicines list, using embeddings, vector databases, LangChain, and RAGAS evaluation.
- llm-zoomcamp — LLM Zoomcamp coursework: OpenAI API, HuggingFace, Elasticsearch, vector search, embeddings, data ingestion with Mage, and monitoring with Grafana.
- med-neo4j-graphq — Experimentation with Neo4j and GraphQL for medical knowledge graph applications.
- ifood-data-governance-pipeline — Complete Data Governance solution with LGPD compliance, data quality, traceability, and an interactive Streamlit dashboard (Airflow, dbt, Pydantic, Redis).
- youtube-2025-data-pipeline — End-to-end pipeline for YouTube 2025 performance metrics (AWS S3, PostgreSQL, Airflow, dbt, Metabase).
- spotify-data-pipeline — Full-stack data engineering solution connecting to the Spotify Web API, storing data in PostgreSQL, transforming with dbt, and delivering insights via Metabase.
- cnpj-data-pipeline — ETL pipeline for Brazilian Federal Revenue public CNPJ data (Mage.ai, AWS S3).
- dock-financial-data-pipelines — Automated pipeline for Dock financial balance reports using Airflow, SFTP, AWS S3, and Lambda.
- redshift-to-s3-unload-dag — Airflow DAG for daily automated data export from Amazon Redshift to S3 in Parquet format.
- datamart-tables-data-type-validation — Data Engineering solution to validate column data types in PostgreSQL DataMart tables (Mage.ai).
- airflow-kpi-insertion-pipeline — Automated collection and insertion of Data Warehouse KPIs (transaction time and storage usage) using Apache Airflow.
- global-emissions-pipeline — Modular ETL framework for greenhouse gas emissions data from ClimateTrace and World Bank APIs.
- transaction_fraud_prevention_pipeline — Fraud detection and prevention system combining ML, business rules, and statistical analysis with a real-time monitoring dashboard (TensorFlow).
- machine-learning-zoomcamp — ML Zoomcamp coursework covering regression, classification, neural networks, and deployment end-to-end (scikit-learn, TensorFlow, XGBoost, Docker, AWS, Kubernetes).
- machine-learning-2025 — ML Zoomcamp 2025: regression, classification, feature engineering, ensembles, deep learning, and deployment with FastAPI, Docker, AWS, and Kubernetes.
- mlops-zoomcamp — MLOps Zoomcamp coursework: MLflow, Mage, Flask, Prometheus, Evidently, Grafana, Prefect, Terraform, GitHub Actions — experiment tracking, deployment, monitoring, and CI/CD.
- mlops-zoomcamp-project-paris-price-house — End-to-end MLOps project for Paris housing price prediction.
- vercel-app-mlops-zoomcamp-project-paris-price-house — Vercel-deployed web application for the Paris housing price prediction MLOps project.
- ai-dev-platform — Full-stack platform for tracking, managing, and demonstrating AI tools and code agents in software development workflows.
- ai-dev-tools-zoomcamp-2025 — AI Dev Tools Zoomcamp 2025: using AI tools to write better and faster code.
- stocks-analytics-2025 — Complete pipeline for equity analysis: financial data ingestion, feature engineering, time series modeling, and automated trading strategy simulation.
- stock-markets-analytics-zoomcamp — Stock Markets Analytics Zoomcamp: financial data extraction, Pandas/TaLib analysis, time series modeling, backtesting, and pipelines with Python, SQLite, and Airflow.
- nathfinance-sistema-controle-financeiro — Personal finance control system built with Dash/Plotly for interactive visualization of expenses, goals, and financial planning.
- cyber-risk-test — Cyber-Risk Sentinel RiskLab: study and experimentation in defensive cybersecurity, covering risk analysis, simulated vulnerabilities, security events, defensive controls, and mitigation plans.
- AWS-DMS-task-restart-and-status-checker — Python script for restarting and monitoring AWS DMS tasks using Boto3 and Mage.ai.
- DMS-CDC-task-status-validator — Automated monitoring and integrity validation of AWS DMS CDC tasks with SNS notifications.
- DMS-missing-or-duplicate-data-validation-script — Python script for detecting missing or duplicate data in AWS DMS replication tasks.
- S3-folder-cleanup — Automation for S3 bucket folder cleanup using Boto3 and Mage.ai, with logging and monitoring.
- airflow-tableau-ec2-maintenance — Airflow DAG for automated weekly maintenance of a Tableau server on EC2, with disk cleanup and SNS notifications.
- datamart-tables-data-type-validation — Data type validation for DataMart tables in PostgreSQL (Mage.ai).
- data-engineering-zoomcamp — Data Engineering Zoomcamp coursework: Docker, Terraform, BigQuery, dbt, Spark, Kafka, Kestra, Postgres, and Metabase.
- llm-zoomcamp — LLM Zoomcamp: RAG pipelines, vector search, HuggingFace, OpenAI API, Elasticsearch, and monitoring with Grafana.
- mlops-zoomcamp — MLOps Zoomcamp: experiment tracking, deployment, monitoring, and CI/CD with MLflow, Prefect, Grafana, and GitHub Actions.
- machine-learning-zoomcamp — Machine Learning Zoomcamp: end-to-end ML projects from regression to deployment.
- machine-learning-2025 — ML Zoomcamp 2025 edition with updated curriculum and FastAPI deployment.
- stock-markets-analytics-zoomcamp — Stock Markets Analytics Zoomcamp: financial analysis, time series, and trading automation.
- ai-dev-tools-zoomcamp-2025 — AI Dev Tools Zoomcamp 2025: AI-assisted software development workflows.



