Data & Analytics Engineer

Lei Zhao

Data and Analytics Engineer specializing in ELT pipeline design, dimensional data modeling, and modern data stack tooling. Built GA4 and AppsFlyer data infrastructure for products with 10M+ users and $5M+ monthly revenue. M.S. in Analytics, Northeastern University. Trilingual in English, Chinese, and Japanese (JLPT N1).

๐Ÿ”ง
dbt ยท BigQuery ยท Airflow
๐ŸŽฎ
Gaming Data
๐Ÿค–
LLM / AI Apps
๐ŸŒ
EN / ZH / JP N1
dbt BigQuery Airflow Snowflake Spark Python SQL Java GA4 ยท AppsFlyer Looker
๐Ÿ‘ฅ
10M+
Users in Pipeline
๐Ÿ’ฐ
$5M+
Monthly Revenue Tracked
๐Ÿ“ฑ
6
Live Apps Shipped
๐Ÿ—ฃ๏ธ
3
Languages (EN/ZH/JP)
About Me

Building reliable data infrastructure

I'm a Data and Analytics Engineer with deep expertise in ELT pipeline design, dimensional data modeling, and modern data stack tooling (dbt, BigQuery, Airflow). My background spans the full data lifecycle โ€” from event instrumentation and pipeline ingestion to transformation, modeling, and self-serve dashboards.

At Newga Network, I built and maintained GA4 and AppsFlyer data infrastructure supporting products with 10M+ users and $5M+ monthly revenue. I designed star schema dimensional models for player behavior, UA performance, and monetization โ€” and established A/B testing frameworks that improved D7 retention by 15%.

Currently completing my M.S. in Analytics at Northeastern University, while running an independent game studio where I own the entire data stack from instrumentation to mart-layer modeling. Open to Data Engineer and Analytics Engineer roles across the US and Japan.

dbt BigQuery Airflow Snowflake Spark Kafka AWS GCP SQL Python Java Looker Tableau GA4 AppsFlyer
๐Ÿ”ง

Pipeline Engineering

End-to-end ELT pipelines with dbt, Airflow, and BigQuery โ€” incremental models, snapshots, data quality tests, and full lineage.

๐Ÿ“

Data Modeling

Star schema dimensional modeling for player behavior, UA, and monetization. Deep expertise in gaming data architecture.

๐Ÿค–

AI / LLM Integration

Built LLM-powered applications using LangChain, GPT-4o, FAISS, and Streamlit. Daily AI tools user (Claude, Cursor).

๐ŸŒ

Global Markets

Trilingual in English, Chinese, and Japanese (JLPT N1). Experience across US, China, and Japan markets.

Portfolio

Data Engineering & Apps

End-to-end data pipelines, analytics infrastructure, and live products I've built and shipped.

Data Engineering Projects
โš™๏ธ

Gaming Analytics Pipeline โ€” dbt + BigQuery + Airflow

End-to-end ELT pipeline built on real GA4 and AppsFlyer data from live mobile game titles. Covers the full modern data stack from raw event ingestion to business-ready marts.

Architecture: GA4 & AppsFlyer โ†’ BigQuery (raw) โ†’ dbt staging โ†’ dbt intermediate โ†’ dbt mart โ†’ Looker Studio dashboards

Features: Incremental models ยท Snapshots (SCD Type 2) ยท Custom generic tests ยท dbt docs & lineage ยท Airflow DAG orchestration ยท Data quality monitoring
dbt Cloud BigQuery Airflow GA4 AppsFlyer Python SQL Looker Studio
๐Ÿค– Live

AI E-Commerce Chatbot โ€” LangChain + GPT-4o + FAISS

LLM-powered conversational chatbot for e-commerce product discovery. Built with RAG architecture using FAISS vector store for semantic search over product catalog data.

Architecture: Product catalog โ†’ FAISS vector index โ†’ LangChain RAG โ†’ GPT-4o โ†’ Streamlit UI

Features: Semantic product search ยท Conversational memory ยท Streamlit interface ยท Deployed end-to-end
LangChain GPT-4o FAISS Streamlit Python RAG
๐Ÿ“Š Live

Game Analytics Infrastructure โ€” Newga Network

Designed and maintained the full analytics data infrastructure for mobile games serving 10M+ global users. Owned everything from event instrumentation to executive dashboards.

Architecture: GA4 & AppsFlyer events โ†’ BigQuery โ†’ dimensional models โ†’ Tableau / Looker dashboards

Impact: D7 retention +15% ยท ARPU +10-20% ยท Analytics turnaround -30% ยท Supported $15M+ acquisition
BigQuery SQL Python Tableau Looker GA4 AppsFlyer
Live Apps โ€” Independent Studio (CwGames)

I build and ship mobile games and AI apps independently โ€” and own the full data infrastructure for each title, from GA4 instrumentation to dbt-modeled analytics marts.

โ— Live SnaPet

SnaPet

AI Social App ยท Independent

A consumer social app where pets come alive through AI. LLM + computer vision give each pet a unique personality that responds to photos and chats.

LLM IntegrationComputer VisionGA4 Instrumented
โ— Live Poker Mahjong

Poker Mahjong

Casual Puzzle ยท Independent

A hybrid casual puzzle fusing poker and mahjong tile-matching. Full product ownership from design through live operations and analytics.

Product LeadCasualAppsFlyer + GA4
โ— Live 21 Saga

21 Saga

Casual Puzzle ยท Independent

Number-matching puzzle combining blackjack mechanics with satisfying tile-clear gameplay. Owned full product lifecycle from concept through analytics.

Product LeadCasualAppsFlyer + GA4
โ— Live Tile Voyage

Tile Voyage

Casual Puzzle ยท Independent

Zen tile-matching with island-building progression. Designed engagement loops and retention mechanics, tracked via full analytics pipeline.

Product LeadRetention DesignAppsFlyer + GA4
โณ iOS Review
๐ŸŽฒ

Mahjong Monopoly

Casual Strategy ยท Independent

Strategic mahjong fused with board game territory mechanics. Android live; iOS under App Store review.

Product LeadStrategyAppsFlyer + GA4
Analytics & Data Science
๐ŸŽฌData Storytelling

Cinema Analytics โ€” Multi-Stakeholder Perspective

Analyzed box office trends through three business lenses. Produced a film where each character visualizes the same dataset differently.

Business IntelligenceVisualizationStorytelling
๐Ÿ“ˆMachine Learning

Stock Price Forecasting & Investment Strategy

Time series modeling of AAPL and HON using ARIMA, regression, and moving averages. Compared simulated trading strategies with comprehensive performance analysis.

RARIMATime Series
โ†— GitHub
๐ŸŽฏStatistics

Sports Betting Strategy โ€” Statistical Modeling

Monte Carlo simulation and chi-square testing to find optimal betting strategies across best-of-3/5/7 series formats.

RMonte CarloProbability
โ†— GitHub
โ˜•Forecasting

Coffee Price Analysis โ€” Statistical Forecasting

EDA, ANOVA, and Lasso regression on coffee bean prices. Includes STL decomposition and MAPE-evaluated forecasting models.

RLassoForecasting
โ†— GitHub
๐Ÿ”„Machine Learning

Model Validation โ€” K-Fold Cross-Validation

Compared Logistic Regression, Random Forest, and SVM using K-Fold and Repeated K-Fold to investigate performance stability.

PythonScikit-learnCross-Validation
โ†— GitHub
๐ŸšฒDashboard

UK Bicycle Accident Analysis Dashboard

Interactive Looker Studio dashboard analyzing 40 years of UK bicycle accident data to surface policy-relevant insights.

Looker StudioDashboardTransportation
โ†— GitHub
Contact

Let's connect

Open to Data Engineer and Analytics Engineer roles in the US and Japan. Based in San Jose, CA โ€” open to relocation.

What I bring

I build reliable data infrastructure that teams can trust. From ELT pipeline design and dimensional modeling to self-serve dashboards โ€” I own the full data stack end-to-end.


10M+ users in pipeline ยท $5M+ monthly revenue tracked ยท 6 live apps shipped ยท Gaming domain expertise.


๐Ÿ“ San Jose, CA  ยท  โœˆ๏ธ Open to relocation  ยท  ๐ŸŒ EN / ZH / JP (N1)