Data Scientist/ Data Analyst

Master of Data Science (Monash) graduate proficient in deep learning, machine learning, with hands-on experience training models and building interactive dashboards. A collaborative team player who is continually learning new tools, I am driven by a passion for leveraging data to build innovative solutions that serve communities.

MelbourneMonash UniversityData Science

Resume LinkedIn GitHub Book an interview

Search projects

Filter by tag

Projects

Illicit Content Detection — LLM vs Classical

Unified text-classification pipeline comparing BERT/Llama/Gemma against SVM/NB baselines. Reproducible CLI, YAML configs, tests, CI.

2025

Binary & 40-class setups with class weights & robust splits
BERT fine-tuning + PEFT stubs for Llama/Gemma
Packaging, unit tests, GitHub Actions

NLPClassificationLLM

Code Notebook

Mental Health & Happiness — Kaggle (Reg + 5-class)

Reframed competition work: OOF stacking, F1-based selection, CatBoost/XGB/LGBM baselines, tidy notebooks.

2025

Ordinal-aware classification with threshold tuning
OOF stacks + Optuna-ready configs
Reproducible environment & CI

KaggleRegressionClassificationEnsembling

Code Notebook

Youth Offender Dashboard (R Shiny)

Interactive choropleth + time-series + gender/age bubble + screen-use heatmap. Clean state-name harmonisation & guardrails.

2024

sf + leaflet + plotly, pre-simplified ASGS shapes
Data-health banner & causality disclaimers
Fast reactivity via leafletProxy

Data VizShinyGeospatial

Code Notebook

Big Data Fraud Detection — Spark

PySpark pipelines: explicit schemas, feature engineering (L1/L2/L3 actions), RF vs GBT, ROC/AUC, KMeans with silhouette.

2024

Spec-compliant SparkConf + ≤16MB partition bytes
No Pandas in core ETL; MLlib-only modelling
Model persistence for data streaming

SparkClassification

Code Notebook

Big Data ETL & EDA

Exploratory analysis + schema-first ingestion for retail data. Clean joins, QA checks, and reproducible visuals.

2024

Typed schemas, null-safety, profiling
EDA figures and automated summaries
Notebook as report; code modules for reuse

SparkData Wrangling

Code Notebook

Advanced Wrangling — Retail Transactions

Robust cleaning of semi-structured sources, feature derivation, and validation tests.

2024

Great expectations-style checks (lightweight)
Column lineage & tidy transforms
CSV → Parquet pipeline

Data Wrangling

Code Notebook

Experience

Market Researcher — EY Sweeney

Sep 2024 – Present

Face-to-face interviews & surveys across Melbourne; high-integrity primary data collection
Maintained clean datasets with market research tools; collaborated to meet collection targets

Project Management Office — Senior Associate — Shopee

Apr 2022 – Jul 2023

SQL/Python analysis to inform initiatives; KPI design and Power BI tracking
P&L initiative saved ~45% team cost (> $500K) and increased profit by > $1M in 2 months; Top 5% performance in 2022

Category Management — Associate — Shopee

Jul 2021 – Apr 2022

Owned revenue growth for Books & Automotive; subcategory targeting via weekly/monthly data
Book category grew 38% in average daily orders in 6 months

Business Development — Management Trainee — DHL Supply Chain

Sep 2019 – Jun 2021

Customer discovery, volume analysis, and solution scoping across logistics opportunities
Supported wins incl. Nestlé Central distribution and Yeah1 Group transportation

Education

Master of Data Science (with Distinction)

Monash University

Aug 2023 – Jul 2025

GPA 3.6/4; Dean’s List S1 2024; Research: LLMs for illicit content classification

B.A. Economics

Colorado State University

Aug 2017 – Dec 2018

GPA 3.8/4; Dean’s List S1 2018; Scholarship for excellent academic performance

B.A. International Economics

Foreign Trade University

Aug 2014 – Jul 2017

Foundational coursework in international economics

Certifications

AWS Certified Machine Learning Engineer - Associate

Amazon Web Services

2025

View credential

Google Analytics

Google

2018

View credential

Publications

Using LLM to detect illicit content on online marketplaces

Quoc Khoa Tran — In submission

2025

Research project fine-tuning large language models such as Llama 3.2 and Gemma 3 for detecting illicit product listings in online marketplaces. Currently under review for publication.

About

Hi, I'm Quoc Khoa Tran (Kevin Tran), a data scientist who loves working with numbers and turning data into stories that make sense. Outside of work, I spend a lot of time at the gym, I am a beginner tennis player always trying to improve my swing, and I enjoy traveling whenever I get the chance.

Interests: AI & ML systems, LLMs for text, Spark pipelines, and human-friendly analytics.

Tech I use

Python (pandas, PySpark, scikit-learn; PyTorch, TensorFlow)
NLP: Transformers (Hugging Face), tokenizers, PEFT
Data engineering: Apache Kafka, Spark Streaming, Snowflake (basics)
Visualisation: Power BI, Tableau, R Shiny (leaflet, sf, plotly)
SQL, dbt basics, data modeling

What I’m looking for

Data Science / ML Engineer internships and grad roles. Open to collaborations on applied NLP and analytics.

Contact

kevintran031096@gmail.com LinkedIn Book an interview