Skip to content

Data Scientist/ Data Analyst

Master of Data Science (Monash) graduate proficient in deep learning, machine learning, with hands-on experience training models and building interactive dashboards. A collaborative team player who is continually learning new tools, I am driven by a passion for leveraging data to build innovative solutions that serve communities.

MelbourneMonash UniversityData Science
Search projects
Filter by tag

Projects

Illicit Content Detection — LLM vs Classical

Unified text-classification pipeline comparing BERT/Llama/Gemma against SVM/NB baselines. Reproducible CLI, YAML configs, tests, CI.

2025
  • Binary & 40-class setups with class weights & robust splits
  • BERT fine-tuning + PEFT stubs for Llama/Gemma
  • Packaging, unit tests, GitHub Actions
NLPClassificationLLM

Mental Health & Happiness — Kaggle (Reg + 5-class)

Reframed competition work: OOF stacking, F1-based selection, CatBoost/XGB/LGBM baselines, tidy notebooks.

2025
  • Ordinal-aware classification with threshold tuning
  • OOF stacks + Optuna-ready configs
  • Reproducible environment & CI
KaggleRegressionClassificationEnsembling

Youth Offender Dashboard (R Shiny)

Interactive choropleth + time-series + gender/age bubble + screen-use heatmap. Clean state-name harmonisation & guardrails.

2024
  • sf + leaflet + plotly, pre-simplified ASGS shapes
  • Data-health banner & causality disclaimers
  • Fast reactivity via leafletProxy
Data VizShinyGeospatial

Big Data Fraud Detection — Spark

PySpark pipelines: explicit schemas, feature engineering (L1/L2/L3 actions), RF vs GBT, ROC/AUC, KMeans with silhouette.

2024
  • Spec-compliant SparkConf + ≤16MB partition bytes
  • No Pandas in core ETL; MLlib-only modelling
  • Model persistence for data streaming
SparkClassification

Big Data ETL & EDA

Exploratory analysis + schema-first ingestion for retail data. Clean joins, QA checks, and reproducible visuals.

2024
  • Typed schemas, null-safety, profiling
  • EDA figures and automated summaries
  • Notebook as report; code modules for reuse
SparkData Wrangling

Advanced Wrangling — Retail Transactions

Robust cleaning of semi-structured sources, feature derivation, and validation tests.

2024
  • Great expectations-style checks (lightweight)
  • Column lineage & tidy transforms
  • CSV → Parquet pipeline
Data Wrangling

Experience

Market ResearcherEY Sweeney
Sep 2024 – Present
  • Face-to-face interviews & surveys across Melbourne; high-integrity primary data collection
  • Maintained clean datasets with market research tools; collaborated to meet collection targets
Project Management Office — Senior AssociateShopee
Apr 2022 – Jul 2023
  • SQL/Python analysis to inform initiatives; KPI design and Power BI tracking
  • P&L initiative saved ~45% team cost (> $500K) and increased profit by > $1M in 2 months; Top 5% performance in 2022
Category Management — AssociateShopee
Jul 2021 – Apr 2022
  • Owned revenue growth for Books & Automotive; subcategory targeting via weekly/monthly data
  • Book category grew 38% in average daily orders in 6 months
Business Development — Management TraineeDHL Supply Chain
Sep 2019 – Jun 2021
  • Customer discovery, volume analysis, and solution scoping across logistics opportunities
  • Supported wins incl. Nestlé Central distribution and Yeah1 Group transportation

Education

Master of Data Science (with Distinction)
Monash University
Aug 2023 – Jul 2025

GPA 3.6/4; Dean’s List S1 2024; Research: LLMs for illicit content classification

B.A. Economics
Colorado State University
Aug 2017 – Dec 2018

GPA 3.8/4; Dean’s List S1 2018; Scholarship for excellent academic performance

B.A. International Economics
Foreign Trade University
Aug 2014 – Jul 2017

Foundational coursework in international economics

Certifications

AWS Certified Machine Learning Engineer - Associate
Amazon Web Services
2025
View credential
Google Analytics
Google
2018
View credential

Publications

Using LLM to detect illicit content on online marketplaces
Quoc Khoa TranIn submission
2025

Research project fine-tuning large language models such as Llama 3.2 and Gemma 3 for detecting illicit product listings in online marketplaces. Currently under review for publication.

About

Hi, I'm Quoc Khoa Tran (Kevin Tran), a data scientist who loves working with numbers and turning data into stories that make sense. Outside of work, I spend a lot of time at the gym, I am a beginner tennis player always trying to improve my swing, and I enjoy traveling whenever I get the chance.

Interests: AI & ML systems, LLMs for text, Spark pipelines, and human-friendly analytics.

Tech I use
  • Python (pandas, PySpark, scikit-learn; PyTorch, TensorFlow)
  • NLP: Transformers (Hugging Face), tokenizers, PEFT
  • Data engineering: Apache Kafka, Spark Streaming, Snowflake (basics)
  • Visualisation: Power BI, Tableau, R Shiny (leaflet, sf, plotly)
  • SQL, dbt basics, data modeling
What I’m looking for

Data Science / ML Engineer internships and grad roles. Open to collaborations on applied NLP and analytics.

Contact