Data Scientist/ Data Analyst

Master of Data Science (Monash) graduate proficient in deep learning, machine learning, with hands-on experience training models and building interactive dashboards. A collaborative team player who is continually learning new tools, I am driven by a passion for leveraging data to build innovative solutions that serve communities.

MelbourneMonash UniversityData Science

Resume LinkedIn GitHub Book an interview

Search projects

Filter by tag

Projects

Illicit Content Detection — LLM vs Classical

Unified text-classification pipeline comparing BERT/Llama/Gemma against SVM/NB baselines. Reproducible CLI, YAML configs, tests, CI.

2025

Binary & 40-class setups with class weights & robust splits
BERT fine-tuning + PEFT stubs for Llama/Gemma
Packaging, unit tests, GitHub Actions

NLPClassificationLLM

Code Notebook

Mental Health & Happiness — Kaggle (Reg + 5-class)

Reframed competition work: OOF stacking, F1-based selection, CatBoost/XGB/LGBM baselines, tidy notebooks.

2025

Ordinal-aware classification with threshold tuning
OOF stacks + Optuna-ready configs
Reproducible environment & CI

KaggleRegressionClassificationEnsembling

Code Notebook

Youth Offender Dashboard (R Shiny)

Interactive choropleth + time-series + gender/age bubble + screen-use heatmap. Clean state-name harmonisation & guardrails.

2024

sf + leaflet + plotly, pre-simplified ASGS shapes
Data-health banner & causality disclaimers
Fast reactivity via leafletProxy

Data VizShinyGeospatial

Code Notebook

Big Data Fraud Detection — Spark

PySpark pipelines: explicit schemas, feature engineering (L1/L2/L3 actions), RF vs GBT, ROC/AUC, KMeans with silhouette.

2024

Spec-compliant SparkConf + ≤16MB partition bytes
No Pandas in core ETL; MLlib-only modelling
Model persistence for data streaming

SparkClassification

Code Notebook

Big Data ETL & EDA

Exploratory analysis + schema-first ingestion for retail data. Clean joins, QA checks, and reproducible visuals.

2024

Typed schemas, null-safety, profiling
EDA figures and automated summaries
Notebook as report; code modules for reuse

SparkData Wrangling

Code Notebook

Advanced Wrangling — Retail Transactions

Robust cleaning of semi-structured sources, feature derivation, and validation tests.

2024

Great expectations-style checks (lightweight)
Column lineage & tidy transforms
CSV → Parquet pipeline

Data Wrangling

Code Notebook

Experience

Market Researcher — EY Sweeney

Sep 2024 – Present

Conduct face-to-face interviews and surveys across Melbourne, collecting high-quality primary data, directly supporting client delivery and insight generation.
Ensure adherence to data integrity and accuracy in a fast-paced environment; managed data using market research tools to prepare clean datasets for analysis.
Collaborate with cross-functional teams to meet data collection targets, demonstrating a consulting mindset in meeting client needs.

Data Analyst — Senior Associate — Shopee

Jul 2021 – Jul 2023

Executed end-to-end analytics, from data sourcing and data warehousing query optimization (SQL) to advanced quantitative analysis (Python) to provide data-driven decision-making support for new initiatives and business problems.
Developed performance reporting and predictive analytics (e.g., forecasting) using Power BI dashboards for senior stakeholders, supporting continuous monitoring and optimisation of product performance.
Acted as a key analytical partner, managing stakeholder management across multiple teams to roll out projects and recommend strategic, system-related solutions based on deep data insights.
Collaborated with multiple stakeholders to roll out projects, manage timelines, and recommend system-related solutions based on data insights.
Achievement: Led a P&L initiative that saved 45% of total team cost (more than 500K USD) and increased profit by more than 1M USD in 2 months; awarded A+ (top 5% of company) in 2022

Business Development — Management Trainee — DHL Supply Chain

Sep 2019 – Jun 2021

Communicated with customers to have a deep understanding of their problems
Analyzed customers' volume to find out missing information and clarified confusing information, upholding high standards of data review.
Achievements: Won Nestlé distribution project (3M USD value); won Unilever warehouse project (1M USD value).

Education

Master of Data Science (with Distinction)

Monash University

Aug 2023 – Jul 2025

GPA 3.6/4; Dean’s List S1 2024; Research: Fine-tuning LLMs for illicit content detection on online marketplaces

B.A. Economics

Colorado State University

Aug 2017 – Dec 2018

GPA 3.8/4; Dean’s List S1 2018; Scholarship for excellent academic performance

B.A. International Economics

Foreign Trade University

Aug 2014 – Jul 2017

Certifications

Australian Federal Police Acknowledgement of Contribution to AI for Law Enforcement and Community Safety

Australian Federal Police

2025

View credential

AWS Certified Machine Learning Engineer - Associate

Amazon Web Services

2025

View credential

Google Analytics

Google

2018

View credential

Publications

Fine-tuning LLM to detect illicit content on online marketplaces

Quoc Khoa Tran — International Conference of Natural Language Processing 2026

2025

Research project fine-tuning large language models such as Llama 3.2 and Gemma 3 for detecting illicit product listings in online marketplaces. Has been accepted for oral presentation and publication at International Conference of Natural Language Processing 2026.

About

Hi, I'm Quoc Khoa Tran (Kevin Tran), a data scientist who loves working with numbers and turning data into stories that make sense. Outside of work, I spend a lot of time at the gym, I am a beginner tennis player always trying to improve my swing, and I enjoy traveling whenever I get the chance.

Interests: AI & ML systems, LLMs for text, Spark pipelines, and human-friendly analytics.

Tech I use

Python (pandas, PySpark, scikit-learn; PyTorch, TensorFlow)
NLP: Transformers (Hugging Face), tokenizers, PEFT
Data engineering: Apache Kafka, Spark Streaming, Snowflake (basics)
Visualisation: Power BI, Tableau, R Shiny (leaflet, sf, plotly)
SQL, dbt basics, data modeling

What I’m looking for

Data Science / ML Engineer internships and grad roles. Open to collaborations on applied NLP and analytics.

Contact

kevintran031096@gmail.com LinkedIn Book an interview