Skip to content

Data Scientist/ Data Analyst

Master of Data Science (Monash) graduate proficient in deep learning, machine learning, with hands-on experience training models and building interactive dashboards. A collaborative team player who is continually learning new tools, I am driven by a passion for leveraging data to build innovative solutions that serve communities.

MelbourneMonash UniversityData Science
Search projects
Filter by tag

Projects

Illicit Content Detection — LLM vs Classical

Unified text-classification pipeline comparing BERT/Llama/Gemma against SVM/NB baselines. Reproducible CLI, YAML configs, tests, CI.

2025
  • Binary & 40-class setups with class weights & robust splits
  • BERT fine-tuning + PEFT stubs for Llama/Gemma
  • Packaging, unit tests, GitHub Actions
NLPClassificationLLM

Mental Health & Happiness — Kaggle (Reg + 5-class)

Reframed competition work: OOF stacking, F1-based selection, CatBoost/XGB/LGBM baselines, tidy notebooks.

2025
  • Ordinal-aware classification with threshold tuning
  • OOF stacks + Optuna-ready configs
  • Reproducible environment & CI
KaggleRegressionClassificationEnsembling

Youth Offender Dashboard (R Shiny)

Interactive choropleth + time-series + gender/age bubble + screen-use heatmap. Clean state-name harmonisation & guardrails.

2024
  • sf + leaflet + plotly, pre-simplified ASGS shapes
  • Data-health banner & causality disclaimers
  • Fast reactivity via leafletProxy
Data VizShinyGeospatial

Big Data Fraud Detection — Spark

PySpark pipelines: explicit schemas, feature engineering (L1/L2/L3 actions), RF vs GBT, ROC/AUC, KMeans with silhouette.

2024
  • Spec-compliant SparkConf + ≤16MB partition bytes
  • No Pandas in core ETL; MLlib-only modelling
  • Model persistence for data streaming
SparkClassification

Big Data ETL & EDA

Exploratory analysis + schema-first ingestion for retail data. Clean joins, QA checks, and reproducible visuals.

2024
  • Typed schemas, null-safety, profiling
  • EDA figures and automated summaries
  • Notebook as report; code modules for reuse
SparkData Wrangling

Advanced Wrangling — Retail Transactions

Robust cleaning of semi-structured sources, feature derivation, and validation tests.

2024
  • Great expectations-style checks (lightweight)
  • Column lineage & tidy transforms
  • CSV → Parquet pipeline
Data Wrangling

Experience

Market ResearcherEY Sweeney
Sep 2024 – Present
  • Conduct face-to-face interviews and surveys across Melbourne, collecting high-quality primary data, directly supporting client delivery and insight generation.
  • Ensure adherence to data integrity and accuracy in a fast-paced environment; managed data using market research tools to prepare clean datasets for analysis.
  • Collaborate with cross-functional teams to meet data collection targets, demonstrating a consulting mindset in meeting client needs.
Data Analyst — Senior AssociateShopee
Jul 2021 – Jul 2023
  • Executed end-to-end analytics, from data sourcing and data warehousing query optimization (SQL) to advanced quantitative analysis (Python) to provide data-driven decision-making support for new initiatives and business problems.
  • Developed performance reporting and predictive analytics (e.g., forecasting) using Power BI dashboards for senior stakeholders, supporting continuous monitoring and optimisation of product performance.
  • Acted as a key analytical partner, managing stakeholder management across multiple teams to roll out projects and recommend strategic, system-related solutions based on deep data insights.
  • Collaborated with multiple stakeholders to roll out projects, manage timelines, and recommend system-related solutions based on data insights.
  • Achievement: Led a P&L initiative that saved 45% of total team cost (more than 500K USD) and increased profit by more than 1M USD in 2 months; awarded A+ (top 5% of company) in 2022
Business Development — Management TraineeDHL Supply Chain
Sep 2019 – Jun 2021
  • Communicated with customers to have a deep understanding of their problems
  • Analyzed customers' volume to find out missing information and clarified confusing information, upholding high standards of data review.
  • Achievements: Won Nestlé distribution project (3M USD value); won Unilever warehouse project (1M USD value).

Education

Master of Data Science (with Distinction)
Monash University
Aug 2023 – Jul 2025

GPA 3.6/4; Dean’s List S1 2024; Research: Fine-tuning LLMs for illicit content detection on online marketplaces

B.A. Economics
Colorado State University
Aug 2017 – Dec 2018

GPA 3.8/4; Dean’s List S1 2018; Scholarship for excellent academic performance

B.A. International Economics
Foreign Trade University
Aug 2014 – Jul 2017

Certifications

Australian Federal Police Acknowledgement of Contribution to AI for Law Enforcement and Community Safety
Australian Federal Police
2025
View credential
AWS Certified Machine Learning Engineer - Associate
Amazon Web Services
2025
View credential
Google Analytics
Google
2018
View credential

Publications

Fine-tuning LLM to detect illicit content on online marketplaces
Quoc Khoa TranInternational Conference of Natural Language Processing 2026
2025

Research project fine-tuning large language models such as Llama 3.2 and Gemma 3 for detecting illicit product listings in online marketplaces. Has been accepted for oral presentation and publication at International Conference of Natural Language Processing 2026.

About

Hi, I'm Quoc Khoa Tran (Kevin Tran), a data scientist who loves working with numbers and turning data into stories that make sense. Outside of work, I spend a lot of time at the gym, I am a beginner tennis player always trying to improve my swing, and I enjoy traveling whenever I get the chance.

Interests: AI & ML systems, LLMs for text, Spark pipelines, and human-friendly analytics.

Tech I use
  • Python (pandas, PySpark, scikit-learn; PyTorch, TensorFlow)
  • NLP: Transformers (Hugging Face), tokenizers, PEFT
  • Data engineering: Apache Kafka, Spark Streaming, Snowflake (basics)
  • Visualisation: Power BI, Tableau, R Shiny (leaflet, sf, plotly)
  • SQL, dbt basics, data modeling
What I’m looking for

Data Science / ML Engineer internships and grad roles. Open to collaborations on applied NLP and analytics.

Contact