|

Data Engineer

Architecting the backbone of data-driven success

About Me

I am a data engineer passionate about building robust and scalable data infrastructure for better data-driven decision-making. Currently at Ruan Transportation Management Systems, I design and maintain enterprise-scale data pipelines using PySpark and SparkSQL in Azure Synapse Analytics, with a focus on dimensional modeling, data ingestion and management, and data governance including data quality and lineage tracking through Microsoft Purview. I've also participated in several studies related to Social Determinants of Health to model and analyze over 20 million patient records across 80+ healthcare sites provided by NIH, as well as built data analytics infrastructure and Power BI dashboards for CIVCO Medical Solutions tracking critical data-driven business insights.

My passion lies in building robust data infrastructure and ensuring the data quality that not only solves immediate challenges but also scales to meet future needs, making data operations more efficient and reliable for everyone. I am also interested in using the data to find critical insights and solve real-world problems.

When I'm not analyzing datasets, you'll find me looking for good restaurants, enjoying a good cup of coffee while listening to music, or playing songs myself with guitar or piano.

Austin Lee profile photo

Skills

Programming Languages

Python, MS SQL, T-SQL, PostgreSQL, R, Java

Tools & Platforms

Microsoft Azure, Azure Synapse Analytics, Microsoft Purview, Microsoft Fabric, Azure DevOps, Apache Airflow, dbt, DuckDB, Docker, Git, Tableau, Power BI, Microsoft Office, Palantir Foundry, TriNetX, SSMS, DBeaver

Libraries & Frameworks

Pandas, NumPy, Matplotlib, Scikit-learn, SciPy, PyTorch, PySpark, SparkSQL, Delta Lake

Technical Skills

ETL/ELT, Dimensional Modeling, Data Warehousing, Data Pipelines, Data Governance, Data Lineage, Data Quality, CI/CD, Machine Learning, Data Visualization, Data Cleaning, Statistical Analysis, A/B Testing

Languages

Korean (Native), English (Fluent), Japanese (Advanced; JLPT N2)

Soft Skills

Problem Solving, Critical Thinking, Communication, Team Collaboration, Time Management, Adaptability

Professional Experience

Associate Data Engineer

Ruan Transportation Management Systems
July 2025 - Present

  • Engineered 20+ dimensional model schema additions across core data warehouse fact and dimension tables stored as Delta tables using PySpark and SparkSQL in Azure Synapse Analytics, enabling more granular operational reporting for business stakeholders
  • Identified and resolved critical data integrity issues including duplicate records, null metrics, incorrect aggregations, and bad key lookups across multiple production fact tables in Azure Synapse Analytics, restoring stakeholder trust in Power BI reporting
  • Researched and prototyped enterprise data governance capabilities in Microsoft Purview, developing REST API-based solutions for bulk metadata management and customized data lineage management
  • Developed end-to-end data pipelines for multiple new data products using Delta tables and Azure Synapse Analytics, including a client-specific data product and a third-party time tracking integration
  • Utilizing Azure DevOps CI/CD pipelines to manage and deploy data pipeline changes, ensuring consistent and reliable delivery of updates to production environments

Research Assistant

University of Iowa
July 2023 - June 2025

  • Participated in 3+ NIH-funded healthcare studies utilizing Palantir to analyze N3C data
  • Transformed 20M+ patient records using SQL and R, enhancing data analytics capabilities
  • Conducted healthcare disparity analysis using TriNetX analytics platform to identify key insights
  • Presented critical findings by visualizing data using Tableau and Excel to research team across 7 universities
  • Contributed to research analyzing patterns across 80+ healthcare sites, enabling identification of key risk factors for adverse COVID-19 outcomes

Data Analysis Consultant

CIVCO Medical Solutions
March 2025 - May 2025

  • Evaluated sales performance for CIVCO Medical Solutions to identify revenue drivers and customer patterns
  • Managed data using PostgreSQL through DBeaver to transform and clean the provided raw CRM and ERP data
  • Built Power BI dashboards integrating customer data, revenue metrics, and customer retention analytics
  • Discovered a significant business insight that their successful newly launched product has 86% expansion rate
  • Presented critical findings to their product and sales operation team

Education

Graduate Education

University of Iowa

MS in Data Science — Graduated May 2025

GPA: 3.93/4.00

Undergraduate Education

University of Iowa

BA in Computer Science — Graduated May 2024

BA in Psychology — Graduated May 2024

Minor in Japanese Language and Literature

GPA: 3.81/4.00

Projects

All-Time Spotify Wrapped

A comprehensive data pipeline that extracts, transforms, and visualizes complete Spotify listening history using modern data engineering tools including Apache Airflow, dbt, DuckDB, and OpenLineage for data lineage tracking.

Data Engineering Apache Airflow dbt DuckDB Data Lineage
Learn More

Racial Disparities in Diabetes Care

Analyzed 115M+ patient records to investigate racial disparities in diabetes care outcomes for visually impaired patients, revealing significant differences in CKD risk ratios and care standards across racial groups.

Data Analysis Healthcare Statistical Analysis Research
Learn More

CSAS 2025: Quantifying MLB Home Run Attempts

Analyzed MLB Statcast data to identify key factors influencing home run attempts using bat speed, swing length, and advanced statistical modeling.

Data Analysis Sports Analytics Statistical Modeling Machine Learning
Learn More

UK-ecommerce Retention Analysis

Analyzed UK e-commerce data to uncover retention trends and actionable strategies for improving new customer retention.

Cohort Analysis Retention E-commerce SQL
Learn More

Publications

Madlock-Brown, C., Austin Lee, Seltzer, J., Solomonides, A., Mathews, N., Phuong, J., Weiskopf, N., Adams, W. G., Lehmann, H., & Espinoza, J. (2024). Racial Disparities in Diabetes Care and Outcomes for Patients with Visual Impairment: A Descriptive Analysis of the TriNetX Research Network. Research Square, rs.3.rs-3901158. View Paper → (Under review)

Contact

Des Moines, IA