Architecting the backbone of data-driven success
I am a data engineer passionate about building robust and scalable data infrastructure for better data-driven decision-making. Currently at Ruan Transportation Management Systems, I design and maintain enterprise-scale data pipelines using PySpark and SparkSQL in Azure Synapse Analytics, with a focus on dimensional modeling, data ingestion and management, and data governance including data quality and lineage tracking through Microsoft Purview. I've also participated in several studies related to Social Determinants of Health to model and analyze over 20 million patient records across 80+ healthcare sites provided by NIH, as well as built data analytics infrastructure and Power BI dashboards for CIVCO Medical Solutions tracking critical data-driven business insights.
My passion lies in building robust data infrastructure and ensuring the data quality that not only solves immediate challenges but also scales to meet future needs, making data operations more efficient and reliable for everyone. I am also interested in using the data to find critical insights and solve real-world problems.
When I'm not analyzing datasets, you'll find me looking for good restaurants, enjoying a good cup of coffee while listening to music, or playing songs myself with guitar or piano.
Python, MS SQL, T-SQL, PostgreSQL, R, Java
Microsoft Azure, Azure Synapse Analytics, Microsoft Purview, Microsoft Fabric, Azure DevOps, Apache Airflow, dbt, DuckDB, Docker, Git, Tableau, Power BI, Microsoft Office, Palantir Foundry, TriNetX, SSMS, DBeaver
Pandas, NumPy, Matplotlib, Scikit-learn, SciPy, PyTorch, PySpark, SparkSQL, Delta Lake
ETL/ELT, Dimensional Modeling, Data Warehousing, Data Pipelines, Data Governance, Data Lineage, Data Quality, CI/CD, Machine Learning, Data Visualization, Data Cleaning, Statistical Analysis, A/B Testing
Korean (Native), English (Fluent), Japanese (Advanced; JLPT N2)
Problem Solving, Critical Thinking, Communication, Team Collaboration, Time Management, Adaptability
Ruan Transportation Management Systems
July 2025 - Present
University of Iowa
July 2023 - June 2025
CIVCO Medical Solutions
March 2025 - May 2025
MS in Data Science — Graduated May 2025
GPA: 3.93/4.00
BA in Computer Science — Graduated May 2024
BA in Psychology — Graduated May 2024
Minor in Japanese Language and Literature
GPA: 3.81/4.00
A comprehensive data pipeline that extracts, transforms, and visualizes complete Spotify listening history using modern data engineering tools including Apache Airflow, dbt, DuckDB, and OpenLineage for data lineage tracking.
Learn MoreAnalyzed 115M+ patient records to investigate racial disparities in diabetes care outcomes for visually impaired patients, revealing significant differences in CKD risk ratios and care standards across racial groups.
Learn MoreAnalyzed MLB Statcast data to identify key factors influencing home run attempts using bat speed, swing length, and advanced statistical modeling.
Analyzed UK e-commerce data to uncover retention trends and actionable strategies for improving new customer retention.
Learn MoreMadlock-Brown, C., Austin Lee, Seltzer, J., Solomonides, A., Mathews, N., Phuong, J., Weiskopf, N., Adams, W. G., Lehmann, H., & Espinoza, J. (2024). Racial Disparities in Diabetes Care and Outcomes for Patients with Visual Impairment: A Descriptive Analysis of the TriNetX Research Network. Research Square, rs.3.rs-3901158. View Paper → (Under review)
Des Moines, IA