Bing O'Dowd

Bing O'Dowd

Data Scientist

In my PhD research, I was able to identify a model that cut labor and supplies costs by eliminating 31% of the possible molecules to focus on, allowing the team to focus on more promising molecules. At Dow AgroSciences, I built software to improve the data collection and data cleansing pipeline so that updates to the database could be made weekly and automatically rather than the previous way, manually and annually. In a project, I found that the choice of metrics by the organizers of a machine learning competition metric was flawed, and I used alternative metrics to generate a model that had 11% improved precision and 85% improved recall than a model that looked better by the competition metrics.

$ 90

USD per hour

3

years experience

3

hours available weekly

0

hours worked

WORK AVAILABILITY

Contract work Contract projects
Contract work Full-time employment

SKILLS

Experience

University of Illinois at Urbana-Champaign - Research Assistant

Ruled out 31% of possible molecules to synthesize by developing a generalized linear model to direct synthetic chemistry efforts and reduce time and financial costs Developed generalized linear model using cheminformatic techniques and PCA to identify important molecular features (Python, RDKit, Sci-kit learn) Discovered new class of bacterial enzyme, new compound activity, and demonstrated previously unknown mechanism of action of a class of compounds against cancer cell lines

DrivenData "Pover-T" (Kagle-style competition)

Finished top 9% (>2000 competitors) in binary classification contest with imbalanced targets and obfuscated features Improved log loss by ~67% from benchmark Random Forest models using single XGBoost models Developed a model with improved precision (0.11) and recall (0.85) despite higher log loss than my best model by identifying flaw in competition organizer's choice of evaluation metric

Dow AgroSciences - Data Scientist

Developed data cleaning pipeline to speed up data ingestion process, from annual and manual updates to weekly and automated updates Sped up (from 1 week to <4 hours) 3D molecular similarity searches by automating and building one interface to three separate programs, and by parallelizing the computation across multiple processors Diagnosed bottleneck in molecule standardization pipeline, and identified problematic molecules with a Random Forests classifier