An Interview with Bojan Tunguz

Bojan Tunguz

Senior Machine Learning Modeler at Nvidia

Key Topics:Data Science, Machine Learning, Machine Learning for Tabular Data, Artificial Intelligence, GPUs

Location:Indiana, USA

Bio:

Bojan is a Machine Learning Modeler at NVIDIA. He has been working in Machine Learning and Data Science fields for seven years and has experience with real-world FinTech problems. He is a Quadruple Kaggle Grandmaster and is the first person to be ranked in the top 10 in all four Kaggle categories simultaneously.

How did you get to become an expert in your key topics?

When I first made a career switch from Physics to Data Science, I struggled for a while to find the best way to learn new marketable skills, and find a footing in his new professional world. Fortunately, relatively early on I discovered Kaggle, and it became the go-to place for me to acquire and improve my skills, showcase my success, and network with the relevant ML/DS practitioners. Eventually, my interests and work have moved to some high-impact real-world ML problems, in fields of genomics and large structured data.

What sub-topics are you most passionate about?

My interests have somewhat evolved over the years. For a while, I was really interested in NLP problems, but that field has become very hard to keep pace with. These days I am most passionate about exploring ML for tabular data. Despite being the “quintessential” DS/ML field, we have remarkably little understanding of why certain canonical approaches work as well as they do, and we have made very little progress here for years.

Who influences you within these topics?

Unfortunately, there is very little high-level research on the topic of ML for tabular data. I am still most influenced by the “classic” Kaggle problems in this space, and the top Kaggle Grandmasters who had made a significant impact here in the past.

What challenges are brands facing in this space?

Tabular data is still the most widely used data in the industry. However, setting up a proper tabular data problem for ML is a major challenge. It requires a lot of skill and oftentimes years of experience to even make a proper formulation of the problem. After the problem has been “solved” to the satisfaction of the major stakeholders, the next big challenge is to make the most out of the solution. This often means putting the modeling pipeline into production. Unfortunately, tools for the best tabular data modeling pipelines are still behind those that have been built for the Deep Learning pipelines.

What do you think the future holds in this space?

I believe that we are still only scratching the surface of what good algorithms for tabular data can deliver. I also believe that there will be a significant leap forward in the use of AutoML tools in this domain.

What brands are leading the way in this space?

AutoML startups like DataRobot and H2O probably have some of the best in-house expertise in this domain.

If a brand wanted to work with you, which activities would you be most interested in collaborating on?

Whitepapers, blogs, written interviews, social media promotion.

What are your passions outside of work?

I am a voracious reader, a health and fitness enthusiast, and am really into digital photography.

What would be the best way for a brand to contact you?

Email.

While you’re here, why not signup to our B2B influencer marketplace, MyOnalytica? MyOnalytica is the world’s largest B2B influencer marketplace. Influencers can sign up for free and create their own profiles which will be visible to Onalytica clients which include many of the world’s largest brands. Sign up now to showcase your expertise, influence & how you would like to partner with brands.