Solving the AI Diversity Problem with Paul van der Boor of Prosus Group/Data Science for Social Good (Podcast)

Show Summary

AI and Machine learning continue to receive tons of interest and scrutiny, especially in the DEI and talent management fields. That’s why we were thrilled to meet data science expert Paul van der Boor during a recent Firefly leadership training session at Prosus Group, a global consumer internet group and technology investor based in Amsterdam, NE. As a senior data science director there as well as a co-founder and board member of the Data Science for Social Good Foundation, Paul shared interesting insights and developments in the field that impact us all – whether we’re hiring and managing talent, applying for credit, seeking healthcare..the list goes on.

We knew we had to take this conversation out of the “Zoom Room” and into the broader light to encourage more understanding of the AI and machine learning field for those of us pursuing fairness and equitable treatment across all decision processes.

Transformational Moments in this Episode

How AI is currently used in talent systems – and its limitations
Putting checks and balances into the system – how do we define “fairness”?
Understanding data as a representation of people
The importance of transparency – knowing where & why AI is being implemented in systems
A call to action for the DEI community
A call to action for the AI/Data Science/Machine Learning community

Hear the Full Episode On:

SPOTIFY

APPLE

SUBSTACK

Resources

Data Science for Social Good Foundation

NeruIPS Conference
Grace Hopper Celebration of Women in Computing (Anita B.org)
Dr. Fei Fei Li – Sequoia Professor in the Computer Science Department at Stanford University, and Co-Director of Stanford’s Human-Centered AI Institute
Hilary Mason – Founder & CEO of Fast Forward Labs, Data Scientist in Residence at Accel Partners.

Full Transcript

Oana Amaria: So, Paul, thanks so much for jumping on. As I shared with you, our inspiration for this podcast is really because Jason and I get to meet all these amazing people in our sessions and I felt like, wow, we are so privileged to have this type of access. And so we wanted to be able to create this platform to really be able to share you with D and I enthusiasts or practitioners, people that may not necessarily have access to the type of insights that you have. And I really loved, I felt like in our session you kind of blew it up a little bit with your insights, because you took it to a different level that was just so refreshing and so interesting to us.

Oana Amaria: And so especially in the context of this intersectionality with AI and inclusive design, which I feel like often doesn’t get covered in D & I sessions. We talk a lot about talent and everybody goes straight to recruiting, but I think this really takes it to the next level, and so that to me is very, very exciting. So, if you could just introduce yourself, however you’d like to do that. I know you’re involved in a lot of things, not just in your day job. And then also what’s something that’s really cool that you’re working on? That I think people would be interested to hear.

Oana Amaria: Because I wonder if we have any concept of the AI field in general, and kind of what’s going on, so we’d love to hear from you.

Paul van der Boor: Yes, so, thanks first of all for having me, Oana and Jason, it was a great session we had together on D & I and I think these kinds of sessions and discussions where we bring together different fields are for me the most interesting because that’s the kind of discussions where we all stand to learn most. So, first of all thanks for that. Maybe for the quick intro.

Paul van der Boor: So, I’m Paul van der Boor, I’m currently based out of Amsterdam in The Netherlands. I am a senior director of data science at Prosus Group, and this is one of the largest tech investors globally. We predominantly invest in food delivery, educational technology, and many other different platforms. And I’m part of the AI team at Prosus, which we started a couple of years ago. And in that role, I basically help apply AI better, smarter, more responsibly throughout the group. But also what I still spend a lot of time on today is a foundation called Data Science For Social Good, which I got involved in back in the days when I was still in my PhD in the US at Carnegie Mellon. That was at that time a group of people that basically aimed to bring data science skills and capabilities to problems that didn’t have access to these kinds of skills and predominantly in the social good area, which basically means predominantly small government agencies, federal agencies, other government departments, school districts, NGOs, non-profits, that were working on meaningful and important problems that didn’t – especially when we started five, six years ago – didn’t have access to the data science talent to help them do their work better.

Paul van der Boor: And so, I’m still part of that organization, I’m a board member of the Data Science For Social Good Foundation. And under that we initially started with an educational program we called the Summer Fellowship, and that has since grown to various locations over the world and several other activities that we do under the foundation. And maybe 30 seconds on my background.

Paul van der Boor: So, I started in engineering, aerospace engineering, and then actually worked in that field for a little bit at Siemens in India many, many years ago. And then moved to work on my PhD in the US in engineering, spent a couple of years in consulting, worked on Data Science For Social Good, and then eventually moved to Prosus where I still am today.

Oana Amaria: Paul, I feel like you should have a cape to go with the Data Science For Good. It’s just amazing, I mean, it really is quite impressive. And especially we were sending back articles of what’s going on in the context of ethics and this is truly the new world that I think many of us don’t understand, and so the fact that we have amazing people like you working on this in itself is no small thing. So, you should get a cape, I think that would be… Just to go with all the…

Paul van der Boor: And many more along with me doing this work.

Oana Amaria: Yeah, for sure, for sure. Thank you for that.

Jason Rebello: So, Paul, I have a question for you. So, in our prototyping inclusion session, we talked about the pitfalls of bias in AI and what we know is that data reflects the social, historical and political conditions in which it was created, so artificial intelligence systems essentially learn based on the data that they’re given. And we see this bias play out in various different systems, like facial recognition software and even recruitment tools that taught itself to dislike women. For those of our audience who aren’t familiar with these types of examples, can you share some of the pitfalls of AI and how it’s being incorporated? And what you think are some of the greatest challenges in this work.

Paul van der Boor: Yeah, I’m happy to talk about that, although maybe I want to state upfront, I’m an optimist when it comes to the applications of data science and spend a lot of time thinking about how we can do data science well and better and for good impact, in a responsible way. That of course doesn’t mean there aren’t many examples, like some of the ones you mentioned, where indeed data science and a lot of the techniques around data science have been used in ways they shouldn’t have, and actually that list unfortunately is growing because the areas and the vast range of areas in which machine learning is being applied increases over time.

Paul van der Boor: So, yeah, there are many regrettable instances of use of data science and continue to be, and I think there’s lots of discussion that is going on in the community around how you deal with that and what the biggest areas are. But the way I try to look at that is actually when you think about whether it’s in the data that represents historic bias in the system, or whether it’s the lack of good training of the model, or the deployment, I like to think of it kind of as a system.

Paul van der Boor: So you don’t just take a model and put it in production and blame the model for being biased or then blame it on the data or even on the engineer that was involved and say that engineer had a biased intention. It’s the entire system. So there’s a decision process and whether that involves, let’s say, hiring a candidate, a good candidate for a role, there are many steps in that process. And that includes, it starts with, the job description, and how you formulate that.

Paul van der Boor: And you probably know there are tons of tools out there that actually help you write and formulate the job description in a way that maximizes a diverse pool of applicants. And those tools are AI based, some of them. And they look at how you phrase a certain part of the job responsibilities that would maybe appeal more to a man or a woman or different parts of the population, and making sure you do that in a way that you attract a diverse group of applicants which is of course the benefit of the hiring team.

Paul van der Boor: Then it has to do with the interview process, what are the processes that you follow, the questions you ask, how methodical are you about that, where is there room for bias to creep in there? And then eventually how you select the final candidates, and so on. So that’s an entire process, and if machine learning is part of that and the regrettable example, I think you’re alluding to the Amazon one where they trained a model on resumes and tried to figure out, okay, what were the predictors of success at Amazon?

Paul van der Boor: And then they found out that because of the vastly male population there, it ended up predicting gender as being a male as a higher probability of being successful, being hired. And so the model should never under any condition have full responsibility, full authority to decide on the candidate. First of all because it can’t, but also because there is this entire system that is the hiring pipeline, the practice of hiring somebody. And in that system, you can do lots of things using AI, also there are some things you can’t use AI for, for both good and bad.

Paul van der Boor: So being aware of that, where how you can use AI to hire the right candidates and thinking about it as a system and not just a model is very important I think.

Oana Amaria: Yeah, it’s so interesting you mention Textio and we talk about a lot of these different tools and being able to really leverage AI for good to your optimism, and what I loved in our session is you talked about how you in your reflection, what can you do to go in and create the parameters for things like fairness or equity. And it’s so important to know that people like you are working on this, and so I don’t know if you mind sharing again just some of your insights around how do we establish or how do we go back and validate for this? To be able to put those checks and balances into the system that you were referring to.

Paul van der Boor: Yeah. No, absolutely, and I think that’s a topic where I believe we’ve actually made quite a bit of progress in the recent years in terms of being able to, one, incorporate fairness and bias into machine learning applications but also building tools around making it easier to measure and quantify how biased or fair a model is or isn’t. And what we’ve learned in doing that, and we’ve done some work inside the Data Science For Social Good Foundation releasing open source tools such as Aequitas which allow data scientists to measure bias and fairness in their models is that it’s very, very important to define what kind of fairness we mean.

Paul van der Boor: I mean, depending on the application you might care about being fair across different types of populations. And thinking about how you want to protect the person that the model is going to affect, are you thinking about fairness across age or gender or ethnicity or all of the above, and how do you make those trade off in a very conscious way? Well, we now have tools that help us measure how fair a model is, because there are many definitions of fairness. And probably depending on the use case, those definitions are very, very different, and it can be about equal opportunity or equal outcomes or equal likelihood of getting some kind of intervention.

Paul van der Boor: Like when we talk about in Data Science For Social Good we do a lot of work on inspections related, so think about government agencies going out and inspecting certain facilities for waste handling or things like that, the likelihood of being inspected is obviously correlated to you getting found in violation of some kind of code of conduct or whatever. And so being fair along with the dimensions of getting caught, let’s say, is another way of measuring fairness.

Paul van der Boor: And so a lot of that part of the discussion happens outside of, let’s say, the modeling efforts. It has to do with, okay, again it’s a system, so with a system we’re either trying to figure out which students need extracurricular support because they’re at higher risk of failing, or which patients need some kind of intervention because they’re at higher risk of getting readmitted to the hospital, and so on. And each of those different situations will mean that you want to have a different definition of fairness to start with, and second they will also, which is really, really important in the case of machine learning because, as a sort of segue, no machine learning model is perfect.

Paul van der Boor: Never, there’s always going to be some mistakes. And so you need to think about in this fairness discussion, what are the costs of a mistake? So, if you make a mistake and you send, I don’t know, to the case I was making, if you make a mistake and shortlist a student for extracurricular support, after school support, that might actually not be too bad because they just get extra support, maybe the opportunity cost is that another student that really needed it didn’t get it. But it’s not harmful to the student that receives it, it’s harmful to the student that should’ve received it that didn’t get it.

Paul van der Boor: And so the inequality and the harms of the mistakes, thinking about false positives and false negatives which are terms that we use to denominate these mistakes, are really, really important to think through carefully because then if you think about credit decisions, for example, if you give somebody credit that shouldn’t get it or didn’t give them credit when they should’ve got it, those are very, very fundamental, and the costs, let’s say the harm that might be inflicted, translates very differently depending on the situation and so thinking about that and the definitions of fairness and then how we deal that as a machine learning data science community, we can’t do that on our own. It has to be in the context of the entire system.

Paul van der Boor: I think it’s something that comes up a lot is to say, okay, well, these data points are people and there’s no such thing as data, actually, data is a representation of somebody or something that happened to that person, and you’re trying to piece it together to predict what you want to do and how you want to help that person or sell that person something or whatever.

Paul van der Boor: And so, all of a sudden when it’s about my data, I start thinking about it differently. When my data is being used in Facebook in certain ways and I see it, hey, wait a second. And, I mean, I’m involved in this stuff in my professional world and then when I’m the subject of it it’s like, wait a second, it’s almost like you experienced it in different ways and we have to be careful not to sort of lose that perspective as data scientists.

Oana Amaria: It’s not fun to think of yourself as the product, right?

Paul van der Boor: Oh, not at all, no. It’s not good for us as data scientists because a lot of these data sharing issues that some of the companies are bringing to the front page make it harder for us to do our work right. I mean, listen, I work at a tech company doing data science, so somebody might say, “Is what you’re doing different?” Well, yes, it is. At least I’d like to answer that we’re trying to make it very different, so, yeah.

Oana Amaria: I saw this Tweet with the comparison of the kind of data that’s tracked with the iPhone versus WhatsApp.

Paul van der Boor: Yeah, it’s crazy.

Oana Amaria: The different nutritional list, version, right?

Paul van der Boor: Yeah.

Oana Amaria: And it’s tough, I have a lot of family in Romania and so one of things that keeps me on WhatsApp is like, what am I going to do? I’m not on Facebook anymore, but I am on WhatsApp, and it’s really interesting, because it ingrains itself in your life and then you have to do this full lift of this mature tree of efficiencies in your life that make it very hard. So it’s a tricky thing, right?

Paul van der Boor: Yeah, it’s absolutely tricky and I’ll also be honest that I have some school groups and whatever for my son that are still sticking around on WhatsApp, and I think everybody has that. And especially I have these discussions with my family and others not living in the US or whatever, and it’s like, “Well, I have nothing to hide, right?” They say, “So why should I be worried about…”

Paul van der Boor: It’s like, well, it’s not about that, it’s also about the fact that we have no choice, so Facebook is just one example, there’s others doing other things wrong. But you have no choice, what do you go to if you want to have an alternative for these products? There’s too much market power.

Oana Amaria: Then go to Signal, go to Signal, it’s open source.

Paul van der Boor: Yeah, that’s what I have, yeah. It works well, yeah. But, yeah, so, no, I think we could talk about it forever, I think that there’s just lots of little intricacies here. There’s one more example, by the way, which I thought for me was recent that was an interesting one on the perspective on the use of machine learning and how it’s so hard if you talk about me creating empathy for the user, it’s so hard for me to do that sitting in Amsterdam.

Paul van der Boor: So, we work with credit models, and we were discussing, okay, so there is regulatory requirements about how you’re open about some decisions that you’re making with regard to credit. And in my world, when you get denied credit in the Netherlands, you have the right to an explanation why you got denied credit. And that’s kind of how the regulatory system has been set up in Europe, so when you get denied credit, it’s sort of as a potential opportunity has been taken away from you, so that should be really justified and there’s this very strict process and blah blah blah.

Paul van der Boor: If you go to South Africa where we have also business, it’s completely different. Because what has happened in South Africa, banks actually have had in the past developed a practice that was basically getting people to buy their credit, knowing that they would default, knowing that they would then be able to seize their assets, knowing that they would then make a lot of money down the road because they could claim a whole bunch of collections based on that defaulted credit. And so the protection of the South African law on credit is much more around if you give somebody credit, you better know that they understood that they will not default, that you understood that they will not default, and that they also understood that all of the consequences of what happens if they do default.

Paul van der Boor: Because people aren’t so literate around credit, and so the practices were very different, and I think that translated into a very different, it’s like asymmetry. So, a cost of a mistake in Netherlands is around getting denied credit. The cost of a mistake in South Africa, on the model, is if you give somebody credit that shouldn’t have, so the false positive. And, anyway, these kinds of contextual things are very hard to sort of be able to guess if you’re not very, very in tune with the local practices and the system and whatnot, so.

Oana Amaria: It sounds like such a shake up, literally someone’s shaking you up for your money.

Paul van der Boor: Yeah.

Oana Amaria: That’s a fascinating example.

Jason Rebello: Paul, so, there’s so much to continue to unpack here. And I want to make sure, to your earlier point, that we don’t make AI out to be the boogeyman, so to speak. Can you share a little bit about the types of tools or even efforts in the AI field that are really making you hopeful and excited about just, A, the work that it’s doing in general, but also to your point, the ability for it to combat some of the very things that we brought up on our chat today?

Paul van der Boor: Yeah, no, and to my comment earlier, I’m also going to be the last to say that we don’t have a lot of challenges and a lot of room for pitfalls still, not least of all the fact that as a community we’re just starting to appreciate this, and I’m a little bit biased because I work with people that are thinking about this a lot but they’re still, the machine learning community out there that’s supplying a lot of this stuff in the real world, I think many practitioners just don’t have this on their daily, let’s say, list of things to think about.

Paul van der Boor: So there is still a lot of work to make sure this becomes a broadly accepted set of things to think about when we think about use of AI, fairness, transparency, privacy, those are the things that we’re working on, we’re making a lot of progress, but I think more broader support and awareness of that is really needed, and in fact, when we talk about diversity and inclusion, I think that’s a huge, if you look at the teams doing machine learning, there’s a lack of diversity and therefore inclusion.

Paul van der Boor: And the work you guys are doing and the session we did together to me was a reminder of how much we can still do on that front and in particular when we think about machine learning teams that are affecting many decisions of that product and people that use those products, it’s super important for those teams to be diverse, because then they can represent the voice of the users that are impacted of that. So, anyway, I just wanted to also say that I recognize there are a couple of very, very big challenges that we still need to overcome. Now, the things I’m excited about is that we’re also doing things in the community to make it easier and almost mandatory to think about these things. I’ll give you an example.

Paul van der Boor: So, NeurIPS, which is the biggest AI conference globally, brings together leading researchers that are working on AI research problems. This year it took place virtually, of course, but this was also the first year that it was required for all paper submissions to include a broader impact statement, so that the researchers writing the paper should also at least think about, okay, how could this model be vulnerable? How could it be misused? How could these techniques potentially have repercussions down the line?

Paul van der Boor: What data was it trained on? Because in machine learning, especially in research, we train a lot of the models on public datasets. And those are generated by somebody else. And often those datasets have, let’s say, a lineage, they were compiled of other datasets and other datasets. And in fact the face recognition issue you pointed out earlier, Jason, the reason that was an issue is because the data sets didn’t have the right representation for the target audience they were used on, in this case law enforcement.

Paul van der Boor: And that’s a huge issue, think about the cost of a mistake there. To our earlier point, if you target the wrong people or are unable to recognize people then that’s very, very harmful. And so as part of the NeurIPS submissions requiring people to think about that sort of forces all these academic researchers to start thinking about this now. This is not optional anymore. So we need to start thinking about how those models might have impacts on other things than just the direct research questions we’re trying to answer.

Paul van der Boor: But there are many other things. I think there are whole lines or groups of researchers working on privacy-preserving machine learning. So, how do you make sure that as you sample from your data, as you train the models, you test sufficiently, you basically have a good understanding of the representativeness of the training data versus the data you’re actually going to be using in real life. And there is a pretty big, I shouldn’t call it movement, a group of people who are now increasingly using synthetic data to basically train machine learning models, not only to train machine learning models, for all sorts of purposes.

Paul van der Boor: But the way synthetic data works, I’ll just also describe that, is basically it takes a version of data that, for example, describes your customers and the transactions they made or something like that, and is able to recreate synthetically a new dataset that has all the same statistical properties but doesn’t include real people. And you cannot then find those real people in that synthetic dataset, but that synthetic dataset can still be used to train machine learning algorithms.

Paul van der Boor: And so it also makes it easier to then share that dataset throughout the company, to say, okay, well, we’ve got this data about our customers but because of regulations of GDPR it makes it sometimes easier to actually learn from the different datasets or even access them because they’ve got private information, and so these synthetic datasets can now be shared to actually start training models in a way that doesn’t risk exposing the users’ data in the process.

Paul van der Boor: And there’s a whole range of these kinds of techniques that are now becoming available and mature enough for us as machine learning practitioners to actually start working with in a way that actually helps the owner of that data, or the person that’s describing that data.

Jason Rebello: Based on what you were saying, something that comes up in our trainings a lot is the impact of transparency. And it sounds like some of the themes that you’re talking about tie into the importance of transparency in how AI is implemented, where the humans are involved, they call it humans in the loop, where the humans are in the loop, who the humans are that are in the loop. I just want to give you an opportunity to kind of tease that out a little bit more and talk a little bit more about just your view on the impact and power of transparency in the AI space.

Paul van der Boor: Yeah. This is one that is easier to talk about than it is sometimes to work with, on a day to day basis, because for me, transparency in contrast to fairness and bias is harder to measure sometimes. It’s like if you think about responsible AI, there’s many things we think about in that context. So, transparency is one, bias and fairness is another, privacy is another one, explainability is another one, and there’s a couple of these themes.

Paul van der Boor: And some of them are much easier to talk about and agree on a conversation like this and say everything, transparency is important. And nobody would disagree with that, right?

Jason Rebello: Right.

Paul van der Boor: But then when you actually start working with people that are working on these kind of projects, then it’s like, okay, well, what does that mean? The same is true with explainability as an example. But in general I think just to comment on some thoughts regardless that it’s hard to measure and I think the practices are less mature in general because it’s hard to define as well, some things that come back, if I am a user that is experiencing a decision that was made by a model, first of all I want to actually know that a model is involved there.

Paul van der Boor: And then second of all I also want to know, let’s go to the credit case, well, what are the reasons that I didn’t qualify for that credit? So, was that because my income wasn’t high enough, or because I defaulted on some debt in the past? Or any of the other reasons you would typically be entitled, especially in credit, it’s well regulated, to sort of contest or figure out why you got denied a loan.

Paul van der Boor: So, in that space, it’s reasonably well, let’s say, regulated that the decisions of some of the models should be transparently available to the people on the receiving end of those decisions. But you also want to know, what else impacted my decision? Was the final decision made by an algorithm? Probably not, because it’s not allowed, so you probably had a human in the loop like you mentioned, Jason. So what did that human actually do? Did they get some sort of recommendation, like a score? A green, orange, red kind of thing, and then they had to based on some other information make a final decision?

Paul van der Boor: Or did they just sort of approve the decision of the model? So, thinking about that entire process, and these little things matter because if you kind of say binary yes or no, so, okay, the model says this person shouldn’t get credit or the model says this is the probability of default that we are able to give. Or the model says this is the probability of default that we give with this much confidence, these are the error bars in our decision, turns out we know nothing about Paul.

Paul van der Boor: So the machine gives you an answer, because it always will, but it’s equivalent to flipping a coin. So all these little things, if you think about transparency, based on what information was the decision made? How much uncertainty was involved? Who else was involved that, regardless of the model, let’s say, that might have affected that? So, again, you look at the system. And machine learning is an easy way to hide part of the decision making in a way, and I think that’s the risk that we run is that previously you had a set of processes and forms you fill out and then a human sort of made the decision, an expert or whatever, and then we kind of could live with that.

Paul van der Boor: Now there’s a next, additional, new thing which is this model, and people call it black box and sometimes it is a black box, really sometimes it isn’t depending on what technique you use, but still it might seem from the users point of view because they can’t access the model or contest the decision of the model. And so that’s where, let’s say, because now there’s a model as part of the decision process, the system, you want to know that, you want to know what the model does, what it learned from, how accurate it was, how confident it was, and so forth, and all these other things that would have, let’s say, a real impact or consequence on you on the receiving end of that model.

Oana Amaria: This is so interesting to me because just like you said with transparency or how you qualify fairness or how you qualify a lot of these pillars of the system, it’s similar to in our space, how do you define inclusion? What’s inclusive? And we go on and on about competencies or behaviors and identify behaviors and I think that’s actually super fascinating, because we don’t necessarily have it figured out yet.

Oana Amaria: And with the false positives, how many clients do we have that are like, “Look, we have such a huge trust score.” It’s like, yeah, but what does that mean? And who did you ask? And how did you ask? So I love your reference of systems, because that’s a huge part of our work and operating within that system just as an individual, your individual system, but also as a part of an organization, as a part of a team. And this piece around decision making, it’s kind of fascinating, not just within machine learning but also as human beings, we always want to know why.

Oana Amaria: And whether or not you get credit is one thing, and then what’s the decision process of me falsely getting incarcerated? I mean, taking it down the gamut of impact, to your point, I don’t want to go down that rabbit hole because I think that’s a whole other piece, but I did want to ask you because I thought that our conversation was so profound and it gave me a chance to reflect as a D and I practitioner and say I think people like Paul have a lot of great insights for us as practitioners, so one of my questions to you is what is your challenge for us or your request?

Jason Rebello: Your ask.

Oana Amaria: Your ask for D and I practitioners to say, “Hey, we need you to level up and get to this level because this is what it’s going to take to impact and to join this fight like all the other people that are part of this fight”, what would that be for you?

Paul van der Boor: That’s a really tough question, Oana. So, I was going to try and give you my perspective, which is probably, I’m not sure how much it’s worth given that that’s not my expertise. But I think maybe I would go back to the point that I mentioned earlier which is that because, let’s say, in the machine learning community we for a long time, and others by the way too, for a long time said, “Well, it’s the data that’s biased, and if only we fixed the data, then the model will be fine and everything will be fine.”

Paul van der Boor: And we learned and through many mistakes but also our recognizing hopefully as a broader community that machine learning is gaining traction, is becoming part of more and more important decisions like you mentioned bill, whether you get bill or not, whether you get treatment from some sort of medicines, all these different things that are much more than just, say, a recommendation engine on Amazon, let’s say.

Paul van der Boor: We recognize that we can’t just wave it away and say, “Okay, it’s the data that’s biased”, there’s a whole set of other factors that affect how good or bad a model can contribute to us, as a community and a society. And that means that we start to recognize other things that we need to fix like the lack of diversity in some of teams that developed these models, with respect, if you compare to the population that actually receives these models. And how do you actually get sort of organizational empathy for the people that you’re in the end affecting?

Paul van der Boor: Some of them are like me, many of them are not like me. And so how do I make sure that I test my model and the system around my model and design all of that in a way that I think meets the bar for a responsible use of AI. And it shouldn’t just be my decision, it should also be the users decision. They should have, again to your point, the transparency and the understanding to some extent and the ability to contest and to inquire about how that model is being used.

Paul van der Boor: So, I think my call would be, help us become more of a diverse and inclusive community. I think that’s the challenge that you probably still see a lot of resistance against, and that’s only the start. Because once you have a more diverse and inclusive community and teams working on these kinds of technical problems, hopefully down the line you also get better models and you end up doing better things in the world.

Oana Amaria: Yeah. I’m going to throw this out here, but I wonder, and we ask this question a lot in our sessions, what do you think is disincentivizing from getting that diverse team? And I know there’s a lot in the context of the pipeline, and I know we’re stealing from, “Did you do physics? Okay, well, we’ll take you into AI”, there’s very creative ways of sourcing some of that talent. But what’s working against you? Other than time and we’ve talked about bias for action and how there’s never enough time to fill roles, so…

Paul van der Boor: Yeah. Well, I think the first thing working against me is I’m a white male. So I think I’m not part of, let’s say, the diversity I want to attract to this field. I’m on the side that we have too many of already. And so but that’s not necessarily a problem, I think this is a truly complex problem, for me personally, I’ll be really honest about this. Because, and I mean, the problem of getting more diverse teams, and have you gone through so many cycles on this, one that says, well, the pool of people out there is not diverse enough.

Paul van der Boor: And that’s partially true, I mean, it’s about 15% of the data science professionals out there are women, depending on which country you look at and if you look at university graduates or whatever, but that’s more or less the number. So that’s not a good starting point. But it would also be too easy for me to say, well, that means I’m only going to go for 15%, because the other way to look at it is, well, listen, let’s say I’ve got a team of 10 people and I only need to find five good data scientists to make 50% of my team. Those are there. I mean, they’re definitely there, so what should I do to get those there?

Paul van der Boor: Should I make a more attractive proposition? Do all those things we talked about earlier, make sure their resumes are given appropriate screening, give it enough time. And I think there are many ways that you can actually do that and I’m definitely not the best one that can tell you how to do it, but I’ve seen teams, one of the fun things we get to do is look at startups in this space, and there are startups out there that basically from the get go, and I mean AI startups, commit to saying, “Okay, we want to have 50% diversity on gender.”

Paul van der Boor: And they’re able to do that. So I think it is a complex issue because, yes, there is a lack of, for example, gender diversity in the talent pool, but you can do things locally, and I mean locally in your company or in your own teams to beat the odds and then we also know that actually diversity is only the beginning. I learned that from you guys. That there’s also, once you have a diverse team, you still need to do lots of things to make sure that everybody on the team is heard equally, and that there is inclusion and that people feel belonging.

Paul van der Boor: All new words that I learned from you guys, what they actually mean, and that you can also measure them, there’s ways to measure those. So, and also, by the way, that diversity isn’t just on one dimension. There’s lots of dimensions you can measure diversity on, diversity of thought, and so forth.

Paul van der Boor: And so, I think you need to, one, recognize that there are some things you can’t affect out there, and that are playing against you, like the field, the talent pool is not diverse, on the other hand there are many things that you can do that are directly impacting your team or mentoring women around you or other kinds of minorities that are trying to get into the field, helping people making sure that you can, let’s say, if you take a longer run, let’s say I hope to be in this field still in 10 years, then how do I want the field to look in 10 years?

Paul van der Boor: Because if you take that horizon, all of a sudden you have a whole new set of options. Make sure you talk to people that want to get into the field, maybe inspire younger groups of women and girls to get into this area. So I think there are lots of different ways to look at it, again, I’m an optimist so I think you should do a little bit of everything. But at the same time it’s also a complex problem.

Oana Amaria: You know, it’s so interesting, it made me think of, I’m sure you’ve heard the organization Girls Who Code?

Paul van der Boor: Yeah.

Oana Amaria: And so it’s this whole initiative for the future programmers and future leaders, and it makes me think of, anyone that’s listening, what’s your version of that for AI? Not you particularly, Paul, but I feel like that sounds like an opportunity in the field to say how do we make it cool? Just like you have robotics competitions, let’s make this cool, let’s talk about the cape that I was saying before. You could be a true superhero in our world and what the future looks like.

Oana Amaria: And I think, I mean, my daughter is almost four, and I just think about 10 years from now, your 10 years, she’s a prime target audience for when you become impressionable enough to be interested in something like that. And to me that’s super fascinating.

Paul van der Boor: Yeah. I mean, and there are many examples in the field, not enough, but there are many examples of female role models that have done amazing things. Some of them started earlier like Fei-Fei Li, who was one of the pioneers in the field of computer vision. There are conferences like the Grace Hopper Conference For Females In Computer Science. So there are lots of these initiatives that in a way we also try to support by sending people from, for example, the Prosus team there.

Paul van der Boor: But with Data Science For Social Good that’s a good example where we had the luxury that we had always so many applicants. So as I mentioned, the Summer Fellowship, where people would come and spend three months with us and we’d work with them on real projects and teach them about how to do data science in a social good context, that we had so many applicants that we had the luxury to pick a diverse pool. And we could’ve actually said we want 50% men and women, we want a representative group of people geographically and ethnically and also from a educational background, so we had social scientists, computer scientists, mathematicians from different seniority, some PhD, some undergrads.

Paul van der Boor: And that’s also where my personal conviction comes from that having diverse teams gives better outcomes. And especially in something like machine learning and AI. And so making sure that we create those kinds of forum, whether it’s Grace Hopper or Data Science For Social Good or helping people recognize the female role models in the field, and there are plenty, really, really enough, making sure that we invite those to our conference and speak and giving them the visibility they recognize because they’ve been having to fight for years.

Paul van der Boor: Imagine someone like Fei-Fei Li or Hilary Mason, there’s a whole list of these women, they already accomplished in the field 10 years ago, what they had to go through to get there. And so I see how hard it is still today for some of them and I can only see part of it because I’m not experiencing it directly myself, there’s still a long way to go. And you still see, unfortunately, lots of unfortunate examples of that with the Google ethics team recently you might have heard of and things like that that sometimes are discouraging and make you feel like we’re regressing, we were making progress, two steps forward and one step backward.

Paul van der Boor: And so it’s definitely we’re not there and we need to keep doing that. Luckily I think for your daughter there is many female role models out there, if she wants to meet some I’m happy to make sure she gets to speak to them when she wants to.

Oana Amaria: I will take you up on that for sure. No, I so appreciate it and I thank you so much for all that you have shared with us and on a positive note I think even with the ethics and the reckoning that’s happening at Google, I think there is a benefit and the benefit is that you have a lot of people that have been maybe awakened or had to take a stand and make a choice on how they feel about that publicly. So if you look at her Twitter feed and if you look at just the conversations that it has ignited, I think that in itself is not a small thing.

Oana Amaria: So, as painful as that process may be, I think there is a benefit in creating awareness around what’s happening, even though it’s continuing. The story evolves. But I think there is a lot of hope in that too.

Paul van der Boor: Yeah, well, yes and no, I think indeed it does awaken people to the fact that these are issues, but it also sometimes we want to make sure that we don’t kind of look around and say, okay, well, Google has an AI ethics team and Microsoft has their fairness and transparency and ethics, AI team, and Facebook has whatever, and I don’t think Facebook even bother by the way. But anyway, so they have these groups, and they pretend to actually want to be part of the debate, but actually they’re not.

Paul van der Boor: I don’t want to say it’s guilt washing, I don’t think it’s that deliberate. But I do think that given that the disproportionate amount of in a way power that these companies have and sway in the community, they should be doing way more. And they also reap so much benefit from the community that, yes, I do think that this recent event actually does force people to recognize we’re not there yet, but I don’t think that’s enough. So I think we should aim for more.

Oana Amaria: Yeah. Fair enough. Awesome. Well, thank you, thank you so much, Paul.

Paul van der Boor: Thank you for having me, and great to speak to both of you again.

Jason Rebello: Thanks a lot, Paul.

Paragraph

LET'S CONNECT:

ABOUT FIREFLY:

CATAGORIES

Guides + Toolkits

podcasts

blogs

BROWSE

Podcast, Stories from the Field

February 19, 2021

Solving the AI Diversity Problem with Paul van der Boor of Prosus Group/Data Science for Social Good (Podcast)

Show Summary

Transformational Moments in this Episode

Hear the Full Episode On:

Resources

Full Transcript

Recent Posts

Stories from the Field

Beyond Labels: Driving High-Trust Culture (Podcast)

March 20, 2025

Stories from the Field

DEI in Crisis: Living Our Values When Leadership Loses Integrity (Podcast)

March 5, 2025

Conversation Starters

Crisis Reveals Culture: The True Test of Values-Driven Leadership

February 26, 2025

Blogs

PODCASTS

Guides + Toolkits

Categories

Subscribe to our newsletter

let's connect

let's connect