Stop Calling Data the New Oil for AI - What really drive AI Success

Show transcript

00:00:00: Welcome to Designing AI Heroes, where AI and people align to drive productivity and innovation.

00:00:10: This is the podcast that empowers businesses and individuals to integrate AI into their

00:00:15: workflows and workplace, unlocking their full potential in the digital age.

00:00:21: We bring you insights, strategies, and real-world applications to help you be an AI hero and

00:00:27: stay ahead.

00:00:29: Let's dive in.

00:00:33: Welcome to another episode in the Designing AI Hero podcast, and we often hear that data

00:00:40: is a new oil when it comes to AI, but is this really true or is it just to catch your new

00:00:46: slogan?

00:00:47: Today, we are unpacking the myths and realities of data in the age of AI, and I'm very happy

00:00:55: that I can speak for the second time with my guest, Sofia Raffa.

00:00:59: She's really an expert when it comes to data in AI, and Sofia, welcome back.

00:01:06: And please, again, introduce yourself and also explain briefly why our today's topic

00:01:11: really matters.

00:01:13: So hi, Nadine.

00:01:14: Hi, everyone.

00:01:15: I'm Sofia Raffa, Digital Marketing and AI Consultant, a helping startup, enterprises,

00:01:20: and mid-sized companies to grow and with digitalization.

00:01:24: So today's topic is really matters because actually nowadays, people say that data is

00:01:31: the new oil, but if you think about that this whole quotation came from Clive Humble from

00:01:37: 2006, but it's still valid nowadays in 2025.

00:01:42: So why was oil valuable?

00:01:46: Because oil is actually once you use it, then it's great for using for cars, engines, and

00:01:54: infrastructures.

00:01:55: And this is also applies for data.

00:01:58: So the raw data is nice to have, but basically you can't do anything until you are refining

00:02:03: it and then it comes actually, and then it comes powerful.

00:02:09: So data is everywhere, but without management and some structure, you cannot really use it.

00:02:15: You can't see the value and you can't utilize it.

00:02:19: That's why refining the data and generating data pipelines and engines and governance and

00:02:25: then later on using in AI is the real magic.

00:02:29: So it's like compared data with the oil is like, raw data is like the crude oil, thick,

00:02:36: messy, and unusable actually, but once you refine it in order to use it, then it feels

00:02:44: innovations and the competitive edge.

00:02:47: So unlike oil, data is renewable.

00:02:50: So the more you use it, the more information you can get out of it.

00:02:54: So the analytical key is useful, but it can complete as a takeaway and data is more like

00:03:01: renewable energy than oil.

00:03:02: Exactly.

00:03:03: So you can use it every time it's not gone when you use it.

00:03:09: The energy is not away or transformed in something different.

00:03:14: Exactly.

00:03:15: So the once you can define in which context you are using the data, then the more value

00:03:21: you can also generate out of it.

00:03:25: From your experience, Nadine, because you are also a consultant when it comes to business

00:03:30: and AI, what distinguishes between good data from bad data when it comes to AI applications?

00:03:38: I think this is what I see in my clients and companies that the biggest mistake is when

00:03:44: it comes to the AI project, that they choose quantity over quality.

00:03:50: It's not collecting as much data as possible and as you can, but choosing identify the

00:03:56: high quality data you can use for AI project because when you don't have the right data,

00:04:02: it's like garbage in, garbage out.

00:04:04: Yeah, it's like prompting garbage in, garbage out.

00:04:07: So it's the same with data, if garbage data, you get garbage things out of your AI project.

00:04:14: So that's the biggest mistake what I see with companies that they say, "Oh, we need much

00:04:18: data as we can," and I say, "Okay, start small, identify one focus, and then build

00:04:25: from there, clean your data."

00:04:30: And many don't know where the data is coming from, from which system.

00:04:34: So really, they have identified, "Okay, this data is there, this data is there."

00:04:39: So they're different systems and then it's the challenge how to get out of, get out

00:04:45: the data of the system and then clean it so that you can use it for AI project.

00:04:50: So good data is clean, accurate, and consistent, relevant, and representative, you can say.

00:04:55: Absolutely.

00:04:56: And bad data is incomplete, also biased, outdated, or mislabeled, and it's really misleads when

00:05:02: it comes to AI.

00:05:04: Yeah, at this point, I just want to give a little overview about what means labeled

00:05:09: data, because many times people think, "Oh, I have tons of data, and yeah, this is social

00:05:14: data, and this is market research data, or this is service data," but labeling means

00:05:21: that you are giving tags for the data, so AI can also understand it.

00:05:28: And once it understands, the better results it's going to generate.

00:05:31: So that's what we call supervised data.

00:05:34: So the less labeling all these data gets, the less accurate can be the results.

00:05:40: So it means that there are also unsupervised machine learning methods to do this, but at

00:05:46: the end of the day, the machine is going to figure out some patterns instead of using

00:05:52: the already available accurate data, and then based on these assumptions, you are going

00:05:58: to get the outcome.

00:06:00: So specifically in my area in marketing, we have tons of data everywhere.

00:06:05: So from CRM, social data, customer reviews, all types of data, and some of them is labeled,

00:06:14: some of them not.

00:06:15: So that's why there is a really good machine learning technique, the pseudo labeling, which

00:06:21: means that we use a small amount of labeled data, and then we generate labels for the

00:06:29: unlabeled data based on this one.

00:06:31: So that's a really good way how you can use 90% of your data for something meaningful

00:06:38: and useful.

00:06:41: And when it comes to data usage on handling data, what do you think are the biggest mistakes?

00:06:48: Do you have anything from your experience?

00:06:51: Oh, yeah.

00:06:53: So if you think about that, we are here middle of Europe in Germany, and the GDPR is the

00:07:00: most strength here.

00:07:01: So we have to do double opt-in in order to define, okay, which user wants what type of

00:07:08: data.

00:07:09: Many times what I hear from marketing people, please don't touch it.

00:07:13: And then, okay, but then why do you collect it?

00:07:17: So so many companies are collecting data, and then they just simply say, oh, please don't

00:07:21: touch it because it's super sensitive.

00:07:24: But if you don't use it, if you don't generate something meaningful, if you don't know where

00:07:30: the data comes from, and I'm referring back to your point, that data mapping and inventory,

00:07:36: so where is my data, where does it come from, and where do I save it?

00:07:42: If you don't know this, then you are going to be in a big trouble.

00:07:47: And also, there are so many times, that's why because the teams do not really want to

00:07:52: do too much with this, so to say sensitive data, then the data is dirty.

00:07:57: So it means that there are tons of duplications, there are missing fields, outdated records.

00:08:04: And then we also must talk about the legacy data and the legacy tools because nowadays,

00:08:12: people especially at bigger companies, corporations, they tend to forget that, oh, we have some

00:08:17: other database, which no one really touches, but still somehow our data flows there.

00:08:25: So that's also a really big issue because then you don't have the whole overview about

00:08:30: all your things.

00:08:32: So AI cannot really broken all the data foundations, but if your data house is messy, and if you

00:08:40: don't know where is your data, then AI just scale and express the mess faster.

00:08:47: Yeah.

00:08:48: I also see what you said, that there's a lot of fear of using data because, oh, we have

00:08:53: two people, it's maybe private, so we don't know if it's private, so we don't touch it.

00:09:00: And what I also see, you have data silos, there are different departments with a lot

00:09:08: of good data, and you use in one AI project, but they don't share because they have fear

00:09:15: to share it because I don't know what you're doing with my data.

00:09:19: But in AI project, it's breaking the silos down in companies, and if we really think

00:09:25: about, okay, one department has this data, the other department has this data, and it's

00:09:30: only available for this project when we combine it and bring it together in one AI project.

00:09:37: That's also what I see in companies with a fear.

00:09:40: Yeah.

00:09:41: So this is a kind of change management on every level.

00:09:45: So change management when it comes to...

00:09:47: business ethics, change management when it

00:09:51: comes to business processes,

00:09:53: and change management with behaviors.

00:09:56: So say that yeah,

00:09:59: these are our data and then we should do something with this,

00:10:04: and then make some utilization.

00:10:09: But talking about data ethics, Nadine,

00:10:13: what do you think, how important is

00:10:15: data ethics in development and deployment of AI systems?

00:10:19: It's absolutely crucial.

00:10:21: Without AI ethics, AI will backfire.

00:10:24: So the risk that we see the data can be just don't say,

00:10:30: it's okay when we identify the right data,

00:10:33: but just don't use it without thinking.

00:10:36: First, the data should have a purpose.

00:10:39: You should know what decision it makes,

00:10:42: or what you would decide with the data.

00:10:45: That's one thing.

00:10:46: So then start with one focus, with one area,

00:10:50: and then the problem is the risk that the data can be

00:10:53: biased or also leads to unfair decisions.

00:10:58: Maybe prefer a decision over

00:11:02: another decision but decision should be neutral,

00:11:05: and AI can only make.

00:11:07: So people say, AI is biased,

00:11:10: the AI system is not biased, the data is biased.

00:11:12: Exactly.

00:11:14: Yeah, the technology is bad or biased,

00:11:16: but the data is biased.

00:11:18: So you really have to identify if the data is

00:11:20: neutral and can make neutral decisions you would like to make.

00:11:24: Then it also can hurt privacy.

00:11:27: That means you lose a lot of trust of

00:11:29: your customer when they find out that you use

00:11:32: private data for another purpose.

00:11:34: So customers give your data also or employees give

00:11:38: the data for a given purpose.

00:11:41: In Europe here, you have to tell people why

00:11:44: you collect the data and what purpose it has.

00:11:48: Then also this data is only for this purpose,

00:11:52: and they use it for another purpose,

00:11:54: then you can use a lot of trust.

00:11:56: Exactly.

00:11:57: So you have to have a right strategy how to use the data.

00:12:01: Exactly. I also see a really big need for explainable AI.

00:12:07: That's also part of the EU AI Act,

00:12:10: so which means that the AI decisions must be

00:12:14: transparent and must be understood by all of us,

00:12:19: so the human people and not only the robots and everything.

00:12:22: So it's like a credit scoring,

00:12:25: AI should explain why a loan was denied,

00:12:28: not just to give a number that you are not good

00:12:31: enough to get whatever bank loan.

00:12:33: So with that explainability,

00:12:35: AI is really like a black box and it can really destroy trust.

00:12:40: If you don't check, what is the outcome and why did you get all this outcome?

00:12:45: Yeah. When it comes to AI agent,

00:12:47: an AI agent can make a decision itself.

00:12:49: So maybe can also decide who to hire,

00:12:53: who to fire or deploy a marketing campaign itself,

00:13:01: and that's what you already think,

00:13:03: where should the human be in loop when it comes to decision?

00:13:08: Can the AI decision make itself?

00:13:11: Is it all is too risky,

00:13:15: then you should have an overview,

00:13:17: and when you let the AI make decision,

00:13:19: then you need a really strict data governance and compliance that you can avoid these risks.

00:13:28: Yeah. How do you see Nadine,

00:13:31: how companies are implementing data governance,

00:13:34: and how do you see it works currently?

00:13:38: It's very different.

00:13:40: It's from bigger companies to smaller companies.

00:13:47: I see really in bigger companies and also it depends on the industry.

00:13:52: So I also work with industry,

00:13:55: they really when it comes to AI act,

00:13:59: they have really restrictions when it comes to AI act like

00:14:03: the insurance or the finance industry or insurance industry,

00:14:07: where we really have personal data of

00:14:11: people for high class insurance or high risky insurances.

00:14:16: So data governance really means having our accountability,

00:14:22: and ownership, and rules of management data,

00:14:25: and it really comes to the industry.

00:14:28: So the industry really there is, okay,

00:14:30: we are aware, we have really private data,

00:14:32: we need a strict governance and their industry,

00:14:35: so they experiment a bit more.

00:14:37: But without the governance,

00:14:39: you also get chaos, duplication on our trust in data.

00:14:43: So you need somebody who says,

00:14:45: okay, we have an AI project and these are

00:14:48: the steps to clean and structure and refine

00:14:50: our data because people don't know how to do that.

00:14:53: So an example framework would be who owns data.

00:14:58: That's one question.

00:14:59: The second question is how is data created,

00:15:02: cleaned and shared, the policies,

00:15:05: privacy, security, and compliance rules,

00:15:08: and then the platform that use the data,

00:15:12: how secure is the data,

00:15:13: and how the data is used in the platforms.

00:15:15: This is really an expert role,

00:15:19: I think in companies, they should have it.

00:15:22: It's a new role when it comes to AI,

00:15:24: that you have a data governance role,

00:15:27: a people who has a data governance and

00:15:30: explain people what to do with their data.

00:15:33: So it's really a new role,

00:15:36: and when it comes to bigger companies,

00:15:38: I see they now implement these rules,

00:15:40: or it's an AI governance,

00:15:42: they take over this role,

00:15:44: and the smaller the companies,

00:15:46: they struggle to maybe this is the CAO.

00:15:48: But there should be one responsible person,

00:15:50: it can be the CAO or it can be

00:15:53: an AI governance or a data governance.

00:15:55: >> Yeah, I totally agree with you.

00:15:57: But also at the other hand,

00:15:59: what I see that many times,

00:16:00: so many users, so many employees says,

00:16:03: "Oh, I rather not touching it,

00:16:06: not to have any issues."

00:16:08: But I think data governance person or

00:16:10: a responsible person should also encourage

00:16:13: people to use it, but in a certain way.

00:16:16: In a certain way, how you can generate value with this,

00:16:19: and also somehow promote the data-driven culture,

00:16:23: that the decisions are based on accurate data,

00:16:26: which brings also things forward.

00:16:30: I think that's also a really important point

00:16:33: at the whole data process.

00:16:35: >> Yeah, and I also see,

00:16:37: "Oh, we don't do this AI project

00:16:39: because we are fed of our data."

00:16:41: >> That shouldn't be the case.

00:16:44: Just take some time,

00:16:46: identify the sources, clean it,

00:16:48: check it, and then really start small with

00:16:52: one focus and build from there.

00:16:55: But don't be afraid to deploy AI projects

00:16:59: because you are fed of your data.

00:17:01: I mean, if you collected it,

00:17:02: it has a purpose.

00:17:04: When the AI delivers this purpose,

00:17:07: you collected the data for,

00:17:08: then I think not that many things can happen.

00:17:12: >> Exactly, I totally agree with you.

00:17:15: Nadine, last but not least,

00:17:17: if you would give companies just one piece of advice

00:17:21: about how to work with data and AI,

00:17:25: then what would you say?

00:17:27: >> Start what you have,

00:17:29: learn from that, improve along the way.

00:17:33: Never wait for perfect data,

00:17:35: just experiment and learn fast.

00:17:38: Also treat data as a product,

00:17:41: not as a by-product of AI.

00:17:43: It's really the most important thing in AI.

00:17:46: It's not a by-product or we need some data

00:17:49: to get the system running because

00:17:50: the system is running out of this data,

00:17:52: so it's a product.

00:17:54: >> Exactly.

00:17:55: >> There are more takeaways than one takeaway,

00:17:58: but it's really difficult to break it down to one sentence.

00:18:04: >> Yeah, I totally understand,

00:18:06: and we have also tons of experience with everything,

00:18:09: so it's just like taking only one piece of advice is challenging indeed.

00:18:15: I would also say that AI is a colleague,

00:18:18: and use it as a robot colleague,

00:18:22: and then also share some data,

00:18:24: but also keep in mind that that's only a colleague,

00:18:28: so not your second body or something.

00:18:31: That's also really important point here.

00:18:34: >> Sofia, many thanks for this very interesting episode,

00:18:40: and I hope it could give you some insight when it comes to data,

00:18:43: what you have to keep in mind,

00:18:45: what to do and what risk it have,

00:18:48: but the message is really identify one focus,

00:18:52: one area, clean your data,

00:18:54: build from there, learn and improve along the way.

00:18:57: I think that's the main message.

00:18:59: >> Exactly. Let's make things sophisticated together.

00:19:02: >> Sophisticated, I like this.

00:19:03: >> Okay. See you. Bye.

00:19:05: >> Thank you.

00:19:06: >> Ciao.

00:19:07: >> Ciao.

00:19:33: [BLANK_AUDIO]

New comment

Your name or nickname, will be shown publicly
At least 10 characters long
By submitting your comment you agree that the content of the field "Name or nickname" will be stored and shown publicly next to your comment. Using your real name is optional.