Exploring The Integration of AI & Blockchain
Show notes
Welcome to Beyond The Screen: An IONOS Podcast, hosted by Joe Nash. Our podcast is your go-to source for tips and insights to scale your business’s online presence and e-commerce vertical. We cover all tech trends that impact company culture, design, accessibility, and scalability challenges – without all the complicated technical jargon.
Our guest today is Evangelos Pappas, Head of Technology at IndiGG, an NFT-based play-to-earn game in the web3 frontier. Join us as we discuss the new frontier of integrating AI and blockchain technologies. Discover how these powerful tools align to revolutionize data management, cloud computing, and more. Evangelos also shares insights on data privacy, open access to data, and the future of AI. Join us as we discuss the following:
- The role of data in AI algorithms
- What are data lakes?
- Using AI strategies to manage data lakes
- A closer look at the integration of AI and blockchain technologies
- Managing data privacy in AI and machine learning models
- Tips for businesses looking to optimize their cloud data for AI applications
Evangelos has over 14 years of experience in IT startups, developing software, and leading and delivering B2B platform software. He has worked as a Director and CTO and delivered IT solutions for multi-billion dollar companies in the FTSE 250. His skill set includes experience in the AdTech, Energy, and Data Analytics industries. He is also the founder and CEO of ASEAN and Enterprise. His current role is Head of Technology at IndiGG.
Show transcript
Evangelos Pappas Transcript
Intro - 00: 00:01: Welcome to Beyond The Screen: An IONOS Podcast, where we share insights and tips to help you scale your business's online presence. Hosting genuine conversations with the best in the web and IT industry, and exploring how the IONOS brand can help professionals and customers with their hosting and cloud issues. We're your hosts, Joe Nash and Liz Moy.
Joe - 00: 00:22: Welcome to another episode of Beyond The Screen: An IONOS Podcast. Today we're joined by Evangelos Pappas, who brings with him over 14 years of experience in IT startups, software development and delivering B2B platform software. He has a rich history in leading and delivering solutions for tech-IP for numerous companies making significant contributions in AdTech, Energy and Data Analytics industries. Evangelos is currently the head of technology at IndiGG, a Revolutionary Organization aiming to create the world's largest Web3 Gaming DAO. Simultaneously, he's also the Founder and CEO of Ocyan, an enterprise working towards providing an end-to-end Cloud Operating System for enterprise blockchains. And if that isn't impressive enough, he has his own venture Evalonlabs, that focuses on developing scalable web platforms and messaging backends. As someone who's been on the front lines of technological innovation, his insights in how Cloud data should evolve, the new wave of AI will be invaluable. Evangelos, welcome to the show.
Evangelos - 00: 01:16: Hey Joe, thanks so much for the invitation and looking forward for the interesting chat.
Joe - 00: 01:20: Yeah, absolutely. And we went through a whole bunch of things on the intro. I love it when we get guests with big beefy intros, lots of things going on, because it leads me to ask your career spans, lots of roles currently and also in the past. That's a lot of balls you're juggling from Head of Technology into IndiGG, through to running your own company. What is your journey and what drove you towards this particular sector that you're at now?
Evangelos - 00: 01:40: So I joined IndiGG after they acquired pretty much Metanomic, which was the startup that I was at the beginning of the CPO. With the whole vision, I think IndiGG is to create the largest Web3 Gaming DAO in the place in India or with the aspiration for global. And when you want to do things like that, you know, that becomes your primary focus, you know, how you want to cohort your users, target them, understand, serve the right content to them. So I would say the whole core of everything is at the end of the day, data.
Joe - 00: 02:07: Yeah, interesting. And so you've worked in a lot of industries where, as you said, data is the core there from AdTech, Data Analytics, Gaming. What are some of the key moments or decisions that have shaped how you think about and handle that data?
Evangelos - 00: 02:22: As always, with all the technical answers, it's always it depends, right? Java is pretty much nothing but numbers, right? And every time you're trying to give meaning to the numbers that you have received, so the answer is what is the meaning that you are looking for in your use case? So you might receive the same data. And if you are a gaming company, you might want to understand why does your user spend 10 minutes on level one and 13 minutes on level two, for example. But if you receive the same data and you're a brand or an advertising company, based on the same data, you would say, oh, on level two I have three more minutes of attention. So what can I do with those three more minutes, right? So data, as I said, is always about perspective. And this is something that we try to do at Metanomic. Before is that we try to see if the perspective of how we create some behavioral profiles for the users, right? So not just that, but create cohorts, as pretty much usually happens on the AdTech space of what people usually like in terms of commercial. But what do people tend to do in terms of behaviors, right? So for example, does a user usually play at night instead of them playing in the morning? And if that's the case, maybe that's a user that probably likes to buy more Red Bulls rather than shoes. For example, so that's a lesson that I learned from data is you might be receiving the same data, but your perspective always depends on your use case.
Joe - 00: 03:38: Right. Very interesting. So I guess talking about part of the use case of that data there, I guess, is one that's very topical at the moment, which is the use of data for developing various automated algorithms or AI algorithms. How do you think the current Cloud landscape supports preparing and handling data towards building out those products and actually managing to create those created experiences?
Evangelos - 00: 03:58: Yeah, well, actually at the moment, we are seeing definitely pivotal moment for AI. So far, we have been delivering automation on data in the shape and forms of machine learning algorithms, which is pretty much an automation on machine learning solutions, because automate how you aggregate your data or how you co-authored your data or how you do very specific predictions and things like that. But now we are in a moment of not just doing statistical analysis on what you're receiving, but doing something intelligent. And what does intelligence means? Intelligence means something that you would have a passion at this much to give you this answer to Google Sets and give you more context about an answer that you have. So for example, there's a peak in your traffic, and you can definitely see that this kind of analysis. But why there's a peak in your traffic? And AI will tell you, well, today is a peak of your traffic because there's an event. happening, right?
Joe - 00: 04:50: It's the interpretation of the data as well, right?
Evangelos - 00: 04:52: Exactly. So this is the next step in terms of solutions and in terms of how people are handling this data at the moment. We are quite mature in being able to do the first level of what I described, like statistical analysis and being able to get easy answers from the data that we are capturing. But the next step that we will see as a trend is much of the latter, like how do you use your data to receive more intelligence to your use cases?
Joe - 00: 05:18: I imagine the Cloud and the access to the compute that you need is a big driving factor in your capabilities. And to what extent you're able to achieve this intelligence. How are you finding that the current landscape is defining or supporting where you want to take those products?
Evangelos - 00: 05:32: Currently, I would say this is the primary challenge on any work that you need to do with data in a Cloud. It's actually the number one question. The main reason is in order to do any kind of these two-year-old solutions, you have to do them where your data is at, right? Where are they physically? Because obviously, you cannot easily have your interpretation algorithms in one Cloud or in one place and have your petabytes of data in another.
Joe - 00: 05:56: Yeah, just give me one minute while I transfer a petabyte of data between two Cloud providers. That's not a thing.
Evangelos - 00: 06:01: Exactly. Probably a date. So pretty much, there's a lot of vendor law or a vendor fear, I would say, about if I upload my petabytes of data in this Cloud provider, probably I'm quite stuck. Not because of the technology, but because I have a certain large commodity that if, for example, Azure buys in five years from now a better AI, in the next dozen, or Google or whatever else has a better product, I'm going to start with less capable solutions while I have the right data to have achieved something better. So yeah, this is literally a fear that all the cheap technology data or national offices have.
Joe - 00: 06:33: I mean, obviously when you're dealing with data at large, the storage of it is in itself so inordinately complex, but it's really interesting just to think of like, as you said, those years down the line, your capability has been defined by the simple decision of where am I parking this data, right? Like that's such a simple, relatively like anal requirement compared to the processing of it, but it's such like a pivotal decision. Very interesting. So in terms of the current Cloud landscape and the capabilities, you know, we mentioned towards the future and always thinking about what would the capabilities of my platform be in the future? What are some of those trends that people who are making these decisions today, like what should they be thinking about for their future providers as they're making those calls today?
Evangelos - 00: 07:11: Yeah, absolutely. When it comes to Cloud, I would say always pricing is definitely the number one, right? Because when you will be in front of your stakeholders meeting and you're going to say that, well, we chose this X provider, it's going to cost us three times more than that, you better have a good excuse by surprising. So, it is the number one. Secondly is, as I said, does this Cloud provider has a good record on meeting the needs of your industry? The key word here is in your industry because we do see different providers. Although, for example, AWS has the larger market size in the Cloud, there are very much niche Solutions that try better in different industries, for example.
Joe - 00: 07:46: And also, it's different industries of different regulatory environments.
Evangelos - 00: 07:48: Exactly.
Joe - 00: 07:48: Yeah, yeah.
Evangelos - 00: 07:49: It's actually regulation, right? And last thing is location, right? We in the West, we have quite well support in terms of data centers from pretty much every provider out there. But if you look at the data centers that exist in Africa, for example, or in Asia, so this will have an impact both on your data transfer. You obviously won't be able to transfer petabytes between India to German Data Centers by no physical means because of regulation, you won't be able to as well. But thirdly, what are your users, right? Like your users would definitely see a glitch of 50 milliseconds if your data are being sent from Germany to the local Indian Data Center, for example. So, I would say these are the three main factors of choosing the right provider.
Joe - 00: 08:29: Right. And I imagine things like topics like latency, especially for your industry, that's very gameplay limiting if you're going to have any kind of latency, which brings me on to, I guess, a bit of a pivot to how IndiGG are tackling these topics. So, we've heard a bit about the concept of a data mesh that is being played with at IndiGG. Can you tell us a little bit about this and how it's being used and what it's been developed for?
Evangelos - 00: 08:48: Yeah. So, this is pretty much my vision. In terms of data mesh, this is the latest archetype in the data world, right? So, previously, if you look at the history of how data have been managed so far in any industry, is that previously we had large databases that were just this kind of monolith, databases with a better data governance and a better management. We started calling them warehouses, and these warehouses became part of the internal data centers to the Cloud. And then with the latest trends of the Cloud providers, we started evolving that into the notion of data lakes, right? The problem with data warehouses is that in order to store something, you already had very strict governance of your entities, very strict governance of your quality. And if something just wasn't fitting their logo, it was just crumbling your whole application. That wasn't very particularly applicable when it was coming to large data or analytical data, because what the last decade gave us is that you need to capture as much data as possible and then find out what you want to do with that, right?
Joe - 00: 09:46: And sometimes that's ad hoc and you don't know the current usage yet.
Evangelos - 00: 09:49: Exactly, right? So, this came back to the data lakes, right? So, we started pretty much creating large and large and large data lakes. It's the point that most of the data lakes became unmanageable because we started collecting so much data, and cataloging this data became this unmanageable thing, right? Like we have all these data points now and we're not sure what do we do with them, right?
Joe - 00: 10:08: You go to the extreme opposite problem where the warehouse is too much structured and the lake is just like a complete unstructured, yeah.
Evangelos - 00: 10:13: Exactly, right? So, a data mesh is pretty much a kind of evolution of those two, of the measure, I would say. So, instead of creating just monoliths of these two walls, why don't you create kind of meeting of mini lakes and warehouses per domain of your department? And department, again, it varies depending on your business and industries and everything. Like for example, you are collecting data from the Web3, you are collecting data from games, you are collecting data from the commerce or your eShow. So, why don't you localize this collection in this indexing and then create a catalog that is correctly routing this request? So, for example, you want to create a new application that is fetching gamers depending on what they have bought, right? So, instead of having a large database of having all this information there, get part of the information from this department which has its own database and the other information from the other database, merge them and then serve this as information. So, this is the data mesh. It's having distributed your governance into more manageable mini-pots and then have a routing solution to serve it. It does require a bit more maturity into your data team, but it gives way more flexibility. So, this is the idea behind the data mesh is that it creates a much better flexibility to grow when you have multiple domains as industry to manage.
Joe - 00: 11:20: Right, fascinating. And to follow that through, that flexibility and that ability to gather data from those various domains, how's that influencing your AI strategy?
Evangelos - 00: 11:27: So AI is pretty much on top of it, right? So you wouldn't consider AI for those issues, right? So AI is pretty much this hungry kid who wants to eat more, right? So before you feed this thing, you need to cook, right? And this one, you know, data strategy and data governance is about cooking the right food. So AI is pretty much something that I'm personally, you know, growing, to be honest. Like we haven't been managed to completely scale it but the idea is, you know, as we said, right? So collecting the data is the step one. Step two is making the right statistical models for you to understand what you have and what does understanding mean? Means that you can use your data to start answering simple questions, right? Like what are my current users? What is the trajectory? What are the cohorts, right? And things like that. Then you are in the right maturity to go on the next step and say, listen, let's take the current cohort and ask something more, right? Like the cohort of the gamers that usually play at night and create some blog posts based on the activity that we see in our system, right? So this could be something that you could do with the latest AI trends, right? Like for example, you could use the, you know, such a big API and say, I'm having this kind of users, this kind of behavior, what kind of blog post you're able to create to fit this specific persona of my users, right? So yes, AI is the next step of the maturity of your team.
Joe - 00: 12:37: Interesting. So to change gears a little bit and continue our tour of your experience, you're kind of a stride to these two big moments in the Tech Industry, you know, the current AI hype and slightly previous, but still very contemporary blockchain and Web3. How have you found the integration of those two things and how has that affected your data management?
Evangelos - 00: 12:54: Yeah, so these are two technologies that don't have to be interlinked, but you can do great things together, right? So blockchain is a network of interactions, right? So it's the way that you connect either people or computers or organizations together, right? So this is what blockchain answers. Now, why do you connect them? It's pretty much a double property. So for example, you want to create a new smart contract, right? And if this smart contract is just happening on the blockchain, you don't have a data problem. But if that smart contract needs to do something that is related to the outside world, then you have a data problem, right? Because you need to understand first the outside world and then send the right signal to this smart contract. So there are solutions that you can do by your understanding the world outside with your machine learning models. And then you are interacting with your smart contracts in the architecture pattern that we call Oracle. For example, so you can create an Oracle, which is understanding the seal that I can do with the outside world and then is passing the right message to your smart contract and the smart contract pretty much reacts on it. Or you can pretty much reverse the tables and use blockchain as the network of your data points, the source of your data points and see, hey, you know what, my users are interacting with me, but they're also interacting with other games. What do they do with those other games? What do they tend to like? And can I offer something from my platform that they also like from outside world? I can see people are playing Axie Infinity. Maybe they like some other RPG kind of Games or NFT-heavy kind of games. So you can write the recommendation engines based on other blockchain interactions that you see in the network. So these are the kind of archetypes that you can marry those two worlds. But there are many interesting ideas that I see putting up, to be honest, like people trying to decentralize AI and things like that using blockchain. I don't have a specific project in my mind at the moment, but it's certainly one of those things that we read about every so often. I remain hopeful that something quite soon will come up. I would say decentralizing AI, whereby blockchain plays a pivotal role. I think we're at a very early stage there. So I would say the marriage between the blockchain world and AI is for the first two use cases that I said here.
Joe - 00: 14:52: I don't want to dwell on that too much, but now I'm kind of intrigued. From my limited understanding of both worlds, what data are they talking about putting on the blockchain in that case? The model parameters that we talk in, like the original training set, like what is it that lives on the blockchain in that case?
Evangelos - 00: 15:04: Again, right? So, you know, as I said, right?
Joe - 00: 15:06: All of the above.
Evangelos - 00: 15:07: Yeah, it depends, right? You said, you know, you're also taking care, right? So let's try to brainstorm, right? Like what if you had the prompt whereby this one was set in a smart contract? If you were to change it, you'd have to have a set of people to vote on it, you know, to vote on the change, right? And everyone would be sharing an open prompt that is defined on the smart contract, for example, right?
Joe - 00: 15:24: One of the issues that's been affecting, for example, the current ChatGPT, LLM thing, is like people using a version of the OpenAI API for a month ago with a particular prompt, and then they update it ephemerally behind the scenes. And now that prompt no longer produces the result that it did with this kind of putting a prompt on the blockchain, could that work both ways? Where if I know that I've got my prompt on a blockchain at this point in time, that equally I expect the output shouldn't change so that prompt.
Evangelos - 00: 15:53: So this would definitely work on the Open Source model side. So if you're using Falcon, for example, as your AI model, this would definitely work, to be honest, and this would definitely create some great ideas. The specific problem with OpenAI is that they're not openly versioning the updates of the GPT. So this is why you see people complaining about the drift of each performance.
Joe - 00: 16:12: And for mid-journey as well as also, yeah.
Evangelos - 00: 16:14: Yeah, exactly, right. So they're not openly versioning it for their own reasons. So I definitely think that someone will challenge us. For example, recently we saw Anthropic coming up with their own very great competitor, load two model, right? So Anthropic has a great chance here to actually challenge the openness of OpenAI and pretty much give us a better versioning, right? But as of now, what you said, it's definitely Atypical with Open Source. And I think actually Open Source models definitely have a competitive edge over the closed source providers because of that.
Joe - 00: 16:42: Yeah, I guess it's not surprising given that a lot of the talk about generative models over the last couple of years has been the fact that there are black box, but it is interesting that it seems that reproducibility is gonna become a competitive edge in the short-term future. Everyone's kind of always thinking of them in terms of escalating reasoning abilities or intelligence abilities, like their ability to do surprising things that's not just generate text, right? But it's actually just getting it to do the same thing twice seems like the dominant real world use case challenge.
Evangelos - 00: 17:07: Exactly.
Joe - 00: 17:08: Yeah. So we've been talking about ways of gathering lots of data and using lots of data. Obviously, privacy is a huge issue in this and there are techniques for managing use of privacy in terms of your data collection, but also in terms of training and using AIs. Can you talk a little bit about how you think about the importance of privacy preserving when it comes to using things like AI and machine learning and how you're managing that?
Evangelos - 00: 17:31: Yeah, absolutely. So again, it depends. At the moment, they rarely have a common point. Very few people are speaking of a privacy, privacy-first AI. And actually the blockchain world is tending to be privacy-first, right? So the next steps of the blockchain protocols actually might even prohibit the use case that I mentioned before, right? Which is for you to be able to have an open access to all these data points that you can infer from the users. So these are definitely two competing worlds that we're speaking about. This is definitely an unanswered question. Definitely there are startups that are trying to answer this and it's definitely, I would say, maybe the next billion Startups that if someone wants to nail it and they would definitely work it for someone to tackle this kind of issue. Because yeah, it's an actual issue, right? Because at the moment, if you want to use users' personal data in your phone, unfortunately, you have to pass this information to OpenAI.
Joe - 00: 18:17: Then what happens to it, right?
Evangelos - 00: 18:19: Yeah, exactly. And maybe the answer is again on the Open Source models because you can achieve better privacy or better control privacy.
Joe - 00: 18:25: Standard, you know, if you're running the model yourself, you know exactly what's happening.
Evangelos - 00: 18:29: Exactly, right? So maybe the answer is from the Open Source models, right? But this is literally the next phase. Like so far, we have been having, you know, GDPR because currently the data flows were much better managed by the engineers, right? Because, you know, engineers, we like to be more certain about things that are happening. We are control-fixed, right? But now AI, you know, is challenging that. It's giving randomness to our flows, right? Achieving privacy while you are not certain what's happening there. If someone is listening and they want to have the next billion startups, that's literally the topic.
Joe - 00: 18:56: Yeah, this is kind of another surprising element of the whole thing. There's so much has been said and spoken about users data collection over the years. And the data privacy issue that's really like the straw that broke the camel's back for generative AI is not the data in the training sets or where any of that came from. It's the fact that company employees are putting IP into these chat boxes. And also like the fact that that's not unusual. Like all of these companies, their employees are signing up for 20 Cloud services a week. Like we're all going and signing up for the next smart to-do list app and putting all their company's plans in it, right? And it's only now with AI that suddenly we want to be careful which SaaS services we put our company's data in. It's very interesting.
Evangelos - 00: 19:31: Yeah, absolutely. This will definitely be the question. Like once people start having the maturity and understanding that this is exactly the right question that we should be asking, this is literally the demand that will be created now and for the next three years.
Joe - 00: 19:42: You mentioned Open Source a couple of times and I guess I kind of brief step from there from like Open Source to open standards. So obviously one of the things we've seen with the AI conversation and the new Open Source LLM's coming out is also kind of a talk about standards for the data. Like how do we transfer this data? How do we keep the training set data in like common formats? Have you seen any interesting movement there? Are there any like standards for managing the data that we give to AIs that you find useful or helpful in your work?
Evangelos - 00: 20:07: Oh, that's an interesting one. So in terms of training your AI in a standard way, it will depend on the kind of embeddings that we're using, right? Speaking for LLM applications, right? Depending on your use case, depending on the content that you want to train your model with and depending on what you want to base with, I would say that the first depending factor, it will be in which space embeddings you will use and this will drive your next factor.
Joe - 00: 20:30: Cool. We are coming towards the end of our time. So we like to tap our very experienced guests for the juicy nuggets that we can apply to our own work and other businesses. From your perspective, what do you think would be the single most important piece of advice for businesses aiming to optimize their data on the Cloud for AI?
Evangelos - 00: 20:47: As we said, AI is the final frontier of your data, right? And you have to have the maturity before you reach that. And although many are there, also are not, right? Or many things are, but they aren't. So do you have your data in such a way that you can easily fit it into a prompt? Do you understand your data, right? Have you asked the simple questions before you go to the more complicated ones, right? And do you need to? Maybe you don't need to, right? And the next step is that you need to understand what is your competitive edge on the data that you are already having. Maybe you don't need to use your data. Maybe there is something already out there, you know, a model that is more or less what you're expecting. So for example, if you have an e-shop and maybe you haven't been so good in collecting data, maybe it doesn't worth spending millions in now building a data structure. Maybe there are already Open Source e-commerce models, like even on Kaggle, for example, that provides data points Open Source that you can use this to provide a basic training to your model. So you can start from there as a checkpoint and then continue, right? It always depends. These are not straightforward answers.
Joe - 00: 21:46: Yeah, we like it depends as answers. You know, the interesting things are going to come when you start doing it depends. We like to hear that. Perfect. That's awesome. Well, thank you so much for joining me today. We definitely covered a lot of ground. Thank you for joining us on the show, Evangelos.
Evangelos - 00: 21:57: That was interesting. Thank you, as well.
Outro - 00: 22:01: Beyond The Screen: An IONOS Podcast. To find out more about IONOS and how we're the go-to source for cutting-edge solutions and web development, visit ionos.com and then make sure to search for Ionis in Apple podcasts, Spotify and Google podcasts, or anywhere else podcasts are found. Don't forget to click subscribe so you don't miss any future episodes. On behalf of the team here at IONOS, thanks for listening.
New comment