Recapping the Think 2019 “Journey to AI” community CrowdChat: AI everywhere
Artificial intelligence is transforming every business process. Developers are incorporating AI – in the form of deep learning, machine learning and kindred technologies – into cloud-native applications and business processes through tools that enable them to compose these features as data-driven microservices.
On Thursday, IBM Corp. and SiliconANGLE’s sister market research firm Wikibon held a #Think2019 conference community CrowdChat to discuss how enterprises can make the journey to AI in the cloud. The hourlong online session was well-attended and there was vibrant discussion of many issues related to the journey to AI.
The CrowdChat featured the following IBM AI subject matter experts: Carlo Appugliese, Data Science and Machine Learning; Matthias Funke, Hybrid Data Management; Madhu Kochar, Cross-analytics; Hemanth Manda, IBM Cloud Private for Data; Anantha Narasimhan, UGI; and Jason Tavoularis, Business Analytics.
Here were the most noteworthy responses from these and other participants to each of the CrowdChat questions:
Q: Rob Thomas, general manager of IBM Analytics, has said there is no AI without an IA or information architecture. How are you modernizing your data estate — the organization of your data assets — to get ready for an AI and multicloud world?
Katie Schafer: “If anyone is looking to learn more about ICP for Data, be sure to check-out session #2571, titled: Change the Game: Learn How to Win with AI happening on Wednesday, February 13th at 1:30pmPST in the Large Theater on the Data & AI Campus.”
Hemanth Manda: “Having talked to a number of customers and business partners, this is an issue everyone is grappling with and we are addressing it through our new platform offering ICP for Data, an integrated data and AI platform for multi-cloud”
Madhu Kochar: “Every client discussion starts with this dialog … and very critical to have a trusted analytics data foundation. It starts with know your data, trust your data and use your data to further drive insights”
Matthias Funke: “I see this question come up everywhere. Modernization to gain agility, new insights faster, and have more people and business application benefit from it … very often one needs to start at the bottom of the AI ladder and the question: How can I collect all the data I need, and make it accessible to the right people, at the right time? And how can I integrate data assets across different locations and data sources?”
Carlo Appugliese: “We work with clients on their Data Science Journey and biggest factor to winning with AI is to make sure you account for 3 things … The right skills, the right process/ culture and finally the correct tools.”
Jason Tavoularis: “Of course! AI requires data. if there’s no infrastructure, there can’t be much data, so you can’t expect the AI to be very smart.”
Anantha Narasimhan: “Our customers are looking at AI to help drive digital and potentially business transformation. At the core of AI are a) People & Culture, b) Process, c) Data … With data present all across the organization, getting a good handle on it is the very first step … Collect, Organize and then Analyze data. and then Infuse AI models in order to operationalize … ML is a great enabler for AI. We need to remember that AI can help us win quickly.. or fall flat quickly. Because if the data is not of good quality, the models will throw up bad insights”
Tanmay Sinha: “Quality of AI models is directly proportional to the quality of data used to train the model. Without an information architecture to serve high-quality data, the AI models can be inconsistent, irrelevant or worse biased.”
John Furrier: “I think that he’s really nailing the core AI (and ML) angle meta data or information that feeds AI engines is super important. If companies get this right then ML and AI soar to new heights”
Jameskobielus: “There’s no practical AI without data quality, governance, prep, and training in a high-performance data lake. Modernizing your data estate in the multicloud for AI demands an industrialized DevOps approach that automates much/most of these processes … AI can’t be smart if data scientists can find the right data to drive feature engineering etc. Likewise, AI models can’t do their jobs with high confidence without upfront and ongoing training from fresh operational data … Infusing AI into the business requires that an operationalized data science pipeline with a strong real-time/streaming CI/CD workflow.”
David Floyer: “IMO, the future for analytics is real-time results. This means fast execution of operational AI/Analytics near the data. It also means low-latency connections between applications wanting to automate processes and the AI/analytics required … For example, if you are wanting to ensure that only employees are entering enterprise premises, there will be many enterprises with the same problem, and many solutions to purchase … 1. There are two sources of AI solutions – internal, and external, the normal make or buy decision. For products and services owned, it is vital that data is collected about those services in IA. However, there are many technologies it would be easier to buy.”
Q: How are you increasing workload and consumption flexibility in your analytics systems?
Katie Schafer: “To learn more about how you can build a proper data architecture to improve data accessibility, don’t miss The Road to AI—A Journey to Modernize Your Data Architecture session on Wednesday, February 13th at 3:30pmPST on the Data & AI Campus.”
Carlo Appugliese: “In Data Science … The key to success is full access to all data.. In my experience, this is hot topic and there is a balance they have to play between Security and Innovation….My experience is full access to data for your Data Scientist and Data Engineers is critical to your business innovating….Here is sessions at #IBMThink where Experian will go into detail about their AI journey. https://myibm.ibm.com/events/think/all-sessions/session/6869A …. Here is a blog where I explained a recent AI project working with Experian. https://www.ibmbigdatahub.com/blog/how-data-science-elite-helped-uncover-gold-mine-experian”
Matthias Funke: “This is gold to me. Having a catalog of data assets at my disposal without worrying about where data resides. Avoid or minimize data movement to avoid lag and cost is of tremendous value.”
Madhu Kochar: “As I talk to multiple clients, access to data especially dark data is critical. It is also important that they have good data virtualization story, meaning you do not always to move your data…. Capability to join your traditional data with IOT data, real time streaming data is critical to drive new analytics insights”
Jennifer Shin: “the notion of being able to access all data sounds like a dream I had once… then I woke up and remembered I work with people and data is mess. The reality is we can have all the data in the world, but it’s useless if it’s not accurate or of poor quality…. In a competitive market, there will always be businesses and both internal and external clients who want their data to be kept private if it provides an advantage. Being able to access the data I need when I need is more important than having access to all of it.”
David Floyer: “It is essential to have multiple sources of data around key business processes, products and services. The quality of AI/Advanced Analytics will be dictated by the quality of the data sources.”
Q: Does your analytics strategy presume to move data to analytics or analytics to data? Why?
Anantha Narasimhan: “Definitely analytics coming to data – so faster decision can be taken at source or close to it…btw, there is an exciting session on Data Modernization strategy in a Multicloud World – by Madhu Kochar: https://myibm.ibm.com/events/think/all-sessions/session/7235A and virtualization: https://myibm.ibm.com/events/think/all-sessions/session/7223A”
Katie Schafer: “For more on business-ready data, don’t miss the Digital Transformation: A Business Ready Data Hub for Advanced Analytics session at Think 2019 happening Friday, February 15th at 9:30amPST on the Data & AI Campus.”
Madhu Kochar: “Data Gravity rules! You bring analytics to data, that is the most optimal…. Especially the world of multi-cloud strategy this is critical that we keep data where it is, thus technology like data virtualization, having governance built in to trust the data drives to trusted AI”
Matthias Funke: “Analytics to data. Any data movement or copying is expensive and leads to all kinds of issues (lineage, quality, latency, higher resource utilization and cost)”
Hemanth Manda: “always move Analytics to Data .. that’s been our mantra . Data gravity should dictate your strategy. Moving against the gravity means you would end up spending a ton of resources / money & is not sustainable”
Carlo Appugliese: “in my opinion, Do your analytics where the data is if you can.. There is no value in moving lots of data, but there is significant business value in doing more analytics with your data. Its all about rate and pace of AI projects.”
Tanmay Sinha: “Data is growing exponentially within an enterprise. Moving becomes an avoidable expense if you can bring analytics to your data!”
Sarbjeet Johal: “When doing #ML #AI, for compute intensive scenarios like human genome sequencing take data to compute. For data intensive scenarios (especially input), bring compute to data. #rethink”
Jennifer Shin: “In my experience, companies already collecting data find value in turning data into analytics, whereas companies developing new products or services find more value in using analytics for data. The best #datascience teams needs to find the balance in doing both”
Jameskobielus: “In-situ/in-database analytics is a key foundation fo the big data revolution. Data gravity. Now with the edge looming larger as a data source, analytics is moving closer to those nodes and getting more sophisticated there. Distributed AI.”
David Floyer: “Data in volume is costly to move & takes a lot of time. Data loses value over time. so, it is usually much cheaper to move code to data than data to code. This is especially true for operational AI/analytics, which should be moved close to data source where possible.…It is interesting to observe that when AI systems are deployed, 90%+ of the code is in operational AI, rather than ML model development.”
Q: How policy-driven are your data analytics visibility, detection and reporting activities?
Carlo Appugliese: “One of the biggest I’ve seen is that companies think they are behind vs other companies.. What companies need to understand is that its a journey and they just need to start. most companies are learning and growing in this space…..I recommend, pick the one use case, put small team on the project and start. If it fails, that is normal. just goto next one and for the wins, it will cancel out many failed projects.”
Tanmay Sinha: “To ensure unbiased AI models, policies on data analytics are more important than ever.”
Hemanth Manda: “very little to be honest & I think this is huge issue given increased and diverse regulations , GDPR being the latest… Hemanth Manda…Here is a session on Data Virtualization @ THINK 2019 that would be very valuable to attend : https://myibm.ibm.com/events/think/all-sessions/session/7181A “
Matthias Funke: “How important are good policies if their ratification is not automated? Deep integration across the analytics ‘stack’ can solve for that”
Madhu Kochar: “Every CDO would want to say YES to this. Need ML/AI based solutions to automate these activities, and we in IBM analytics have solutions to make this EASY (a hard problem)”
Jennifer Shin: “there’s always a policy, but the restrictions depend on the purpose associated with how the data is being used. When my #datascience team built models for negotiation purposes, even our internal status reports listed our work as confidential.”
John Furrier: “Policy driven will be a very important portion of a machine driven future. Getting policy down and having machines figure out new policies on the fly address both on demand AI and real time AI”
Sarbjeet Johal: “ML and AI are next frontiers in Data Governance Platforms and these models will work in conjunction with policies! So it’s “policy driven ML enabled” approach which seems most practical with the tools we have today!”
Q: Are protection and compliance regimes built into your analytics systems, or bolted on? Why?
Matthias Funke: “I see it as a never-ending journey. One is never done. There is a legacy to begin with, but every moment, new data (sources) may get added to your current landscape. Fun!”
Tanmay Sinha: “Data privacy regulations are coming whether we like them or not. GDPR is already here, CCPA is coming soon. Enterprises, small and large, have to starting thinking about the data being collected and shared.”
Jennifer Shin: “#analytics systems typically have several layers of protection and compliance regimes. accessing the platform is at a system level whereas anonymizing data depends on the data set (as well any contracts associated with it)”
David Floyer: “Early days for establishing compliance and protection policies. It will probably need a company to have a Wall Street Journal disaster to focus minds on this issue!”
Q: How does your organization administer profiling, cleansing and cataloging of data?
Anantha Narasimhan: “this is perhaps the core of organization’s journey to AI or even to a successful Data Lake, Data Science…. there is an excellent session at THINK, hosted by Jay @jaylimburn -https://myibm.ibm.com/events/think/all-sessions/session/6913A …. some organizations refer to this as Data Preparation or Data Curation…. Here’s a good session at THINK, in case you are interested: https://myibm.ibm.com/events/think/all-sessions/session/6912A”
Carlo Appugliese: “In area of Data Science, typically we include a Data Engineer who work side by side with Data Scientist and are critical to take findings and put into Catalog as well as provide key features needed to modeling phase…. You need a combination of a cross frictional team, the right access to data and tools to build your AI foundation…. One the big areas we see in AI is ability to explain what your predictive models are doing and do you trust them.. Let me ask everyone, Do you trust the decision made by an AI/ML model?…Model bias is something we are very focused on, especially from a dev ops perspective. Understanding this is important and critical to your organizations future as you incorporate key decisions using AI. So Trust AI but verify :)”
Sarbjeet Johal: “it’s mainly done at LOB level in most of the companies I have worked with in advisory capacity. Central tools, policies and procedures need to be built for data governance. I believe the WHAT of data cleansing and cataloging must stay with LOB and HOW with IT.”
Hemanth Manda: “as usual, there are multiple solutions too handle this, but ICP for Data is a platform that includes and enforces these capabilities by default .. Learn more @ this THINK session : https://myibm.ibm.com/events/think/all-sessions/session/5478A….here is a 3rd party listing of vendors offering cleansing tools : https://www.analyticsindiamag.com/10-best-data-cleaning-tools-get-data/“
Madhu Kochar: “Besides Profiling, cleansing, cataloging, Data classification is another critical attribute. Here is where Ml automation can go a long way. IBM Information Server provides complete solution”
Pouya Fakhari: “An edge computing approach is made for the concept of the data warehouse, while pure cloud computing fundamentally contradicts the concept. It is generally accepted that only edge computing makes sense for systems that collect data on a massive scale thoughts hybrid cloud edge…. E. g. an Edge Computing Device can outsource simple computing tasks to a cloud using a Function-As-A-Service concept. Here, the cloud does not store anything and no backend is set up on it. The cloud only offers computing power for any functions that are transmitted on the fly
Matthias Funke: “Would agree if you think about IoT use cases with massive volumes of data points continuously produced. Aggregation and storage can happen at the edge. It’s not just data warehousing though.”
Jennifer Shin: “I have yet to see a organization that has this process streamlined. Most established companies have many, many meetings about how data set is going to be used internally and the logistics around it…. one of the advantages of building cutting edge tech and creating new data products/services is that this is dealt with further down the line”
David Floyer: “This an important requirement in the maturing of AI/advanced analytics. Solutions should support distributed and multi-cloud data, and ideally support orchestration and optimization of moving code to data or vice versa.”
John Furrier: “Clean data in —> great ML and AI; not clean data in –> lots of cleanup. Just say no to data pollution!!”
Q: What resources help your enterprise deploy models anywhere, securely?
Madhu Kochar: “A built in governance for these models is critical as well.. so you really need data engineers, data scientist, data stewards need to collaborate”
Carlo Appugliese: “Using Watson Machine learning really gives you ability to train. deploy and monitor your models.. This really gives you model portability so you can train and deploy anywhere..”
Sarbjeet Johal: “Data Governance Policies + Data Governance Skills + Stated Policies. That covers all people processes and tech aspects.”
Carlo Appugliese: “If you’re looking to build a new Data Science Team?…Here is a blog I put out on how to build a rock star Data Science Team! https://www.ibm.com/blogs/business-analytics/rock-star-ibm-data-science-elite-team/ “
Jennifer Shin: “In my experience, IT and operations teams are very important when you need to confirm that certain governance is in place within an #analytics system or need a new policy to be put in place… the best resource for deploying models anywhere, securely is a IT or technology team that is knowledgable, experienced and responsive!”
James Kobielus: “The core platform that enables enterprises to deploy models anywhere is a data-science CI/CD toolchain that can serve to any target device, node, hardware, container, and runtime environment. The “securely” requires tight access and integrity controls throughout.”
David Floyer: “End-to-end security from development, deployment, and updating is important, and not yet at all common!”
Q: How are your analytics users using data visualization and low-code development tooling?
Katie Schafer: “Here’s a great session that will showcase the new capabilities in IBM Cognos Analytics 11.1 and how it uses AI to provides smarter self-service analytics: https://myibm.ibm.com/events/think/all-sessions/session/3651A “
Anantha Narasimhan: “based on prior experience, when we want to accelerate self-service analytics, low code/no code become important… with Cognos Analytics 11.1, Business Users can use natural language queries to get insights into data.. and stunning visualization to clearly state trends or issues (sorry – shameless plug in) :)”
Matthias Funke: “I see two categories of analytics users: Data Scientists using dev tooling like jupyter notebooks and OSS visualization libraries, vs LoB users using canned reports and dashboards.”
Hemanth Manda: “I tried using Tableau, but gave up after a few days. Nothing beats Cognos especially after the latest improvements in 11.1”
James Kobielus: “Increasingly, analytics developers are using declarative, visual, low-code tooling to program AI/ML, with the tooling leveraging auto-ML to compile models for optimized execution on target platforms…. Analytics business users are also using self-service, visual tooling to build predictive and other advanced analytics for decision support–eg Cognos…. ML-driven augmented programming, leveraging low-code visual front-ends, is a huge research focus here at Wikibon. See my report from a year ago: https://wikibon.com/augmented-programming-ml-development/ ”
Jennifer Shin: “I find more teams are using #datavisualization across an organization ranging from creating a realtime dashboard for the c suite to using it as a a tracking tool for day to dat operations.”
Q: What is your organization doing to manage and mitigate bias in your models?
Katie Schafer: Here’s a session happening at Think 2019 that will dive into Detecting and Mitigating Bias in AI: https://myibm.ibm.com/events/think/all-sessions/session/3449A”
Carlo Appugliese: “What I’ve seen is companies are doing this manually but after the fact and really fall short.. This is topic needs to be evaluated in the beginning of your model development. We can really help companies with this using tools.”
Madhu Kochar: “Bias in AI a very hot topic and critical. There are great examples, i will share later on how many societal biases are in our datasets. So we really need tools and technology to help on data traceability, explainability”
Jennifer Shin: “All models will have bias because we live in a world without perfect information, which is why being able to communicate the extent that the bias poses a risk is so essential in #AI….The best way to manage and mitigate bias in your model is to understand #statistics, #mathematics, #data, #science, #engineering and people…. algorithms aren’t in and of themselves bias, but it can increase the bias depending on how it is designed… Developing appropriate reporting and monitoring for models and algorithms implemented in productions is essential for limiting bias”
Steve Ardire: “Most people think algorithms are objective but in large part they’re opinions embedded in code. AI systems are black boxes; the data goes in and the answer comes out without an explanation for the decision. Algorithms that learn are supposed to become more accurate unless bias intrudes and amplifies stereotypes….Current ML models understand what’s explicitly stated, but less good at anticipating what’s not said or implied…@DameWendyDBE University of Southampton, Growing role of #AI in our lives is ‘too important to leave to men’ …Must develop effective mechanisms in algorithms to filter out biases and build ethics into AI with ability to read between the lines or what requires common sense.”
James Kobielus: “Debiasing models starts with debiasing data. Here’s a piece I published on the emerging best practices in this. From last year: https://www.informationweek.com/big-data/ai-machine-learning/debiasing-our-statistical-algorithms-down-to-their-roots/a/d-id/1331852
David Floyer: “This is an important trust issue! If a company is shown not be have addressed this issue, there are severe risk of brand damage. E.g., a store with cameras with AI to help employees meet, greet or challenge customers entering the store should be especially careful!”
Sarbjeet Johal: “always be training your models! Context injection mechanisms are poor with current toolings but we are aware of this problem, that means, we are on our way to solve it!…. you have to remove bias from data input! Algos aren’t bias, data is! Always keep that in mind!”
Image: Marcus Spiske/Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.
We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.