StrategyMix

How accurate is AI? - Roughly right vs highly precise

Dawid Naude, Director, Pathfindr

Is it accurate? Along with “Is my data safe”, this is the most common question about AI. We’ve all experienced mild embraces with inaccuracy, possibly even hallucination of AI models.

So how accurate is AI, let’s get a bit more specific.

General AI Assistants and Narrow AI Applications

First let’s categorise them into 2 buckets - general AI Assistants (like ChatGPT, MS Copilot, Google Gemini), and narrow AI applications - like a custom tool that validates if a new contract has any terms not compliant with your standard policy. It’s been narrowly grounded on your own data and business rules.

For general AI Assistants, these are ‘roughly right’. They’re great at trends, summaries, advice. If I were to ask a group of sales executives what best practice sales process is, I’ll get different answers from each one. None of them are wrong, it’s just that the advice is at a level where it’s generalised, and they’re coming at it from their own viewpoint. You didn’t ask a precise question like “what’s the temperature right now”, you asked “how’s the weather”, and got answers like “It’s cool”, “pleasant”, “chilly but sunny”. All correct, but different.

Compare this to a question that requires a precise answer - “What regulations am I required to comply with when selling in the EU”. This isn’t a general answer, you need a list of specific clauses as well as likely needing to give more specific answers to specific questions. Or “What are the visa application requirements for India”.

General AI Assistants like ChatGPT are pretty good at most things, as long as it doesn’t require precision. In the example of visa entry requirements in India, if you were to ask ChatGPT this right now, it gets it wrong. It suggests items that are no longer required. The nature of the training process is that it draws on petabytes of information, and part of that are blogs, archived information and there might even be an attachment on the government website somewhere that has the old information, vs what’s published on a page on the website. It draws on all of this information.

We can see new user experience patterns emerging in Google Gemini where it offers the option to “verify with google” and it’ll do a quick google search the validate the information. This would increase the accuracy by a bit but not completely. The more you can ground AI in deliberate and specific sources, the higher the level of precision.

The more precise an answer you need the more specific a solution you should seek, or create your own. Platforms like Redline, Josef Q and Harvey are specialist narrow applications for lawyers. Most of them still use GPT4 as an underlying model but the user experience and technology has been tweaked, fine tuned, grounded in a much more precise process.

Roughly Right

Where is a 20% difference in opinion from 2 people ok…

The things I would recommend people use ChatGPT, Copilot, Gemini without hesitation are:

Create a first draft of marketing material
Critiquing a contract (vs creating one)
Advising on ways to improve your sales process
Understanding themes and highlights in an annual report
Summarising a document
Summarising a big spreadsheet and distilling themes
Deeply understanding a topic you’re unfamiliar with
Recommending approaches to a problem
Simplifying complex documents
‘Guidance’ instead of formal process
Simulating challenging interactions like sales negotiations “give me 15 difficult questions they’re likely to ask”
Asking for ways to improve something
Consolidating pros and cons and creating comparison tables

Things that more precision is necessary, and will require further review or a specific solution. AI can do the first draft, but rely on the output, it will have problems.

Complex financial modelling
Listing specific steps in order for a scientific process
Creating an HR policy that’s meant to meet legislation
Creating a call script for regulated industries
Creating a non-disclosure agreement
Responding to a customer complaint where they are liable for financial compensation by a regulator
Interpreting your specific obligations accordingly to law (80% accuracy here isn’t enough)
Calculating financials like “what is Amazon’s marketing revenue for the past 4 quarters”. Surprisingly it gets this wrong

There are solutions to all of the above starting from complex prompt engineering, however the best solution would be to choose or build a narrow AI solution for what you’re wanting to do.

There’s a ton of potential, start tinkering in the roughly right domain, and then identify the things where you need a specific solution, and where the business case would stack up, and find or build the right approach.

The good news is that there are 1000+ narrow AI solutions on websites like theresanaiforthat.com. Note, that the vast majority are honestly… very average. It’s a pretty user interface on some complex prompt engineering, without any clear grounding on a custom data set.

Most of these startups will die, so you’ll need to try a few before figuring out if a custom solution in the right approach. As with everything in AI, test and learn and trial and error is part of the process.

Other Blogs from Dawid

5 No Brainers

For all the hype around AI, it’s often not clear to business where they should get started. Some convince themselves they have a plan, you’re in this category if you say “We’ve given some people access to MS Copilot”. Let me guess - they seem to like the MS Teams meeting summarisation right?

AI Poor Data

I don’t know if it’s simply a convenient answer whilst there are hundreds of other competing priorities, or if it’s a misinformed opinion, but “our data isn’t ready yet to take advantage of AI” is something we hear regularly. Or similar variants like needing to spend 2024 getting data ready and then they’ll reassess AI in 2025.

How to think about copilots

Bots, like people, have two types of usefulness - knowledge and skills. Things it knows, and work it can do. In this Blog, Dawid Naude of Pathfindr provides practical advice to help you roll out or review your deployment of co-pilots and chatbots within your organisation, to generate both a short time impact and longer term strategic value.

Use whats in front of you

The first step your company should take is learning how to use ChatGPT properly. Very few do. There’s a lot of talk of AI, and businesses need to start tinkering right away. There is a huge opportunity accessible right now, right in front of you. You can completely automate processes with autonomous agents, have all your content created automatically for your learning management platform, and even enable the equivalent of a data science team with a few clicks.

AI Value not in Chat

Chat is only incremental The biggest mistake you’re making right now is thinking about AI in terms of ChatGPT, or chat interfaces. These assistant tools are great generalist support applications and a demonstration of the incredible power of LLM’s. But. They are a single step in a process, typically involving a lot of Ctrl+c, Ctrl+v, a lot of prompting, trial and error, to get a consistent outcome.

10 Boring Problems

Whilst we talk about ethics, avatars, deep fakes and other magical technology, the reality is that for most businesses, the value in AI will come from solving very boring problems. The average desk worker has a lot of boring tasks. There’s a substantial amount of download, copy, paste, save-as, etc. There’s a lot of responding to messages that could be found on the website. There’s a lot of vlookups and pivottables, meeting minutes and deal updates. We love solving boring problems, and get less excited about deep fakes and avatars.

Early AI pioneers

Our clients are trailblazers of AI, and it’s clear what the leaders are doing.

AI Assessments

Checking, marking, critiquing, assessing, scoring We recently reflected on a common theme we’re noticing in the work we’re delivering. The projects had different names like “accreditation certifier”, “clause validator” or “skilled migrant assessment”, but they were all doing the same thing, assessing something. What we also realised is that AI is often great at a first draft, or a last pass. In this case, an initial assessment.