Weekly Digest #1: Claude beat's Chatgpt

PLUS: Deep dive on Building highly reliable Gen AI systems

Hey 👋

I'm Saurabh, and I'm thrilled to welcome you to Apperture Focus. This isn't just another newsletter – it's the culmination of our team's year-long deep dive into the intersection of AI and finance.

Now, I'll be honest – we've taken a bit of a liberty by adding you to this mailing list for our very first edition. But here's why: In our journey, we've uncovered insights so compelling, so potentially game-changing, that we couldn't wait to share them with minds like yours.

Every week, we'll be sending you a 5-minute read that distills the noise into actionable insights. It's essentially an invitation into our world – a behind-the-scenes look at our latest research, the "aha!" moments, and the trends that keep us up at night (in a good way).

I'd love to hear your thoughts. What resonates? What doesn't? Your feedback will shape the evolution of Apperture Focus, ensuring it delivers real value to you.

Let’s begin.

Claude 3.5 Sonnet beats GPT-4o on Financial Data Extraction

Anthropic, a rising star in the AI landscape, is defying expectations. With just 375 employees, this underdog is punching far above its weight against industry giants like DeepMind and OpenAI. It just released Claude 3.5 Sonnet a model that outperforms GPT-4o hands down on multiple benchmarks.

We were especially interested in figuring out how it did on financial data extraction - the achilles heel of every finance data professional.

We tried extracting financial data from complex tables and charts in PDFs and found that Claude 3.5 Sonnet was notably more accurate. What's even better is that it's also more cost-effective in terms of token pricing, making it a win-win solution.

Here is an interesting case study: Hanane on LinkedIn compared Claude 3.5 Sonnet, Opus, and GPT-4o on complex financial data extraction.
The results were clear: Claude 3.5 Sonnet excels in managing intricate financial datasets, significantly reducing the manual labor involved. Read more

While these benchmarks don't translate to production applications directly, here are some insights into the model’s potential-

Source: Anthropic

Fina.money

Imagine if Notion, a chartered accountant, and a financial analyst had a baby together, it would be Fina Money. To me, it stands as one of the best implementations of AI in personal finance.

Here are some use cases that blew my mind and why:

The first step with fina.money is bringing in your financial data through over 12,000 integrations. You can then clean this data and categorise it any way you like, presented in the most visually appealing manner.

Fina.Money's standout feature is its customisable dashboards. Create anything from simple cash flow trackers to complex debt management systems using live data.

The cherry on top? An integrated AI that answers any money question, offering instant insights from your data - like a 24/7 personal financial advisor.

Building highly reliable Gen AI systems

Hey folks, I'm excited to share something that's been consuming my world lately - our journey to build a Gen AI system to fetch financial data with over 90% accuracy. It's been a wild ride, and I want to take you through it.

Let's start with the elephant in the room - Accuracy in LLMs.

It's like trying to tame a brilliant but wildly unpredictable beast. These models are incredibly powerful, trained on massive datasets that become their internal memory. But here's the kicker - changing that memory is about as easy as teaching an old dog new tricks. It's time-consuming and costly.

In our recent project, we faced this head-on. We needed to fetch data from a structured database, create beautiful visualisations and atop that generate insights. Try in to answer things like

  • Are there any seasonality considerations in sales of Uber

  • What has been the primary driver of cost increases at Instamart

  • Are there correlations in discounts and growth in Food Delivery

We ended up creating a system of 7 agents to do this. Now, I know what you're thinking - "7 agents? Isn't that overkill?" Well, sometimes you need a whole team to win the game.

The result?

We hit over 90% accuracy. To put that in perspective, the best text-to-SQL LLM out there manages about 65% in production. We're talking near-human performance here, folks!

But here's where it gets interesting. Is this the best we can do? Hell no!

Our approach was a bit like using a sword when a small knife would do. We had LLMs doing things like metric matching, which added to both latency and cost. It's like hiring a Michelin-star chef to make you a sandwich - it works, but it's not exactly efficient.

So, Here's our playbook:

  1. Deliver a quick MVP using agents that leverage powerful models

  2. Open it up to users and see if the offering resonates

  3. Generate high-quality data based on usage

  4. Use that data to fine-tune smaller models and reduce cost, latency, and inaccuracies

  5. Rinse and repeat until we've built something kick-ass that is cheap, effective and super accurate

Would love to hear if this iterative approach of building MVPs fast to generate datasets to fine tune resonates.

P.S. For those interested in the nitty-gritty, check out IBM's research on LLLM text to SQL and the SQL LLM Accuracy benchmarks. It's fascinating stuff!

That’s all.

Stay curious, leaders! See you next week.

How did you like today's email?

  • ❤️ Loved it

  • 💪 Pretty good

  • 💢 Could do better

We'd greatly appreciate your thoughts on the structure, content, and insights provided. Your feedback will help us refine and improve future editions to better serve your needs and interests.

Reply

or to participate.