Weekly Digest #5: ChatGPT Finds Its Voice

PLUS: Deep dive on why we are betting on Small Language Models

Hey 👋

I'm Saurabh, and I'm thrilled to welcome you to Apperture Focus. This isn't just another newsletter – it's the culmination of our team's year-long deep dive into the intersection of AI and finance.

Every week, we'll be sending you a 5-minute read that distills the noise into actionable insights. It's essentially an invitation into our world – a behind-the-scenes look at our latest research, the "aha!" moments, and the trends that keep us up at night (in a good way).

I'd love to hear your thoughts. What resonates? What doesn't? Your feedback will shape the evolution of Apperture Focus, ensuring it delivers real value to you.

Let’s begin.

ChatGPT's Advanced Voice Mode - The AI Polyglot in Your Pocket

OpenAI has released Advanced Voice Mode for ChatGPT, currently in limited access. Our team has tested this feature, and here are our key observations:

  • Multilingual Capabilities: Proficient in multiple languages and accents.

  • Adaptive Interaction: Responds to real-time feedback, adjusting communication style.

  • Versatility: Performs various tasks with notable vocal range.

  • Natural Speech Patterns: Incorporates pauses for more realistic speech.

Evolution from Previous Versions

Having used ChatGPT 4.0's voice mode, we've noticed significant improvements. The new version interrupts less frequently during conversations, though some interruptions still occur. Given its early access status, we expect further refinements.

Early User Experiences:

We scoured Twitter for real-world applications. Here's a snapshot of what users are doing:

  • Language tutor with multiple accents (link)

  • Joke teller with volume control (link)

  • Adaptive text recitation (link)

  • Storyteller with sound effects (link)

  • Multi-character dialogue performer (link)

  • Singer in various styles (link)

Why this is big deal and why we are only cautiously optimistic.

While ChatGPT's Advanced Voice Mode is generating buzz, it's important to consider its current limitations and uncertainties from a business standpoint:

  • Underlying Technology: The audio-to-text and text-to-audio models still contain significant inaccuracies, potentially limiting real-world applications.

  • Availability Uncertainty: OpenAI has not provided a public roadmap for API availability, making it difficult for businesses to plan for integration.

  • Unclear Pricing: The cost structure for using these advanced voice features in a business context remains undefined.

  • Limited Alternatives: Our research on platforms like HuggingFace has not yielded comparable open-source models with similar capabilities.

  • Current State: At present, Advanced Voice Mode appears more of a proof of concept than a business-ready tool. Its applications are largely limited to entertainment purposes like generating rhymes or jokes.

While the technology shows promise, the lack of a clear path to practical, cost-effective business implementation means it remains in the realm of novelty for now. Businesses should monitor developments but may need to wait for more concrete information on accuracy improvements, API availability, and pricing before considering serious adoption plans.

Sana AI Assistant - Your Intelligent Personal Companion

After using the Sana AI assistant for nearly five months, I can confidently say it stands out in the crowded field of AI-enabled personal assistants.

Imagine having a personal assistant who not only manages your files, meetings, and tasks, but also understands context, learns from your work patterns, and provides insights just when you need them. That's what Sana AI assistant aims to be, and it largely succeeds.

Source: Sana

Key Features That Caught Our Eye:

  1. Meeting Assistant: Records, summarises, and enables post-meeting queries.

  2. Knowledge Integration: Centralises diverse file types for easy access.

  3. Extensive Integrations: Works with GitHub, Mixpanel, Figma, Google Workspace, Microsoft, Salesforce, and more.

  4. Task Automation: Streamlines repetitive tasks across platforms.

  5. Contextual Search: Delivers relevant answers from all information sources.

  6. Customisable AI Agents: Creates specialised assistants for different functions.

  7. Collaboration Tools: Facilitates team knowledge sharing.

Our Take:

In my day-to-day use, Sana has significantly reduced time spent on administrative tasks. Its meeting summaries and ability to answer specific questions about past meetings have been particularly valuable.

From a product management perspective, Sana's implementation of AI in the productivity space is impressive. They've thoughtfully considered the user experience, creating a tool that feels intuitive and adapts to individual work styles.

While not perfect - AI tools always have room for improvement - Sana has demonstrably increased my productivity and improved my ability to manage information effectively.

For CEOs and CXOs looking to streamline their workflow and enhance their team's productivity, Sana AI assistant is worth considering. It's not just another digital tool, but a comprehensive system that adapts to your work style and helps focus on high-value tasks.

Our bet on Small Language Models: Why Enterprises Should Think Big by Going Small

In the last year, we've seen a flurry of new large language models (LLMs) released by tech giants like Meta, OpenAI, and Google, as well as new entrants Mistral and Anthropic. Building solutions on top of these models for multiple clients across diverse use cases, we honestly resonate with the question bothering the industry at large - Can Generative AI really create the business impact it promises?

Here are our two cents!

You Pay for AI with an arm and a leg and still get inaccurate answers, Why?

Through our extensive work with enterprises, the biggest concern every product leader or CXO has is that this system gives great answers at times and really inaccurate ones once in a while. Can we not switch off the dumb answers? I wish it was this simple.

But let's start by understanding why hallucinations occur:

  1. Missing Pieces: In most use cases, we are trying to use a model trained on public data to answer specific questions that are rooted in the enterprise's context. The AI model's knowledge obviously has gaps, so it tries to fill them with guesses. Think of trying to complete a puzzle with missing pieces by imagining them.

  2. Mixed Messages: In many instances, information available publicly that models have been trained on conflicts with enterprise information. The AI might blend incompatible facts. It's like following two different recipes at once.

  3. Attention Glitches: The AI models have limited context windows and might fixate on the wrong parts of information. It's like not reading part of a complex puzzle and trying to answer the modified puzzle, fundamentally changing its meaning.

  4. Random noise: LLMs introduce small variations during text generation to ensure diversity. However, these variations can sometimes result in plausible-sounding but factually incorrect statements, similar to how a message can become distorted in a game of "telephone."

The Small Language Model (SLM) Advantage

After months of attempting to mitigate these problems through prompt engineering, advanced RAG, and agentic workflows, we still felt we could not get to the kind of accuracy that we wanted. This realisation led us to explore Small Language Models (SLMs).

A major current advantage: fine-tuning SLMs is becoming cheaper daily, thanks to researchers developing creative ways to reduce the compute and effort required in training these models.

Once fine-tuned, these models become highly specialized tools:

  • Highly reliable: Give you the same answer for similar instruction

  • Cost Efficient: Dramatically reduced token costs, with expenses as low as 5% of the initial cost for LLMs.

  • Speed: Faster inference times.

  • Security: Enhanced data control with on-premises hosting.

  • Customisation: Models tailored to specific business needs and terminology.

In one of our use cases, switching to SLMs from LLMs achieved:

  • Increased accuracy from 85% to 95%+

  • Decreased cost by 72%

  • Decreased query response time by 40%

This combination of cost savings and improved accuracy presents a win-win situation for enterprises, addressing key challenges we've observed with LLMs in real-world applications.

Our take: Own Your AI

At Apperture, we're betting on self-hosted, open-source Small Language Models (SLMs) trained on proprietary data as the future of enterprise AI. Imagine an AI that truly understands your business, its processes, and jargon. Based on our experience, SLMs offer a more practical, efficient, and reliable solution to the challenges posed by LLMs. For CEOs considering AI implementation, this shift is crucial to understand. While LLMs have their place, SLMs provide a more targeted, cost-effective approach for many business applications.

That’s all.

Stay curious, leaders! See you next week.

How did you like today's email?

  • ❤️ Loved it

  • 💪 Pretty good

  • 💢 Could do better

We'd greatly appreciate your thoughts on the structure, content, and insights provided. Your feedback will help us refine and improve future editions to better serve your needs and interests.

Reply

or to participate.