How I Use AI Daily: My Personal Voice-to-Insight System

I’ve built a system that uses AI to help manage my thoughts and emotions. It’s not perfect, but it works for me. Here’s how I use it every day.

“…giving me a starting point for further reflection”

Process Overview

It’s pretty straightforward:

  1. I talk to my watch, just like leaving a voice message.
  2. My home server picks up the new audio file.
  3. It turns my words into text.
  4. The text goes through an AI that “thinks” about what I said.
  5. Both the original text and the AI’s thoughts end up in my note-taking app.
  6. My watch receives the AI’s response within a few seconds.

All of this happens automatically, locally, without me doing anything beyond the initial voice memo.

Challenges Addressed

I often have thoughts or ideas that I need to process, especially when it comes to managing stress or negative thinking patterns. Writing them down or typing them out isn’t always convenient or quick enough. I needed a way to capture these thoughts effortlessly and process them automatically.

My Voice-to-Insight Pipeline

Here’s what I built:

  1. Record: I speak a voice memo into my Samsung Watch.
  2. Sync: The recording syncs to my smartphone automatically.
  3. Transfer: Folder Sync Pro sends the file to my Unraid server on my home network.
  4. Transcribe: A bash script on my server sends the audio to a local Whisper ASR service for transcription.
  5. Process: The transcription goes to a Node-RED flow on my network.
  6. Enhance: A local Large Language Model (LLM) analyzes and enhances the transcription.
  7. Store: Both the original transcription and the AI-enhanced version end up in my self-hosted Memos application.
  8. Notify: I send a notification to Pushover with the LLM response.

The Tech Behind It

Here’s what I’m using:

Hardware:

All of this is on my local network. I keep my data at home.

Software:

Which LLM Model?

I rely on Meta’s Llama 3.1 7B for quick processing of my initial voice notes, hitting the “good enough” mark about 80% of the time. When it doesn’t work out, I revisit and decide whether to run them through a larger (still local) model like Gemma 27B.

If I’m comfortable sharing personal content, or if my voice note is massive in size, Claude (paid) is my go-to for more detailed processing and synthesis into actionable playbooks.

Key Features

  1. Effortless: I just speak, and everything else happens automatically.
  2. Private: All my data stays on my network.
  3. Customizable: I can tweak any part of the system.
  4. AI-enhanced: The AI adds context and suggestions to my thoughts.

How I Use It

I’ve set this up primarily for CBT-style (Cognitive Behavioral Therapy) thought processing. When I’m stressed or notice a negative thought pattern, I speak it into my watch. By the time I check my notes, the AI has already offered a different perspective or pointed out cognitive distortions.

It’s not about replacing my own thinking. It’s about augmenting it, challenging it, and giving me a starting point for further reflection.

What’s Next

I’m always tinkering with this system. Here are some changes I’m considering:

  1. Building a custom app: To replace Node-RED and streamline the process.
  2. Adding a printer: For physical copies of important insights.
  3. Keyword-based processing: To handle different types of thoughts in specialized ways.

Closing Thoughts

This system has become an integral part of my daily life. It helps me manage stress, catch negative thought patterns quickly, and gain insights I might have missed.

At its core, this project is about using technology to support personal growth. It’s an example of how we can create tools that adapt to our specific needs, rather than the other way around.

Personal Example

To give you a better idea of how this system works in practice, here’s an example based on a real event that happened to me, recreated for the purpose of this post.

Voice Transcript:

I went to Starbucks earlier for a little bit, and I was trying to get maybe an hour of work done. But there was this toddler that was grabbing a chain for the window shade and just making all the noise possible with it. The mom and dad were distracted with the real estate agent, and I didn’t really know how to process any of it in the moment. I didn’t want to have any negative feelings towards a little toddler, and I can understand that the parents are maybe distracted, but at the same time, I just had mixed feelings about the whole thing. And I ended up leaving earlier than I wanted to.

AI-Enhanced Reflection:

The AI generated the following response for me to reflect on:

It can be really frustrating when you’re trying to focus on work or relax in a public space, only to have it disrupted by something like that.

First of all, kudos to you for not wanting to have any negative feelings towards the toddler! It’s completely understandable to feel a bit annoyed or frustrated in that situation, but choosing to react with compassion and understanding is really great.

It sounds like the parents were also dealing with their own stressors, what with the real estate agent there. Sometimes it feels like everyone around us is stressed out and overwhelmed, which can be draining to deal with.

Leaving earlier than you wanted to is totally okay! You took care of yourself in that moment, even if it meant not getting as much work done as you had hoped. Self-care is important too!

If you don’t mind me asking, what would have made your experience better? Would you have liked the parents to intervene or find a way to soothe their little one more quickly? Or was it just a matter of needing some quiet time yourself?

Scripts

Node Red Flow

You can find it here.

Transcribe new audio files

#!/bin/bash

# Set the path to the watched folder
watched_folder="/mnt/user/Inbox/FromWatch"

# Set the URL of the Whisper ASR service
whisper_url="http://yourwhisperserver.home.arpa:6234/v1/audio/transcriptions"

# Set the URL of your Node-RED endpoint
node_red_url="http://yournoderedserver.home.arpa:1880/transcription"

# Set up logging
log_file="/var/log/whisper_transcription.log"

# Function for logging
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$log_file"
}

# Function to transcribe an audio file using the Whisper ASR service
transcribe_file() {
    audio_file="$1"
    json_file="$2"
    log "Transcribing file: $audio_file"
    response=$(curl -s -w "\n%{http_code}" -X POST -H "Content-Type: multipart/form-data" \
         -F "file=@$audio_file" \
         -F "language=en" \
         "$whisper_url")
    http_code=$(echo "$response" | tail -n1)
    content=$(echo "$response" | sed '$d')

    if [ "$http_code" -eq 200 ]; then
        echo "$content" > "$json_file"
        log "Transcription successful. JSON saved to: $json_file"
        return 0
    else
        log "Error transcribing file. HTTP Code: $http_code, Response: $content"
        return 1
    fi
}

# Function to send the transcription result to Node-RED
send_to_node_red() {
    json_file="$1"
    log "Sending transcription to Node-RED: $json_file"
    response=$(curl -s -w "\n%{http_code}" -X POST -H "Content-Type: application/json" -d "@$json_file" --max-time 30 "$node_red_url")
    http_code=$(echo "$response" | tail -n1)
    content=$(echo "$response" | sed '$d')

    if [ "$http_code" -eq 200 ]; then
        log "Successfully sent to Node-RED"
        return 0
    else
        log "Error sending to Node-RED. HTTP Code: $http_code, Response: $content"
        return 1
    fi
}

# Function to process a single file
process_file() {
    file="$1"
    log "Received file for processing: $file"
    if [[ "$file" =~ \.(mp3|wav)$ ]]; then
        log "Processing new audio file: $file"
        audio_file="$watched_folder/$file"
        json_file="$watched_folder/${file%.*}.json"

        if transcribe_file "$audio_file" "$json_file"; then
            if send_to_node_red "$json_file"; then
                log "File processed successfully: $file"
            else
                log "Failed to send to Node-RED: $file"
            fi
        else
            log "Failed to transcribe file: $file"
        fi
    else
        log "Ignoring non-audio file: $file"
    fi
    log "Finished processing file: $file"
}

# Main script
log "Starting Whisper transcription script"
log "Watching folder: $watched_folder"

# Watch for new files
log "Starting inotifywait"
inotifywait -m -e close_write -e moved_to --format '%f' "$watched_folder" | while read -r file; do
    log "inotifywait detected new file: $file"
    process_file "$file"
    log "Finished processing. Waiting for next file..."
done
log "inotifywait loop ended unexpectedly"