UpNorthClark

| SQLwonk | Choir | BikeRide | C# | ProudFullTimeParent | Mapping | NatureJay | NealStephenson | GarbagePicker | TrailRover | TheLongNow | Tolstoy | Javascript | GPS | SkiSlacker | JSON | RaspberryPI | STEAMdreamer | MachineLearning | History | OrsonScottCard | mySQL | Bobiverse | STEM | WORDLE | NTSMapsheet | SnowshoeStomper |


STEAM / STEM / Science Centre

Books

  • Gen AI – Google’s Gemini

    So among other things, I am a Kaggler – not sure what that means, but I spent some time on some jupyter notebooks and well – an email came in one day about a free course from Google – and I took the bait.

    https://rsvp.withgoogle.com/events/google-generative-ai-intensive_2025q1/home

    After 5 days – there was an optional capstone project to take on for the 1/4 million (yes that’s 250,000) participants to demonstrate what they had learned. (Let us pause here for a moment and consider how much energy was consumed by the google servers to teach that many people about Generative AI – there were little meters on the notebooks that showed CPU usage and whatnot – so I can’t imagine that someone at google hasn’t tracked that. It would be interesting how much energy it was – and relate it to something – like tanks of gas in an SUV – or seconds of travel for an aircraft carrier or whatnot.)

    Anyway.. I couldn’t think of anything sexy – but I did have a little pet project where I had some pdf’s that I’d have to manually extract information from for a little database – or a “library” as my fellow-volunteers call it. Below is my story of that operational pipeline I build using google’s Gemini model with Kaggle’s notebook code web ui. Amazing!

    PDF’s in need of summarization

    I had a situation where I have a library of PDF’s that require specific information extracted from them and was going to manually read each one to extract that information. When I took this course I realized that I could use the Gemini generative models to “read” the pdf’s – extract the information I would have had to manually read and type into a database – and give it to me.

    The Start

    I used the examples from the first two days to test all the pieces (ie read a PDF and summarize it – then progress to crafting a prompt that would extract the specific data I wished. I noted that even when I had the temperature set to 0.0, it varied from time to time with certain types of data I wished – particularly the examples given i the course that just asked for a summarization of a PDF certainly did vary. (For this notebook I used PDF’s referenced in various lab examples throughout the Gen AI Intensive course.)

    Going from an example to an ‘ops’ pipeline

    There was quite a bit of difficulty moving from a one-off to a pipeline for an “ops” situation.

    API drives and feeds the pipeline

    First, I crafted an api with a simple JSON array of objects with a url property and crafted a the notebook code to look at each url in a loop. https://pacp.ca/kaggi/api/

    Loopy loops loop lovingly

    Then, after much trial an error (and was assisted by Gemini AND ChatGpt) I was able to put that in that loop code that would ask the model to find and “generate” the information that was needed for each PDF.
    I added some code to check to make sure the URL was valid and it was truly a PDF.
    As the loop does it’s job – the results are displayed for the user – a nice thing for debugging.
    But the key was storing that information in an array and at the end – sending that information back to the api with the results – completing the loop.

    API receives the results

    I stored the results in a separate file – and they can be inspected by calling the api here: https://pacp.ca/kaggi/api/postindex.php

    If you run this notebook again – you’ll see two new entries in it.

    Extra notes – ethics

    Calling an external API from an LLM is not something the my particular API wanted to do – it smelled too much like a “bot” I suppose.

    Ironically, when I consulted with Gemini and ChatGpt – I got advice on how to “workaround” by pretending to be a browser. Of course there was nothing about this being deceptive and a potential abuse – but since it was my own API and I had permission – I proceeded with the hackaround and it worked. This means that I definitely need to have rate limiting and other “measures” to prevent abuse – although CORS theoretically should solve it – I turned that shtuff off for this proof of concept.

    Also – one LLM gave me advice I thought was really good – and I randomized a pause on the loop – so that I would “mimic natural usage” whatever THAT means!

    Summary

    This simple little pipeline of abilities: the GenAI summary ability, the ability to read PDF’s, the ability to read and write JSON to api’s – is a handy little tool that I’m hoping will save me many minutes of manual labour. Wish me luck!

“Hone” your skills on previous articles…