Gen AI – Google’s Gemini

So among other things, I am a Kaggler – not sure what that means, but I spent some time on some jupyter notebooks and well – an email came in one day about a free course from Google – and I took the bait.

https://rsvp.withgoogle.com/events/google-generative-ai-intensive_2025q1/home

After 5 days – there was an optional capstone project to take on for the 1/4 million (yes that’s 250,000) participants to demonstrate what they had learned. (Let us pause here for a moment and consider how much energy was consumed by the google servers to teach that many people about Generative AI – there were little meters on the notebooks that showed CPU usage and whatnot – so I can’t imagine that someone at google hasn’t tracked that. It would be interesting how much energy it was – and relate it to something – like tanks of gas in an SUV – or seconds of travel for an aircraft carrier or whatnot.)

Anyway.. I couldn’t think of anything sexy – but I did have a little pet project where I had some pdf’s that I’d have to manually extract information from for a little database – or a “library” as my fellow-volunteers call it. Below is my story of that operational pipeline I build using google’s Gemini model with Kaggle’s notebook code web ui. Amazing!

PDF’s in need of summarization

I had a situation where I have a library of PDF’s that require specific information extracted from them and was going to manually read each one to extract that information. When I took this course I realized that I could use the Gemini generative models to “read” the pdf’s – extract the information I would have had to manually read and type into a database – and give it to me.

The Start

I used the examples from the first two days to test all the pieces (ie read a PDF and summarize it – then progress to crafting a prompt that would extract the specific data I wished. I noted that even when I had the temperature set to 0.0, it varied from time to time with certain types of data I wished – particularly the examples given i the course that just asked for a summarization of a PDF certainly did vary. (For this notebook I used PDF’s referenced in various lab examples throughout the Gen AI Intensive course.)

Going from an example to an ‘ops’ pipeline

There was quite a bit of difficulty moving from a one-off to a pipeline for an “ops” situation.

API drives and feeds the pipeline

First, I crafted an api with a simple JSON array of objects with a url property and crafted a the notebook code to look at each url in a loop. https://pacp.ca/kaggi/api/

Loopy loops loop lovingly

Then, after much trial an error (and was assisted by Gemini AND ChatGpt) I was able to put that in that loop code that would ask the model to find and “generate” the information that was needed for each PDF.
I added some code to check to make sure the URL was valid and it was truly a PDF.
As the loop does it’s job – the results are displayed for the user – a nice thing for debugging.
But the key was storing that information in an array and at the end – sending that information back to the api with the results – completing the loop.

API receives the results

I stored the results in a separate file – and they can be inspected by calling the api here: https://pacp.ca/kaggi/api/postindex.php

If you run this notebook again – you’ll see two new entries in it.

Extra notes – ethics

Calling an external API from an LLM is not something the my particular API wanted to do – it smelled too much like a “bot” I suppose.

Ironically, when I consulted with Gemini and ChatGpt – I got advice on how to “workaround” by pretending to be a browser. Of course there was nothing about this being deceptive and a potential abuse – but since it was my own API and I had permission – I proceeded with the hackaround and it worked. This means that I definitely need to have rate limiting and other “measures” to prevent abuse – although CORS theoretically should solve it – I turned that shtuff off for this proof of concept.

Also – one LLM gave me advice I thought was really good – and I randomized a pause on the loop – so that I would “mimic natural usage” whatever THAT means!

Summary

This simple little pipeline of abilities: the GenAI summary ability, the ability to read PDF’s, the ability to read and write JSON to api’s – is a handy little tool that I’m hoping will save me many minutes of manual labour. Wish me luck!

PDF Processing Take II

Look Mom! All in Javascript!! Going places!

Going places with javascript I never realized I wanted to go - and REALLY HAVING A BLAST!!!

So my experience with learning via LLM has allowed me to try things that in a previous world would have had me looking at a calendar instead of my watch. This is one of them.

I had a situation where I needed to add a watermark to a PDF and all my experience had been in c# tooling. BUT… I had been learning a few things lately a) LLM’s can get me access to working examples of code I might have previously shied away from because the styling was wonky and my path from PoC (Proof of Concept) was a few steps from V1.0. Basically V1.0 was not a PoS because my PoC was polished right from the start – all thanks to LLM styling proficiency (and the ascendancy of cool styling options like Tailwind CSS) and
b) javascript (and browser and processing power in consumers hands) has matured much more than I had realized. A real “schooling” I’m glad I got!

So within hours (not days) I was able to find, test, hone and add features that would have been off the table scope because I was getting results within minutes instead of hours or days. It was little projects like this that have really impressed me with the value-add of LLM’s. I realized that it’s a goldilocks story – not every use case and “wanna features” fit the situation , but this was was “just right” all the way through.

How it works

The javascript libraries I used were only two (the tailwind css of course) and the pdf-lib – library that was recommended by the LLM’s. (I did a few minutes of vetting to make sure it wasn’t someone’s CS110 homework on github.) I specified the functionality I wanted to the LLM and then integrated the output into the single .html file. (Look at the code behind the url to see the value that the LLM’s provided me.)

Three easy features in quick succession:

Large Watermark:

Add a large page sized watermark. Instant results. I then tweaked it to allow the user to specify the text of the watermark – which could give undesired results in the output – but that’s the use cased I wanted – it’s for a power user that’s willing to craft it a bit to get a better experience.

The LLM gave that working code that tested out so quickly I started expanding the scope:

Re-order Pages:

I had a wonky tool that allowed me to move, recorder and delete pages in a pdf and I never liked it – so I tried it on this – and wow! it was pretty painless. I regularly scan documents that are booklet double sided and the scanning renders them in the wrong order. But since they are consistent I was able to make preset page orders for 12 and 8 page “raw” scans – and the output would re-arrange them in the correct order. Win-Win!

Tiny Watermark:

And then in the time I had left – I really scope creep-ed this little project when I imagined a nearly invisible watermark to deal with “edge case” pdf uses. This worked very well. Win-Win-Win!

Imagined Future Features:

  1. Adding Metadata to the pdf
  2. Form filling
  3. Html or markup to PDF
  4. Form Creation

Summary

This tiny project blew my socks off in a way that later I found myself almost embarrassed at my naivety! I had written PDF manipulation tools before and the coding and testing took days and while my output was what I wanted – I hated the UI I had jury rigged together. This was a re-imagining of the entire process that I am ashamed that I hadn’t given javascript a serious look until LLM’s pointed them to me. One of my bonus design goals was to never share the pdf with the server – all the data and all the work happened on the browser – and I was able to stay true to that. And it didn’t hurt that my html UI crafting had improved since those days – culminating in the ah ha! it all comes together moment.

C# PDF fun…

So.. I was looking at some cool javascript libraries that edit PDF’s – and I was blown away! I’ll make a post after I have something to show – but this reminded me of something I did almost 10 years ago with an older open source library called iTextSharp – something originally created in Java and then ported to c#. It was for a one-time project and when I opened it up many many years ago – I realized that I had a backup plan if the javascript just didn’t cut it (or if my IIS hosting stack didn’t properly implement it) I’d have this as a backup in my back pocket.

Here’s the public repo: https://github.com/gitberry/pdfTool

There’s not a lot to say except that my original use case was heavy on copying individual pages around and filling forms – so the functionality of this library was not fully utilized – there’s functionality that allowed for images and QR codes to be inserted – and that was cool to play with it again. It’s a plain-ol-windows app – have fun!

C.R.U.D. & More!!!!

When I was a kid – crud was stuff that got stuck in the sink and needed a brush to remove – today we work with something much more positive! Create, Read, Update and Delete – is the basic building blocks of truly functional web sites and I was ready to see what I could do with VueJs. I’ve been around the block a couple times and like real life – it expanded to fill the needs that goes beyond basic CRUD examples you see out there and dips it’s toes into “real life” issues – like authentication, permissions, mobile-first design, delete vs inactivate. Read my story – or go to the bottom to see the links to run the demo or view the code.

Front-End

Building on previous VueJs work I’d demonstrated – this also uses Vue3, Vite and Pinia to build out the front-end functionality of Edit, Create, Delete (and Un-delete) of data that a backend provides. Each component was designed with the browser emulating a smaller phone screen – so it was obvious if something didn’t work for the small screen. A version 2.0 may use “cards” – which can look good on both a small, medium and large screens.

Needless to say – the two ‘ends’ know the structure of the data and this started as a simple demonstration of that functionality that extended into authentication and permissions. Certain users can update – others can only view.

Backend-End

A standard .NET API app – Entity Framework, JWT token handling etc with the common element being the structure of the data – which in this case started with a POC (Plain Old Class) and evolved to include odd tricks because I designed a general-purpose SQL table that I wanted to use for multiple classes. Don’t do that unless you have a good reason – my reason is that I knew I’ll be building more demo’s and just wanted to reduce table proliferation on my SQL servers…

When I was starting – I was thinking big so I added class libraries to separate out the various functionality – jwt, crypto, users, data. That was an exercise in over-engineering – but I’m not really sorry – I intend to re-use them in the future.

This was an exercise in “let’s build everything from first principles – ground up” – so I won’t try to suggest that my approach was superior. (My approach certainly was not superior when it came to my data model – more on that later.)

Data

If I had wanted to make a quick little demo then this would ahve been a failure – but my goal was to learn and expand my skillset. Like a lot of projects, this evolved into working through a few little features that I wanted to get beyond simple. But for the data – yes I should have just created a simple table, made an EF model of it – and just coded the darn thing – but I decided to make something that would use a common table – and then “magically” transform it into a Json object (or list/array of Json objects) and well… you can look at the code for yourself to see how complicated that got. After doing it, I wanted to do it levering the c# JToken – (and I intend to report another time on that) – but basically it wasn’t super clean eather – mainly because simplicity was at cross purposes with the desire to build a data handling library that allows me to “code-once-deploy-many” – with only the configuration files to specify the data models. The search for flexibiility and simplicity continues…

Permissions

This was another feature that ended up within the “code-once-deploy-many” goal. Because I hadn’t settled on a user infrastructure – and I really didn’t want to add another database structure to this demo – I made the users dynamically from a string in configuration. Pros-and-cons for sure – and those users had to hold the permissions for data elements and because I was determined to not add extra tables (ie normalized permissions that a fully functioning database would typically provide), they ended up being config strings inside of config strings. A kludgey solution that I’d not recommend for a real database oriented backend.

JWT & Crypto

This was demonstrated in my earlier Json Web Token demo: https://demo.northberry.net/jwt – what is different was that I moved them into two separate projects to keep code “clean”. In hind-sight – that was probably overkill – a simple folder of classes probably would have sufficed. It made the testing full of references and made the project heavier and more complicated than necessary.

SQL

I love data – and so the database part was a welcome break into the fun – creating random data sets to be merged in with the real list of real songs with odd or funny names so that the data had proper publishers, creators, created & edited dates etc..

Working together

So in the real world – you don’t just have CRUD – a user needs to be authenticated (or not – as per Contact Us forms) and permissions granted and enabled/disabled as per the wishes of the site’s specs. So not only is basic CRUD demonstrated but authentication (JWT) was used to enforce a minimum permission level. I also attempt to build configurable permissions (via web.config) from ground up on first principles. After doing it I’m not sure I like it, but part of that was trying to keep the scope of a demo project low and resorting to “static” users. (Albiet configurable static users – which may be an oxymoronic phrase.)

If you look at the API backend code you’ll see that at deployment, users can be configured with access (or not) to editing features. For this example, try editing with the user “sad” and you’ll discover that editing doesn’t work! There’s a delicate play between the frontend and the backend – for example the user holds the permissions to editing, but the frontend never knows that – it just tries to edit and save and if it works – it works. This is inspiring a 2.0 version to communicate information back to the user – for a better exerience. Nonetheless – this is solved problem in more mature systems – this was simply an experiment in moving beyond basic CRUD and into practical problems of mature, fuller featured sites.

Before you look at the Code: A (non) apology for naming…

Over the years I’ve found that frameworks often usurp common names and trying to use common english words such as object, list, token, crypto, data, database etc. has regularly put me afoul of microsoft & team’s frameworks – so I INTENTIONALLY MISSPELL classes and functions. They don’t conflict with common frameworks and a lot easier to find when searching. Anyway have fun perusing the code:

The frontend code: https://github.com/gitberry/crud-demo-frontend

The backend code: https://github.com/gitberry/crud-demo-api

And to see it all work together on a real site – here’s my deployment of it:

https://demo.northberry.net/crud/

something new

a few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to saya few words to say

more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and more words and dfsdf

fiddle dee

fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum.

fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum.

fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum. fiddle dum, dum dum da dum.

and then more of the same because this is a test and we want to see how long it could go and in what way does it display etcetera and all that good stuff..

JWT – Json Web Tokens – a little example

I’d read about JWT over the years – but all the frameworks I’d been working in had their own thing going and I never had a chance to dive into it a bit and try it myself. But I’m curious and I had some frontend tech I was learning so I decided to get my “hands dirty” and build something. Here’s my short story around building a duo: an HTML5 frontend paired with a .NET 4.7 backend demonstrating JSON Web Token authentication, generation, and basic use cases.

Frontend

I’m exploring Vue3, Vite and Pinia and Jason Watmore’s was by far the best frontend example I could find (as of January 2024). So the frontend code was cloned from Jason’s lovely example (https://jasonwatmore.com/post/2022/05/26/vue-3-pinia-jwt-authentication-tutorial-example) in May of 2022 (github here: https://github.com/cornflourblue/vue-3-pinia-jwt-authentication-example)

I wanted to learn the backend as well – so I commented out Jason’s faked backend code (which worked wonderfully by the way – I just wanted to test the backend I was writing) and added some funny song names in order to demonstrate how well his code handled authentication via a backend API. If you’re so inclined – set the minutes to live on your API to 1 minute and watch what happens 1 minute later – it gracefully logs out the user and gives the user gentle suggestions about logging in – and if they click on anything – it takes the user directly to the login dialog. (The demo I made available to the public is set to 2 minutes by the way if you don’t want to load the code and deploy just to see it work.)

Something I wanted to test was what happens if someone starts messing with the URLs and tries to log in WHILE LOGGED IN – and initially it let them get a new token. I wasn’t sure that this was a great thing (and it’s debatable – so I won’t get all rigid on you if you disagree), so I modified my backend to NOT grant a token while logged in and throw a 417 (Expectation Failed) and then modified Jason’s example to handle it by giving the user gentle suggestions about logging in while being logged in… (I also learned some weirdness which I can only attribute to various servers idiosyncrasies around status codes and messages.)

Backend

I have a hosted IIS server plan – and although they recently started hosting the newer .net CORE frameworks – I decided to for my first attempt, write the API in a stack I had more experience – and since my coding and deployment skills evolved along wwith .NET into the current 4.* framework over the past 2 decades – I chose that to keep my learning curve somewhat manageable. I was inspired by Bilal Shahzad’s posting https://www.c-sharpcorner.com/article/asp-net-web-api-2-creating-and-validating-jwt-json-web-token/, and used that as my inspiration. When I had questions I cross referenced it with a posting I found by “Sean” https://decatechlabs.com/secure-webapi-using-jwt who had an actual repository I could peruse and test https://github.com/seanonline/Webapi_JWT_Authentication.

Once the authentication tested out – I threw a list of funny songs into a data structure and used it in a new API path for testing the frontend behavior when tokens expired or are tampered with.

Working Together

I stubbed out an edit piece in the frontend to make the demo feel complete. As you see it today, the app demonstrates authentication, jwttoken generation, presentation of the token by the frontend within the header, validation of the token in the header by the API when requesting data, and some edge case situations handling. I made the frontend show an Editing component – but stubbed it out so it doesn’t actually save any data. (I did not want my demo to actually talk to a database – that’d be beyond the scope of this particular demo – check out my longer demo project on mobile-first CRUD.)

The goal was to make the two apps work on the same domain in order to avoid Cross Origin Resource Scripting (CORS). But as it was – I had to code for that possibility because the IDE running the .net 4.7 backend could not (or rather SHOULD not) run on the same port as the IDE running the Vue3 dev environment. So if you look in the backend code – dev’s can specifically enable web calls from external sites via config settings.

So today – we have a two little apps coexisting on the same location (a neat trick in it’s own right – look in the web.config of the HTML app), that show a user how to log in, and view data which requries a valid token, an about page available to both unauthenticated and authenticated users, and demonstrates to interested developers how editing might look on a simple app like this if they chose to further developer their Frontend and Backend API.

If you care to look at my code:

Testing tools:

  • https://jwt.io – helps inspect the tokens – and particularly nice because you don’t need to decode them – leave them base64 encoded as they are generated and paste them right in to be reviewed.
  • https://www.epochconverter.com – helps the exp(iry) value – a unix time value…
  • I would have had to use something like Postman or Fiddler if I hadn’t had my own demo frontend where I could use native browser dev tools to inspect values etc…