Rahul Gopinath
rahul@gopinath.org
Lecturer at the University of Sydney, Australia. ശ്രീദേവി's Dad. I work in the junction between Software Engineering and Cybersecurity. Interested in Program Analysis, Automatic Repair, Mutation Analysis, Specification Mining, Grammar Based Generators and Parsing. My website is at https://rahul.gopinath.org

Notebook LLM is awesome, but very much a work in progress. For example, while I can provide multiple sources, there is no way to preview PDFs or even download them once I have added one of these as source material.  Secondly, the source guide extraction of text leaves a lot to be desired. While the RAG is very useful, when I click on the citations, I wish it took me to the actual document rather than an extracted summary of it. I also wish that there was a way to query individual documents separately.

Danushka Liyanage, postdoctoral researcher at University of Sydney Software Engineering group at his convocation.

Congratulations Dr. Liyanage!

danushka-postdoc.jpg 115.39 KB

When I first attended a conference (OOPSLA 2002), my parents paid for my economy flights from NZ to the USA, I got a spot as a student volunteer with free rego and stayed at a friend's house one-hour bus ride away from Seattle. To have to pay $1K extra to present is insane!!!

It costs $0 to publish a paper in POPL/PLDI/ICFP/OOPSLA as if accepted it gets into PACMPL journal and you can choose to present at a conference if you can afford it. It also costs $0 to publish a paper in TOPLAS. These are by far the top PL venues in the entire world.

Google has just published their scholar ranking for publication venues here (The link is for software systems).

I am somewhat new to the world of complex spreadsheets (Never had to use them before except as pretty CSV viewers). I am surprised that Excel does not allow us to rename columns to intuitive names so that I can say `=Total/Max` rather than `=S2/T2`. Is there any spreadsheet that allows this?

I have posted about a related issue before, but it seems that it is time to take a harder look at using CVEs as the touchstone for effectiveness of security tools. It has become far too easy to produce CVEs (even high severity ones) because there is limited oversight in the whole process. If you are a security researcher wondering how to evaluate your tool, please consider using Mutation Analysis as the metric. It is a well researched technique that can reliably show how your tool performs, and provide you insights with where you can improve.

We have reworked the integration with the #mybinder platform, and you can now again interact with notebooks right in your browser (now using #JupyterLab instead of Jupyter Notebook).
As an example, here’s the notebook on fuzzing with grammars: mybinder.org/v2/gh/uds-se/fuzz

You can access these from any chapter in fuzzingbook.org via “Resources” → “Edit as Notebook”. Enjoy!

Just recently read the paper "Delving into ChatGPT usage in academic writing
through excess vocabulary"
. by Kobak et al. Their premise is that (from the abstract) the [models] can produce inaccurate information, reinforce existing biases, and can easily be misused. So, the authors analyse pubmed abstracts for vocabulary changes, and identify certain words that have become more common post LLM. They find that words such as "delves", "showcasing", "underscores", "intricate", "excel", "pivotal", "encompassing", "enhancing" are all showing an increased usage, and hence suspect.

While this data is indeed interesting, I wonder why LLMs tend to use these words. Aren't LLM outputs supposed to be more of a reflection of the data they are fed in training? Surely that means that these words are more common in some data set than we expect?