Rahul Gopinath
rahul@gopinath.org
Lecturer at the University of Sydney, Australia. ശ്രീദേവി's Dad. I work in the junction between Software Engineering and Cybersecurity. Interested in Program Analysis, Automatic Repair, Mutation Analysis, Specification Mining, Grammar Based Generators and Parsing. My website is at https://rahul.gopinath.org

I am somewhat new to the world of complex spreadsheets (Never had to use them before except as pretty CSV viewers). I am surprised that Excel does not allow us to rename columns to intuitive names so that I can say `=Total/Max` rather than `=S2/T2`. Is there any spreadsheet that allows this?

I have posted about a related issue before, but it seems that it is time to take a harder look at using CVEs as the touchstone for effectiveness of security tools. It has become far too easy to produce CVEs (even high severity ones) because there is limited oversight in the whole process. If you are a security researcher wondering how to evaluate your tool, please consider using Mutation Analysis as the metric. It is a well researched technique that can reliably show how your tool performs, and provide you insights with where you can improve.

We have reworked the integration with the #mybinder platform, and you can now again interact with notebooks right in your browser (now using #JupyterLab instead of Jupyter Notebook).
As an example, here’s the notebook on fuzzing with grammars: mybinder.org/v2/gh/uds-se/fuzz

You can access these from any chapter in fuzzingbook.org via “Resources” → “Edit as Notebook”. Enjoy!

Just recently read the paper "Delving into ChatGPT usage in academic writing
through excess vocabulary"
. by Kobak et al. Their premise is that (from the abstract) the [models] can produce inaccurate information, reinforce existing biases, and can easily be misused. So, the authors analyse pubmed abstracts for vocabulary changes, and identify certain words that have become more common post LLM. They find that words such as "delves", "showcasing", "underscores", "intricate", "excel", "pivotal", "encompassing", "enhancing" are all showing an increased usage, and hence suspect.

While this data is indeed interesting, I wonder why LLMs tend to use these words. Aren't LLM outputs supposed to be more of a reflection of the data they are fed in training? Surely that means that these words are more common in some data set than we expect?

Our paper "Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable Mutants" on evaluation on models for estimating equivalent and killable mutants were accepted by ESEM 2024.  The paper is here. #ESEM2024 #Equivalentmutants #mutationanalysis

I am visiting ANU Canberra on Friday. If you are around, and is interested in what I do, please come talk to me.

Parsing JSON is indeed a minefield. However, a commenter in HN has a suggestion: Use postscript instead of JSON. It has a binary format, has comments, and generally looks much better. Here is their provided example:

  <<
    /first_name (John)
    /last_name (Smith)
    /is_alive true
    /age 27
    /phone_numbers [
      <<
        /type (home)
        /number (212 555-1234)
      >>
      <<
        /type (office)
        /number (646 555-4567)
      >>
    ]
    /spouse null
  >>

And I agree, it is much better than JSON. There are many other interesting things to like here. For one, the keys are symbolic. There is only one character `/` indicating the keys in a dictionary (indicated by `<<`). This reduces the visual clutter to a great extent. Using `<<` for dictionaries is also great.  Dictionaries are one of the largest units in such formats containing data, and it is better to use two characters for their delimiters. By using `()` for strings, it provides a starting and ending delimiter for strings, and is better visually parsable than `"`. There are no commas in arrays. or dictionaries, removing the question of trailing commas. Overall, PON (Postscript Object Notation) is much better designed than JSON for human readability.

Are you attending the Singapore Summer School on Fuzzing? Here's what my students and I have planned for Monday, fitting into more great talks and tutorials by Abhishek Arya, Marcel Böhme, Lim Min Kwang, Mathias Payer, and
Thuan Pham. Details at fuzzing.comp.nus.edu.sg

I am co-founding a new startup! #InputLab creates test data for thousands of formats from electronic invoices to retail orders, covering all input features. We just passed the initial evaluation toward up to 800k€ public funding to start as a #CISPA spin-off in September.

We are #hiring – notably test experts – and we are looking out for #collaboration partners and early #adopters from industry and public service. Check us out at inputlab.net! #startup #XML #softwaretesting