Releases: benhoyt/goawk
Version 1.17.1
Minor test fixes, no change in functionality:
Version 1.17.0
Now with proper CSV input and output support! For example, a simple example showing CSV input parsing and the new @"named-field" syntax:
$ goawk -i csv -H '{ print @"Abbreviation" }' testdata/csv/states.csv
AL
AK
AZ
...
This feature was sponsored by the library of the University of Antwerp -- many thanks!
Version 1.16.0
- Add
interp.New...ExecuteAPI to speed up and reduce allocations when executing the same program multiple times. #100 - Add
ExecuteContextAPI to support timeout and cancellation. #103 - Optimized string concatenation when concatenating more than two strings, for example
x = "a" "," "b". #99 - Reduce allocations in a few other places, such as
print,printf,sprintf(), and field parsing. #102 - Add proper Go 1.18 fuzzing support for fuzzing the AWK source and input. #103
Version 1.15.0
This release adds no new features. It's a significant performance improvement due to switching the internals of the interpreter from a tree-walking interpreter to a bytecode compiler with a virtual machine interpreter.
Results show that it's 18% faster overall on microbenchmarks, 13% on more real-world benchmarks. It should be fully backwards compatible -- please file an issue if you find a regression!
Version 1.14.0
This reverts the feature from v1.11.0 which changed the builtin functions length, substr, index, and match to use character indexes instead of byte indexes (as per the POSIX spec). The reason is because it changed those functions from O(1) to O(N), which created "accidentally quadratic" behavior in scripts that expected these functions to be O(1).
For example, @xonixx's grok.awk script on a relatively large JSON input file took about 1s in bytes mode (goawk -b), but 8 minutes (!) in the new unicode char default mode. That's extremely problematic.
Like v1.11.0, this release is again a small breaking change, but once again shouldn't affect many scripts (it will again only affect scripts that use constant indexes for substr on non-ASCII strings). I hope not many people are using interp.Config.Bytes or the goawk -b option yet, as those are gone again. Seeing v1.11.0 was only introduced a few weeks ago, I think it's worth the breakage for a performance problem of this magnitude.
Fixes #93: "Major speed regression for gron.awk in goawk 1.11.0+".
Version 1.13.0
Support RS being multiple characters and regular expressions RS (#86), allowing significantly more powerful text processing. This is a Gawk extension to POSIX, which says, "If RS contains more than one character, the results are unspecified."
Version 1.12.0
This release adds support for "getline lvalue" forms. See #85.
Version 1.11.0
This release changes the handling of the builtin functions length, substr, index, and match to use character indexes instead of byte indexes, as per the POSIX spec.
So this is a small backwards-incompatible change, but I think it's 1) warranted given GoAWK tries to conform to POSIX, and 2) won't break most scripts, even ones that use non-ASCII, unless they use constant indexes for substr on non-ASCII strings. To revert to the previous bytes-index behavior, set interp.Config.Bytes to true when using from Go, or use the new goawk -b option for the command-line version.
This does affect the performance of those builtins, as some operations that were O(1) are now O(N) in the length of the string. Still, v1.10.0 introduced other performance improvements, and it's pretty much a wash on the "real world" benchmarks overall.
Version 1.10.0
This release includes a performance improvement and several bug fixes:
- Only convert numeric string values to number lazily when needed: this gives an 11% average improvement on the real-world benchmarks, and around 40% improvement on my countwords script
- Support ENVIRON special array
- Show parse errors in a simpler, more standard format
- Fix ARGV numeric string handling
- Fix FILENAME numeric string handling
- Fix field numeric string handling: this brings a slight performance decrease, but thankfully the lazy numeric string improvement above more than makes up for it
- Fix parsing of string concatenation with prefix ++ and --
- Fix output synchronization by flushing appropriately
This release also adds the AWKGo AWK-to-Go compiler as described here, but it's a separate executable that doesn't change GoAWK itself at all. Read more about AWKGo.
Version 1.9.2
Fix builds on Go versions before 1.17 due to missing +build constraint. Fixes #74.