BashSupportPro and indexing performance

jay.guidos · November 21, 2020, 6:45pm

We are still loving the new BSP, and use it extensively!

Lately I have been leveraging #shellcheck source … directives and the code completion and navigation capabilities that provides are a game changer for us.

One challenge we are starting to face is indexing performance. We have a pretty large BASH function library that we are tying into, admittedly not your average kind of ‘little bit of BASH’ projects. When I make a simple change to a script I see the CPU jump and stay up, for as much as 30 seconds or more. While this background processing is going on, while I can still type and edit there is no longer any code completion or error checking going on.

This is not an occasional thing, it happens each time I edit a file. I am sure it is because of the large amount of back referenced code that the BASH checker must grind its way through in order to finish validation and parsing, but it is becoming a usability problem. Maybe there is something we can do to improve things? The vast bulk of the BASH libraries change rarely, maybe there is some kind of parse cache alternatives?

We are open to suggestions!

Jay

jansorg · November 21, 2020, 7:47pm

Hi Jay,
thanks for reporting. This definitely isn’t the way how it’s supposed to be.
The plugin is supposed to support large source sets and I’ll do my best to improve this.

When your typing into the editor, the plugin performs steps like resolving variables and functions. This uses all previously sourced files and also all files, which source the current file. An index is used to make this fast, but it’s obviously not doing well with your project.

A few questions:

Are the sources public? If yes, then I’d test and profile this myself. But I’m assuming that this isn’t possible.
Can you tell how large your Bash codebase is, e.g. lines of code and number of files?
Is the high CPU usage only found when typing into a large file or also when typing into a small file, which sources your bash function library?
Anything else seeming wrong, e.g. exceptions popping up? Had you noticed things like a StackOverflowException?

You could post a CPU snapshot to help me diagnose this.
https://intellij-support.jetbrains.com/hc/en-us/articles/207241225-Performance-testing-plugin has details how to do that. Make sure to enable the plugin Performance Testing (available on the marketplace, if it’s not already bundled with your IDE).

Help > Diagnostic Tools > Start CPU Usage Profiling
Work with the slow bash scripts, make sure to let highlighting finish and wait until cpu usage is down again
Help > Diagnostic Tools > Stop CPU Usage Profiling
Send the cpu profile to mail@bashsupport.com.

Screenshot_0691143×1080 108 KB

Another thing you could do is this (not at the same time as CPU profiling):

Enable logging of BashSupport Pro. Add #bashpro to the debug settings at Help > Diagnostics > Debug Log Settings...
Confirm the dialog, work with the slow bash scripts, then collect the logs via Help > Collect Logs and Diagnostic Data
Send the collected data (a zip) to mail@bashsupport.com

Screenshot_067765×609 29.2 KB

Thanks!
Joachim

jay.guidos · November 21, 2020, 9:46pm

Hi there Joachim,

I sent an email with some CPU profiling and debug logs.

To answer your questions:

The sources are not public, but I included a small portion of some non-proprietary code in the email.
Our codebase for this project is perhaps 3000-5000 lines of BASH in about 15 files.
I can see heavy CPU usage when editing a BASH file only 20 lines long, with several include directives pulling in a large number of dependent functions
Nothing else seems wrong, no evidence of sickness.

To test while profiling/debugging, this is what I did:

opened a 30 line BASH script that used source include directives
selected a line that used a variable like this: “${beSeq}”
started profiling when the CPU was quiet
edited the line to remove the quotes
watched the CPU grind at high levels for about 40 seconds
saw a warning appear (use quoted variables to avoid word splitting)
ended profiling

One final thing - the library of BASH code is slurped up in a series of nested source include directives, in that when a single file is included, the included file has one or more includes of it’s own (and each subsequent include may include other include directives). So while the file I am editing may only have 1 include directive, there may be as many as 15 files that have to be scanned in order to complete the dependency traversal.

Thanks for looking into this so quickly! I am eager to hear what you might learn next.

Jay

jay.guidos · November 21, 2020, 9:54pm

One last thing that might be a factor Joachim,

Some of our BASH coders prefer NOT to us suffixes to identify their files. This admittedly causes some problems in other areas both in Linux and in Intellij, where file types are usually identified by suffix alone. You recently added the capability to recognize BASH files by looking for #shellcheck directives and/or hash bang lines in the file. This was a great feature that stopped our team’s ‘Suffix Wars’!

I don’t recall exactly when I started to see performance problems, but it has been a recent thing. Perhaps this new scanning feature is part of the problem?

Jay

jansorg · November 21, 2020, 10:32pm

Thanks for the details, that should help to track this down.
(I haven‘t received the email, the email provider (Google) might have rejected attachment data. I‘ll let you know with a better way to share the data when I need it.)

5000 lines of code shouldn‘t be an issue here, but I‘ll keep it in mind.
Files without extension are treated in the same way. IntelliJ is applying slightly more content scanning, but only when these files change.

A chain of source commands is supported, but a likely cause of this issue — especially with a non-trivial graph of dependencies. I‘ll look into this next week and hope to have a fix soon.

Joachim

jansorg · November 21, 2020, 10:43pm

One more thing, your note about the warning appearing after cpu usage was back to normal might indicate an issue with ShellCheck. The more recent versions of the plugin pass -x To make ShellCheck look into sourced files.
You could try to run the latest version of ShellCheck as shellcheck -x file.bash and watch cpu usage.

But I‘ll do that for my tests anyway.

jay.guidos · November 21, 2020, 11:20pm

I was using shellcheck 0.4.3, the latest available version is 0.7.1.

I updated to 0.7.1 and ran it using the -x option on the same file I used to test performance in IntelliJ, and the results came back in less that 1 second. I don’t think shellcheck itself is the performance bottle neck.

Just to get more information I tried removing the shellcheck include directives from my little test BASH script and then repeated the same edit test. Surprisingly, the CPU still jumped for a long time - maybe 20 seconds this time (instead of 40 seconds when there is included files). While this is an improvement there is clearly still some kind of issue, as my test script is only 30 lines of very simple BASH.

jansorg · November 23, 2020, 8:44pm

This boils down to a bunch of issues.
I’ve started to work on fixes, but it’s still unclear if these are fixing all of your problems.

BashSupport goes into files, which include the currently edited file. That’s needed to resolve variables declared in shared file and a few more things. That was most likely causing the CPU spikes of your last post.

I’d publish a snapshot version soon, if you wanted to give this a try. Are you on 2020.2?

Joachim

jay.guidos · November 23, 2020, 10:23pm

Sure Joachim, I will try out what ever you want to chuck over to me I am on the latest release, 2.2.3

“A bunch of issues” - nothing like field testing code, right?

Jay

jay.guidos · November 23, 2020, 10:24pm

Did you get my uploaded files? I did include one of our base libraries, it has a bunch of non-trivial BASH functions in them that should exercise parsers well.

J

jansorg · November 24, 2020, 2:23pm

Hi @jay.guidos,
thanks for helping to test this!
There’s now version 1.5.11 at https://plugins.jetbrains.com/plugin/13841-bashsupport-pro/versions/eap.
That’s an eap version, so you need to download the zip file and install either via Drag&Drop or with Settings > Plugins... > Install from disk.

The new version should perform much better now in projects with many sourced files.
I already have automated tests to parse the most popular GitHub shell projects, but performance tests were too few. I’ve tested this version with a project, which has 100+ files linked via source commands.

There’s a new setting at Settings > Languages&Frameworks > BashSupport Pro to control if the automatic completion popup should display items from files, which source the currently edited file. Enable this when you’re missing declarations in completions — or request completions by shortcut.
It’s off by default – but it might change for the public release.

If editing, code completion, etc, is still (too) slow for your project:

could you send me a new CPU snapshot, please?
do you see an improvement or none at all?
what hardware are you using? Slower machines or machines with few cores might be slowed down by ShellCheck running in the background. ShellCheck runs for a while for large files and/or files with many source commands.
Try to disable ShellCheck by turning off Settings>Editor>Inspections>BashSupport Pro>ShellCheck to see if it helps.
Are things better with File > Power Save Mode turned on?

Joachim

jay.guidos · November 24, 2020, 3:41pm

Hi Joachim,

I can’t seem to install the EAP:

Do I have to uninstall the current one first?

Sorry, this is my first attempt at using an EAP plugin

Jay

jansorg · November 24, 2020, 3:51pm

@jay.guidos I see what happened. You installed the first version on that list, which is for 2020.3.
Install the 2nd one, which should work for you: https://plugins.jetbrains.com/plugin/13841-bashsupport-pro/versions/eap/103128

No uninstall needed, it’s replaced after the restart.

jay.guidos · November 24, 2020, 10:56pm

Hi Joachim,

Good news! What ever you changed in the EAP seems to have fixed things. I did the same test change as before. With the last plugin CPU jumped for 20-30 seconds and no check results appeared until the CPU went back down. Now, I make a change that should issue a warning in exactly the same place in the same file, and not only does the CPU not even flicker the warning I expect appears essentially instantly.

Well done!

Jay

jansorg · November 25, 2020, 9:17am

Glad to hear that! I’ll polish things a bit and will release the fixes later this week.

Joachim

jansorg · November 28, 2020, 1:30pm

I’ve published version 1.5.12 on the stable channel. I had to change a few things of the original attempt, but also improved a few more slow pieces of the software. Overall, this version should be much better than 1.5.10. Thanks for providing feedback and support to test this!

Joachim

jay.guidos · November 28, 2020, 4:06pm

Thanks Joachim!

I did notice a few hiccups here and there in your EAP, but I figured you were focusing on the performance issues first. The parser seemed to get confused in a few places, I am sure the polished stable version will be fine.

Thanks again for all your support in getting this to work. I have said it before, this plugin is a game changer in how to write clean, reliable and error-free BASH!

Jay

jansorg · November 28, 2020, 4:52pm

There was a bug in the eap, which prevented “go to declaration” (aka ctrl+click) on some variables, which is fixed in the stable version.
If you notice other regressions or bugs, please report it and I’ll see what I can do to fix it.

Joachim

jay.guidos · November 28, 2020, 9:01pm

Hi Joachim,

I am not sure if this is related to your changes or its a new thing, but here is a problem:

The highlighted code reports a SC2154, but it isn’t even a variable so how can it be unassigned, right? If I try to do an inline disable directive, I get this:

So the parser is not clear where to put the directive, in this case in the line above the directive.

I trimmed the code down to just this function in a single file, and it seems to still show the problem. Here is the actual code, maybe you can reproduce?

#!/usr/bin/env bash
# shellcheck disable=SC2155,SC1090,SC2164

function awsSqsQueueMsgRead()
{
    local sqsUrl="${1:?}"
    local message instanceId instanceIp receiptHandle instanceState

    message=$(aws sqs receive-message --queue-url "${sqsUrl}" --max-number-of-messages 1)
    instanceId=$(echo "${message}"|jq -r .Messages[].Body)
    if [[ -z "${instanceId}" ]]; then
        echo "No instances available!"
        return 2
    fi

    receiptHandle=$(echo "${message}"|jq -r .Messages[].ReceiptHandle)
    aws sqs delete-message --queue-url "${sqsUrl}" --receipt-handle "${receiptHandle}"
    instanceState=$(aws ec2 describe-instances --instance-ids "${instanceId}" --query 'Reservations[0].Instances[0].State.Name' --output text)

    if [[ "${instanceState}" -ne "running" ]]; then
        echo "Got a dead instance (not running) off the queue: ${instanceId}"
        return 1
    fi

    instanceIp=$(
        aws \
            ec2 \
                describe-instances \
                --instance-ids "${instanceId}" \
                --query 'Reservations[0].Instances[0].NetworkInterfaces[0].PrivateIpAddress' \
                --output text
                )

    if [[ ${instanceIp^^} == "NONE" ]]; then
        echo "Got a dead instance (no IP) off the queue: ${instanceId}"
        return 1
    fi

    cat - <<JSON
{
    "instanceId" : "${instanceId}",
    "hostname" : "ip.${instanceIp//\./-}.aws-test.local",
    "ip" : "${instanceIp}"
}
JSON
}

jansorg · November 28, 2020, 10:39pm

Well, welcome to the wonderful world of Bash scripting
You almost made me report this to ShellCheck (that’s where the warning is coming from), it’s definitely confusing…

But it’s actually correct, as far as I can tell.
See the last paragraph of Bash Conditional Expressions (Bash Reference Manual)

When used with the [[ command, Arg1 and Arg2 are evaluated as arithmetic expressions

String comparision is actually [[ "${instanceState}" != "running" ]], comparing integers uses -ne and other operators (see man bash or help test).
This means that [[ 42 -ne running ]] is an integer comparison and equivalent to [[ 42 -ne $((running)) ]], which is a reference to a variable named running.

You can test this with:

# prints "not equal"
state=123; running=234; [[ "${state}" -ne "running" ]] && echo not equal || echo equal
# prints "equal"
state=123; running=123; [[ "${state}" -ne "running" ]] && echo not equal || echo equal

Write your code as [[ "${instanceState}" != "running" ]] for string comparison, and you’re good to go

(The wrong position of the inserted disable directive is a bug, I’ll fix that for the next update. The last update did not change here, so it shouldn’t be a new thing or a regression.)

Joachim