Any AI tool to analyse a git repo for malicious code?
Any AI tool to analyse a git repo for malicious code?
I'm trying to feel more comfortable using random GitHub projects, basically.
Any AI tool to analyse a git repo for malicious code?
I'm trying to feel more comfortable using random GitHub projects, basically.
I don't think "AI" is going to add anything (positive) to such a use case. And if you remove "AI" as a requirement, you'll probably get more promising candidates than if you restrict yourself to "AI" (whatever that means) solutions.
I don't care if the solution is AI based or not, indeed.
I guess I thought it like that because AI is quite fit for the task of understanding what might be the purpose of code in a few seconds/minutes without you having to review it. I don't know how some non-AI tool could be better for such task.
Edit: so many people against the idea. Have you guys used GitHub Copilot? It understands the context of your repo to help you write the next thing... Right? Well, what if you apply the same idea to simply review for malicious/unexpected behaviour on third party repos? Doesn't seem too weird for me.
AI is quite fit for the task
EXTREMELY LOUD INCORRECT BUZZER
AI is quite fit for the task of understanding what might be the purpose of code
Disagree.
I don’t know how some non-AI tool could be better for such task.
ClamAV has been filling a somewhat similar use case for a long time, and I don't think I've ever heard anyone call it "AI".
I guess bayesian filters like email providers use to filter spam could be considered "AI" (though old-school AI, not the kind of stuff that's such a bubble now) and may possibly be applicable to your use case.
AI is quite fit for the task of understanding
Sure, and parrots are amazing at spotting fallacies like cherry picking...
Don't listen to the idiots downvoting you. This is absolutely a good task for AI. I suspect current AI isn't quite clever enough to detect this sort of thing reliably unless it is very blatant malicious code, but a lot of malicious code is fairly blatant if you have the time to actually read an entire codebase in detail, which of course AI can do and humans can't.
For example the extra .
that disabled a test in xz? I think current AI would easily be capable of highlighting it as wrong. It probably wouldn't be able to figure out that it was malicious rather than a mistake yet though.
Privado CLI will produce a list of data exfilration points in the code.
If the JSON output file points out a bunch of endpoints you don't recognize from the README, then I wouldn't trust the project.
Privado likely won't catch a malicious binary file, but your local PC antivirus likely will.
The solution to what you want is not to analyze the code projects automagically, but rather to run them in a container/virtual machine. Running them in an environment which restricts what they can access limits the harm an intentional --- or accidental bug can do.
There is no way to automatically analyze code for malice, or bugs with 100% reliability.
Of course, 100% reliability is impossible even with human reviewers. I just want a tool that gives me at least something, cause I don't have the time or knowledge to review a full repo before executing it on my machine.
That is another tool you can use to reduce the risk of malicious code, but it isn't perfect, so using sandboxing doesn't mean you can forget about all other security tools.
There is no way to automatically analyze code for malice, or bugs with 100% reliability.
He wasn't asking for 100% reliability. 100% and 0% are not the only possibilities.
What do you consider malicious, specifically. Because AI are not magic boxes, they are regurgitation machines prone to hallucinations. You need to train it on examples to identify what you want from it.
I just want a report that says "we detected in line 27 or file X, a particular behavior that feels weird as it tries to upload your environment variables into some unexpected URL".
particular behavior that feels weird
Yea, AI doesn't do feelings.
tries to upload your environment variables into some unexpected URL
Most of the time that is obfuscated and can't be detected as part of a code review. It only shows up in dynamic analysis.
Perhaps snyk.io I used it in the past, but I didn't find it quite useful. Now I have a github action to upgrade dependencies every week. But you want some kind of scanner to be more involved on the actual codebase. Did you look into https://github.com/marketplace?query=security ? That's what I would do. But I never heard of any of those listed there. Let us know your findings after some time if you test 'em ;) good luck!