How we test? The making of a security testing project

1 September 2019

To explain how complicated the procedure of test automation is, we must describe a few problems with which we have been struggling while monitoring activity of antivirus products and malicious software in real time. In fact, here we describe what needs to be done to interest big international corporations which provide their security solutions and services for business in such small company that is AVLab. In addition, in this article we want to share our knowledge, and at the same time make readers aware of the fact how many issues have turned out to be complex, and how much time we had to spend on apparently insignificant problems. It is our first big project that is why engineering approach has revealed extent of problems.

All instructions regarding the methodology and tools used in tests were published on the website https://avlab.pl/en/cyber-transparency-audit/methodology/

We have been working on the project for over two years. In that time, we have been able to program scripts in NodeJS and Python programming languages which manage the whole procedure of testing. The AVLab website is also very important. It visualizes interesting test data, and everything that happens in the invisible layer which is a backend.

Testing environment: VMware, Citrix or VirtualBox?

The beginnings were difficult. We had to start with selecting a hypervisor for a machine management. For licensing reasons, we have chosen free VirtualBox Initially it has met our requirements – it has got all we need which is an access to snapshots and management from the command line. In addition, VirtualBox is free for commercial purposes. Either way, after a few weeks of testing, we have registered too many unexpected errors. The virtualizer has been hanging up at random moments. It could not handle managing multiple machines at the same time. The VirtualBox technology was rough around the edges and highly inefficient. We believe that VirtualBox is good solution, but designed for smaller projects or for home use. Then we tried VMware Workstation Pro and Citrix XenServer. We have decided on VMware, because it was easier to use, has greater community support, and fit in better with our demands.

The application which is launched in the terminal you see on the left side of the screenshot, has been written in NodeJS and Python programming languages. Its task is to, among others, optimize available resources, e.g. to prevent hanging up systems or other unplanned downtime.

Our testing system can be launched on any Linux distribution. Even on a home computer that can handle at least one virtual system. In the current version, the application is scalable that is why it can use the available resources on a traditional dedicated server or the cloud server. High computing power is necessary if there are dozens of machines operating at the same time. From experience we already know that very high-speed drives are the most important. Hard drives such as SAS Enterprise don’t have enough processing speed. Same with SSD SATA III. The best suited are commercial NVMe drives which are available, for example, in the offer of OVH, Hetzner, or Oktawave. A CPU and RAM play a big role. The more resources the more efficiently a guest system can work. At the moment, we have a server with dozens of CPU cores, very high-speed drives, and enough amount of RAM to run multiple machines at the same time.

The testing system consists of several components

The whole testing application is like one organism composed of individual elements in which each operates separately and at the same time cooperates with other. Actually, we have created application that does exactly the same thing as a man, but incomparably faster and more efficiently, because at the same time it can manage an unlimited number of virtual machines — it performs large number of operations as it is not limited by human biomechanics.

The elements of which our applications for tests is composed:

1. The local DNS and HTTP/S server

The local DNS and HTTP/S server is responsible for “exposing” malware samples to random URL address that is transferred to a component that manages tests. From this URL address virus is downloaded to the operating system through the Chrome browser.

2. Honeypot network

Samples used in our tests come from attacks on honeypots. Honeypots are traps whose task is to simulate target vulnerable to attacks and capture malicious software. We use low and high interactive honeypots. All of them emulate services such as: SSH, HTTP, HTTPS, SMB, FTP, TFTP, real Windows systems, and email servers.

3. Importer

It is this element of the application that logs into the honeypots once a day, and downloads captured malicious software. Then it calculates the checksum of every file and compares it against those in the database. Recipients of our tests are sure that we will never analyze two identical viruses.

4. Manager

It is the most important element of the application. It automates the whole test procedure:

Manages the honeypot network.
Manages downloading malicious software.
Analyses malicious software in Windows 10 to see if every virus is suitable for tests (i.e. if it is able to infect Windows 10).
Manages virtual machines.
Performs automatic operations of testing virus samples on all security products in every machine.
Analyses logs passing them from guest to host systems.
Manages logs of tested security solutions.
Sends diagnostic information to the database after analysis of each virus sample is completed.

The manager uses the VMware API to ensure that the machines are run at the same time. It responds to errors in analysis, and if necessary, it repeats operations (e.g. if the system freezes for some reason). That way, malicious software is set in the queue, so we know for sure that all protection solutions will be tested at the same time on the same virus sample.

5. Analyzer

It is a very important element of the testing system that firstly logs events performed in Windows 10 by malicious software, and secondly picks up information on reaction of a security product to malicious software.

6. Parser

On the basis of implemented guideline, it decides whether malicious software has infected a system, and whether a security product has reacted to a virus. The script searches logs for a virus activity, and reaction of a security product on a malicious activity. Indicators got that way are transmitted to the database.

Detailed methodology

This is an article describing the genesis of the testing project. We do not want it to be crammed with technical information, because there is just a lot of it. Detailed methodology can be found at the https://avlab.pl/en website in the section:

FAQ: https://avlab.pl/en/faq
Methodology in three steps:
- Capturing malware from honeypots: https://checklab.pl/en/methodology/capturing-malware-from-honeypots
- Analyzing system logs: https://checklab.pl/en/methodology/analyzing-system-logs
- Methods of carrying out automatic tests: https://checklab.pl/en/methodology/methods-carrying-out-automatic-tests
Recent results: https://avlab.pl/en/recent-results
Granted certificates: https://avlab.pl/en/awards

These problems have given us a hard time

It was hard. Sometimes very difficult. Not once we have hit a dead end. We have not given up. This required dozens hours of testing, checking many settings in the system, and improving scripts for automation.

Problems that have given us really a hard time are described below. Please consider that probably not all events occur when using computer on a regular basis. They have caught us a little by surprise only if we wanted to automate all operations using VMware API:

Disabling UAC in the control panel haven’t been respected under some file-virus extensions. The solution to the problem was disabling UAC in the Windows registry. Otherwise, we have not been able to skip confirmation of UAC through VMware API.
Downloading malicious software via the Chrome browser was one of the challenging procedures to automate. Why? If a file contains any extension, in new versions of Chrome (also in Firefox) it is not possible to disable the pop-up informing of a file harmfulness – even when the chrome://settings flag is disabled that is responsible for security.
The machines are managed by API of the VMware software. The problem is that we do notreally know when the network service is up and running after logging into the system. Consequently, it was not possible to automate steps from running a browser to downloading a virus with the „chrome.exe hxxp://IP/malware_sampledoc” parameter. This method has not worked, because the network service was not ready for operation within the first few seconds after logging.
Disabling the Windows Defender antivirus was effective only after changing options in Group Edit. Antivirus would interfere with analyzing malicious software, and send files to the Microsoft cloud. We did not want that. One of the machines act as a sandbox-system without any security product. We analyze malware for malicious activity, that is why the Microsoft’s native protection would interfere with the tests.
Running viruses through API has not been optimal method of representing the real situation. Instead, we have improved procedure, and we are currently doing this as if a user runs any programs in a system, clicking on folders and applications.
Logging certain event was not possible when actions were performed with the ring(-1) privileges at the VMware API level. The hypervisor works “below” the Windows system, that’s why some events cannot be recorded in Windows. We have moved away from doing anything using this method. Instead, we have focused on processes and services in Windows that natively run malicious software and analyze logs.
Using free solution to analyze logs has appeared to be too complicated or impossible, bearing in mind our requirements. We needed a tool which will be doing particular things – not just analyzing events of malware, but above all detecting product reaction to a malicious file (i.e. whether antivirus has displayed a warning pop-up or moved a file to a quarantine).
Tests with a number security products have required a dedicated server, which unfortunately significantly increased the project costs. At this moment, our testing application manages available resources, and runs the optimal number of machines at one time — just in case to avoid an unplanned freeze of the system or the hypervisor.

avlab.pl/en — the results of over 1000 working hours

The results of our work can be seen on the website https://avlab.pl/en.

We attach a few screenshots from the backend. We can’t show everything, because programming some components cost a lot of money, so it is covered by the trade secret.

Special thanks

Przemysław Czekaj from ProCoder.pl

We would like to express our special thanks to Przemysław Czekaj from ProCode company for programming the whole backend on the basis of guidelines as well as for additional know-how. A professional approach to the project from the planning phase to the testing and implementation, an active aid, and consulting services in the field of functioning and improving the application. It is to be commended! Przemysław’s commitment in the creation and development of the project lets us continue to cooperate. We highly recommend Przemysław for a very good knowledge of the Linux system – as an expert in the field of designing application, programming, and using third party API.

Rafał Pogroszewski from CreLab.pl

We thank Rafał Pogroszewski for graphic design of the website, logo, and certificates. We actively work with Rafał at AVLab. We recommend his company CreLab for very good approach to the customer – his examination of requirements before and during implementation of the project is thorough.

Robert Grzegorzak from RoburStudio.com

We thank Robert Grzegorzak for assist with configuration of the website. We have been working with Grzegorz at AVLab for few years.

SmartBees.pl company

We thank SmartBees.pl company which has linked the website with the backend. On the basis of an entity from the database they have implemented visualization of charts, tables, and other details. Through their work it is possible to view the results on charts and tables which are now understandable to any person.

Bartłomiej Hawrot from AVLab.pl

We thank Bartłomiej Hawrot for preparing a dedicated server with different configuration many times and continuous technical support. For few years already. His knowledge is invaluable, and his experience gained in managing and configuration of government websites turns into security and performance of our servers.

Mateusz Kurlit from AVLab.pl

Our special thanks goes to Mateusz Kurlit for preparation of the website in English version. We have been working with Mateusz for several years with translation of almost everything when it comes to articles and tests.

Regards also go to others people who have shared their knowledge, and wanted to remain anonymous.

What are our plans for future?

The plans are ambitious, but they require time and money. First of all, we want to provide companies with additional services. In the tests we want to complete vector of sending malware with additional protocols. It will certainly complicate an automation therefore we need time and additional budget. We want to improve the functionality of the virus analyzer and logging events in Windows using a public software Sysmon. We would like to provide users with an interface to transfer files. Such files could be added to our tests. But most of all, we do not want to create a second VirusTotal.

VirusTotal is not suitable for giving an opinion on whether or not antivirus detects anything. We will try our best to prove this below.

At the occasion of various campaigns with spam on many technical websites, we can read that “only a few antivirus applications detect a threat”. It is a half-truth, because VirusTotal should not be used to give such opinions. Therefore, a non-technical person who read such information can conclude that it is not worth investing in security.

We will prove that the results from scanning on VirusTotal does not have any relation with an actual reaction to running a threat on a real operating system. We conducted a similar research in April 2019. It came in handy when we presented facts regarding fair testing. At the conference Check Point Experience 2019 in Poland, we proved that VirusTotal is helpful in analyzing malicious software, but it is totally not suitable for carrying out even amateur tests, because:

Antivirus engines implemented on VirusTotal operate from the command line. In this connection, they may not be able to access the functionality which form part of real security suites. It proves a practical approach to testing. For example, malware which will be blocked by a firewall module, it will not be blocked by an antivirus engine on VirusTotal in a realistic scenario.

As we read in the official document, antivirus engines on VirusTotal are binary versions, operating from the command line. They will not behave exactly the same as versions which we install on computers. In other words, engines implemented on VirusTotal usually do not have a firewall, scanning in the cloud, sandbox, HIPS, DLP, blocking script viruses, and other modules.

We are tired of repeating that VirusTotal was not designed as a tool to perform antivirus comparative analyses, but as a tool that checks suspicious samples with several antivirus solutions and helps antivirus labs by sending them the malware they have failed to detect. Those who use VirusTotal to perform antivirus comparative analyses should know that they are making many implicit errors in their methodology.
– source: https://support.virustotal.com/hc/en-us/articles/115002094589-Why-do-not-you-include-statistics-comparing-antivirus-performance

For this reason, we are joining the VirusTotal request and reminding not to issue a false opinion on security products.

This is the first real reason which shows why not to follow the opinion from the VirusTotal scanning. The second reason probably is more relevant, but it is not documented:

Malware which is uploaded using online panel into the VirusTotal service IS NOT LAUNCHED in certain cases. It performs a static analysis of a file, i.e. checksums are calculated, DLLs are extracted, Windows API functions are disclosed, and links with other malicious campaigns are revealed. Every file is scanned by antivirus engine, however, a dynamic analysis is performed only for binary files. Consequently, analyzed EXE files will show a virus activity, but for instance VBS scripts, malicious invoices, PDF file, or macro viruses in DOCX files not always.

In our security tests we launch every malicious file and use genuine versions of antivirus software, so those which are installed by users on their computers. Next, we collect logs from the whole Windows system: from the activity of malware and the response of tested product to a threat. This way, our tests are popular, and enjoy a good reputation among the community and developers of protection solutions themselves.

Just one more thing…

The total cost of the project slightly exceeded 100 000 PLN. It is hard to tell if it is a lot or too little. It depends on the level of the budget. On our micro scale, it is a lot of money. The investment has been funded from private money (without any subsidies). Now we are just waiting for implementation of tests to the benefit of users, developers, and our small business.

We encourage you to visit the https://avlab.pl/en website that is entirely dedicated to security tests. And do not forget about https://avlab.pl that continuously plays an important educational role since 2012.

0 Comments

Latest Publications

EDR-XDR Visibility & Correlation Assessment 2026

22 June 2026

How we test? The making of a security testing project

Testing environment: VMware, Citrix or VirtualBox?