Copyleaks services were designed from top to bottom to be scalable, in order to support high loads of work and provide the best performance. We have implemented the scan process in order to minimize bottlenecks and achieve high performance.
To take advantage of all of these capabilities, please follow this article which includes all the relevant information to make sure you are maximizing the performance with your integration.
Once the scan has been completed, you will be informed with the "Completed" webhook that the results are ready. Then, in most cases, you will download all the materials (results, source document, pdf report etc.) to your servers.
Our traditional method to download the scan results was to send a new REST HTTP call for each result that you would like to download. This option is slow and we introduced another solution that has much better performance - "Export" method.
With the ‘Export’ method, you can specify the exact scan information that you would like to download and the Copyleaks API will push it directly to your servers.
Some more tips to gain even higher speed:
- If your storage system is also REST HTTP supported (examples: Google Cloud Storage, Microsoft Azure Storage and AWS Buckets), we can push the data directly into your storage system without any high load on your servers. Point us directly to your storage system and we will store the data there for you. You also can use Signed URLs techniques according to your needs.
- If your storage also supports GZIP on requests? Add
Accept-Encoding: gzip, compressto the export headers. In this case, we will compress the data before transmitting it over the network.
Overall, using the ‘Export’ method will substantially reduce the download time and cut the needed processing resources on both sides.
Network data compression
Network communication between two servers always leads to a relatively slow progress and transmitting the data over the internet consumes time.
Compressing the transfer data can reduce the data size by around 70% of the original size.
Copyleaks supports compressing payloads over HTTP requests and responses.
Compression over request
The Copyleaks server supports compression of the client request.
It is important to use this, especially when sending large payloads to the server. This mostly takes place when submitting files to the Copyleaks servers.
In order to activate this, you will need to compress (gzip) your request payload and add the following header to your request:
Compression over response
Our servers support compression of response payload to the client.
To enable this compression, you will need to state that you support compression by adding the
Accept-Encoding header to the request:
Accept-Encoding: gzip, compress
When our servers will detect that header, we will automatically respond with a compressed payload to the client.
Turn off unused features
Copyleaks services were designed from scratch for high flexibility. Each product feature you enable will add another task to be processed on our side. Each task takes time to be processed and it's not rare to see clients that enable features they don't need. Therefore, it's slowing down their scans.
It is important to turn off all unwanted features to make sure you’re optimizing your usage with Copyleaks.
Some examples of features that you should disable if not used:
properties.includeHtml- If the textual version (text, not HTML) is enough for you, don’t enable the HTML version.
properties.pdf.create- Not using the PDF report? Turn it off.
properties.expiration- Use short expiration time for your scans. In most cases, you will store the backup of the scans on your servers. So, using high expiration time won’t help but it will slow down the overall process. In most cases setting up the expiration to 7 days is enough.
properties.filters- this is a group of features that let us know which types of results we should be looking for. Use the filters to Narrow the search area and enhance the search speed.
This list is not complete. The complete list is under the documentation page of the Submit.
Simultaneous Scan Limitations
Copyleaks services are fully deployed on the Cloud. Using the cloud allows us to dynamically allocate resources to support the scan load.
It’s important for you to understand the way you should submit your content. Understanding the mechanism will allow you to scan at a high rate and get short response times.
If, for example, you have 1 million documents that you would like to scan. Scanning them all immediately won’t be possible because of the Rate Limit policy that we enforce (see here for more information). So, you should submit your content with the rate of
N calls per second (where
N is the max rate limit).
Another key aspect is to avoid fast starts. Since our system is scaling according to the load on the service, it will take a little time for the scaling to take place. Flooding our service with a sudden load will result with poor performance. Instead, we strongly recommend that you start with a low rate and continue feeding the service with a similar number of tasks per second until completion.
Although our general rate limit policy is enough for most of the users, we also understand that it may not be enough for your needs. Especially for those cases, we have a variety of advanced plans for higher volumes, including an allocation of resources specifically for your account. Please write to [email protected] for more information.
Adjust the sensitivity level according to your needs
Copyleaks can be used for many use-cases and this is why we have different types of users that perform different tasks with the service. Some users can be more sensitive to scan speed while others care more about the comprehensiveness of the results. In order to provide a high-quality solution for all of our users, we added a new property -
properties.sensitivityLevel. With this feature, you can select if:
1: Speed is the most important factor for me.
5: Scan comprehensiveness is the most important.
According to our research, for most of the users level
3 (default) is the best option. This is our recommendation for most of the cases. But, feel free to change it according to your needs.
Reuse login token
In order to safely communicate with the Copyleaks server, you should add the Authorization header. The value of this header is a JWT token. This token expires after 48 hours. During this period, there is no need to send new Login calls, your current JWT token is just fine.
Do you have a technical question?
Use stackoverflow.com to get help from our development team and other Copyleaks users.