Choosing Scan ID Naming Convention
Copyleaks allows you to choose any scan ID that makes sense from your organization’s point of view. Although it's very flexible, the limitations are:
- Character length must be between 3-36 characters.
- Allowed characters are lowercase letters and digits
[a-z0-9]
and the following symbols:!@$^&-+%=_(){}<>';:/.",~`|
The idea behind this flexibility is to allow you to use the same entities ID from your system, within the Copyleaks platform. This way, the same ID represents the same entity both in your system and in the Copyleaks system.
What happens if you cannot use the same characters with Copyleaks ScanID?
Sometimes users have a longer scan id (for example 128 characters) or different characters on their scan-id (for example - upper cases A-Z). The solution for this case is to generate a valid Copyleaks Scan ID and handle it on your side. Remember that you can always have a mapping table that defines the relationship between your entity ID to Copyleaks scan ID.
Other options
In contrast to this approach, you might need to pick another naming strategy if you are using Copyleaks Internal Database and/or Copyleaks Repositories and you want to exclude results based on their
scan-id.
For example, if your application allows students to submit their materials and replace them with other submissions in the future, this will make the “old” submission returned in the results and affect the score of the entire work. A common request is to ignore submissions from the same student. The way you can implement it is based on properties.scanning.exclude.idPattern
. So, you can select an ID that looks similar to this:
CopyleaksScanID = {StudentId}-{SubmissionId}
For instance = studentid123-submissionid456
Then, you can specify the exclude.idPattern
field to exclude all his previous submissions. For instance: studentid123-*
.

Do you have a technical question?
Use stackoverflow.com to get help from our development team and other Copyleaks users.