BOOK A DEMO
LOGIN
Homekeyboard_arrow_rightDocumentationkeyboard_arrow_rightVersion 3keyboard_arrow_rightBusinesseskeyboard_arrow_rightSubmit a file
PUT

/v3/businesses/submit/file/{scanId}

Scan files to find where the content has been used elsewhere and check its originality. Using submit-file you can scan various file types for plagiarism and identify copied content. See supported formats .

lock You need to login with a user and api key in order to access this method.
Add this HTTP header to your request:
Authorization: Bearer <Your-Login-Token>
Not sure how to generate your login token? Read here .

For integration testing purposes, use sandbox mode - for free.

Request

URL Parameters

Name
Description
scanIdREQUIRED
A unique scan id provided by you.

We recommend you use the same id in your database to represent the scan in the Copyleaks database. This will help you to debug incidents.

Using the same ID for the same file will help you to avoid network problems that may lead to multiple scans for the same file.

String
Length: 3-36 characters.

Allowed characters are [a-z0-9] and the following symbols: [email protected]$^&-+%=_(){}<>';:/.",~`|

Learn more about the criteria for creating a Scan ID .

Body Parameters

Name
Description
base64REQUIRED
A base64 data string of a file. If you would like to scan plain text, encode it as base64 and submit it.
String
Example: aGVsbG8gd29ybGQ=
filenameREQUIRED
The name of the file as it will appear in the Copyleaks scan report Make sure to include the right extension for your filetype.
String
Example: Myfile.pdf Max length: 255 characters.
properties.action
Types of content submission actions.

Possible values:

  • Scan: Start scan immediately.
  • Check Credits: Check how many credits will be used for this scan.
  • Index Only: Only index the file in the Copyleaks internal database or Copyleaks Repository(depends on your submit request). No credits will be used.
Integer (enum)
Default: 0

Optional Values:
0 : Scan
1 : Check-Credits
2 : Index Only

properties.includeHtml
By default, Copyleaks will present the report in text format. If set to true, Copyleaks will also include html format.
Boolean
Default: false

Possible values:
True : results will be generated as HTML format, if possible. Otherwise, it will be generated as text format.
False : results will be generated as text format.

properties.developerPayload
Add custom developer payload that will then be provided on the webhooks .
String
Length: up to 512 characters.

Default: null

properties.sandbox
You can test the integration with the Copyleaks API for free using the sandbox mode.

You will be able to submit content for a scan and get back mock results, simulating the way Copyleaks will work to make sure that you successfully integrated with the API.

Turn off this feature on production environment.

Boolean
Default: false

Rate Limiting: This method has a maximum call rate limit of 100 sandbox scans within 1 hour. See the 429   Response code section at the bottom of this page.

properties.expiration
Specify the maximum life span of a scan in hours on the Copyleaks servers.

When expired, the scan will be deleted and will no longer be accessible.

Integer
Default: 2800

Range: 1 to 2800

properties.scanMethodAlgorithm
Choose the algorithm goal. You can set this value depending on your use-case.
Integer (enum)
Default: 0 - MaximumCoverage.

Available Options:
0 - MaximumCoverage: prioritize higher similarity score.
1 - MaximumResults: prioritize finding more sources.

properties.customMetadata
Add custom properties that will be attached to your document in a Copyleaks repository.

If this document is found as a repository result, your custom properties will be added to the result.
Object Array
Default: []

Example:
[
  {
    "key":"Test1",
    "value":"Test1"
  },
  ...
]
properties.author.id
A unique identifier that represents the author of the content. Make sure to use the same ID for the same author.

Using this feature Copyleaks can detect the author's writing patterns and get better results.

String
Default: null
properties.webhooks.newResult
Http endpoint to be triggered while the scan is still running and a new result is found. This is useful when the report is being viewed by the user in real time so the results will load gradually as they are found.
String (uri)
Default: null

Example: https://yoursite.com/webhook/new-result

properties.webhooks.newResultHeaders
Adds headers to the webhook.
Array of String Arrays
Example:
[
  [
    "header-key",
    "header-value"
  ],
  ...
]
properties.webhooks.statusREQUIRED
This webhook event is triggered once the scan status changes.

Use the special token {STATUS} to track the current scan status. This special token will automatically be replaced by the Copyleaks servers with the optional values: completed , error , creditsChecked and indexed .

Read more about webhooks .

String (uri)
Example: https://yoursite.com/webhook/{STATUS}
properties.webhooks.statusHeaders
Adds headers to the webhook.
Array of String Arrays
Example:
[
  [
    "header-key",
    "header-value"
  ],
  ...
]
properties.filters.identicalEnabled
Enable matching of exact words in the text.
Boolean
Default: true
properties.filters.minorChangesEnabled
Enable matching of nearly identical words with small differences like slow becomes slowly.
Boolean
Default: true
properties.filters.relatedMeaningEnabled
Enable matching of paraphrased content stating similar ideas with different words.
Boolean
Default: true
properties.filters.minCopiedWords
Select results with at least minCopiedWords copied words.
Unsigned Integer
Default: null
properties.filters.safeSearch
Block explicit adult content from the scan results such as web pages containing inappropriate images and videos. SafeSearch is not 100% effective with all websites.
Boolean
Default: false
properties.filters.domains
A list of domains to either include or exclude from the scan - depending on the value of domainsMode .
String Array
Default: []
properties.filters.domainsMode
Include or Exclude the list of domains you specified under the domains property

When Include is selected, Copyleaks will filter out all results that are not part of the properties.filters.domains list.

When Exclude is selected, Copyleaks will only find results outside of the properties.filters.domains list.

Integer (Enum)
Default: 1
Optional Values:
0 : Include
1 : Exclude
properties.scanning.internet
Compare your content with online sources.
Boolean
Default: true
properties.scanning.exclude.idPattern
Exclude your submissions from results if their id matches the supplied pattern. Matched submissions will be excluded from batch, internal database and repositories results.

Supported pattern wildcards:
* : Matches any, zero or more, characters.
. : Matches a single (non whitespace) character.
String
Default: null

Example:
abc* : will exclude any submissions that have an id starting with 'abc'.
ab.. : will exclude any submittions with exactly 4 letter id starting with 'ab'.
properties.scanning.repositories[]
Specify which repositories to scan the document against.
Object Array
Default: []
properties.scanning.repositories[].id
Id of a repository to scan the submitted document against.
String
Default: null
properties.scanning.repositories[].includeMySubmissions
Compare the scanned document against MY submittions in the repository.
Boolean
Default: false
properties.scanning.repositories[].includeOthersSubmissions
Compare the scanned document against OTHER users submittions in the repository.
Boolean
Default: false
properties.scanning.crossLanguages.languages[]
Cross language plagiarism detection. Choose which languages to scan your content against. For each additional language chosen, your pages will be deducted per page submitted. The language of the original document submitted is always scanned, therefore should not be included in the additional languages chosen.
Supported languages list.
Object Array
Default: []
Max length: 5
properties.scanning.crossLanguages.languages[].code
Language code for cross language plagiarism detection.
String
Default: null
properties.indexing.repositories[]
Specify which repositories to index the scanned document to.
Object Array
Default: []
properties.indexing.repositories[].id
Id of a repository to add the scanned document to.
String
Default: null
properties.indexing.repositories[].maskingPolicy
allows to specify a document masking policy on the document level.

If the repo has it's own masking policy, the stricter policy will be applied to results from this document.
Integer (enum)
Default: 0

Available policies:
0 : don't mask results from this document.
1 : Mask all results coming from this document, unless the requesting user owns this file.
2 : Mask all results from this document.

properties.exclude.quotes
Exclude quoted text from the scan.
Boolean
Default: false
properties.exclude.citations
Exclude citations from the scan.
Boolean
Default: false
properties.exclude.tableOfContents
Exclude table of contents from the scan.
Boolean
Default: false
properties.exclude.titles
Exclude titles from the scan.
Boolean
Default: false
properties.exclude.htmlTemplate
When the scanned document is an HTML document, exclude irrelevant text that appears across the site like the website footer or header.
Boolean
Default: false
properties.pdf.create
Add a request to generate a customizable export of the scan report, in a pdf format.

Set to true in order to generate a pdf report for this scan.

Boolean
Default: false
properties.pdf.title
Customize the title for the PDF report.
String
Default: null
Max length: 256 characters.
properties.pdf.largeLogo
Customize the logo image in the PDF report.

We only support png format.
String (base64)
Default: null

Max size: 100kb

properties.pdf.rtl
When set to true the text in the report will be aligned from right to left.
Boolean
Default: false
properties.sensitivityLevel
You can control the level of plagiarism sensitivity that will be identified according to the speed of the scan. If you prefer a faster scan with the results that contains the highest amount of plagiarism choose 1, and if a slower, more comprehensive scan, that will also detect the smallest instances choose 5.
Integer
Default:3

Optional Values:

Range between 1 (faster) to 5 (slower but more comprehensive)

properties.cheatDetection
When set to true the submitted document will be checked for cheating. If a cheating will be detected, a scan alert will be added to the completed webhook.
Boolean
Default:false
properties.aiGeneratedText.detectBETA
Detects whether the text was written by an AI.

Upon detection a scan alert of type "suspected-ai-text" will be added to the scan completion webhook.
Boolean
Default: false
properties.sensitiveDataProtection.driversLicense
Mask driver's license numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
  • Australia driver's license number
  • Canada driver's license number
  • United Kingdom driver's license number
  • USA drivers license number
  • Japan driver's license number
  • Spain driver's license number
  • Germany driver's license number
Boolean
Default:false
properties.sensitiveDataProtection.credentials
Mask credentials from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
  • Authentication token
  • Amazon Web Services credentials
  • Azure JSON Web Token
  • HTTP basic authentication header
  • Google Cloud Platform service account credentials
  • Google Cloud Platform API key
  • JSON Web Token
  • Encryption key
  • Password
Boolean
Default:false
properties.sensitiveDataProtection.passport
Mask passports from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
  • Canada passport number
  • China passport number
  • France passport number
  • Germany passport number
  • Ireland passport number
  • Japan passport number
  • Korea passport number
  • Mexico passport number
  • Spain passport number
  • United Kingdom passport number
  • USA passport number
  • Netherlands passport number
  • Poland passport
  • Sweden passport number
  • Australia passport number
  • Singapore passport number
  • Taiwan passport number
Boolean
Default:false
properties.sensitiveDataProtection.network
Mask network identifiers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
  • IP address
  • Local MAC address
  • MAC address
Boolean
Default:false
properties.sensitiveDataProtection.url
Mask url from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Boolean
Default:false
properties.sensitiveDataProtection.emailAddress
Mask email addresses from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Boolean
Default:false
properties.sensitiveDataProtection.creditCard
Mask credit card numbers and credit card track numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Boolean
Default:false
properties.sensitiveDataProtection.phoneNumber
Mask phone numbers from the scanned document with # characters. Available for users on a plan for 2500 pages or more.
Boolean
Default:false

Request Example

RAW
cURL
Python
C#
Node.js
PHP
PUT
https://api.copyleaks.com/v3/businesses/submit/file/my-custom-id
Headers
Body
base64: "SGVsbG8gd29ybGQh"
filename: "file.txt"
properties:
webhooks:
status: "https://yoursite.com/webhook/{STATUS}/my-custom-id"

Response

Codes

Status Code
Description
Example
201
The scan was Created.
400
Bad request.
{
  "properties.webhooks.status": [
    "The field is required."
  ]
}
401
Unauthorized

Authorization has been denied for this request.

409
A scan with the same Id already exists in the system.
429
Too many requests have been sent. The request has been rejected.

This may happen when sending too many scans in Sandbox mode.

Other resources:

  • Performance Considerations Important! - How to improve your scan performance.
  • Exponential Backoff - Algorithm that helps applications define a retry strategy for consuming a network service.
  • Technical Specifications - See API's limits and supported formats.
stack-overflow_icon.png

Do you have a technical question?

Use stackoverflow.com to get help from our development team and other Copyleaks users.

Ask a Question

PRODUCTS

Scans
Pricing
Copyleaks.com

RESOURCES

Documentation
Plagiarism Report
Help Center
System Status
Security

ABOUT

About us
Careers
Terms of Use
Privacy Policy
Sitemap

Copyleaks, Inc.

700 Canal St.
Stamford, CT 06902 USA

[email protected]

Copyleaks, Inc. All rights reserved. Use of this website signifies your agreement to the Terms of Use.

Copy