Podrobná bibliografie
| Název: |
ML-BASED ENCRYPTED FILE CLASSIFICATION FOR IDENTIFYING ENCRYPTED DATA MOVEMENT |
| Document Number: |
20240012912 |
| Datum vydání: |
January 11, 2024 |
| Appl. No: |
17/860037 |
| Application Filed: |
July 07, 2022 |
| Abstrakt: |
The disclosed technology teaches facilitate User and Entity Behavior Analytics (UEBA) by classifying a file being transferred as encrypted or not. The technology involves monitoring movement of a files by a user over a wide area network, detecting file encryption for the files using a trained classifier, wherein the detecting includes processing by the classifier some or all of the following features extracted from each of the files: a chi-square randomness test; an arithmetic mean test; a serial correlation coefficient test; a Monte Carlo-Pi test; and a Shannon entropy test, counting a number of the encrypted files moved by the user in a predetermined period, comparing a predetermined maximum number of encrypted files allowed in the predetermined period to the count of the encrypted files moved by the user and detecting that the user has moved more encrypted files than the predetermined maximum number, and generating an alert. |
| Assignees: |
Netskope, Inc. (Santa Clara, CA, US) |
| Claim: |
1. A computer-implemented method of detecting exfiltration designed to defeat data loss protection (DLP) by encryption before evaluation, including: intercepting, by a network security system server interposed on a network between a cloud-based application and a user endpoint, movement of a plurality of files by a user over the network to the cloud-based application, wherein the network security system server monitors traffic on the network associated with the user endpoint of the user; detecting, by the network security system server, file encryption for each file of the plurality of files using a trained machine learning (ML) classifier, wherein the detecting comprises: for each file of the plurality of files: determining a file type of the respective file, calculating two or more metrics for the respective file, the two or more metrics selected from: a chi-square metric based on a chi-square randomness test that measures a degree to which a distribution of sampled bytes varies from an expected distribution of bytes from the respective file; an arithmetic mean metric based on an arithmetic mean test that compares an arithmetic mean of the sampled bytes to an expected mean of the bytes from the respective file; a serial correlation coefficient metric based on a serial correlation coefficient test that calculates a serial correlation coefficient between pairs of successive sampled bytes from the respective file; a Monte Carlo-Pi metric based on a Monte Carlo-Pi test that maps concatenated bytes as coordinates of a square and calculates a degree to which a proportion of the mapped concatenated bytes that fall within a circle circumscribed by the square varies from an expected proportion that corresponds to mapping from the respective file; and an entropy metric based on a Shannon entropy test of randomness of the respective file, providing the two or more metrics and the file type as input to the trained ML classifier trained to classify the respective file as encrypted or unencrypted based on the two or more metrics and the file type, and receiving a classification of the respective file from the trained ML classifier based on the input; counting, by the network security system server, a number of the plurality of files classified as encrypted and moved by the user during a predetermined period of time; determining, by the network security system server, a predetermined maximum number of encrypted files the user is allowed to move during the predetermined period of time; comparing, by the network security system server, the predetermined maximum number for the user to the number counted; detecting, by the network security system server, based on the comparing, that the user has moved more encrypted files than the predetermined maximum number; and generating, by the network security system server, an alert that the user has moved more than the predetermined maximum number of encrypted files allowed to be moved. |
| Claim: |
2. The computer-implemented method of claim 1, wherein the predetermined maximum number of encrypted files allowed to be moved is based on a determined typical movement pattern. |
| Claim: |
3. The computer-implemented method of claim 2, wherein the determined typical movement pattern is based on what is typical for the user, determined by monitoring the user for at least 15 days. |
| Claim: |
4. The computer-implemented method of claim 2, wherein the determined typical movement pattern is based on what is typical for an organization, determined based on collecting at least 1000 user-days of data for users within the organization. |
| Claim: |
5. The computer-implemented method of claim 2, wherein determining normal movement patterns requires a minimum number of user-day data points. |
| Claim: |
6. The computer-implemented method of claim 5, wherein the minimum number of user-day data points includes both workday and non-workday data points. |
| Claim: |
7. The computer-implemented method of claim 1, wherein the network security system server monitors the traffic using API connections to cloud-based applications to capture the traffic originating from the cloud-based applications to the user endpoint and monitors inline traffic originating from the user endpoint. |
| Claim: |
8. The computer-implemented method of claim 1, wherein the calculating the two or more metrics for the respective file comprises sampling the respective file, and wherein the sampling is between 10 KB and 250 KB in size. |
| Claim: |
9. A non-transitory computer readable storage medium impressed with computer program instructions to detect exfiltration designed to defeat data loss protection (DLP) by encryption before evaluation, the instructions, when executed on a processor, implement a method comprising: intercepting traffic over a network between a cloud-based application and a user endpoint by a network security system interposed on the network between the cloud-based application and the user endpoint; detecting, based on the traffic, movement of a plurality of files by a user of the user endpoint over the network to the cloud-based application; detecting file encryption for each file of the plurality of files using a trained machine learning (ML) classifier, wherein the detecting comprises: for each file of the plurality of files: determining a file type of the respective file, calculating two or more metrics for the respective file, the two or metrics selected from: a chi-square metric based on a chi-square randomness test that measures a degree to which a distribution of sampled bytes varies from an expected distribution of bytes from the respective file; an arithmetic mean metric based on an arithmetic mean test that compares an arithmetic mean of the sampled bytes to an expected mean of the bytes from the respective file; a serial correlation coefficient metric based on a serial correlation coefficient test that calculates a serial correlation coefficient between pairs of successive sampled bytes from the respective file; a Monte Carlo-Pi metric based on a Monte Carlo-Pi test that maps concatenated bytes as coordinates of a square and calculates a degree to which a proportion of the mapped concatenated bytes that fall within a circle circumscribed by the square varies from an expected proportion that corresponds to mapping from the respective file; and an entropy metric based on a Shannon entropy test of randomness of the respective file, providing the two or more metrics and the file type as input to the trained ML classifier trained to classify the respective file as encrypted or unencrypted based on the two or more metrics and the file type, and receiving a classification of the respective file from the trained ML classifier based on the input; counting a number of the plurality of files classified as encrypted and moved by the user during a predetermined period of time; determining a predetermined maximum number of encrypted files the user is allowed to move during the predetermined period of time; comparing the predetermined maximum number for the user to the number counted; detecting that the user has moved more encrypted files than the predetermined maximum number; and generating an alert that the user has moved more than the predetermined maximum number of encrypted files allowed to be moved. |
| Claim: |
10. The non-transitory computer readable storage medium of claim 9, wherein the predetermined maximum number of encrypted files allowed to be moved is based on a determined typical movement pattern. |
| Claim: |
11. The non-transitory computer readable storage medium of claim 10, wherein the typical movement pattern is based on one of what is typical for the user, determined by monitoring the user for at least 15 days and what is typical for an organization, determined based on collecting at least 700 user-days of data for users within the organization. |
| Claim: |
12. The non-transitory computer readable storage medium of claim 10, wherein determining normal movement patterns requires a minimum number of user-day data points. |
| Claim: |
13. The non-transitory computer readable storage medium of claim 12, wherein the minimum number of user-day data points includes both workday and non-workday data points. |
| Claim: |
14. The non-transitory computer readable storage medium of claim 9, wherein the intercepting the traffic comprises using API connections to the cloud-based application to capture the traffic originating from the cloud-based application to the user endpoint and capturing inline traffic originating from the user endpoint. |
| Claim: |
15. A system including one or more processors coupled to memory, the memory loaded with computer instructions to detect exfiltration designed to defeat data loss protection (DLP) by encryption before evaluation, the instructions, when executed on the processors, implement actions comprising: intercepting traffic over a network between a cloud-based application and a user endpoint by the system interposed on the network between the cloud-based application and the user endpoint; detecting, based on the traffic, movement of a plurality of files by a user of the user endpoint over the network to the cloud-based application; detecting file encryption for each file of the plurality of files using a trained machine learning (ML) classifier, wherein the detecting comprises: for each file of the plurality of files, determining a file type of the respective file, calculating two or more metrics for the respective file, the two or more metrics selected from: a chi-square metric based on a chi-square randomness test that measures a degree to which a distribution of sampled bytes varies from an expected distribution of bytes from the respective file; an arithmetic mean metric based on an arithmetic mean test that compares an arithmetic mean of the sampled bytes to an expected mean of the bytes from the respective file; a serial correlation coefficient metric based on a serial correlation coefficient test that calculates a serial correlation coefficient between pairs of successive sampled bytes from the respective file; a Monte Carlo-Pi metric based on a Monte Carlo-Pi test that maps concatenated bytes as coordinates of a square and calculates a degree to which a proportion of the mapped concatenated bytes that fall within a circle circumscribed by the square varies from an expected proportion that corresponds to mapping from the respective file; and an entropy metric based on a Shannon entropy test of randomness of the respective file, providing the two or more metrics and the file type as input to the trained ML classifier trained to classify the respective file as encrypted or unencrypted based on the two or more metrics and the file type, and receiving a classification of the respective file from the trained ML classifier based on the input; counting a number of the plurality of files classified as encrypted and moved by the user during a predetermined period of time; determining a predetermined maximum number of encrypted files the user is allowed to move during the predetermined period of time; comparing the predetermined maximum number for the user to the number counted; detecting that the user has moved more encrypted files than the predetermined maximum number; and generating an alert that the user has moved more than the predetermined maximum number of encrypted files allowed to be moved. |
| Claim: |
16. The system of claim 15, wherein the predetermined maximum number of encrypted files allowed to be moved is based on a determined typical movement pattern. |
| Claim: |
17. The system of claim 16, wherein determining normal movement patterns requires a minimum number of user-day data points. |
| Claim: |
18. The system of claim 17, wherein the minimum number of user-day data points includes both workday and non-workday data points. |
| Claim: |
19. The system of claim 15, wherein the intercepting the traffic comprises using API connections to the cloud-based application and capturing inline traffic originating from the user endpoint. |
| Claim: |
20. The system of claim 15, wherein the calculating the two or more metrics for the respective file comprises sampling the respective file, and wherein the sampling is between 10 KB and 250 KB in size. |
| Current International Class: |
06; 06 |
| Přístupové číslo: |
edspap.20240012912 |
| Databáze: |
USPTO Patent Applications |