Event deduplication using multiple stages and concurrent processing
Gespeichert in:
| Titel: | Event deduplication using multiple stages and concurrent processing |
|---|---|
| Patent Number: | 12210,497 |
| Publikationsdatum: | January 28, 2025 |
| Appl. No: | 17/936787 |
| Application Filed: | September 29, 2022 |
| Abstract: | An event deduplication system may efficiently perform event deduplication (identifying “new” or “unique” events that might be an anomaly) by using a first stage that has multiple first stage processes running in parallel (e.g., at different data centers) and a single second stage that has a second stage process that receives and processes events from the different first stage processes. The second stage process updates a global state (e.g., lookup table) and periodically publishes the global state to the first stage processes to update their local state. When the second stage process receives a possible new event from a first stage process, it may more accurately determine whether the event is actually a new event based on the global state. |
| Inventors: | Amazon Technologies, Inc. (Seattle, WA, US) |
| Assignees: | Amazon Technologies, Inc. (Seattle, WA, US) |
| Claim: | 1. A system, comprising: a plurality of first stage computing nodes of an event deduplication system, wherein the first stage computing nodes respectively comprise one or more processors and memory, wherein a given first stage computing node of the plurality of first stage computing nodes is configured to implement a first stage process to, for an event of a plurality of events obtained from an input stream: determine, based on a key for the event and a local state, whether the event is a local reoccurring event; in response to a determination that the event is not a local reoccurring event: update the local state based on the key for the event; and send the event to a second stage process of the event deduplication system as a possible new event; and in response to a determination that the event is a local reoccurring event: determine, based on the key for the event and the local state, whether to send a promote state event to the second stage process; in response to a determination to send the promote state event, send the promote state event to the second stage process; and update the local state based on the key for the event; and a second stage computing node of the event deduplication system, wherein the second stage computing node comprises one or more other processors and memory, wherein the second stage computing node is configured to implement the second stage process to: in response to reception of the event as a possible new event: determine, based on the key for the event and a global state, whether the event is a global reoccurring event; in response to a determination that the event is not a global reoccurring event, update the global state based on the key for the event and output the event as a new event; and in response to a determination that the event is a global reoccurring event, update the global state based on the key for the event; and in response to reception of the promote state event, update the global state based on the key for the event. |
| Claim: | 2. The system of claim 1 , wherein the local state comprises a local lookup table, and wherein to determine that the event is not a local reoccurring event, the first stage process is configured to: determine, based on the key for the event, that an entry for the event does not exist in the local lookup table. |
| Claim: | 3. The system of claim 1 , wherein the local state comprises a local lookup table, and wherein to update the local state based on the key for the event in response to a determination that the event is not a local reoccurring event, the first stage process is configured to: insert an entry into the local lookup table that corresponds to the key for the event. |
| Claim: | 4. The system of claim 1 , wherein the local state comprises a local lookup table, and wherein to update the local state based on the key for the event in response to a determination that the event is a local reoccurring event, the first stage process is configured to: re-order a group of entries in the local lookup table, wherein an entry that corresponds to the key for the event is moved ahead within the group of entries according to a most recently used to least recently used ordering, or update a timestamp for the entry that corresponds to the key for the event. |
| Claim: | 5. The system of claim 1 , wherein the global state comprises a global lookup table, and wherein to determine that the event is not a global reoccurring event, the second stage process is configured to: determine, based on the key for the event, that an entry for the event does not exist in the global lookup table. |
| Claim: | 6. A method, comprising: performing, by a first stage process of a computing node: obtaining an event from an input stream of events; determining, based on a key for the event and a local state, whether the event is a local reoccurring event; in response to determining that the event is not a local reoccurring event: updating the local state based on the key for the event; and sending the event to a second stage process as a possible new event; and performing, by the second stage process of another computing node: receiving the event as a possible new event; determining, based on the key for the event and a global state, whether the event is a global reoccurring event; and in response to determining that the event is not a global reoccurring event: updating the global state based on the key for the event; and outputting the event as a new event; and sending at least a portion of the global state to the first stage process to update the local state. |
| Claim: | 7. The method of claim 6 , further comprising: performing, by the first stage process: obtaining another event from the input stream; determining, based on a key for the other event and the local state, that the other event is a local reoccurring event; in response to determining that the other event is a local reoccurring event: determining, based on the key for the other event and a property of the local state, whether to send a promote state event to the second stage process; in response determining to send the promote state event, sending the promote state event to the second stage process; and updating the property of the local state based on the key for the other event; and performing, by the second stage process: receiving the promote state event; and in response to receiving the promote state event, updating a global state based on the key for the other event. |
| Claim: | 8. The method of claim 7 , wherein the local state comprises a local lookup table, and wherein updating the property of the local state based on the key for the other event comprises: re-ordering a group of entries in the local lookup table, wherein an entry that corresponds to the key for the other event is moved ahead within the group of entries according to a most recently used to least recently used ordering, or updating a timestamp for the entry that corresponds to the key for the other event. |
| Claim: | 9. The method of claim 6 , wherein the local state comprises a local lookup table, and wherein determining that the event is not a local reoccurring event comprises: determining, based on the key for the event, that an entry for the event does not exist in the local lookup table. |
| Claim: | 10. The method of claim 9 , wherein determining, based on the key for the event, that an entry for the event does not exist in the local lookup table comprises: generating a digest based on a hash of the key; and determining that the local lookup table does not have an entry for the digest. |
| Claim: | 11. The method of claim 6 , wherein the local state comprises a local lookup table, and wherein updating the local state based on the key for the event comprises: expiring an entry from the local lookup table that corresponds to a key for another event; and inserting a new entry into the local lookup table that corresponds to the key for the event. |
| Claim: | 12. The method of claim 6 , further comprising: identifying a plurality of portions of data based on the obtained event; and generating the key based on the plurality of portions of data. |
| Claim: | 13. The method of claim 6 , further comprising: performing, by a first stage process of a computing node: receiving, from the other computing note, at least a portion of the global state; and in response to receiving at least the portion of the global state, updating the local state based on at least the portion of the global state. |
| Claim: | 14. The method of claim 6 , wherein a network comprises the computing node and another network comprises the other computing node, and wherein the network is remote from the other network, and wherein the first stage process runs concurrent with other first stage processes at other computing nodes that send other events to the second stage process as other possible new events. |
| Claim: | 15. One or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more processors of a second stage computing node of an event deduplication system, cause the one or more processors to implement a second stage process to: receive, from different first stage processes on different first stage computing nodes of the event deduplication system, a plurality of possible new events obtained by the different first stage processes from different input streams, wherein for an event of the plurality of possible new events obtained by a given first stage process from a given input stream, the one or more processors implement the second stage process to: determine, based on a key for the event and a global state, whether the event is a global reoccurring event; in response to a determination that the event is not a global reoccurring event, update the global state based on the key for the event and output the event as a new event; and send at least a portion of the global state to the different first stage processes to update a local state associated with the different first stage processes. |
| Claim: | 16. The one or more storage media as recited in claim 15 , wherein the global state comprises a global lookup table, and wherein to update the global state based on the key for the event, the program instructions when executed on or across the one or more processors further cause the one or more processors to implement the second stage process to: insert an entry into the global lookup table that corresponds to the key for the event. |
| Claim: | 17. The one or more storage media as recited in claim 15 , further comprising program instructions that when executed on or across the one or more processors further cause the one or more processors to implement the second stage process to: receive another possible new event from one of the first stage processes; determine, based on a key for the other possible new event and the global state, whether the other possible new event is a global reoccurring event; and in response to a determination that the other possible new event is a global reoccurring event, update the global state based on the key for the other possible new event. |
| Claim: | 18. The one or more storage media as recited in claim 17 , wherein the global state comprises a global lookup table, and wherein to update the global state based on the key for the other possible new event, the program instructions when executed on or across the one or more processors further cause the one or more processors to implement the second stage process to: re-order a group of entries in the global lookup table, wherein an entry that corresponds to the key for the other possible new event is moved ahead within the group of entries according to a most recently used to least recently used ordering, or update a timestamp for the entry that corresponds to the key for the other possible new event. |
| Claim: | 19. The one or more storage media as recited in claim 15 , further comprising program instructions that when executed on or across the one or more processors further cause the one or more processors to implement the second stage process to: receive a promote state event from one of the first stage processes, wherein the promote state event corresponds to a recurring event obtained by the first stage process; and in response to reception of the promote state event, update the global state based on a key for the recurring event. |
| Claim: | 20. The one or more storage media as recited in claim 15 , further comprising program instructions that when executed on or across the one or more processors further cause the one or more processors to implement the second stage process to: receive a request from one of the first stage processes for the global state; and in response to reception of the request, send at least the portion of the global state to the first stage process. |
| Patent References Cited: | 7979670 July 2011 Saliba et al. 9661074 May 2017 Peake 2015/0350260 December 2015 Tadepalli 2021/0406287 December 2021 Bar-on |
| Primary Examiner: | Truong, Cam Y T |
| Attorney, Agent or Firm: | Kowert, Robert C. Kowert, Hood, Munyon, Rankin & Goetzel, P.C. |
| Dokumentencode: | edspgr.12210497 |
| Datenbank: | USPTO Patent Grants |
| Abstract: | An event deduplication system may efficiently perform event deduplication (identifying “new” or “unique” events that might be an anomaly) by using a first stage that has multiple first stage processes running in parallel (e.g., at different data centers) and a single second stage that has a second stage process that receives and processes events from the different first stage processes. The second stage process updates a global state (e.g., lookup table) and periodically publishes the global state to the first stage processes to update their local state. When the second stage process receives a possible new event from a first stage process, it may more accurately determine whether the event is actually a new event based on the global state. |
|---|