View in EDS

Automated virtual machine resource management in container-supported many task computing

Saved in:

Bibliographic Details
Title:	Automated virtual machine resource management in container-supported many task computing
Patent Number:	11734,064
Publication Date:	August 22, 2023
Appl. No:	17/733090
Application Filed:	April 29, 2022
Abstract:	An apparatus includes a processor to: receive a request to perform a job flow; within a performance container, based on the data dependencies among a set of tasks of the job flow, derive an order of performance of the set of tasks that includes a subset able to be performed in parallel, and derive a quantity of task containers to enable the parallel performance of the subset; based on the derived quantity of task containers, derive a quantity of virtual machines (VMs) to enable the parallel performance of the subset; provide, to a VM allocation routine, an indication of a need for provision of the quantity of VMs; and store, within a task queue, multiple task routine execution request messages to enable parallel execution of task routines within the quantity of task containers to cause the parallel performance of the subset.
Inventors:	SAS Institute Inc. (Cary, NC, US)
Assignees:	SAS Institute Inc. (Cary, NC, US)
Claim:	1. An apparatus comprising at least one processor and a storage to store instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: receive, at the at least one processor, and from a requesting device via a network, a request to perform a job flow, wherein: the job flow is defined in a job flow definition that specifies a set of tasks to be performed via execution of a corresponding set of task routines during a performance of the job flow, and that specifies data dependencies among the set of task routines; and the set of task routines are executed within a set of task containers instantiated within at least one virtual machine (VM); store, within a job queue, a job performance request message comprising an indication of receiving the request to perform the job flow; within a performance container, execute instructions of a performance routine to cause the at least one processor to, in response to the storage of the job performance request message within the job queue, perform operations comprising: based on the data dependencies among the set of tasks specified in the job flow definition, derive an order of performance of the set of tasks that specifies a subset of tasks of the set of tasks that are able to be performed in parallel, and derive a quantity of task containers within the set of task containers that enables parallel execution of a corresponding subset of task routines to cause the parallel performance of the subset of tasks; based on at least the derived quantity of task containers within the set of task containers, derive a quantity of at least one other VM that, in combination with the at least one VM, provides resources to enable instantiation of the derived quantity of task containers to enable the parallel execution of the subset of task routines; provide, to a VM allocation routine executed by the at least one processor, an indication of a need for provision of the at least one other VM, wherein execution of the VM allocation routine causes the at least one processor to allocate the at least one VM; and store, within a task queue, multiple task routine execution request messages that each comprise an identifier associated with a task of the subset of tasks to enable the parallel execution of the subset of task routines within the set of task containers; and within the performance container, in response to at least storage of multiple execution completion messages within the task queue that are indicative of completion of execution of the subset of task routines, perform operations comprising: provide, to the VM allocation routine, an indication of cessation of the need for provision of the at least one other VM.
Claim:	2. The apparatus of claim 1 , wherein the provision of the at least one other VM comprises at least one of instantiating a VM of the at least one other VM, or providing access to a VM of the at least one other VM.
Claim:	3. The apparatus of claim 1 , wherein: the allocation of the at least one VM comprises allocation of at least one of processing resources or storage resources provided within at least one node device; and the provision of the at least one other VM comprises a further allocation of the at least one of the processing resources or storage resources provided within the at least one node device.
Claim:	4. The apparatus of claim 3 , wherein: the job flow definition indicates that a task of the subset of tasks is of a task type that requires access to a particular resource that is provided by less than all node devices of the at least one node device; and in executing the VM allocation routine, and in response to the indication that a task of the subset of tasks is of the task type, the at least one processor is caused to perform operations comprising: analyze resources provided by each node device of the at least one node device to identify a node device that provides the particular resource; and provide a VM of the at least one other VM that is instantiated within the identified node device.
Claim:	5. The apparatus of claim 4 , wherein: the at least one processor executes a resource allocation routine that causes the at least one processor to dynamically allocate multiple containers based on availability of resources within each VM of the at least one VM; and in executing the resource allocation routine, and in response to the indication that a task of the subset of tasks is of the task type, the at least one processor is caused to perform operations comprising: analyze resources provided by each VM of the at least one VM and of the at least one other VM to identify a VM that provides the particular resource; and provide a task container of the set of task containers that is instantiated within the identified VM.
Claim:	6. The apparatus of claim 4 , wherein the at least one processor is caused to perform operations comprising: within each task container of the set of task containers, in response to the storage of a task routine execution request message within the task queue, analyze an identifier of the task type within the task routine execution request message to determine whether the task container supports a performance of a task of the task type; and in response to a determination that the task container supports the performance of a task of the task type, perform operations comprising: use an identifier of a task provided in the task routine execution request message to retrieve a corresponding task routine; and execute the retrieved corresponding task routine to cause the performance of the identified task.
Claim:	7. The apparatus of claim 1 , wherein the at least one processor is caused to perform operations comprising: provide, to a resource allocation routine executed by the at least one processor, an indication of a need for provision of the derived quantity of task containers, wherein execution of the resource allocation routine causes the at least one processor to dynamically allocate multiple containers based on availability of at least one of processing resources or storage resources within the at least one VM.
Claim:	8. The apparatus of claim 7 , wherein the at least one processor delays the provision, to the resource allocation routine, of the indication of the need for the provision of the derived quantity of task containers by a predetermined amount of time after the provision, to the VM allocation routine, of the indication of the need for provision of the at least one other VM to cause the provision of the resources to enable the instantiation of the quantity of task containers.
Claim:	9. The apparatus of claim 7 , wherein the at least one processor is caused to perform operations comprising: within the performance container, in response to at least storage of multiple execution completion messages within the task queue that are indicative of completion of execution of the subset of task routines, perform operations comprising: provide, to the resource allocation routine, an indication of cessation of the need for provision of the derived quantity of task containers.
Claim:	10. The apparatus of claim 7 , wherein: the dynamic allocation of the multiple containers comprises dynamic allocation of multiple pods based on the availability of the at least one of processing resources or storage resources within the at least one VM; the performance container and a messaging container are instantiated within a performance pod of the multiple pods; and within the messaging container, instructions of an instance of a messaging routine are executed by the at least one processor to cause the at least one processor to provide the instance of the performance routine with access to the job queue and the task queue.
Claim:	11. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause at least one processor to perform operations comprising: receive, at the at least one processor, and from a requesting device via a network, a request to perform a job flow, wherein: the job flow is defined in a job flow definition that specifies a set of tasks to be performed via execution of a corresponding set of task routines during a performance of the job flow, and that specifies data dependencies among the set of task routines; and the set of task routines are executed within a set of task containers instantiated within at least one virtual machine (VM); store, within a job queue, a job performance request message comprising an indication of receiving the request to perform the job flow; within a performance container, execute instructions of a performance routine to cause the at least one processor to, in response to the storage of the job performance request message within the job queue, perform operations comprising: based on the data dependencies among the set of tasks specified in the job flow definition, derive an order of performance of the set of tasks that specifies a subset of tasks of the set of tasks that are able to be performed in parallel, and derive a quantity of task containers within the set of task containers that enables parallel execution of a corresponding subset of task routines to cause the parallel performance of the subset of tasks; based on at least the derived quantity of task containers within the set of task containers, derive a quantity of at least one other VM that, in combination with the at least one VM, provides resources to enable instantiation of the derived quantity of task containers to enable the parallel execution of the subset of task routines; provide, to a VM allocation routine executed by the at least one processor, an indication of a need for provision of the at least one other VM, wherein execution of the VM allocation routine causes the at least one processor to allocate the at least one VM; and store, within a task queue, multiple task routine execution request messages that each comprise an identifier associated with a task of the subset of tasks to enable the parallel execution of the subset of task routines within the set of task containers; and within the performance container, in response to at least storage of multiple execution completion messages within the task queue that are indicative of completion of execution of the subset of task routines, perform operations comprising: provide, to the VM allocation routine, an indication of cessation of the need for provision of the at least one other VM.
Claim:	12. The computer-program product of claim 11 , wherein the provision of the at least one other VM comprises at least one of instantiating a VM of the at least one other VM, or providing access to a VM of the at least one other VM.
Claim:	13. The computer-program product of claim 11 , wherein: the allocation of the at least one VM comprises allocation of at least one of processing resources or storage resources provided within at least one node device; and the provision of the at least one other VM comprises a further allocation of the at least one of the processing resources or storage resources provided within the at least one node device.
Claim:	14. The computer-program product of claim 13 , wherein: the job flow definition indicates that a task of the subset of tasks is of a task type that requires access to a particular resource that is provided by less than all node devices of the at least one node device; and in executing the VM allocation routine, and in response to the indication that a task of the subset of tasks is of the task type, the at least one processor is caused to perform operations comprising: analyze resources provided by each node device of the at least one node device to identify a node device that provides the particular resource; and provide a VM of the at least one other VM that is instantiated within the identified node device.
Claim:	15. The computer-program product of claim 14 , wherein: the at least one processor executes a resource allocation routine that causes the at least one processor to dynamically allocate multiple containers based on availability of resources within each VM of the at least one VM; and in executing the resource allocation routine, and in response to the indication that a task of the subset of tasks is of the task type, the at least one processor is caused to perform operations comprising: analyze resources provided by each VM of the at least one VM and of the at least one other VM to identify a VM that provides the particular resource; and provide a task container of the set of task containers that is instantiated within the identified VM.
Claim:	16. The computer-program product of claim 14 , wherein the at least one processor is caused to perform operations comprising: within each task container of the set of task containers, in response to the storage of a task routine execution request message within the task queue, analyze an identifier of the task type within the task routine execution request message to determine whether the task container supports a performance of a task of the task type; and in response to a determination that the task container supports the performance of a task of the task type, perform operations comprising: use an identifier of a task provided in the task routine execution request message to retrieve a corresponding task routine; and execute the retrieved corresponding task routine to cause the performance of the identified task.
Claim:	17. The computer-program product of claim 11 , wherein the at least one processor is caused to perform operations comprising: provide, to a resource allocation routine executed by the at least one processor, an indication of a need for provision of the derived quantity of task containers, wherein execution of the resource allocation routine causes the at least one processor to dynamically allocate multiple containers based on availability of at least one of processing resources or storage resources within the at least one VM.
Claim:	18. The computer-program product of claim 17 , wherein the at least one processor delays the provision, to the resource allocation routine, of the indication of the need for the provision of the derived quantity of task containers by a predetermined amount of time after the provision, to the VM allocation routine, of the indication of the need for provision of the at least one other VM to cause the provision of the resources to enable the instantiation of the quantity of task containers.
Claim:	19. The computer-program product of claim 17 , wherein the at least one processor is caused to perform operations comprising: within the performance container, in response to at least storage of multiple execution completion messages within the task queue that are indicative of completion of execution of the subset of task routines, perform operations comprising: provide, to the resource allocation routine, an indication of cessation of the need for provision of the derived quantity of task containers.
Claim:	20. The computer-program product of claim 17 , wherein: the dynamic allocation of the multiple containers comprises dynamic allocation of multiple pods based on the availability of the at least one of processing resources or storage resources within the at least one VM; the performance container and a messaging container are instantiated within a performance pod of the multiple pods; and within the messaging container, instructions of an instance of a messaging routine are executed by the at least one processor to cause the at least one processor to provide the instance of the performance routine with access to the job queue and the task queue.
Claim:	21. A computer-implemented method comprising: receiving, by at least one processor, and from a requesting device via a network, a request to perform a job flow, wherein: the job flow is defined in a job flow definition that specifies a set of tasks to be performed via execution of a corresponding set of task routines during a performance of the job flow, and that specifies data dependencies among the set of task routines; and the set of task routines are executed within a set of task containers instantiated within at least one virtual machine (VM); storing, within a job queue, a job performance request message comprising an indication of receiving the request to perform the job flow; within a performance container, executing, by the at least one processor, instructions of a performance routine to cause the at least one processor to, in response to the storage of the job performance request message within the job queue, perform operations comprising: based on the data dependencies among the set of tasks specified in the job flow definition, deriving, by the at least one processor, an order of performance of the set of tasks that specifies a subset of tasks of the set of tasks that are able to be performed in parallel, and deriving, by the at least one processor, a quantity of task containers within the set of task containers that enables parallel execution of a corresponding subset of task routines to cause the parallel performance of the subset of tasks; based on at least the derived quantity of task containers within the set of task containers, deriving, by the at least one processor, a quantity of at least one other VM that, in combination with the at least one VM, provides resources to enable instantiation of the derived quantity of task containers to enable the parallel execution of the subset of task routines; providing, to a VM allocation routine executed by the at least one processor, an indication of a need for provision of the at least one other VM, wherein execution of the VM allocation routine causes the at least one processor to allocate the at least one VM; and storing, within a task queue, multiple task routine execution request messages that each comprise an identifier associated with a task of the subset of tasks to enable the parallel execution of the subset of task routines within the set of task containers; and within the performance container, in response to at least storage of multiple execution completion messages within the task queue that are indicative of completion of execution of the subset of task routines, performing operations comprising: providing, to the VM allocation routine, an indication of cessation of the need for provision of the at least one other VM.
Claim:	22. The computer-implemented method of claim 21 , wherein, the provision of the at least one other VM comprises at least one of instantiating a VM of the at least one other VM, or providing access to a VM of the at least one other VM.
Claim:	23. The computer-implemented method of claim 21 , wherein: the allocation of the at least one VM comprises allocation of at least one of processing resources or storage resources provided within at least one node device; and the provision of the at least one other VM comprises a further allocation of the at least one of the processing resources or storage resources provided within the at least one node device.
Claim:	24. The computer-implemented method of claim 23 , wherein: the job flow definition indicates that a task of the subset of tasks is of a task type that requires access to a particular resource that is provided by less than all node devices of the at least one node device; and the method comprises, in executing the VM allocation routine, and in response to the indication that a task of the subset of tasks is of the task type, performing operations comprising: analyzing, by the at least one processor, resources provided by each node device of the at least one node device to identify a node device that provides the particular resource; and providing a VM of the at least one other VM that is instantiated within the identified node device.
Claim:	25. The computer-implemented method of claim 24 , wherein: the at least one processor executes a resource allocation routine that causes the at least one processor to dynamically allocate multiple containers based on availability of resources within each VM of the at least one VM; and the method comprises, in executing the resource allocation routine, and in response to the indication that a task of the subset of tasks is of the task type, performing operations comprising: analyzing, by the at least one processor, resources provided by each VM of the at least one VM and of the at least one other VM to identify a VM that provides the particular resource; and providing a task container of the set of task containers that is instantiated within the identified VM.
Claim:	26. The computer-implemented method of claim 24 , comprising: within each task container of the set of task containers, in response to the storage of a task routine execution request message within the task queue, analyzing, by the at least one processor, an identifier of the task type within the task routine execution request message to determine whether the task container supports a performance of a task of the task type; and in response to a determination that the task container supports the performance of a task of the task type, performing operations comprising: using an identifier of a task provided in the task routine execution request message to retrieve a corresponding task routine; and executing, by the at least one processor, the retrieved corresponding task routine to cause the performance of the identified task.
Claim:	27. The computer-implemented method of claim 21 , comprising: providing, to a resource allocation routine executed by the at least one processor, an indication of a need for provision of the derived quantity of task containers, wherein execution of the resource allocation routine causes the at least one processor to dynamically allocate multiple containers based on availability of at least one of processing resources or storage resources within the at least one VM.
Claim:	28. The computer-implemented method of claim 27 , comprising delaying the provision, to the resource allocation routine, of the indication of the need for the provision of the derived quantity of task containers by a predetermined amount of time after the provision, to the VM allocation routine, of the indication of the need for provision of the at least one other VM to cause the provision of the resources to enable the instantiation of the quantity of task containers.
Claim:	29. The computer-implemented method of claim 27 , comprising: within the performance container, in response to at least storage of multiple execution completion messages within the task queue that are indicative of completion of execution of the subset of task routines, performing operations comprising: providing, to the resource allocation routine, an indication of cessation of the need for provision of the derived quantity of task containers.
Claim:	30. The computer-implemented method of claim 27 , wherein: the dynamic allocation of the multiple containers comprises dynamic allocation of multiple pods based on the availability of the at least one of processing resources or storage resources within the at least one VM; the performance container and a messaging container are instantiated within a performance pod of the multiple pods; and within the messaging container, instructions of an instance of a messaging routine are executed by the at least one processor to cause the at least one processor to provide the instance of the performance routine with access to the job queue and the task queue.
Patent References Cited:	7698427 April 2010 Lee et al. 3024405 September 2011 Shukla et al. 8671403 March 2014 Sundarrajan 9313133 April 2016 Yeddanapudi 9454323 September 2016 Dausner 9577972 February 2017 Word 9916135 March 2018 Dube 9946719 April 2018 Bowman 9984004 May 2018 Little 9998418 June 2018 Clark 10042886 August 2018 Saadat-Panah 10169121 January 2019 Vibhor 10185547 January 2019 Sun et al. 10277603 April 2019 Ainscow 10346780 July 2019 Deng et al. 10360053 July 2019 Christensen 10361919 July 2019 Yang 10635642 April 2020 Haggerty 10691501 June 2020 Hussain 10838756 November 2020 Singh 10977081 April 2021 Mandagere 10977111 April 2021 Rungta 11068309 July 2021 Allen 11144363 October 2021 Francis Conde 11171834 November 2021 Bockelmann 11481245 October 2022 Oliver 20020184250 December 2002 Kern et al. 20060029068 February 2006 Frank 20130024872 January 2013 Bobroff 20130232497 September 2013 Jalagam et al. 20130290979 October 2013 Kawano 20130332612 December 2013 Cai 20130347003 December 2013 Whitmore 20140040905 February 2014 Tsunoda 20140067457 March 2014 Nagendra et al. 20150082317 March 2015 You et al. 20150149745 May 2015 Eble 20150205633 July 2015 Kaptur 20170093988 March 2017 Rehaag et al. 20170163647 June 2017 Cernoch et al. 20170255886 September 2017 Schmidt et al. 20200133728 April 2020 Nataraj 20210182729 June 2021 George et al.
Other References:	Tim Dornemann et al; “On-Demand Resource Provisioning for BPEL workflows using Amazon's Elastic Compute Cloud”; IEEE 2009; (Dornemann_2009.pdf, pp. 140-147) (Year: 2009). cited by examiner Chung et al; “Stratus: cost-aware container scheduling in the public cloud”; Carnegie Mellon University; ACM 2018; (Chung_2018.pdf; pp. 121-134) (Year: 2018). cited by examiner Ramirez et al; “Capacity-Driven Scaling Schedules Derivation for Coordinated Elasticity of Containers and Virtual Machines”; IEEE 2019; (Ramirez_2019.pdf; pp. 177-186) (Year: 2019). cited by examiner Fakhfakh et al., “Towards a Provisioning Algorithm for Dynamic Workflows in the Cloud” IEEE 2015; pp. 35-40. cited by applicant Garg et al., “Adaptive workflow scheduling in grid computing based on dynamic resource availability”; Karabuk University; Engineering Science and Technology, an International Journal, 2015, pp. 256-267. cited by applicant Yang et al., “A Workflow-base Computational Resource Broker with Information Monitoring in Grids”, GCC'06; IEEE 2006, pp. 1-8. cited by applicant Sharma et al., “Containers and Virtual Machines at Scale: A Comparative Study,” In Proceedings of the 17th International Middleware Conference (Middleware '16). Association for Computing Machinery, New York, NY, USA, Article 1, pp. 1-13. cited by applicant Zhang et al., “A Comparative Study of Containers and Virtual Machines in Big Data Environment,” Accepted by Jul. 5, 2018 IEEE International Conference On Cloud Computing, pp. 8. arXiv:1807.01842 [cs.DC]. cited by applicant Yildiz et al.; “Fault-Tolerance in Dataflow-based Scientific Workflow Management”; 2010 IEEE 6th World Congress on Services; ( Yildiz_201 0.pdf; pp. 336-343) (Year: 2010). cited by applicant
Primary Examiner:	Patel, Hiren P
Attorney, Agent or Firm:	KDW Firm PLLC
Accession Number:	edspgr.11734064
Database:	USPTO Patent Grants

View record in USPTO Patent Grants

Description
Abstract:	An apparatus includes a processor to: receive a request to perform a job flow; within a performance container, based on the data dependencies among a set of tasks of the job flow, derive an order of performance of the set of tasks that includes a subset able to be performed in parallel, and derive a quantity of task containers to enable the parallel performance of the subset; based on the derived quantity of task containers, derive a quantity of virtual machines (VMs) to enable the parallel performance of the subset; provide, to a VM allocation routine, an indication of a need for provision of the quantity of VMs; and store, within a task queue, multiple task routine execution request messages to enable parallel execution of task routines within the quantity of task containers to cause the parallel performance of the subset.