0% found this document useful (0 votes)
11 views33 pages

Reliable Task Framework: Raju Pandey

Uploaded by

Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views33 pages

Reliable Task Framework: Raju Pandey

Uploaded by

Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Reliable Task

Framework
Raju Pandey
Questions
• Why do we need another workflow system?
• What doesn’t get solved by Oklahoma or Flyte?
• Azkaban like DAGs
• Which workflow to choose?
• Deployment?
• How to migrate?
Organization
• Programming Model
• Execution and Failure Model
• Under the hood: Architecture
At the most basic level:
T1

T2 T3 T4
DAG/Workflow
• Partially ordered set of tasks & workflows

T5
What do we look for?
• How are DAGs defined?
• What kind of development support?

• What is the execution model?


• Failure model?

• How robust is the DAG runtime infrastructure?


• Will it scale?
• Is it reliable?
What do we look for – cont’d?
• How do I trigger/events that start/correlate with workflow
execution
• Can I access databases, message brokers, HIVE tables,
etc.?
• Can I run multiple versions? Update/Unroll?
• Can I isolate resources and data?
• ML workflow needs
• Dynamic pipelines
• Data lineage
• Resource management (GPU allocation, etc..)
• Resource isolation – one task cannot affect others
• Results caching
https://flyte.org/blog/orchestrating-data-pipelines-at-lyft-comparing-flyte-and-airflow
How do we define these DAGs?
T1
• DAG through DSLs:
• JSON, TOML, YAML
T2 T3 T4
• Airflow configuration using Python

• DAG through code:


T5
• Temporal, Cadence
• Azure Durable Task

• Implications
Airflow DAG definition

with DAG(“id”, default_args={...}, schedule=…) as


dag:
t1 = BashOperator(…)
t2 = PythonOperator(…)
t3 = MyOperator(…)
t4 = …
t5 = …
t1 >> [t2, t3, t4] >> t5
Execution model
DAG definition

Build DAG

T
1

T T T
2 3 4

T
5

Schedule DAG
Workflow as Code

• Workflow and Tasks defined embedded within


application

• Java (Temporal):
• Workflow Interface and Workflow Implementation
• Task interface and Task implementation
• Workers who will execute workflow and tasks.

• Identify invocations (gRPC calls)


Task (Activity in Temporal)

@ActivityInterface
public interface T1 {
@ActivityMethod String t1(P1 x);
}

public class T1Impl implements T1 {


@Override public String t1(P1 x) {

}
}
// code for T2, T3, T4, T5
Workflow (Temporal)
@WorkflowInterface
public interface W1 {
@WorkflowMethod String w1(P x);
}
public class W1impl implements W1 {
ActivityOptions options = ActivityOptions.newBuilder()
.setScheduleToCloseTimeout(Duration.ofSeconds(2)).build();

private final T1 t1 = Workflow.newActivityStub(T1.class, options);


private final T2 t2 = Workflow.newActivityStub(T2.class, options);

@Override public String w1(P x) {
t1.t1(..); // execute task 1
Promise<String> t2Val = Async.function(T2::t2, ..); // invoke t2 in parallel
Promise<String> t3Val = Async.function(T2::t2, ..); //

promiseValList = new ArrayList<>();
promiseValList.append(t2Val);

Promise.allOf(promiseValList); // wait
t5.t5();
}
}
Temporal: Workers that execute workflows and tasks

public class HelloWorldWorker {


public static void main(String[] args) {
WorkflowServiceStubs service = WorkflowServiceStubs.newLocalServiceStubs();
WorkflowClient client = WorkflowClient.newInstance(service);
WorkerFactory factory = WorkerFactory.newInstance(client);

Worker worker = factory.newWorker(Shared.HELLO_WORLD_TASK_QUEUE);


worker.registerWorkflowImplementationTypes(W1.class);

worker.registerActivitiesImplementations(new T1Impl());
worker.registerActivitiesImplementations(new T2Impl());

factory.start();
}
}
Execution model

Schedule & Execute


workflow/tasks

T
1

Schedule & Execute T


2
T
3
T
4

workflow/tasks

T
5

Schedule & Execute


workflow/tasks
Why does that matter?
Begi
n

Book a car

Book a hotel Transaction at workflow level

Book a flight

End
Why does that matter?
Begi
n
Compensating transactions

Book a car Cancel a car

Book a hotel Cancel a hotel

Book a flight Cancel a flight

End

Generating DAG based on results of DAG


Why does it matter?

public class BookTripWorkflowImpl implements BookTripWorkflow {



@Override void bookTrip(String name){
String carReservationID, hotelReservationId;

try {
carReservationID = activities.reserveCar(name);
} exception (ActivityFailure e){
// undo by calling another activity
}
try {
hotelReservationID = activities.bookHotel(name);
} exception (ActivityFailure e){
// undo by calling another activity
activities.cancelCar(carReservationID);
}

}

https://github.com/temporalio/samples-java/blob/main/src/main/java/io/temporal/samples/bookingsaga/TripBookingWorkflowImpl.java
Saga pattern
public class BookTripWorkflowImpl implements BookTripWorkflow {

@Override void bookTrip(String name){
// Configure SAGA to run compensation activities in parallel
Saga.Options sagaOptions = new
Saga.Options.Builder().setParallelCompensation(true).build();
Saga saga = new Saga(sagaOptions);
try {
String carReservationID = activities.reserveCar(name);
saga.addCompensation(activities::cancelCar, carReservationID, name);

String hotelReservationID = activities.bookHotel(name);


saga.addCompensation(activities::cancelHotel, hotelReservationID, name);

String flightReservationID = activities.bookFlight(name);


saga.addCompensation(activities::cancelFlight, flightReservationID, name);
} catch (ActivityFailure e) {
saga.compensate();
throw e;
}
}

https://github.com/temporalio/samples-java/blob/main/src/main/java/io/temporal/samples/bookingsaga/TripBookingWorkflowImpl.java
Temporal workflows
Workflo
• MPs in Java, Python, Go, … w SDK
Code
• Integration with:
• Config2, InGraph,
DataVault, …
MP

• Debugging, Observability

mint build
mint deploy
Execution Model
Execution Model: Airflow
• Schedule Tasks T1

• Run Tasks
• Store execution state information T2 T3 T4

• Replay a DAG
• Use state to determine which tasks to re-excute
T5
Execution Model for Temporal
• Event sourcing
• Capture workflow/task events – begin, end, fail, etc.
• Replay
• Recreate program state and execution state:
• Variable values
• Stacks, Threads
• Skip what has already been executed
• Execute unknowns
Child
Parent workflow workflow

T1 T3

W1 1 2 3 4 W2 5 6 7 8

T2 T4

Activity
Nested Workflow – failure at 3 in parent workflow

T1 T3

W1 1 2 3 4 W2 5 6 7 8

T2 T4
Update = Change + Replay

T1 W3 T3

W1 1 2 3 4 W2 5 6 7 8

T2 T5 T4
Architecture
Architecture - Airflow

Key: How scalable is scheduler in


• Persisting state
• Managing Task queues

https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/overview.html
30
Oklahoma Job Execution

https://docs.google.com/document/d/1sXax1Rs- 31
hs7qt2NL1JlAt2t0WwHcNDUhDK18DqO8eU4/edit#
RTF Architecture @ LinkedIn
Temporal Server Cluster
Rain InstanceRain Instance Rain Instance
Fronte Fronte Fronte
nd nd nd
servic servic servic
e
Histor e
Histor e
Histor


y y y
App & Worker Cluster servic servic servic
e e e
Rain Instance Rain Instance Matchi Matchi Matchi
ng ng ng
App. App. servic servic servic
Code
… Code e
Worke
r
e
Worke
r
e
Worke
r
Runti Runti servic servic servic
me me e e e

Storage Cluster
Monitoring & Alerting MYSQL MYSQL … MYSQL

InGraphs
Worker Performance Monitoring & Alerting

Server Performance
Scalability

• Shard each workflow


instances (load balancing)
• Shard DB for storing state
• Consistent hashing to map
workflow instances to
specific db partitions

Temporal cloud: support 1M


workflows/second

https://docs.google.com/presentation/d/1x0ETmVVJcbluTSnJGo8F2sNL1GKJPwOh-2s53x_UKLg/edit#slide=id.g1157260aeaa_0_386
Other features
• Airflow: support for several utilities
• Sensors: Check for some conditions
• Files
• SQLSensor
• HivePartition
• DateTime
• Operators: Predefined tasks
• BashOperator
• PythonOPerator
• Email, mysql, postgres
• Make it easier to integrate with external sources
• RTF: Forthcoming integration with events (Kafka, etc.)
• DB, Hive Table: none yet, but can existing integration code from other
services be used?
Other features
• ML workflow needs (Flyte)
• Dynamic and High frequency pipelines
• Data lineage
• Resource management (GPU allocation, etc..)
• Resource isolation – one task cannot affect others
• Results caching
Questions
• Why do we need another workflow system?
• What doesn’t get solved by Oklahoma or Flyte?
• Azkaban like DAGs
• Which workflow to choose?
• Deployment?
• How to migrate?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy