0% found this document useful (0 votes)
66 views12 pages

DataScience in Oil and Gas Engineering Projects.

DataScience

Uploaded by

santosh setty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
66 views12 pages

DataScience in Oil and Gas Engineering Projects.

DataScience

Uploaded by

santosh setty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
y015r24, 731 AM DataScience in Oll and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science 7 ~ ( Following v ) 584K Followers = C = DataScience in Oil and Gas Engineering Projects. Exploratory Data Analysis @ Felipe sanchez Sep3,2019 - 4minread * Photo by Dean Brierley on Unsplash hitpstowaredatascience.comidatacience-in-ol-and-gas-engineering-projects-daaceBaGc7! an s015r21, 731 AM DataScience in Ol and Gas Engineering Projecs. | by Felipe Sanchez | Towards Data Science Hello here I will explain part of my thesis research project. First I will do a classic EDA in python, in the first step I import classic libraries and upload the file: Amport import inport Amport rnunpy as np pandas as pd # natplotlib.pyplot as plt seaborn as sns sns.set() full 1 2 3 4 5 Snatplotlib iniine 6 7 8 ° pd.read_csv( 'Docunents\FullD8Frank.csv', delimiter full.head() OnGt hosted with © by GitHub rect FY Quarter Monts gga Mt caegayaetae Nn Coen Cause dete Amount Comments Root canes © ANG 21 GA Omoton — atatns | FLAMING, IZLE BOTT nq TREN sabawnacowese Preselane eyo + Aa 210 2 nw Oprent iis LAMRING Recindargnese” mo NMP es na oo conning 2 ANG 21 G2 my Opes ans LANING Resin 16960 Nea nan ‘Sues Toone Author's creations This database has the next information about overcost in oil and gas offshore engineering projects: Project: Name of the Project Time of execution: FY: Year, Quarter and Month. Main Category and Category details of the engineering problem causing the overcost of the project. Main Cause and Cause details of the problem related to project management related causes. hitpsitowarsedatascience.comidalasciencesin-olland-gas-engine ane 10/921,731 AM DataScience in Ol and Gas Engineering Projects | by Feline Sanchez | Towards Data Science ‘There is a link between these four columns described in the comments column. Some problems are judge as “only engineering problems” other as “engineering and project management problems”, the diferenciation is based on the description given by the people who deal with that problem and completed the database. Amount is the total loss of money due to each of the problems. Root Cause is the possible main cause of the problem. The next step is to explore the database in order to check for non valid values: Check for the missing values in the columns fig, ax = plt.subplots(figsize=(9,5)) 1 2 3 sns.heatmap(full.isnul1(), cbarsFalse, cmaps"Y1GnBu_ 4 plt.show() 02 hosted with ©) by GitHub view raw 105 126 “7 168 189 210 Bt D2 Z3 35 6 = 378 29 20 a 483 . a ae § 3 ® |G §& pba i 5 5 3 2 2 3 8 tectonic naka gt greens one ee ane to1s21. 731 AM Dateien in Of and Gas Engneang Projets |by Feige Sanchez | Towards Data Science The column Root Causes is almost empty, I will drop it. The comments column could be usefull for a NLP analysis, and to do a relationship with the categories or the caus However in this first example I will keep things simple, this exploration won’t need the comments column, The next step will explore the amount of money that was loss in each project. train = full.drop(columns = ["Conments*, "Root_causes"]) train[["Project™, "Anount"]].groupby(['Project'], as_indexsFalse).sum().sort_values(by="Amount", sns.barplot(x='Project", y="Anount", data-train) 1 2 3 4 5 plt.ylabel( 6 7 8 yount") plt.title("Average Loss Overcost in ORG engineering project (including recovery actions)") plt.xticks(rotation-90) plt.show() > (03 hosted with © by GitHub view raw Project Amount 1 Block 15 85255.000000 9 Pazfior 34990.100003 7 Moho Bilondo 24338 340000 0 ALNG 21541.000003 2 Block 17 47506.000000 10 Saxi 4684.000000 4 Epc2b — 3088,000000 5 Greater Plutonio —2037.000000 3 EGP3B 500.000000 8 Oso 500.000000 6 Legend Metrology 214.0000 Author's creations htpsstowardsdatascionce.comidalascioncein-lLand-gas-ongjneerng-projcts-daacoBoGe7! ane s015r21, 731 AM DataScience in Oll and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science ‘There amount is given in thousand of dollars. When plotting (see chart below) the amount of losses and the project name we can see “negative amounts”, that means that some actions were made to correct the mistakes and recover part of the losses. Average Loss Overcost in O&G engineering project (including recovery actions) 3000 (2500 2000 1500 & 1000 | 500 ° ot Mo ole || = 600 geeg z 2¢ 2g 2 8 sppe be aaeaed 2 5 = 2 3 5 8 g Project Author's creations as | am interested for only in positive “losses”, I will keep only positive values for amount. I define a new dataset called Overcost: overcost = train{train.Anount > @) a 2 3 sns.barplot(x«'Project", y="Anount', data-overcost) 4 pltsylabel ("Anount") 5 plt.title(“Average Loss Overcost in O86 engineering project") 6 plt.xticks(rotation=90) hntpssitowardsdalascience.comidalascionce-n-lLand-gas-ongneerng-projcts-daacoBoGe7! en2 y015r21, 731 AM 2 paeesnomyy Of hosted with © by GitHub Average Loss Overcost in O&G engineering project Block 17 ALNG Block 15 1 overcost{[ "Project", "Anount"]] 05 hosted with © by GitHub 1 9 7 0 2 ae SW. 8 he a preys 3 3 6 Project Author's creations Project Block 15 Paztlor Moho Bilondo ALNG Rinck 17 Amount 85255.000000 34990.100003 24338.340000 21541.000003 17506 000000 htpsstowardsdalascionce.comidalascionce-n-lLand-gas-ongneering-projects-daacoBoGe7t a DataScience in Ol and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science Legend Metrology fort_values (by="Anount eC“ :SSC“(‘CSC‘éC en2 s015r21, 731 AM DataScience in Ol and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science 10 Saxi 4684.000000 4 Epc2b — 3088.000000 & Greater Plutonio 2037.000000 3 EGP3B —_ 500.000000 8 Oso 500.000000 6 Legend Metrology —_214.000000 Author's creations Now I can check the real overcost produced in each of the projects. The numbers are higher compared to the table same above. However, there are some projects that still have “small” numbers, I will drop them in order to focus on the bigger projects. # Delete the rows of the proejcts E6P3B, Oso, Legend Metrology 2 overcost.drop(overcost{overcost["Project"] =» ("EGP3B")].index, inplacesTrue) 3. overcost.drop(overcost[overcost[ 'Project"] == ("0so")].index , inplece=True) 4 overcost.drop(overcost[overcost[ ‘Project'] == ("Legend Metrology")]-index , inplace=True) 5 6 sns.barplot(x='Project’, y="Anount’, hue="Main Category", datasovercost) 7 sns.set(rca( “Figure. Figsize' :(19,8.27))) 8 plt.ylabel(“Amount") 9 plt.title("Average Loss Overcost in 086 engineering project") 10 plt.show() 06 hosted with © by GitHub view raw The result by category is: htpsstowardsdatascionce.comidalascionce-n-lLand-gas-ongineerng-projcts-daacoBoGe7! m2 s015r21, 731 AM DataScience in Oll and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science ee Author's creations Since all the project have different losses compared with the magnitude of the project, its hard to compare how important each category is. To improve visualization, and have a similar view of all the projects, I normalize this data dividing the amount of losses in each project line by the total losses of the project: 21 Wormalize results 2 normalized overcost.copy() 3 4 normalized. loc{normalized.Project =» ‘ALNG', ‘Arount"] = 100*overcost..loc[overcast.Project == ‘A 5 normalized. loc{normalized.Project == "Glock 15', ‘Anount"] = 100*overcost.loc[overcost.Project = 5 normalized. loc{normalized.Project == ‘Block 17', “Arount") = 100*overcost. loc[overcost Project = 7 normalized.loc{normalized.Project == "Epc2b', ‘Anount'] = 10erovercost.locfovercost Project == * 8 normalized.loc{nornalized.Project == "Greater Plutonio', ‘Anount'} = 190*0vercost -loc{overcost.P 9 normalized. loc{normalized.Project == ‘Moho Silondo*, ‘Anount*] = 100tevercost. loc[overcost.Proje 18 normalized.loc{nornalized.Project == 'Pazflor’, "Anount") = 188¢overcost .loclovercost. Project 31 normalized. oc{normalized.Project == 'Saxi', ‘Anount!'] = 109tovercost.loc[overcost.Project == °S 2 13 normalized.groupby [Project ]) Anount -sum() a4 15 thigh overcost = overcost{overcost[“Anount"] > 10] 16 sns.barplot(x='Project', y="Amount', hues"Wain Category", datasnormalized) 37 sns.set(ro={" Figure. Figsize’:(17.7,8.27))) 18 plt.ylabel(“Anount 19 plt.titie("average Loss Overcost in 086 engineering project”) 20 plt.show() < > O7 hosted with O by GitHub View raw Sass 1 htps:itowardsdatascionce.comidalascionce-n-lLand-gas-ongineerng-projects-daacoBeGe7! ane y015r21, 731 AM DataScience in Oll and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science Lu. bE Author's creations The plot shows the relative amount of loss by project. It also show some outliers that could be drop (the values with Amount > 30 for instance): look for outliers , thresshold in 30 df.loc{] out = normalized.1oc[normalized.Aount > 38, :]-sort_values(by=["Anount"], ascending=False) normalized.drop(normalized{normalized{ ‘Anount'] > 30].index, inplace=True) sns.barplot (x= "Project", y="Anount", hues"F¥", data-normalized) sns. set(rc=( "Figure. Figsize’ :(17.7,8.27)}) 1 2 3 4 5 ‘ 7 8 plt.ylabel(“Amount) ° plt.title("Average Loss Overcost in O86 engineering project") 1 pit. show() 08 hosted with by GitHub view raw Ll a i htpsstowardsdatascionce.comidalascionce-n-l-and-gas-ongineerng-projects-daacoBoGe7! onz 01521, 731 AM DataScience in Ol and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science ere |e | Author's creations This last plot shows a comparative distribution of the normalized losses by categories for the most important projects. Finally we can explore the other columns. For example with a jointplot. But before, it would be neccesary to convert the categorical data such a the detailed causes, and the months into numbers (float type). So, I create the next mapping: 1 #Chaning the months to numbers 2 mapping = {‘January': 1, ‘February’: 2, ‘March':3, ‘April':4, 'May':5, ‘June’: 6, ‘July’: 7, ‘Au 3. normalized{ “Month ]=nornalized. Month.map(mapping) 4 normalized. columns #Chaning the causes to numbers mapping? = {°1.2 Late delivery fron suppliers/subcontractors": 1, ne 6 7 8 ‘1.4 Ship Rescheduling/Reallocation : Change of vessel’ : 2, ° Late issue of AFC docurentation’:3, e 1.3 Late availability of ships extra costs" :4, 11 ‘2.1 Incorrect estimate of cost in tender':5, a2. a2, Improper Wi e Book Rates / Escalations’ Incorrect estinate of allowances/contingencies':7, 14 ‘2.5 Teproper Contract/Subcontract Flowdown':8, 35 ‘3.1 Materials and equipment delivered out-of specs':9, 17 ‘4,1 Incorrect design engineering’ :11, 18 4 4, Incorrect installation engineering':12, Extra costs/staff cause by final docs delay':13, 28 ‘4,5 Incorrect execution offshore by Acergy':14, a4 2 4, a 3 1 2 4 5 a 16 ‘3.2 Inconplete or partisl delivery':1@, a 2 3 s 6 Incorrect execution offshore by 3rd party’:15, 7 Incorrect onshore local logistic’ :16, 23 "5. EQUIPMENT BREAKDOWN’ :17, 24 °5.1 Lack of preventive maintenance’ :18, 25 *5.2 Misuse of equipment’ :19) 26 normalized{ “CauseN" J=normalized.causes_details.map(napping2) 28 sns.Jointplot(*CauseN", ‘MonthN" ,normalized, kind="kde", space-@, color: 29 #sns.jointplot('CauseN’ , *MonthN' normalized, kind="hex", space-8, color htpsstowardsdatascionce.comidalascionce-n-lLand-gas-ongineerng-projects-daacoBoGe7! sone s015r21, 731 AM DataScience in Ol and Gas Engineering Projects. | by Felipe Sanchez | Towards Data Science 09 hosted with © by GitHub view raw And as a result, the plot give us a relationship between the cause detail and the month. As an illustration, we can see that the cause #5 “equipment breakdown” have great impact in the monts around 2 and 10 15.0 125 10.0 MonthN 0 5 10 15 a CauseN Author's creations Thanks for reading!!! Tam open to comments and suggestions the code for this EDA example is here htpsstowardsdatascionce.comidalascioncein-lLand-gas-ongjneerng-projcts-daacoBoGe7! nie s015r21, 731 AM DataScience in Ol and Gas Engineering Projecs. | by Felipe Sanchez | Towards Data Science For my project I was looking which was the best Machine Learning technique to use in the Oil and Gas database, I created a post where I explain how to chose it: How to implement the right Al technique for your digital transformation projects? Deep Learning VS Reinforcement Learning VS Bayesian Networks towardsdatascience.com You can also check out some explanaition of the basic algorithms of Machine Learning, ina next post: Supervised Learning Algorithms: Explanaition and Simple code A.supervised learning algorithm takes a known set of input data (the learning set) and known responses to the data (the... towardsdatascience.com DataScience Engineering Oil AndGas_— Data Visualization _Github 2 Google Play htpsstowardsdatascionce.comidalascioncein-lLand-gas-ongjneerng-projcts-daacoBoGe7! sane

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy