Automation Cheat Sheet 2.0
Automation Cheat Sheet 2.0
Cheat Sheet
Frank Andrade
File & Folder Regex
We use regex to create patterns that help
Other Metacharacters
\b
Operation
Word boundary
match text.
Metacharacters
\B No word boundary
Path
Import Path:
\D No digits (0-9)
Table Extraction
\w Word Character (a-z, A-Z, 0-9, _) We can use camelot to extract tables from PDFs
from pathlib import Path
import camelot
List directory content: \S No Whitespace (space, tab, new line)
>>> list(Path().iterdir())
flavor='lattice')
Joining paths: ^ Beginning of a string
>>> from pathlib import Path, PurePath
Export tables:
>>> PurePath.joinpath(Path.cwd(), 'Dataset') $ End of a string tables.export('foo.csv',
'/Users/frank/Projects/DataScience/Dataset' f='csv',
Websites
Import library:
Metadata: {3,4} Range of numbers (Min, Max)
>>> path = Path('test/expenses.csv')
import pandas as pd
>>> path.parts ( ) Group
('test', 'expenses.csv')
Excel
With Python we can send emails and WhatsApp messages. Create workbook:
from openpyxl import Workbook
Email
wb = Workbook() # create workbook
Import libraries: ws = wb.active # grab active worksheet
import smtplib ws['C1'] = 10 # assign data to a cell
import ssl wb.save("report.xlsx") # save workbook
from email.message import EmailMessage
Working with existing workbook:
Set variables: from openpyxl import load_workbook
email_sender = 'Write-sender-here'
email_password = 'Write-passwords-here' wb = load_workbook('pivot_table.xlsx')
email_receiver = 'Write-receiver-here' sheet = wb['Report'] # grab worksheet "Report"
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, 'id_name'))) The “root node” is the top node. In this example,
# Wait 5 seconds until an element is clickable <article> is the root.
Every node has exactly one “parent”, except the
Options: Headless mode, change window size root. The <h1> node’s parent is the <article> node.
from selenium.webdriver.chrome.options import Options “Siblings” are nodes with the same parent.
options = Options() One of the best ways to find an element is building
options.headless = True its XPath
options.add_argument('window-size=1920x1080')
driver = webdriver.Chrome(service=service, options=options)
XPath
We need to learn how to build an XPath to
Google Sheets
Google Sheets is a cloud-based spreadsheet application that can store data in a structured way just like most
properly work with Selenium. database management systems. We can connect Google Sheets with Python by enabling the API and
scope = ['https://www.googleapis.com/auth/spreadsheets',
Let’s check some examples to locate the article, "https://www.googleapis.com/auth/drive"]
title, and transcript elements of the HTML code we
Sharing Sheet:
sheet.share('write-your-email-here', perm_type='user', role='writer')
XPath Functions and Operators
XPath functions Save spreadsheet to specific folder (first manually share the folder with the client email)
client.create("SecondSheet", folder_id='write-id-here')
//tag[contains(@AttributeName, "Value")]
Open a spreadsheet:
sheet = client.open("SecondSheet").sheet1
XPath Operators: and, or
//tag[(expression 1) and (expression 2)] Read csv with Pandas and export df to a sheet:
df = pd.read_csv('football_news.csv')
sheet.update([df.columns.values.tolist()] + df.values.tolist())
XPath Special Characters
Convert a dataframe to a dictionary:
Select rows by index values: df.to_dict()
A 1 4 df.loc['A'] df.loc[['A', 'B']]
Save a dataframe as an Excel table:
axis 0
df = B 2 5 Select rows by position:
df.iloc[1] df.iloc[1:]
df.to_excel('output.xlsx')
C 3 6
index=['A', 'B', 'C'], ascending=[False, True]) Make a pivot tables that says how much male and
female spend in each category:
name='col1') Identify duplicate rows:
Create a dataframe: df.duplicated() df_sales.pivot_table(index='Gender',
data = [[1, 4], [2, 5], [3, 6]] columns='Product line',
Drop duplicates: values='Total',
index = ['A', 'B', 'C'] df = df.drop_duplicates(['col1']) aggfunc='sum')
df = pd.DataFrame(data, index=index,