100% found this document useful (1 vote)
480 views6 pages

Automation Cheat Sheet 2.0

This document provides an automation cheat sheet covering various tools and techniques for automating tasks involving files, folders, websites, emails, and reports using Python. It includes summaries of using regular expressions, the Pathlib module, Camelot and Pandas for table extraction, sending emails, and creating Excel reports with OpenPyxl. Metacharacters, quantifiers, and groups for regex patterns are defined.

Uploaded by

Av Ri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
480 views6 pages

Automation Cheat Sheet 2.0

This document provides an automation cheat sheet covering various tools and techniques for automating tasks involving files, folders, websites, emails, and reports using Python. It includes summaries of using regular expressions, the Pathlib module, Camelot and Pandas for table extraction, sending emails, and creating Excel reports with OpenPyxl. Metacharacters, quantifiers, and groups for regex patterns are defined.

Uploaded by

Av Ri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Automation

Cheat Sheet

Websites | WhatsApp | Emails


Excel | Google Sheets | Files & Folders

Frank Andrade
File & Folder Regex
We use regex to create patterns that help
Other Metacharacters
\b

Operation
Word boundary
match text.

Metacharacters

\B No word boundary

We can create folders and manipulate files in Python using Path.


\d Digit (0-9)
\1 Reference

Path
Import Path:
\D No digits (0-9)

Table Extraction
\w Word Character (a-z, A-Z, 0-9, _) We can use camelot to extract tables from PDFs
from pathlib import Path

and pandas to extract tables from some websites.


\W Not a Word Character
Get current working directory:
>>> Path.cwd()

Whitespace (space, tab, new line)


PDF
'/Users/frank/Projects/DataScience'
\s

Import library:

import camelot
List directory content: \S No Whitespace (space, tab, new line)
>>> list(Path().iterdir())

[PosixPath('script1.py'), PosixPath('script2.py')] . Any character except new line Read PDF:



tables=camelot.read_pdf('foo.pdf',
List directory content within a folder: \ Ignores any special character
>>> list(Path('Dataset').iterdir())
pages='1',

flavor='lattice')
Joining paths: ^ Beginning of a string
>>> from pathlib import Path, PurePath
Export tables:
>>> PurePath.joinpath(Path.cwd(), 'Dataset') $ End of a string tables.export('foo.csv',
'/Users/frank/Projects/DataScience/Dataset' f='csv',

Quantifiers & Groups


Create a directory:
compress=True)
>>> Path('Dataset2').mkdir() * 0 or more (greedy)
>>> Path('Dataset2').mkdir(exist_ok=True)

Export first table to a CSV file:


Rename a file: + 1 or more (greedy) tables[0].to_csv('foo.csv')
>>> current_path = Path('Data')

>>> target_path = Path('Dataset') ? 0 or 1


>>> Path.rename(current_path, target_path)
Print as a dataframe:

{3} Exact number print(tables[0].df)


Check existing file:

>>> check_path = Path('Dataset')


>>> check_path.exists() # True/False {n,} More than n characters

Websites

Import library:
Metadata: {3,4} Range of numbers (Min, Max)
>>> path = Path('test/expenses.csv')
import pandas as pd
>>> path.parts ( ) Group
('test', 'expenses.csv')

>>> path.name Read table:


expenses.csv [ ] Matches characters in brackets tables=pd.read_html('https://xyz.com')
>>> path.stem

expenses [^ ] Matches characters not in brackets


>>> path.suffix
Printing table:
.csv | Or print(tables[0])

Send Email & Create Reports


Message
We can create an Excel report in Python using openpyxl.

Excel
With Python we can send emails and WhatsApp messages. Create workbook:
from openpyxl import Workbook
Email
wb = Workbook() # create workbook
Import libraries: ws = wb.active # grab active worksheet
import smtplib ws['C1'] = 10 # assign data to a cell
import ssl wb.save("report.xlsx") # save workbook
from email.message import EmailMessage
Working with existing workbook:
Set variables: from openpyxl import load_workbook
email_sender = 'Write-sender-here'
email_password = 'Write-passwords-here' wb = load_workbook('pivot_table.xlsx')
email_receiver = 'Write-receiver-here' sheet = wb['Report'] # grab worksheet "Report"

subject = 'Check this out!' Cell references:


body = """ min_column = wb.active.min_column
I've just published a new video on YouTube max_column = wb.active.max_column
""" min_row = wb.active.min_row
Send email: max_row = wb.active.max_row
em = EmailMessage()
em['From'] = email_sender Create Barchart:
em['To'] = email_receiver from openpyxl.chart import BarChart, Reference
em['Subject'] = subject barchart = BarChart()
em.set_content(body)
context = ssl.create_default_context() Locate data:
data = Reference(sheet,
with smtplib.SMTP_SSL('smtp.gmail.com', 465, context=context) as smtp: min_col=min_column+1,
smtp.login(email_sender, email_password) max_col=max_column,
smtp.sendmail(email_sender, email_receiver, em.as_string()) min_row=min_row,
max_row=max_row)
WhatsApp Locate categories:
Import libraries: categories = Reference(sheet,
min_col=min_column,
import pywhatkit max_col=min_column,
min_row=min_row+1,
Send message to a contact: max_row=max_row)
# syntax: phone number with country code, message, hour and minutes
pywhatkit.sendwhatmsg('+1xxxxxxxx', 'Message 1', 18, 52) Add data and categories:

barchart.add_data(data, titles_from_data=True)
Send message to a contact and close tab after 2 seconds: barchart.set_categories(categories)
# syntax: same as above plus wait_time, tab_close and close_time
pywhatkit.sendwhatmsg(“+1xxxxxxx”, “Message 2”, 18, 55, 15, True, 2) Add chart:

sheet.add_chart(barchart, "B12")
Send message to a group:
# syntax: group id, message, hour and minutes Save existing workbook:
pywhatkit.sendwhatmsg_to_group("write-id-here", "Message 3", 19, 2) wb.save('report_2021.xlsx')
Web Automation HTML for Web Automation
Let's take a look at the HTML element syntax.
Web automation is the process of automating web actions like
clicking on buttons, selecting elements within dropdowns, etc. Tag Attribute Attribute
name name value End tag
The most popular tool to do this in Python is Selenium.
Selenium 4 <h1 class="title"> Titanic (1997) </h1>
Note that there are a few changes between Selenium 3.x versions and
Selenium 4. Attribute Affected content
Import libraries:
from selenium import webdriver HTML Element
from selenium.webdriver.chrome.service import Service
This is a single HTML element, but the HTML code behind
web="www.google.com" a website has hundreds of them.
path='introduce chromedriver path'
service = Service(executable_path=path)
HTML code example
driver = webdriver.Chrome(service=service)
driver.get(web) <article class="main-article">

<h1> Titanic (1997) </h1>
Find an element <p class="plot"> 84 years later ... </p>
driver.find_element(by="id", value="...")

<div class="full-script"> 13 meters. You ... </div>
Find elements </article>
driver.find_elements(by="xpath", value="...") # returns a list

The HTML code is structured with “nodes”. Each
Quit driver rectangle below represents a node (element, attribute
driver.quit() and text nodes)

Getting the text


data = element.text Root Element Parent Node

<article>
Implicit Waits
import time
time.sleep(2) Element Attribute Element Element
<h1> class="main-article" <p> <div>

Siblings
Explicit Waits
from selenium.webdriver.common.by import By Text Attribute Text Attribute Text
Titanic (1997) class="plot" 84 years later ... class="full-script"" 13 meters. You ...
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.ID, 'id_name'))) The “root node” is the top node. In this example,
# Wait 5 seconds until an element is clickable <article> is the root.

Every node has exactly one “parent”, except the
Options: Headless mode, change window size root. The <h1> node’s parent is the <article> node.
from selenium.webdriver.chrome.options import Options “Siblings” are nodes with the same parent.
options = Options() One of the best ways to find an element is building
options.headless = True its XPath
options.add_argument('window-size=1920x1080')
driver = webdriver.Chrome(service=service, options=options)
XPath
We need to learn how to build an XPath to
Google Sheets
Google Sheets is a cloud-based spreadsheet application that can store data in a structured way just like most
properly work with Selenium. database management systems. We can connect Google Sheets with Python by enabling the API and

downloading our credentials.


XPath Syntax
An XPath usually contains a tag name, attribute Import libraries:
from gspread
name, and attribute value. from oauth2client.service_account import ServiceAccountCredentials

//tagName[@AttributeName="Value"] Connect to Google Sheets:


scope = ['https://www.googleapis.com/auth/spreadsheets',
Let’s check some examples to locate the article, "https://www.googleapis.com/auth/drive"]
title, and transcript elements of the HTML code we

used before. credentials=ServiceAccountCredentials.from_json_keyfile_name("credentials.json",


scope)

client = gspread.authorize(credentials)
//article[@class="main-article"]

Create a blank spreadsheet:


//h1 sheet = client.create("FirstSheet")
//div[@class="full-script"]


Sharing Sheet:
sheet.share('write-your-email-here', perm_type='user', role='writer')
XPath Functions and Operators

XPath functions Save spreadsheet to specific folder (first manually share the folder with the client email)
client.create("SecondSheet", folder_id='write-id-here')
//tag[contains(@AttributeName, "Value")]

Open a spreadsheet:
sheet = client.open("SecondSheet").sheet1
XPath Operators: and, or

//tag[(expression 1) and (expression 2)] Read csv with Pandas and export df to a sheet:
df = pd.read_csv('football_news.csv')
sheet.update([df.columns.values.tolist()] + df.values.tolist())
XPath Special Characters

Print all the data:


Selects the children from the node set on the sheet.get_all_records()
/
left side of this character
Specifies that the matching node set should Append a new row:
// new_row = ['0', 'title0', 'subtitle0', 'link0']
be located at any level within the document sheet.append_row(new_row)
Specifies the current context should be used

. (refers to present node) Insert a new row at index 2:


sheet.insert_row(new_row, index=2)
.. Refers to a parent node

A wildcard character that selects all Update a cell using A1 notation:


* elements or attributes regardless of names sheet.update('A54', 'Hello World')

@ Select an attribute Update a range:


() Grouping an XPath expression sheet.update('A54:D54', [['51', 'title51', 'subtitle51', 'link51']])
Indicates that a node with index "n" should

[n] Update cell using row and column coordinates:


be selected sheet.update_cell(54, 1, 'Updated Data')
Pandas Selecting rows and columns Data export
Cheat Sheet
Select single column: Data as NumPy array:
df['col1'] df.values

Select multiple columns:


Pandas provides data analysis tools for Python. All of the df[['col1', 'col2']]
Save data as CSV file:
df.to_csv('output.csv', sep=",")
following code examples refer to the dataframe below.

Show first/last n rows: Format a dataframe as tabular string:


df.head(2) df.to_string()
axis 1 df.tail(2)
col1 col2


Convert a dataframe to a dictionary:
Select rows by index values: df.to_dict()
A 1 4 df.loc['A'] df.loc[['A', 'B']]


Save a dataframe as an Excel table:
axis 0
df = B 2 5 Select rows by position:
df.iloc[1] df.iloc[1:]
df.to_excel('output.xlsx')

C 3 6

Data wrangling Pivot and Pivot Table


Getting Started Filter by value: Read csv file:
df[df['col1'] > 1] df_sales=pd.read_excel(
Import pandas:
'supermarket_sales.xlsx')
import pandas as pd Sort by one column:

df.sort_values('col1') Make pivot table:



df_sales.pivot_table(index='Gender',
Create a series: Sort by columns: aggfunc='sum')
s = pd.Series([1, 2, 3], df.sort_values(['col1', 'col2'],

index=['A', 'B', 'C'], ascending=[False, True]) Make a pivot tables that says how much male and

female spend in each category:
name='col1') Identify duplicate rows:
Create a dataframe: df.duplicated() df_sales.pivot_table(index='Gender',
data = [[1, 4], [2, 5], [3, 6]] columns='Product line',
Drop duplicates: values='Total',
index = ['A', 'B', 'C'] df = df.drop_duplicates(['col1']) aggfunc='sum')
df = pd.DataFrame(data, index=index,

columns=['col1', 'col2']) Clone a data frame:


clone = df.copy() Below are my guides, tutorials and
Read a csv file with pandas:
complete web scraping course:
df = pd.read_csv('filename.csv') Concatenate multiple data frames vertically:
df2 = df + 5 # new dataframe - Medium Guides
pd.concat([df,df2]) - YouTube Tutorials
Advanced parameters:
- Data Science Course
df = pd.read_csv('filename.csv', sep=',', Concatenate multiple data frames horizontally:
df3 = pd.DataFrame([[7],[8],[9]], - Automation Course
names=['col1', 'col2'],
index=['A','B','C'], - Web Scraping Course
index_col=0, columns=['col3']) - Make Money Using Your Programming
encoding='utf-8',

pd.concat([df,df3], axis=1) & Data Science Skills


nrows=3)
Made by Frank Andrade frank-andrade.medium.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy