Skip to content

ArshKA/LinkedIn-Job-Scraper

Repository files navigation

LinkedIn Job Scraper

Program to scrape and store a constant stream of LinkedIn job postings and dozens of their respective attributes

Download the polished dataset and view insights at - https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

User Configurations

Required

  • logins.csv
    • Populate with multiple LinkedIn logins
    • Specify the purpose of the login (search or detail retreiever)
    • I recommend 1-3 logins for search and the remaining for more expensive attribute retrieval

Optional

  • details_retriever.py
    • MAX_UPDATES: - Number of job postings to look up before sleeping. Increase with more accounts/proxies (default = 25)
    • SLEEP_TIME: - Seconds to sleep between every iteration (default = 60)

Running

This program consists of 2 main scripts, running in parallel.

python search_retriever.py - discovers new job postings and insert the most recent IDs and minimal attributes into the database

python details_retriever.py - populates tables with complete job attributes

It's important to note that while search_retriever.py typically runs smoothly, even through your personal IP and a singular account, details_retriever.py can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:

  • Utilize multiple proxies and accounts when running details_retriever.py.
  • Experiment with different time delays to find the optimal settings.
  • Run details_retriever.py during periods of lower online activity, such as late-night hours and weekends, to catch up with the progress of search_retriever.py. This will ensure that both processes remain synchronized and up to date.

Converting Database to CSV

python to_csv.py --folder <destination folder> --database <linkedin_jobs.db>

Creates a CSV file for each database, along with minimal preprocessing

Database Structure

You can find the structure of the database here

About

LinkedIn scraper to retrieve and store a live stream of job postings

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy