Sunday, January 15, 2023

How to extract data from the web with python data scientist salaries

 Salary of data scientist by country and city.


First we have to look for reliable webs. In my case I will use ziprecruiter for the US and indeed for Canada.

US

https://www.ziprecruiter.com/Salaries/What-Is-the-Average-DATA-Scientist-Salary-by-State

    
Canada




https://ca.indeed.com/career/data-scientist/salaries

Now everytime, this numbers are changed we will extract the more recent ones. 

Method 1:

Use beautiful soup to save all fields

#Import all libraries needed
import requests
import urllib.request
import time
from bs4 import BeautifulSoup


# Set the URL you want to webscrape from
url = 'https://ca.indeed.com/career/data-scientist/salaries'

# Connect to the URL
response = requests.get(url)

# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(response.text, "html.parser")

all = []


for i in soup.findAll("span"):
    #print(i.text)
    all.append(i.text)

print(all)



The code above prints the desired quanitities that will be updated every time you run the code. 
From there we can locate only the average

>>> The average salary is $120,039 





Code below comprises all output of sapn

>>> print(all)
['Jobs', 'Salaries', 'Messages', 'Sign In', '', 'Post a Job', 'Profile', 'All Salaries', 'DATA Scientist Salary', 'What Is the Average DATA Scientist Salary by State', 'Within 25 miles of Toronto, CA', '\n          $39,500 - $52,9991% of jobs\n        ', '\n          $53,000 - $66,4993% of jobs\n        ', '\n          $66,500 - $79,9997% of jobs\n        ', '', '\n          $92,500 is the 25th percentile. Salaries below this are outliers.$80,000 - $93,49914% of jobs\n        ', '\n          $93,500 - $106,99914% of jobs\n        ', '\n          The average salary is $120,039 a year$107,000 - $120,49917% of jobs\n        ', '\n          $120,500 - $133,99913% of jobs\n        ', '', '\n          $141,000 is the 75th percentile. Salaries above this are outliers.$134,000 - $147,49911% of jobs\n        ', '\n          $147,500 - $160,9997% of jobs\n        ', '', '\n          $169,000 is the 90th percentile. Salaries above this are outliers.$161,000 - $174,4994% of jobs\n        ', '\n          $174,500 - $188,0003% of jobs\n        ', '$39,500', '$120,039\n      /year\n', '/year', '$188,000', 'Data Scientist', 'Altocloud', 'Toronto, ON', 'Data Scientist, Risk', 'Square', 'Toronto, ON', 'Senior Data Scientist, Business Intelligence (English Services)', 'CBC Radio Canada', 'Toronto, ON', 'Associate/Senior Associate, Data Scientist, Portfolio Value Creation', 'CPP Investments', 'Toronto, ON', 'Data Scientist, Consultant', 'Project X', 'Toronto, ON', 'Senior Machine Learning Engineer / Data Scientist', 'Paytm Labs', 'Toronto, ON', 'Data Scientist', 'CARFAX', 'Toronto, ON', 'Data Scientist, MIR (English Services)', 'CBC Radio Canada', 'Toronto, ON', 'Senior Data Scientist', 'Borrowell', 'Toronto, ON', 'Lead Data Scientist (Predictive Maintenance)', 'Fusemachines', 'Toronto, ON', ' in the Toronto, CA area', ' in the Toronto, CA area:', 'ZipRecruiter, Inc. © 2023 All Rights Reserved Worldwide', 'New York', '$133,172', '$11,097', '$2,561', '$64.03', 'Idaho', '$129,201', '$10,766', '$2,484', '$62.12', 'California', '$127,575', '$10,631', '$2,453', '$61.33', 'New Hampshire', '$124,679', '$10,389', '$2,397', '$59.94', 'Vermont', '$121,280', '$10,106', '$2,332', '$58.31', 'Maine', '$120,390', '$10,032', '$2,315', '$57.88', 'Massachusetts', '$118,391', '$9,865', '$2,276', '$56.92', 'Hawaii', '$117,548', '$9,795', '$2,260', '$56.51', 'Tennessee', '$116,638', '$9,719', '$2,243', '$56.08', 'Nevada', '$116,564', '$9,713', '$2,241', '$56.04', 'Wyoming', '$116,434', '$9,702', '$2,239', '$55.98', 'Washington', '$116,253', '$9,687', '$2,235', '$55.89', 'Arizona', '$116,252', '$9,687', '$2,235', '$55.89', 'Connecticut', '$113,536', '$9,461', '$2,183', '$54.58', 'Montana', '$113,325', '$9,443', '$2,179', '$54.48', 'Rhode Island', '$112,851', '$9,404', '$2,170', '$54.26', 'Indiana', '$112,813', '$9,401', '$2,169', '$54.24', 'New Jersey', '$112,746', '$9,395', '$2,168', '$54.20', 'Alaska', '$112,238', '$9,353', '$2,158', '$53.96', 'Minnesota', '$112,151', '$9,345', '$2,156', '$53.92', 'West Virginia', '$112,113', '$9,342', '$2,156', '$53.90', 'Oregon', '$111,346', '$9,278', '$2,141', '$53.53', 'Maryland', '$109,286', '$9,107', '$2,101', '$52.54', 'North Dakota', '$109,256', '$9,104', '$2,101', '$52.53', 'Pennsylvania', '$108,610', '$9,050', '$2,088', '$52.22', 'Wisconsin', '$106,369', '$8,864', '$2,045', '$51.14', 'Virginia', '$106,296', '$8,858', '$2,044', '$51.10', 'Ohio', '$105,665', '$8,805', '$2,032', '$50.80', 'Iowa', '$104,159', '$8,679', '$2,003', '$50.08', 'Nebraska', '$103,247', '$8,603', '$1,985', '$49.64', 'South Dakota', '$103,087', '$8,590', '$1,982', '$49.56', 'Colorado', '$102,923', '$8,576', '$1,979', '$49.48', 'Kentucky', '$102,736', '$8,561', '$1,975', '$49.39', 'Delaware', '$102,461', '$8,538', '$1,970', '$49.26', 'Utah', '$101,968', '$8,497', '$1,960', '$49.02', 'Alabama', '$101,620', '$8,468', '$1,954', '$48.86', 'New Mexico', '$101,284', '$8,440', '$1,947', '$48.69', 'South Carolina', '$101,195', '$8,432', '$1,946', '$48.65', 'Kansas', '$99,712', '$8,309', '$1,917', '$47.94', 'Florida', '$99,276', '$8,273', '$1,909', '$47.73', 'Arkansas', '$98,776', '$8,231', '$1,899', '$47.49', 'Oklahoma', '$98,052', '$8,171', '$1,885', '$47.14', 'Mississippi', '$97,389', '$8,115', '$1,872', '$46.82', 'Michigan', '$96,564', '$8,047', '$1,857', '$46.43', 'Missouri', '$94,548', '$7,879', '$1,818', '$45.46', 'Texas', '$94,428', '$7,869', '$1,815', '$45.40', 'Georgia', '$93,556', '$7,796', '$1,799', '$44.98', 'Illinois', '$93,085', '$7,757', '$1,790', '$44.75', 'Louisiana', '$89,464', '$7,455', '$1,720', '$43.01', 'North Carolina', '$84,706', '$7,058', '$1,628', '$40.72']



You can also try

table= soup.find('table', {'class': 'salary_by_state_table'})


to get an organized table values

<table class="salary_by_state_table">
<thead>
<tr>
<th class="col1">State</th>
<th class="col2">Annual Salary</th>
<th class="col3">Monthly Pay</th>
<th class="col4">Weekly Pay</th>
<th class="col5">Hourly Wage</th>
</tr>
</thead>
<tbody>
<tr>
<td class="col1">New York</td>
<td class="col2">$133,172</td>
<td class="col3">$11,097</td>
<td class="col4">$2,561</td>
<td class="col5">$64.03</td>
</tr>
<tr>
<td class="col1">Idaho</td>
<td class="col2">$129,201</td>
<td class="col3">$10,766</td>
<td class="col4">$2,484</td>
<td class="col5">$62.12</td>
</tr>
<tr>
<td class="col1">California</td>
<td class="col2">$127,575</td>
<td class="col3">$10,631</td>
<td class="col4">$2,453</td>
<td class="col5">$61.33</td>
</tr>
<tr>
<td class="col1">New Hampshire</td>
<td class="col2">$124,679</td>
<td class="col3">$10,389</td>
<td class="col4">$2,397</td>
<td class="col5">$59.94</td>
</tr>

.......


From here you can convert the data to csv 
Website below could be useful to convert from table html to csv
https://www.convertcsv.com/html-table-to-csv.htm

State,Annual Salary,Monthly Pay,Weekly Pay,Hourly Wage
New York,"$133,172","$11,097","$2,561",$64.03
Idaho,"$129,201","$10,766","$2,484",$62.12
California,"$127,575","$10,631","$2,453",$61.33
New Hampshire,"$124,679","$10,389","$2,397",$59.94
Vermont,"$121,280","$10,106","$2,332",$58.31
Maine,"$120,390","$10,032","$2,315",$57.88
Massachusetts,"$118,391","$9,865","$2,276",$56.92
Hawaii,"$117,548","$9,795","$2,260",$56.51
Tennessee,"$116,638","$9,719","$2,243",$56.08
Nevada,"$116,564","$9,713","$2,241",$56.04
Wyoming,"$116,434","$9,702","$2,239",$55.98
Washington,"$116,253","$9,687","$2,235",$55.89
Arizona,"$116,252","$9,687","$2,235",$55.89
Connecticut,"$113,536","$9,461","$2,183",$54.58
Montana,"$113,325","$9,443","$2,179",$54.48
Rhode Island,"$112,851","$9,404","$2,170",$54.26
Indiana,"$112,813","$9,401","$2,169",$54.24
New Jersey,"$112,746","$9,395","$2,168",$54.20
Alaska,"$112,238","$9,353","$2,158",$53.96
Minnesota,"$112,151","$9,345","$2,156",$53.92
West Virginia,"$112,113","$9,342","$2,156",$53.90
Oregon,"$111,346","$9,278","$2,141",$53.53
Maryland,"$109,286","$9,107","$2,101",$52.54
North Dakota,"$109,256","$9,104","$2,101",$52.53
Pennsylvania,"$108,610","$9,050","$2,088",$52.22
Wisconsin,"$106,369","$8,864","$2,045",$51.14
Virginia,"$106,296","$8,858","$2,044",$51.10
Ohio,"$105,665","$8,805","$2,032",$50.80
Iowa,"$104,159","$8,679","$2,003",$50.08
Nebraska,"$103,247","$8,603","$1,985",$49.64
South Dakota,"$103,087","$8,590","$1,982",$49.56
Colorado,"$102,923","$8,576","$1,979",$49.48
Kentucky,"$102,736","$8,561","$1,975",$49.39
Delaware,"$102,461","$8,538","$1,970",$49.26
Utah,"$101,968","$8,497","$1,960",$49.02
Alabama,"$101,620","$8,468","$1,954",$48.86
New Mexico,"$101,284","$8,440","$1,947",$48.69
South Carolina,"$101,195","$8,432","$1,946",$48.65
Kansas,"$99,712","$8,309","$1,917",$47.94
Florida,"$99,276","$8,273","$1,909",$47.73
Arkansas,"$98,776","$8,231","$1,899",$47.49
Oklahoma,"$98,052","$8,171","$1,885",$47.14
Mississippi,"$97,389","$8,115","$1,872",$46.82
Michigan,"$96,564","$8,047","$1,857",$46.43
Missouri,"$94,548","$7,879","$1,818",$45.46
Texas,"$94,428","$7,869","$1,815",$45.40
Georgia,"$93,556","$7,796","$1,799",$44.98
Illinois,"$93,085","$7,757","$1,790",$44.75
Louisiana,"$89,464","$7,455","$1,720",$43.01
North Carolina,"$84,706","$7,058","$1,628",$40.72




  

No comments:

Post a Comment