Home Page > > Details

For the challenge

 For the challenge, I've attached a csv file named ‘data.csv’ containing 10,000 rows where each

row represents information on an individual businessperson. Please perform an analysis
exercise on the data, including but not limited to:
● fill rate for all columns (percentage of cells in that column that have a valid value as
opposed to no value)
● for columns with a set number of possible values (industry, level, company size,
company revenue, and state) please provide a breakdown by percent of results that
fall in each category (eg: 10% in Alabama, 10% in Arizona, etc)
● The top 10 companies represented, along with the number of employees included for
each
● The top 10 most common titles represented, along with the number of employees
holding each
● The top 10 most common first names represented, along with the number of
employees with each
Bonus Challenges:
● Visual/Graphical representations of any of the above
● An analysis of the 'primarydomain' column broken down by domain type (.com, .org,
.edu, etc) along with the number of companies falling into each category
● A python script that "guesses" possible email addresses from a person's first name,
last name and company by using common email patterns like:
○ first@example.com, firstlast@example.com, first.last@example.com,
last@example.com,first@example.com, f.last@example.com,
lastF@example.com, first_last@example.com, firstL@example.com
● A python script to find the top 5 most common email patterns like those above by
comparing the value in the ‘emailid’ column to firstname and lastname columns
The purpose of the challenge is to assess both your technical know-how and your creative
problem solving abilities (both equally), so have fun with it! Please feel free to reach out if you
have any questions. Good luck!
Contact Us - Email:99515681@qq.com    WeChat:codinghelp
Programming Assignment Help!