For the challenge

For the challenge, I've attached a csv file named ‘data.csv’ containing 10,000 rows where each

row represents information on an individual businessperson. Please perform an analysis

exercise on the data, including but not limited to:

● fill rate for all columns (percentage of cells in that column that have a valid value as

opposed to no value)

● for columns with a set number of possible values (industry, level, company size,

company revenue, and state) please provide a breakdown by percent of results that

fall in each category (eg: 10% in Alabama, 10% in Arizona, etc)

● The top 10 companies represented, along with the number of employees included for

each

● The top 10 most common titles represented, along with the number of employees

holding each

● The top 10 most common first names represented, along with the number of

employees with each

Bonus Challenges:

● Visual/Graphical representations of any of the above

● An analysis of the 'primarydomain' column broken down by domain type (.com, .org,

.edu, etc) along with the number of companies falling into each category

● A python script that "guesses" possible email addresses from a person's first name,

last name and company by using common email patterns like:

○ first@example.com, firstlast@example.com, first.last@example.com,

last@example.com,first@example.com, f.last@example.com,

lastF@example.com, first_last@example.com, firstL@example.com

● A python script to find the top 5 most common email patterns like those above by

comparing the value in the ‘emailid’ column to firstname and lastname columns

The purpose of the challenge is to assess both your technical know-how and your creative

problem solving abilities (both equally), so have fun with it! Please feel free to reach out if you

have any questions. Good luck!

Hots

Programming Assignment Help！