Help With CS236 php Programming,asp Programming Assignment

CS236 Class Project (Due Thursday June11th)

Spring 2020

TA: Pritom Ahmed ()

Project Summary

For this project, you will be provided with two real datasets. The first dataset contains station

information from weather stations across the world. The second provides individual recordings

for the stations over a 4-year period. The goal of the project is to find out which states in the US

have the most stable rainfall. That is, the result should provide the US states ordered (in

ascending order) by D, where D is the difference between the two months with the highest and

lowest rainfall. You will do the project individually. The project should be completed on Spark.

You will be using a local instance of Spark. The TA will upload instructions on how to quickly

setup local Spark in a Unix environment.

Suggested Approach

Task 1 (25 points) : As your datasets contain data from stations around the world while what

you are asked is about US states, you would first need to identify which stations are within the

United States. Then you would need to group stations by state.

Task 2 (25 points) : For each state with readings, you will first need to find the average

precipitation recorded for each month (ignoring year).

Task 3 (25 points) : Then find the months with the highest and lowest averages for each

state.

Task 4 (25 points) : You will need to order the states by the difference between the highest

and lowest month average, in ascending order.

In your result, for each state, you should return:

The state abbreviation, e.g. “CA”

The average precipitation and name of the highest month, e.g. “90, July”

The average precipitation and name of the lowest month, e.g. “50, January”

The difference between the two, e.g. “40”

Dataset Information

Please find the dataset here.

The LOCATIONS dataset is a single .csv file (WeatherStationLocations.csv), containing the

metadata for every station across the world. To identify that a station is in the US, you need to

look for stations where the “CTRY” field is “US” and the “ST” field is non-empty. Keep in mind

that the first row of this file is Header. Here are the fields for this dataset:

USAF = Air Force station ID. May contain a letter in the first position.

WBAN = NCDC WBAN number

CTRY = FIPS country ID

ST = State for US stations

LAT = Latitude in thousandths of decimal degrees

LON = Longitude in thousandths of decimal degrees

ELEV = Elevation in meters

BEGIN = Beginning Period Of Record (YYYYMMDD).

END = Ending Period Of Record (YYYYMMDD).

Sample Row:

"724920","23237","STOCKTON METROPOLITAN

AIRPORT","US","CA","+37.889","-121.226","+0007.9","20050101","20140403”

The RECORDINGS dataset is contained in four files, one for each year (2006-2009). The

“STN---“ value will match with the “USAF” field in the locations dataset. These files are

concatenated from many small files, so keep in mind that there will be Header lines through the

files. Here are the fields for this dataset:

STN--- = The station ID (USAF)

WBAN = NCDC WBAN number

YEARMODA = The datestamp

TEMP = Ignore for this project

DEWP = Ignore for this project

SLP = Ignore for this project

STP = Ignore for this project

VISIB = Ignore for this project (Visibility)

WDSP = Ignore for this project

MXSPD = Ignore for this project

GUST = Ignore for this project

MAX = Ignore for this project (Max Temperature for the day)

MIN = Ignore for this project (Min Temperature for the day)

PRCP = Precipitation

NDP = Ignore for this project

FRSHTT = Ignore for this project

Sample Row:

997781 99999 20061121 42.4 13 9999.9 0 9999.9 0 9999.9 0 999.9 0 17.5 13 22.0

999.9 46.2* 39.0* 0.00I 999.9 000000

Calculation of Precipitation

A value of 99.99 means that there is no data (You can either treat these as 0 or exclude them

from your calculations, though the latter is the more correct option)

There is a letter at the end of the recordings. Here is a table of what the letters mean (Basically

the letter tells you how long the precipitation was accumulated before recording).

A - 6 hours worth of precipitation

B - 12 hours...

C - 18 hours...

D - 24 hours...

E - 12 hours... (slightly different from B but the same for this project).

F - 24 hours ... (slightly different from D but the same for this project).

G - 24 hours ... (slightly different from D but the same for this project).

H - station recorded a 0 for the day (although there was some recorded instance of

precipitation).

I - station recorded a 0 for the day (and there was NO recorded instance of precipitation).

A simple solution would be to multiply. For instance, if they recorded 12 hours worth of

precipitation, multiply it by 2 to extrapolate 24 hours worth. How you treat these is up to you (just

let me know in your README)

Due Date

This project will be due Thursday, June11th before Midnight. Email the TA your submissions.

Deliverables

A zipped file containing:

1. The script to run your job

2. The full source code

3. A README describing your project, including:

1. An overall description of how you chose to divide the problem into different spark

operations, with reasoning.

2. A description of each step job

3. Total runtime for the whole project

4. A description of anything you did extra for the project, such as adding a combiner. If

there is anything that you feel deserves extra credit, put it here.

Do not include the two original datasets.

The script should take as input three things:

1. A folder containing the Locations file

2. A folder containing the Recordings files

3. An output folder

Hints

1. Make sure you parse correctly. One file is a csv, the other is not. One file has a single

row of Headers, the other has many.

2. Think about how many steps you should use. How much work should each step do?

3. It is a good idea to read about different types of RDDs and various operations they allow

users to do.

4. You may want to create a subset of the reading dataset and use that while you are

developing, instead of using the whole dataset. This can save some time.

5. Make sure you start early.

Potential Extra Credit (Optional)

Please feel free to try to beef up your project for extra credit. There are many ways that you can

do this. Here are a few examples:

● A clever way to achieve faster execution time (fewer steps); if you do that show how

much improvement you get.

● Bigger Bonus: Include the stations with “CTRY” as “US” that do not have a state tag,

finding a way to estimate the state using a spatial distance with the known stations.

There are some stations that are Ocean Buoys so you may want to have a maximum

distance to be required in order to be included in a state, or you could create a separate

“state” representing the “pacific” and “atlantic” ocean (Checked by using coordinates).

There is a lot of potential work here so the extra credit could be large.

Whatever you try to do let the TA know in your README. Make sure that you complete the

project before going for extra credit. If you aren’t sure whether your idea is worth extra credit or

not, just email the TA.

QQ：99515681
WeChat：codinghelp
Email：99515681@qq.com
Work Time：8:00-23:00

Hots

Ghostwriter Cs1b Spring 2024 Tth Hw08h... 2024-04-19
Help With Managing Financial Risk Prob... 2024-04-19
Ghostwriter Cs 0449 – Project 5: /Dev/ 2024-04-19
Ghostwriter Elec 2141 Digital Circuit ... 2024-04-19
Help With Csc171 — Videogame Projecthe 2024-04-19
Help With Comp3411 Artificial Intellig 2024-04-19
Help With Stat3061: Random Processes &... 2024-04-19
Ghostwriter Accounting 452, Spring 202... 2024-04-19
Ghostwriter Finc5001 Foundations In Fi... 2024-04-19
Ghostwriter 7Ssmm712 – Topics In Appli 2024-04-19
Help With Com 337 - Film Studies For T... 2024-04-19
Ghostwriter Mes202tc - Digital Vlsi Sy... 2024-04-19
Ghostwriter Geography 2041B Distance S... 2024-04-19
Ghostwriter Ecos3006 International Tra... 2024-04-19
Help With Fit5225 2024 Sm1 Creating An... 2024-04-19
Help With Cit 593: Introduction To Com... 2024-04-19
Help With Math 4931: Take Home Examgho... 2024-04-19
Ghostwriter Csci 547|Info 533: Systems... 2024-04-19
Ghostwriter Cs536-S24 Intro To Pls And... 2024-04-19
Help With Fit5212 - Assignment 1Ghostw... 2024-04-19

Programming Assignment Help！