The Answer is in the image, click to view.......

# Overview ---------------------------------------------------------------- # Assignment...

60.1K

Verified Solution

Question

Accounting

# Overview ----------------------------------------------------------------

# Assignment 1: Analysis of the protest data from Crowd Love

# For each question/prompt, write the necessary code to calculate the answer.

# For grading, it's important that you store your answers in the variable names

# listed with each question in `backtics`.

# For each prompt marked `Reflection`, please write a response

# in your `README.md` file.

# Part 1:Set up -----------------------------------------------------------

# In this section, you're loading the data and necessary packages.

# Load the `stringr` package, which you'll use later.

# Load the data from https://countlove.org/data/data.csv

# into a variable called `protests`

# How many protests are in the dataset? `num_protests`

# How much information is available about each protest? `num_features`

# Part 2: Attendees -------------------------------------------------------

# In this section, you're exploring the number of attendees.

# Extract the `Attendees` column into a variable called `num_attendees`

# What is the lowest number of attendees? `min_attendees`

# (hint for this and other calculations: you'll need to consider missing values)

# What is the highest number of attendees? `max_attendees`

# What is the mean number of attendees? `mean_attendees`

# What is the median number of attendees? `median_attendees`

# What is the difference between the mean and median number of attendees?

# `mean_median_diff`

# Reflection: What does the difference between the mean and the median

# tell you about the distribution of the data? (if you're unfamiliar with

# working with distibutions, feel free to ask your TA for clarification)

# To further assess the distribution of values, create a boxplot of the number

# of attendees using the `boxplot()` function.

# Store the plot in a variable called `attendess_distribution`

# (Note, we'll use much more refined plotting methods, and pay far

# more attention to detail later in the course)

# Create another boxplot of the log of the number of attendees.

# Store the plot in a variable `log_attendees_distribution`.

# (note, you will see a warning in the console, which is expected)

# Part 3: Locations -------------------------------------------------------

# In this section, you're exploring where protests happened.

# Extract the `Location` column into a variable called `locations`

# How many unique locations are in the dataset? `num_locations`

# How many protests occured in Washington? `num_in_wa`

# (hint: use a function from the stringr package to detect the letters "WA")

# What proportion of protests occured in Washington? `prop_in_wa`

# Reflection: Does the number of protests in Washington surprise you?

# Why or why not?

# Write a function `count_in_location()` that accepts (as a parameter)

# a `location` name, and returns the sentence (note: spacing and punctuation):

# "There were N protests in LOCATION.", where N is the number of

# protests that occured in that location, and LOCATION is the parameter that

# was provided into the function.

# Note, you should count the number of locations that match the parameter

# put into the function, so `Seattle` should be a match for "Seattle, WA"

# Use your function above to describe the number of protests in "Washington, DC"

# `dc_summary`

# Use your function above to describe the number of protests in "Minneapolis"

# `minneapolis_summary`

# Create a new vector `states` which is the last two characters of each

# value in the `locations` vector. Hint, you may want to again use the

# `stringr` package

# Create a vector of the unique states in your dataset. `uniq_states`

# Create a summary sentence for each state by passing your `uniq_states`

# variable and `count_in_location` variables to the `sapply()` function.

# Store your results in `state_summary`

# (don't miss how amazing this is! Very powerful to apply your function to an

# entire vector at once with `sapply()`)

# Create a summary table by passing your `states` variable to the `table()`

# funciton, and storing the result in a variable `state_table`.

# Optional: use the View() function to more easily read the table

# Reflection: Looking at the `state_table` variable, what data quality issues

# do you notice, and how would you use that to change your analysis (no need

# to actually change your analysis)?

# What was the maximum number of protests in a state? `max_in_state`

# (hint: use your `state_table` variable)

# Part 4: Dates -----------------------------------------------------------

# In this section, you're exploring when protests happened.

# Extract the `Date` column into a variable called `dates` by passing the

# column to the `as.Date()` function (this will process the values as dates,

# which are luckily already in an optimal format for parsing)

# What is the most recent date in the dataset? `most_recent`

# What is the earliest date in the dataset? `earliest`

# What is the length of the timespan of the dataset? `time_span`

# hint: R can do math with dates pretty well by default!

# Create a vector of the dates that are in 2020 `in_2020`

# Create a vector of the dates that are in 2019. `in_2019`

# What is the ratio of the number of protests in 2020 comparted to 2019?

# `ratio_2020_2019`

# Reflection: Does the change in the number of protests from 2019 to 2020

# surprise you? Why or why not?

# Write a function `count_on_date()` that accecpts as a parameter a `date`,

# and returns the sentence:

# "There were N protests on DATE.", where N is the number of protests on that

# date, and DATE is the date provided

# Using your function you just wrote, how many protests were there on

# May 24th, 2020? `num_may_24`

# Using your function you just wrote, how many protests were there on

# May 31th, 2020? `num_on_may_31`

# For more on this timeline, see:

# https://www.nytimes.com/article/george-floyd-protests-timeline.html

# How many protests occured each month in 2020? `by_month_table`

# Hint: use the `months()` function, your `in_2020` dates, and the `table()`

# Function. If you like, you can do this in multiple different steps.

# As a comparison, let's assess the change between July 2019 and July 2020.

# What is the difference in the number of protests between July 2020 and

# July 2019? You'll want to do this in multiple steps as you see fit, though

# your answer should be stored in the variable `change_july_protests`.

# Reflection: do a bit of research. Find at least two specific policies that

# have been changed as a result of protests in 2020. These may be at the

# city, state, or University level. Please provide a basic summary, as well as a

# link to each article.

# Part 5: Protest Purpose -------------------------------------------------

# In this section, you're exploring why protests happened

# Extract the `Event..legacy..see.tags.` column into a variable called `purpose`

# How many different purposes are listed in the dataset? `num_purposes`

# That's quite a few -- if you look at -- View() -- the vector, you'll notice

# a common pattern for each purpose. It's listed as:

# SOME_PURPOSE (additiona_detail)

# To get a higher level summary, create a variable `high_level_purpse` by

# extracting everything before the first parenthesis in each value

# in the vector. For example, from "Civil Rights (Black Women's March)"

# you would extract "Civil Rights". You'll also have to remove the space

# before the first parenthasis.

# Hint: this will take a little bit of googling // trial and error. Be patient!

# How many "high level" purposes have you identified? `num_high_level`

# Create a table that counts the number of protests for each high level purpose

# `high_level_table`

# Reflection: Take a look (`View()`) your `high_level_table` variable. What

# picture does this paint of the U.S.?

# Part 6: Independent Exploration -----------------------------------------

# As a last step, you should write your own function that allows you to

# quickly ask questions of the dataset. For example, in the above sections,

# you wrote functions to ask the same question about different months, or

# locations. If you need any guidance here, feel free to ask!

Answer & Explanation Solved by verified expert

Get Answers to Unlimited Questions

Join us to gain access to millions of questions and expert answers. Enjoy exclusive benefits tailored just for you!

Membership Benefits:

Unlimited Question Access with detailed Answers
Zin AI - 3 Million Words
10 Dall-E 3 Images
20 Plot Generations
Conversation with Dialogue Memory
No Ads, Ever!
Access to Our Best AI Platform: Flex AI - Your personal assistant for all your inquiries!

Become a Member

# Overview ---------------------------------------------------------------- # Assignment...

60.1K

Verified Solution

Question

Accounting

Answer & Explanation Solved by verified expert

Get Answers to Unlimited Questions

Membership Benefits:

Other questions asked by students

The Apparel Company makes expensive polo-style men's and women's short-sleeve knit shirts at its plant in...

If I can tell you what areas of the brain are activated, and what neurotransmitters are...

a) If an atom has an electron in the n=3 state with m=1, what are the...

stion What is an industry which is dominated by a few suppliers who exercise some...

A straw is placed inside a rectangular box that is 10 inches by 8 inches...

Decide whether the following statement makes sense or is clearly true or does not make...

Question 5 5 points Delia is flying a kite that is 23 m high above...

If f x x x 3x 16x 20 and x 1 is a factor of...

If the function defined by the following is one to one find its inverse 4...

8 A television camera is on a reviewing platform a meters from the street on...

Merchandise subject to terms 2/10, n/30, FOB shipping point, is sold on account to a...

What are the breach in the ethical principles if the necessary adjustments are not done?

# Overview ----------------------------------------------------------------

	# Assignment 1: Analysis of the protest data from Crowd Love
	# For each question/prompt, write the necessary code to calculate the answer.
	# For grading, it's important that you store your answers in the variable names
	# listed with each question in `backtics`.
	# For each prompt marked `Reflection`, please write a response
	# in your `README.md` file.



	# Part 1:Set up -----------------------------------------------------------

	# In this section, you're loading the data and necessary packages.
	# Load the `stringr` package, which you'll use later.

	# Load the data from https://countlove.org/data/data.csv
	# into a variable called `protests`

	# How many protests are in the dataset? `num_protests`

	# How much information is available about each protest? `num_features`


	# Part 2: Attendees -------------------------------------------------------

	# In this section, you're exploring the number of attendees.

	# Extract the `Attendees` column into a variable called `num_attendees`

	# What is the lowest number of attendees? `min_attendees`
	# (hint for this and other calculations: you'll need to consider missing values)

	# What is the highest number of attendees? `max_attendees`

	# What is the mean number of attendees? `mean_attendees`

	# What is the median number of attendees? `median_attendees`

	# What is the difference between the mean and median number of attendees?
	# `mean_median_diff`

	# Reflection: What does the difference between the mean and the median
	# tell you about the distribution of the data? (if you're unfamiliar with
	# working with distibutions, feel free to ask your TA for clarification)

	# To further assess the distribution of values, create a boxplot of the number
	# of attendees using the `boxplot()` function.
	# Store the plot in a variable called `attendess_distribution`
	# (Note, we'll use much more refined plotting methods, and pay far
	# more attention to detail later in the course)

	# Create another boxplot of the log of the number of attendees.
	# Store the plot in a variable `log_attendees_distribution`.
	# (note, you will see a warning in the console, which is expected)


	# Part 3: Locations -------------------------------------------------------

	# In this section, you're exploring where protests happened.

	# Extract the `Location` column into a variable called `locations`

	# How many unique locations are in the dataset? `num_locations`

	# How many protests occured in Washington? `num_in_wa`
	# (hint: use a function from the stringr package to detect the letters "WA")

	# What proportion of protests occured in Washington? `prop_in_wa`

	# Reflection: Does the number of protests in Washington surprise you?
	# Why or why not?

	# Write a function `count_in_location()` that accepts (as a parameter)
	# a `location` name, and returns the sentence (note: spacing and punctuation):
	# "There were N protests in LOCATION.", where N is the number of
	# protests that occured in that location, and LOCATION is the parameter that
	# was provided into the function.
	# Note, you should count the number of locations that match the parameter
	# put into the function, so `Seattle` should be a match for "Seattle, WA"

	# Use your function above to describe the number of protests in "Washington, DC"
	# `dc_summary`

	# Use your function above to describe the number of protests in "Minneapolis"
	# `minneapolis_summary`

	# Create a new vector `states` which is the last two characters of each
	# value in the `locations` vector. Hint, you may want to again use the
	# `stringr` package

	# Create a vector of the unique states in your dataset. `uniq_states`

	# Create a summary sentence for each state by passing your `uniq_states`
	# variable and `count_in_location` variables to the `sapply()` function.
	# Store your results in `state_summary`
	# (don't miss how amazing this is! Very powerful to apply your function to an
	# entire vector at once with `sapply()`)

	# Create a summary table by passing your `states` variable to the `table()`
	# funciton, and storing the result in a variable `state_table`.

	# Optional: use the View() function to more easily read the table

	# Reflection: Looking at the `state_table` variable, what data quality issues
	# do you notice, and how would you use that to change your analysis (no need
	# to actually change your analysis)?

	# What was the maximum number of protests in a state? `max_in_state`
	# (hint: use your `state_table` variable)


	# Part 4: Dates -----------------------------------------------------------

	# In this section, you're exploring when protests happened.

	# Extract the `Date` column into a variable called `dates` by passing the
	# column to the `as.Date()` function (this will process the values as dates,
	# which are luckily already in an optimal format for parsing)

	# What is the most recent date in the dataset? `most_recent`

	# What is the earliest date in the dataset? `earliest`

	# What is the length of the timespan of the dataset? `time_span`
	# hint: R can do math with dates pretty well by default!

	# Create a vector of the dates that are in 2020 `in_2020`

	# Create a vector of the dates that are in 2019. `in_2019`

	# What is the ratio of the number of protests in 2020 comparted to 2019?
	# `ratio_2020_2019`

	# Reflection: Does the change in the number of protests from 2019 to 2020
	# surprise you? Why or why not?

	# Write a function `count_on_date()` that accecpts as a parameter a `date`,
	# and returns the sentence:
	# "There were N protests on DATE.", where N is the number of protests on that
	# date, and DATE is the date provided

	# Using your function you just wrote, how many protests were there on
	# May 24th, 2020? `num_may_24`

	# Using your function you just wrote, how many protests were there on
	# May 31th, 2020? `num_on_may_31`

	# For more on this timeline, see:
	# https://www.nytimes.com/article/george-floyd-protests-timeline.html

	# How many protests occured each month in 2020? `by_month_table`
	# Hint: use the `months()` function, your `in_2020` dates, and the `table()`
	# Function. If you like, you can do this in multiple different steps.

	# As a comparison, let's assess the change between July 2019 and July 2020.
	# What is the difference in the number of protests between July 2020 and
	# July 2019? You'll want to do this in multiple steps as you see fit, though
	# your answer should be stored in the variable `change_july_protests`.

	# Reflection: do a bit of research. Find at least two specific policies that
	# have been changed as a result of protests in 2020. These may be at the
	# city, state, or University level. Please provide a basic summary, as well as a
	# link to each article.


	# Part 5: Protest Purpose -------------------------------------------------

	# In this section, you're exploring why protests happened
	# Extract the `Event..legacy..see.tags.` column into a variable called `purpose`

	# How many different purposes are listed in the dataset? `num_purposes`

	# That's quite a few -- if you look at -- View() -- the vector, you'll notice
	# a common pattern for each purpose. It's listed as:
	# SOME_PURPOSE (additiona_detail)
	# To get a higher level summary, create a variable `high_level_purpse` by
	# extracting everything before the first parenthesis in each value
	# in the vector. For example, from "Civil Rights (Black Women's March)"
	# you would extract "Civil Rights". You'll also have to remove the space
	# before the first parenthasis.
	# Hint: this will take a little bit of googling // trial and error. Be patient!

	# How many "high level" purposes have you identified? `num_high_level`

	# Create a table that counts the number of protests for each high level purpose
	# `high_level_table`

	# Reflection: Take a look (`View()`) your `high_level_table` variable. What
	# picture does this paint of the U.S.?


	# Part 6: Independent Exploration -----------------------------------------

	# As a last step, you should write your own function that allows you to
	# quickly ask questions of the dataset. For example, in the above sections,
	# you wrote functions to ask the same question about different months, or
	# locations. If you need any guidance here, feel free to ask!