# Overview ---------------------------------------------------------------- # Assignment...

60.1K

Verified Solution

Question

Accounting

# Overview ----------------------------------------------------------------
# Assignment 1: Analysis of the protest data from Crowd Love
# For each question/prompt, write the necessary code to calculate the answer.
# For grading, it's important that you store your answers in the variable names
# listed with each question in `backtics`.
# For each prompt marked `Reflection`, please write a response
# in your `README.md` file.
# Part 1:Set up -----------------------------------------------------------
# In this section, you're loading the data and necessary packages.
# Load the `stringr` package, which you'll use later.
# Load the data from https://countlove.org/data/data.csv
# into a variable called `protests`
# How many protests are in the dataset? `num_protests`
# How much information is available about each protest? `num_features`
# Part 2: Attendees -------------------------------------------------------
# In this section, you're exploring the number of attendees.
# Extract the `Attendees` column into a variable called `num_attendees`
# What is the lowest number of attendees? `min_attendees`
# (hint for this and other calculations: you'll need to consider missing values)
# What is the highest number of attendees? `max_attendees`
# What is the mean number of attendees? `mean_attendees`
# What is the median number of attendees? `median_attendees`
# What is the difference between the mean and median number of attendees?
# `mean_median_diff`
# Reflection: What does the difference between the mean and the median
# tell you about the *distribution* of the data? (if you're unfamiliar with
# working with distibutions, feel free to ask your TA for clarification)
# To further assess the distribution of values, create a boxplot of the number
# of attendees using the `boxplot()` function.
# Store the plot in a variable called `attendess_distribution`
# (Note, we'll use much more refined plotting methods, and pay far
# more attention to detail later in the course)
# Create another boxplot of the *log* of the number of attendees.
# Store the plot in a variable `log_attendees_distribution`.
# (note, you will see a warning in the console, which is expected)
# Part 3: Locations -------------------------------------------------------
# In this section, you're exploring where protests happened.
# Extract the `Location` column into a variable called `locations`
# How many *unique* locations are in the dataset? `num_locations`
# How many protests occured in Washington? `num_in_wa`
# (hint: use a function from the stringr package to detect the letters "WA")
# What proportion of protests occured in Washington? `prop_in_wa`
# Reflection: Does the number of protests in Washington surprise you?
# Why or why not?
# Write a function `count_in_location()` that accepts (as a parameter)
# a `location` name, and returns the sentence (note: spacing and punctuation):
# "There were N protests in LOCATION.", where N is the number of
# protests that occured in that location, and LOCATION is the parameter that
# was provided into the function.
# Note, you should count the number of locations that *match* the parameter
# put into the function, so `Seattle` should be a match for "Seattle, WA"
# Use your function above to describe the number of protests in "Washington, DC"
# `dc_summary`
# Use your function above to describe the number of protests in "Minneapolis"
# `minneapolis_summary`
# Create a new vector `states` which is the last two characters of each
# value in the `locations` vector. Hint, you may want to again use the
# `stringr` package
# Create a vector of the unique states in your dataset. `uniq_states`
# Create a summary sentence for each state by passing your `uniq_states`
# variable and `count_in_location` variables to the `sapply()` function.
# Store your results in `state_summary`
# (don't miss how amazing this is! Very powerful to apply your function to an
# entire vector *at once* with `sapply()`)
# Create a summary table by passing your `states` variable to the `table()`
# funciton, and storing the result in a variable `state_table`.
# Optional: use the View() function to more easily read the table
# Reflection: Looking at the `state_table` variable, what data quality issues
# do you notice, and how would you use that to change your analysis (no need
# to actually change your analysis)?
# What was the maximum number of protests in a state? `max_in_state`
# (hint: use your `state_table` variable)
# Part 4: Dates -----------------------------------------------------------
# In this section, you're exploring *when* protests happened.
# Extract the `Date` column into a variable called `dates` by passing the
# column to the `as.Date()` function (this will process the values as dates,
# which are *luckily* already in an optimal format for parsing)
# What is the most recent date in the dataset? `most_recent`
# What is the earliest date in the dataset? `earliest`
# What is the length of the timespan of the dataset? `time_span`
# hint: R can do math with dates pretty well by default!
# Create a vector of the dates that are in 2020 `in_2020`
# Create a vector of the dates that are in 2019. `in_2019`
# What is the ratio of the number of protests in 2020 comparted to 2019?
# `ratio_2020_2019`
# Reflection: Does the change in the number of protests from 2019 to 2020
# surprise you? Why or why not?
# Write a function `count_on_date()` that accecpts as a parameter a `date`,
# and returns the sentence:
# "There were N protests on DATE.", where N is the number of protests on that
# date, and DATE is the date provided
# Using your function you just wrote, how many protests were there on
# May 24th, 2020? `num_may_24`
# Using your function you just wrote, how many protests were there on
# May 31th, 2020? `num_on_may_31`
# For more on this timeline, see:
# https://www.nytimes.com/article/george-floyd-protests-timeline.html
# How many protests occured each month in 2020? `by_month_table`
# Hint: use the `months()` function, your `in_2020` dates, and the `table()`
# Function. If you like, you can do this in multiple different steps.
# As a comparison, let's assess the change between July 2019 and July 2020.
# What is the *difference* in the number of protests between July 2020 and
# July 2019? You'll want to do this in multiple steps as you see fit, though
# your answer should be stored in the variable `change_july_protests`.
# Reflection: do a bit of research. Find at least *two specific policies* that
# have been changed as a result of protests in 2020. These may be at the
# city, state, or University level. Please provide a basic summary, as well as a
# link to each article.
# Part 5: Protest Purpose -------------------------------------------------
# In this section, you're exploring *why* protests happened
# Extract the `Event..legacy..see.tags.` column into a variable called `purpose`
# How many different purposes are listed in the dataset? `num_purposes`
# That's quite a few -- if you look at -- View() -- the vector, you'll notice
# a common pattern for each purpose. It's listed as:
# SOME_PURPOSE (additiona_detail)
# To get a higher level summary, create a variable `high_level_purpse` by
# extracting *everything before the first parenthesis* in each value
# in the vector. For example, from "Civil Rights (Black Women's March)"
# you would extract "Civil Rights". You'll also have to *remove the space*
# before the first parenthasis.
# Hint: this will take a little bit of googling // trial and error. Be patient!
# How many "high level" purposes have you identified? `num_high_level`
# Create a table that counts the number of protests for each high level purpose
# `high_level_table`
# Reflection: Take a look (`View()`) your `high_level_table` variable. What
# picture does this paint of the U.S.?
# Part 6: Independent Exploration -----------------------------------------
# As a last step, you should write your own function that allows you to
# quickly ask questions of the dataset. For example, in the above sections,
# you wrote functions to ask the same question about different months, or
# locations. If you need any guidance here, feel free to ask!

Answer & Explanation Solved by verified expert
Get Answers to Unlimited Questions

Join us to gain access to millions of questions and expert answers. Enjoy exclusive benefits tailored just for you!

Membership Benefits:
  • Unlimited Question Access with detailed Answers
  • Zin AI - 3 Million Words
  • 10 Dall-E 3 Images
  • 20 Plot Generations
  • Conversation with Dialogue Memory
  • No Ads, Ever!
  • Access to Our Best AI Platform: Flex AI - Your personal assistant for all your inquiries!
Become a Member

Other questions asked by students