Login

solve and explain please cheak this is a COE topics HW ...

60.1K

Verified Solution

Question

Accounting

solve and explain please

cheak this is a COE topics HW

Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts]

Answer & Explanation Solved by verified expert

answer-section

Get Answers to Unlimited Questions

Join us to gain access to millions of questions and expert answers. Enjoy exclusive benefits tailored just for you!

Membership Benefits:

Unlimited Question Access with detailed Answers
Zin AI - 3 Million Words
10 Dall-E 3 Images
20 Plot Generations
Conversation with Dialogue Memory
No Ads, Ever!
Access to Our Best AI Platform: Zin AI - Your personal assistant for all your inquiries!

Become a Member

Other questions asked by students

Q

what is the difference between absolute risk and relative risk in epidemiology? and how are both...

Statistics

Q

a. Develop a divide-and-conquer algorithm to perform a parallel merge sort of an array. Hint: After...

Programming

Q

· · Why a small business startup gets caught in the dilemma of “to grow or not...

General Management

Q

abbreviated ambience articulate humiliation intimidating obligation stimulating surpass Before radio TV and movies came along...

Biology

Q

Write a detailed paragraph explaining how the enzyme lactase catalyzes the digestion of the milk...

Biology

Q

SINGLE OPTION CORRECT TYPE When forces F F2 and F3 are acting on a particle...

Physics

Q

Find the domain and range of the following relation. Also determine whether the relation is...

Basic Math

Q

The stockholders of a corporation: Question 13 options: have power to act for the...

Accounting

Q

The following information is from the records of Mountainview Camera Shop: Accounts receivable, December...

Accounting

Q

E17-22 (L04) (Impairment) Elaina Company has the following investments as of December 31, 2017: Investments...

Accounting

Q

The Regal Cycle Company manufactures three types of bicycles-a dirt bike, a mountain bike, and...

Accounting

Q

September 20, 2018, the company Tanta issued 100,000 shares of $50 par value 10% cumulative...

Accounting

1 Answer

$0.99

~~$1.99~~

(Save $1 )

One time Pay

No Ads
Answer to 1 Question
Get free Zin AI - 50 Thousand Words per Month

Best

Unlimited

$4.99*

~~$9.99~~

(Save $5 )

Billed Monthly

No Ads
Answers to Unlimited Questions
Get free Zin AI - 3 Million Words per Month

*First month only

Free

$0

Get this answer for free!
Sign up now to unlock the answer instantly

You can see the logs in the Dashboard.

Sign In

Don't have an account? Sign Up