Problem
In the file imdb.py, write program that:
- Reads a file containing movie information, and stores the information in appropriate data structures, which are specified further in this document. (You will only store the information from the file that is relevant to answering questions of the form described above.)
- Reads in from the user (with a prompt) the name of an actor or actress.
- Determines the actor or actress who has appeared in the most movies with the named actor/actress, and prints out the result.
- Prompts the user again for the name of another actor/actress. The program continues until the user enters an empty string in response to the prompt.
Your program should behave exactly as shown further down in this problem statement. That is, the wording and spacing of what is printed by your program should be the same as shown below.
The dataset files maintained by the IMDb are very large and somewhat tedious to read, so you will be working with a single smaller file, named “imdb_data.csv†that contains information on about 10000 popular movies released since around 1960. The information includes the names of the movies and the principle actors/actresses in them.
“imdb_data.csv†is (as the extension implies), a csv (comma separated value) file. csv files are plain text files that can be viewed with a plain text editor. Each record in a csv file is contained on a separate line, and the fields in a line are separated by commas. Fields containing commas must be escaped. This is typically done by embedding the field inside of double quotes. The first line of a csv file typically contains the labels of the fields. You can view a csv file using a spreadsheet program, like Excel, or Numbers (on a Mac).
The first line of “imdb_data.csv†contains the labels of the fields, and fields containing commas are embedded in double quotes. I suggest that you open “imdb_data.csv†with a spreadsheet program and look at the fields. For purposes of this PSA, the only fields you need to be concerned with are the “original_title†field and the “cast†field. (Note that even with the limited information presented in this file, there are lots of interesting questions you can pose about the data. This PSA has you answering just one such question.)
Since a csv is a plain text file, having your program read it should be straightforward, but there is one complication – those embedded commas within a field. If you read a line into a string, and use the python split method to break it into fields (using the comma as the separator character), you will break some fields into multiple fields. With some careful programming, you can get around this problem, but it is tedious. To avoid this tedium, you can (and you will) let the python module csv handle reading from a csv file for you. Since it is good practice for you to learn to read documentation, I refer you to https://docs.python.org/3/library/csv.html#csv-fmt… to learn about how to use the csv module to read a csv file. You can also Google the topic to find tutorials on it.
The program you write should behave (the wording of what it prints, including space characters) exactly as illustrated in this sample run:
Enter a name: Jennifer Lawrence_x000D_
4 actor(s) have been in 4 common movies with Jennifer Lawrence. They are:_x000D_
Bradley Cooper:_x000D_
American Hustle_x000D_
Joy_x000D_
Serena_x000D_
Silver Linings Playbook_x000D_
_x000D_
Josh Hutcherson:_x000D_
The Hunger Games_x000D_
The Hunger Games: Catching Fire_x000D_
The Hunger Games: Mockingjay - Part 1_x000D_
The Hunger Games: Mockingjay - Part 2_x000D_
_x000D_
Liam Hemsworth:_x000D_
The Hunger Games_x000D_
The Hunger Games: Catching Fire_x000D_
The Hunger Games: Mockingjay - Part 1_x000D_
The Hunger Games: Mockingjay - Part 2_x000D_
_x000D_
Woody Harrelson:_x000D_
The Hunger Games_x000D_
The Hunger Games: Catching Fire_x000D_
The Hunger Games: Mockingjay - Part 1_x000D_
The Hunger Games: Mockingjay - Part 2_x000D_
_x000D_
Enter a name: Emma Watson_x000D_
2 actor(s) have been in 8 common movies with Emma Watson. They are:_x000D_
Daniel Radcliffe:_x000D_
Harry Potter and the Chamber of Secrets_x000D_
Harry Potter and the Deathly Hallows: Part 1_x000D_
Harry Potter and the Deathly Hallows: Part 2_x000D_
Harry Potter and the Goblet of Fire_x000D_
Harry Potter and the Half-Blood Prince_x000D_
Harry Potter and the Order of the Phoenix_x000D_
Harry Potter and the Philosopher's Stone_x000D_
Harry Potter and the Prisoner of Azkaban_x000D_
_x000D_
Rupert Grint:_x000D_
Harry Potter and the Chamber of Secrets_x000D_
Harry Potter and the Deathly Hallows: Part 1_x000D_
Harry Potter and the Deathly Hallows: Part 2_x000D_
Harry Potter and the Goblet of Fire_x000D_
Harry Potter and the Half-Blood Prince_x000D_
Harry Potter and the Order of the Phoenix_x000D_
Harry Potter and the Philosopher's Stone_x000D_
Harry Potter and the Prisoner of Azkaban_x000D_
_x000D_
Enter a name: amy adams_x000D_
1 actor(s) have been in 3 common movies with amy adams. They are:_x000D_
Philip Seymour Hoffman:_x000D_
Charlie Wilson's War_x000D_
Doubt_x000D_
The Master_x000D_
_x000D_
Enter a name: Gwyneth Paltrow _x000D_
2 actor(s) have been in 3 common movies with Gwyneth Paltrow. They are:_x000D_
Jude Law:_x000D_
Contagion_x000D_
Sky Captain and the World of Tomorrow_x000D_
The Talented Mr. Ripley_x000D_
_x000D_
Robert Downey Jr.:_x000D_
Iron Man_x000D_
Iron Man 2_x000D_
Iron Man 3_x000D_
_x000D_
Enter a name: John Doe_x000D_
John Doe is not a known actor_x000D_
_x000D_
Enter a name: moRgaN FReEmAn_x000D_
1 actor(s) have been in 4 common movies with Morgan Freeman. They are:_x000D_
Ashley Judd:_x000D_
Dolphin Tale_x000D_
Dolphin Tale 2_x000D_
High Crimes_x000D_
Kiss the Girls_x000D_
_x000D_
Enter a name:
Some things to note about how your program must work:
- If multiple actors have appeared in the most common movies with the query actor, then all of those actors must be listed.
- For each actor that has appeared in the most common movies with the query actor, all common movies must be listed.
- Actor names in the database file are capitalized (first letter uppercase, remaining letters lowercase), but when entering a query, the user can enter the letters of the name in any case, and your program should find the actor. (For examples, see the queries for Amy Adams and Morgan Freeman above.)
- If multiple actors have appeared in the most movies with an actor, then those actors should be listed in lexicographic order of their names. (For examples, see the queries above except Amy Adams and Morgan Freeman.) (A lexicographic ordering is like the ordering of words in a dictionary.
In Python, the relational operators, like<, give the lexicographic order of strings. So if you have two stringss1ands2, and ifs1 < s2, thens1appears befores2in a lexicographic ordering. This means that if you have a list of strings in Python, and you sort it, then they will be put in lexicographic order.) - The common movies an actor has with the query actor should be listed in lexicographic order. (As illustrated by all of the examples above.)
- The wording and spacing in the output of your program must be exactly as illustrated in the examples above. I will be testing your programs with software, and any deviations from the output shown above will cause your output to be flagged as incorrect. The
imdbtest.pyprogram included in the repository will check the output of your program. Make sure all test cases are flagged as correct.
The program you write should conform to the following requirements:
- The name of the file containing your program should be
imdb.py. The repository contains this file, and this file contains some starter code, which you should not modify. - It should take one command line argument that is the name of the csv file containing the movie information. The starter code in
imdb.pyreflects this requirement. - It should contain the definitions of a Movie class and an Actor class.
- The instance variables of the Movie class should be the name of the movie, and a list of actors who appear in the movie. This actor list should be a list of Actor objects, and not a list of Actor names.
- The instance variables of the Actor class should include the name of the actor, and a list of movies that the actor has appeared in. This movie list should be a list of Movie objects, and not a list of movie names. You may want to add additional instance variables to support a user query.
- It should contain the definition of a Imdb class. The instance variables of this class should be:
- a dictionary of movies, where the keys are the names of movies, and the values are the Movie objects.
- a dictionary of actors, where the keys are the names of actors, and the values are the Actor objects.
- The Imdb class should have a
runmethod that reads in from the user (with prompting) the name of an actor/actress, and calls another methodquerywith that name as parameter.queryshould return a string representation of the result of the query, whichrunshould print out.runcontinues to prompt the user for actor names, until the user enters an empty string in response to the prompt. - The
querymethod of theImdbclass should be as efficient as possible (in terms of running time).
Why store information about movies and actors in dictionaries? Because storing a new value in a dictionary and looking up a key in a dictionary are constant time operations, and you need that efficiency when building your data structures, especially when the database file is large.








Jermaine Byrant
Nicole Johnson



