I got to spend some time with my son this past summer.  He’s growing up and has his first job this year so my opportunities for spending blocks of time with him are dwindling.  He is a budding cinephile so one of the way we connect is discussing movies.  

While he was down here we started discussing “the best movies” and what the criteria for inclusion might be.  I was recalling that conversation last week and decided to try and do some data collection and machine learning to create a Top 100 Movies of All Time List.

I collected reviews, ratings, data sets – all the data I could get my hands on that had numerical representations of “movie scores”.  I normalized that data and then ran it through some machine learning techniques that I have been studying the last few months.

The Top 100 Movies of All Time:

Rank Movie Name Year
9.56 Shawshank Redemption, The 1994
9.38 Godfather, The 1972
9.32 Usual Suspects, The 1995
9.27 Schindler’s List 1993
9.19 Godfather: Part II, The 1974
9.19 Seven Samurai 1954
9.18 Rear Window 1954
9.17 Band of Brothers 2001
9.16 Casablanca 1942
9.15 Sunset Blvd. 1950
9.13 One Flew Over the Cuckoo’s Nest 1975
9.13 Dr. Strangelove 1964
9.13 Third Man, The 1949
9.11 City of God (Cidade de Deus) 2002
9.10 Lives of Others, The 2006
9.10 North by Northwest 1959
9.10 Paths of Glory 1957
9.09 Fight Club 1999
9.08 Double Indemnity 1944
9.08 12 Angry Men 1957
9.07 Cosmos 1980
9.07 Dark Knight, The 2008
9.07 Raiders of the Lost Ark 1981
9.06 Yojimbo 1961
9.05 Big Sleep, The 1946
9.04 All About Eve 1950
9.04 Spirited Away 2001
9.03 Chinatown 1974
9.03 Notorious 1946
9.02 Amelie 2001
9.02 M 1931
9.01 Star Wars: Episode IV – A New Hope 1977
9.01 To Kill a Mockingbird 1962
9.00 The Empire Strikes Back 1980
9.00 Maltese Falcon, The 1941
9.00 Matrix, The 1999
9.00 Thin Man, The 1934
8.99 Goodfellas 1990
8.99 Touch of Evil 1958
8.99 Black Mirror 2011
8.99 Wallace & Gromit: The Wrong Trousers 1993
8.98 Memento 2000
8.98 Silence of the Lambs, The 1991
8.98 Princess Bride, The 1987
8.98 Rashomon (Rashômon) 1950
8.98 Life Is Beautiful (La Vita è bella) 1997
8.97 Pulp Fiction 1994
8.97 Monty Python and the Holy Grail 1975
8.97 Decalogue, The (Dekalog) 1989
8.97 City Lights 1931
8.97 Ran 1985
8.97 Sting, The 1973
8.97 Philadelphia Story, The 1940
8.97 It Happened One Night 1934
8.96 Wallace & Gromit: A Close Shave 1995
8.95 On the Waterfront 1954
8.95 General, The 1926
8.95 Lawrence of Arabia 1962
8.95 Treasure of the Sierra Madre, The 1948
8.94 Shadow of a Doubt 1943
8.94 Strangers on a Train 1951
8.94 When We Were Kings 1996
8.94 Inception 2010
8.94 American Beauty 1999
8.93 Cinema Paradiso 1989
8.93 His Girl Friday 1940
8.93 American History X 1998
8.92 Grand Illusion 1937
8.92 My Neighbor Totoro 1988
8.92 Vertigo 1958
8.92 Witness for the Prosecution 1957
8.91 Raise the Red Lantern 1991
8.91 Bicycle Thieves 1948
8.91 Modern Times 1936
8.91 The Return of the King 2003
8.90 Song of the Little Road 1955
8.90 Boot, Das (Boat, The) 1981
8.90 Jean de Florette 1986
8.90 Nights of Cabiria 1957
8.90 The Fellowship of the Ring 2001
8.90 Great Escape, The 1963
8.89 400 Blows, The 1959
8.89 Blade Runner 1982
8.89 Man for All Seasons, A 1966
8.88 Intouchables 2011
8.88 Citizen Kane 1941
8.88 Lady Eve, The 1941
8.88 Duck Soup 1933
8.88 Thin Blue Line, The 1988
8.88 Stop Making Sense 1984
8.88 Once Upon a Time in the West 1968
8.88 Manchurian Candidate, The 1962
8.88 Fawlty Towers 1979
8.87 World of Apu, The 1959
8.87 Hustler, The 1961
8.87 Ikiru 1952
8.86 Celebration, The (Festen) 1998
8.86 Good, the Bad and the Ugly, The 1966
8.86 Henry V 1989
8.86 Creature Comforts 1989

The ratings have been rounded for display purposes but the movies are in the right order.  I tried to find a mix of data ( 1-10 stars, 1-100 percent ranking, thumbs up or down); from  critics and movie goers.

The biggest part of this project was collating and cleaning data.  It is not as sexy as Machine Learning but just as important.  Without clean data and quality feature engineering even the best deep learning algorithms are not particularly useful.

If you have data you need cleaned up


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *