Trending March 2024 # Student Performance Analysis And Prediction # Suggested April 2024 # Top 3 Popular

You are reading the article Student Performance Analysis And Prediction updated in March 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Student Performance Analysis And Prediction


Read more on what is predictive analytics for beginners here.

This article was published as a part of the Data Science Blogathon.

Table of Contents Understanding the Problem Statement

This project understands how the student’s performance (test scores) is affected by other variables such as Gender, Ethnicity, Parental level of education, and Lunch and Test preparation course.

The primary objective of higher education institutions is to impart quality education to their students. To achieve the highest level of quality in the education system, knowledge must be discovered to predict student enrollment in specific courses, identify issues with traditional classroom teaching models, detect unfair means used in online examinations, detect abnormal values in student result sheets, and predict student performance. This knowledge is hidden within educational datasets and can be extracted through data mining techniques.

Data Collection

Dataset Source – Students performance chúng tôi The data consists of 8 column and 1000 rows.

Import Data and Required Packages

Importing Pandas, Numpy, Matplotlib, Seaborn and Warings Library.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import warnings warnings.filterwarnings("ignore")

Import the CSV Data as Pandas DataFrame

df = pd.read_csv("data/StudentsPerformance.csv")

Show the top 5 Recoreds


show the top 5 records on the dataset and look at the features.

To see the shape of the dataset


And it will help to find the shape of the dataset.

Dataset Information

lunch: having lunch before test (standard or free/reduced)

test preparation course: complete or not complete before test

math score

reading score

writing score

After that, we check the data as the next step. There are a number of categorical features contained in the dataset, including multiple missing value kinds, duplicate values, check data types, and a number of unique value types.

Data Checks to Perform

Check Missing values

Check Duplicates

Check data type

Check the number of unique values in each column

Check the statistics of the data set

Check various categories present in the different categorical column

Check Missing Values

To check every column of the missing values or null values in the dataset.


If there are no missing values in the dataset.

Check Duplicates

If checking the our dataset has any duplicated values present or not


There are no duplicates values in the dataset.

Check the Data Types

To check the information of the dataset like datatypes, any null values present in the dataset.

#check the null and Dtypes Check the Number of Unique Values in Each Column df.nunique() Check Statistics of the Data Set

To examine the dataset’s statistics and determine the data’s statistics.


The numerical data shown above shows that all means are fairly similar to one another, falling between 66 and 68.05.

The range of all standard deviations, between 14.6 and 15.19, is also narrow.

While there is a minimum score of 0 for math, the minimums for writing and reading are substantially higher at 10 and 17, respectively.

We don’t have any duplicate or missing values, and the following codes will provide a good data checking.

Exploring Data print("Categories in 'gender' variable: ",end=" ") print(df["gender"].unique()) print("Categories in 'race/ethnicity' variable: ",end=" ") print(df["race/ethnicity"].unique()) print("Categories in 'parental level of education' variable: ",end=" ") print(df["parental level of education"].unique()) print("Categories in 'lunch' variable: ",end=" ") print(df["lunch"].unique()) print("Categories in 'test preparation course' variable: ",end=" ") print(df["test preparation course"].unique())

The unique values in the dataset will be provided and presented in a pleasant way in the code above.

The output will following:

We define the numerical and categorical columns:

#define numerical and categorical columns numeric_features = [feature for feature in df.columns if df[feature].dtype != "object"] categorical_features = [feature for feature in df.columns if df[feature].dtype == "object"] print("We have {} numerical features: {}".format(len(numeric_features),numeric_features)) print("We have {} categorical features: {}".format(len(categorical_features),categorical_features))

The above code will use separate the numerical and categorical features and count the feature values.

Exploring Data (Visualization) Visualize Average Score Distribution to Make Some Conclusion


Kernel Distribution Function (KDE)

Histogram & KDE

Gender Column

How is distribution of Gender?

Is gender has any impact on student’s performance?

# Create a figure with two subplots f,ax=plt.subplots(1,2,figsize=(8,6)) # Create a countplot of the 'gender' column and add labels to the bars sns.countplot(x=df['gender'],data=df,palette ='bright',ax=ax[0],saturation=0.95) for container in ax[0].containers: ax[0].bar_label(container,color='black',size=15) # Set font size of x-axis and y-axis labels and tick labels ax[0].set_xlabel('Gender', fontsize=14) ax[0].set_ylabel('Count', fontsize=14) ax[0].tick_params(labelsize=14) # Create a pie chart of the 'gender' column and add labels to the slices plt.pie(x=df['gender'].value_counts(),labels=['Male','Female'],explode=[0,0.1],autopct='%1.1f%%',shadow=True,colors=['#ff4d4d','#ff8000'], textprops={'fontsize': 14}) # Display the plot

Gender has balanced data with female students are 518 (48%) and male students are 482 (52%)

Race/Ethnicity Column # Define a color palette for the countplot colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'] # blue, orange, green, red, purple are respectiively the color names for the color codes used above # Create a figure with two subplots f, ax = plt.subplots(1, 2, figsize=(12, 6)) # Create a countplot of the 'race/ethnicity' column and add labels to the bars sns.countplot(x=df['race/ethnicity'], data=df, palette=colors, ax=ax[0], saturation=0.95) for container in ax[0].containers: ax[0].bar_label(container, color='black', size=14) # Set font size of x-axis and y-axis labels and tick labels ax[0].set_xlabel('Race/Ethnicity', fontsize=14) ax[0].set_ylabel('Count', fontsize=14) ax[0].tick_params(labelsize=14) # Create a dictionary that maps category names to colors in the color palette color_dict = dict(zip(df['race/ethnicity'].unique(), colors)) # Map the colors to the pie chart slices pie_colors = [color_dict[race] for race in df['race/ethnicity'].value_counts().index] # Create a pie chart of the 'race/ethnicity' column and add labels to the slices plt.pie(x=df['race/ethnicity'].value_counts(), labels=df['race/ethnicity'].value_counts().index, explode=[0.1, 0, 0, 0, 0], autopct='%1.1f%%', shadow=True, colors=pie_colors, textprops={'fontsize': 14}) # Set the aspect ratio of the pie chart to 'equal' to make it a circle plt.axis('equal') # Display the plot

Most of the student belonging from group C /group D.

Lowest number of students belong to group A.

Parental Level of Education Column plt.rcParams['figure.figsize'] = (15, 9)'fivethirtyeight') sns.histplot(df["parental level of education"], palette = 'Blues') plt.title('Comparison of Parental Education', fontweight = 30, fontsize = 20) plt.xlabel('Degree') plt.ylabel('count')

Largest number of parents are from college.

Bivariate Analysis df.groupby('parental level of education').agg('mean').plot(kind='barh',figsize=(10,10)) plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

The score of student whose parents possess master and bachelor level education are higher than others.

Maximum Score of Students in All Three Subjects plt.fig figsize=(18,8)) plt.subplot(1, 4, 1) plt.title('MATH SCORES') sns.violinplot(y='math score',data=df,color='red',linewidth=3) plt.subplot(1, 4, 2) plt.title('READING SCORES') plot(y='reading score',data=df,color='green',linewidth=3) plt.subplot(1, 4, 3) plt.title('WRITING SCORES') sns.violinplot(y='writing score',data=df,color='blue',linewidth=3)

From the above three plots its clearly visible that most of the students score in between 60-80 in Maths whereas in reading and writing most of them score from 50-80.

Multivariate Analysis Using Pie Plot # Set figure size plt.rcParams['figure.figsize'] = (12, 9) # First row of pie charts plt.subplot(2, 3, 1) size = df['gender'].value_counts() labels = 'Female', 'Male' color = ['red','green'] plt.pie(size, colors=color, labels=labels, autopct='%.2f%%') plt.title('Gender', fontsize=20) plt.axis('off') plt.subplot(2, 3, 2) size = df['race/ethnicity'].value_counts() labels = 'Group C', 'Group D', 'Group B', 'Group E', 'Group A' color = ['red', 'green', 'blue', 'cyan', 'orange'] plt.pie(size, colors=color, labels=labels, autopct='%.2f%%') plt.title('Race/Ethnicity', fontsize=20) plt.axis('off') plt.subplot(2, 3, 3) size = df['lunch'].value_counts() labels = 'Standard', 'Free' color = ['red', 'green'] plt.pie(size, colors=color, labels=labels, autopct='%.2f%%') plt.title('Lunch', fontsize=20) plt.axis('off') # Second row of pie charts plt.subplot(2, 3, 4) size = df['test preparation course'].value_counts() labels = 'None', 'Completed' color = ['red', 'green'] plt.pie(size, colors=color, labels=labels, autopct='%.2f%%') plt.title('Test Course', fontsize=20) plt.axis('off') plt.subplot(2, 3, 5) size = df['parental level of education'].value_counts() labels = 'Some College', "Associate's Degree", 'High School', 'Some High School', "Bachelor's Degree", "Master's Degree" color = ['red', 'green', 'blue', 'cyan', 'orange', 'grey'] plt.pie(size, colors=color, labels=labels, autopct='%.2f%%') plt.title('Parental Education', fontsize=20) plt.axi ff') # Remove extra subplot plt.subplot(2, 3, 6).remove() # Add super title plt.suptitle('Comparison of Student Attributes', fontsize=20, fontweight='bold') # Adjust layout and show plot # This is removed as there are only 5 subplots in this figure and we want to arrange them in a 2x3 grid. # Since there is no 6th subplot, it is removed to avoid an empty subplot being shown in the figure. plt.tight_layout() plt.subplots_adjust(top=0.85)

The number of Male and Female students is almost equal.

The number of students is higher in Group C.

The number of students who have standard lunch is greater.

The number of students who have not enrolled in any test preparation course is greater.

The number of students whose parental education is “Some College” is greater followed closely by “Associate’s Degree”.

From the above plot, it is clear that all the scores increase linearly with each other.

Student’s Performance is related to lunch, race, and parental level education.

Females lead in pass percentage and also are top-scorers.

Student Performance is not much related to test preparation course.

The finishing preparation course is beneficial.

Model Training

Import Data and Required Packages

Importing scikit library algorithms to import regression algorithms.

# Modelling from sklearn.metrics import mean_squared_error, r2_score from sklearn.neighbors import KNeighborsRegressor from chúng tôi import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor from chúng tôi import SVR from sklearn.linear_model import LinearRegression,Lasso from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error from sklearn.model_selection imp RandomizedSearchCV from catboost import CatBoostRegressor from xgboost import XGBRegressor import warnings Splitting the X and Y Variables

This separation of the dependent variable(y) and independent variables(X) is one the most important in our project we use the math score as a dependent variable. Because so many students lack in math subjects it will almost 60% to 70% of students in classes 7-10 students are fear of math subjects that’s why I am choosing the math score as a dependent score.

It will use to improve the percentage of math scores and increase the grad f students and also remove fear in math.

X = df.drop(columns="math score",axis=1) y = df["math score"] Create Column Transformer with 3 Types  of Transformers num_features = X.select_dtypes(exclude="object").columns cat_features = X.select_dtypes(include="object").columns from sklearn.preprocessing import OneHotEncoder,StandardScaler numeric_transformer = StandardScaler() oh_transformer = OneHotEncoder() preprocessor = Column transformer( [ ("OneHotEncoder", oh_transformer, cat_features), ("StandardScaler", numeric_transformer, num_features), ] ) X = preprocessor.fit_transform(X) Separate Dataset into Train and Test

To separate the dataset into train and test to identify the training size and testing size of the dataset.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42) X_train.shape, X_test.shape

Create an Evaluate Function for Model Training

This function is use to evaluate the model and build a good model.

def evaluate_model(true, predicted): mae = mean_absolute_error(true, predicted) mse = mean_squared_error(true, predicted) rmse = np.sqrt(mean_squared_error(true, predicted)) r2_square = r2_score(true, predicted) return mae, mse, rmse, r2_square

To create a models variable and form a dictionary formate.

models = { "Linear Regression": LinearRegression(), "Lasso": Lasso(), "K-Neighbors Regressor": KNeighborsRegressor(), "Decision Tree": DecisionTreeRegressor(), "Random Forest Regressor": RandomForestRegressor(), "Gradient Boosting": GradientBoostingRegressor(), "XGBRegressor": XGBRegressor(), "CatBoosting Regressor": CatBoostRegressor(verbose=False), "AdaBoost Regressor": AdaBoostRegressor() } model_list = [] r2_list =[] for i in range(len(list(models))): model = list(models.values())[i], y_train) # Train model # Make predictions y_train_pred = model.predict(X_train) y_test_pred = model.predict(X_test) # Evaluate Train and Test dataset model_train_mae, model_train_mse, model_train_rmse, model_train_r2 = evaluate_model(y_train, y_train_pred) model_test_mae, model_test_mse, model_test_rmse, model_test_r2 = evaluate_model(y_test, y_test_pred) print(list(models.keys())[i]) model_list.append(list(models.keys())[i]) print('Model performance for Training set') print("- Root Mean Squared Error: {:.4f}".format(model_train_rmse)) print("- Mean Squared Error: {:.4f}".format(model_train_mse)) print("- Mean Absolute Error: {:.4f}".format(model_train_mae)) print("- R2 Score: {:.4f}".format(model_train_r2)) print('Model performance for Test set') print("- Root Mean Squared Error: {:.4f}".format(model_test_rmse)) print("- Mean Squared Error: {:.4f}".format(model_test_rmse)) print("- Mean Absolute Error: {:.4f}".format(model_test_mae)) print("- R2 Score: {:.4f}". for model_test_r2)) r2_list.append(model_test_r2) print('='*35) print('n')

The output of before tuning all algorithms’ hyperparameters. And it provides the RMSE, MSE, MAE, and R2 score values for training and test data.

Hyperparameter Tuning

It will give the model with most accurate predictions and improve prediction accuracy.

This will give the optimized value of hyperparameters, which maximize your model predictive accuracy.

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV from sklearn.metrics import make_scorer # Define hyperparameter ranges for each model param_grid = { "Linear Regression": {}, "Lasso": {"alpha": [1]}, "K-Neighbors Regressor": {"n_neighbors": [3, 5, 7],}, "Decision Tree": {"max_depth": [3, 5, 7],'criterion':['squared_error', 'friedman_mse', 'absolute_error', 'poisson']}, "Random Forest Regressor": {'n_estimators': [8,16,32,64,128,256], "max_depth": [3, 5, 7]}, "Gradient Boosting": {'learning_rate':[.1,.01,.05,.001],'subsample':[0.6,0.7,0.75,0.8,0.85,0.9], 'n_estimators': [8,16,32,64,128,256]}, "XGBRegressor": {'depth': [6,8,10],'learning_rate': [0.01, 0.05, 0.1],'iterations': [30, 50, 100]}, "CatBoosting Regressor": {"iterations": [100, 500], "depth": [3, 5, 7]}, "AdaBoost Regressor": {'learning_rate':[.1,.01,0.5,.001],'n_estimators': [8,16,32,64,128,256]} } model_list = [] r2_list =[] for model_name, model in models.items(): # Create a scorer object to use in grid search scorer = make_scorer(r2_score) # Perform grid search to find the best hyperparameters grid_search = GridSearchCV( model, param_grid[model_name], scoring=scorer, cv=5, n_jobs=-1 ) # Train the model with the best hyperparameters model_test_r2)) r2_list.append(model_test_r2) print('='*35) print('n') Outputs

The output of after tuning all algorithms’ hyperparameters. And it provides the RMSE, MSE, MAE, and R2 score values for training and test data.

If we choose Linear regression as the final model because that model will get a training set r2 score is 87.42 and a testing set r2 score is 88.03.

Model Selection

This is used to select the best model of all of the regression algorithms.

In linear regression, we got 88.03 curacy in all of the regression models that’s why we choose model.

pd.DataFrame(list(zip(model_list, r2_list)), columns=['Model Name', 'R2_Score']).sort_values(by=["R2_Score"],ascending=False)

 Accuracy of the model is 88.03%

plt.scatter(y_test,y_pred) plt.xlabel('Actual') plt.ylabel('Predicted')

sns.regplot(x=y_test,y=y_pred,ci=None,color ='red')

Difference Between Actual and Predicted Values pred_df=pd.DataFrame({'Actual Value':y_test,'Predicted Value':y_pred,'Difference':y_test-y_pred}) pred_df

Convert the Model to Pickle File # loading library import pickle # create an iterator object with write permission - model.pkl with open('model_pkl', 'wb') as files: pickle.dump(model, files) # load saved model with open('model_pkl' , 'rb') as f: lr = pickle.load(f) Conclusion

This brings us to an end to the student’s performance prediction. Let us review our work. First, we started by defining our problem statement, looking into the algorithms we were going to use and the regression implementation pipeline. Then we moved on to practically implementing the identification and regression algorithms like Linear Regression, Lasso, K-Neighbors Regressor, Decision Tree, Random Forest Regressor, XGBRegressor, CatBoosting Regressor, and AdaBoost Regressor. Moving forward, we compared the performances of these models. Lastly, we built a Linear regression model that proved that it works best for student performance prediction problems.

The key takeaways from this  student performance prediction are:

Identification of student performance prediction is important for many institutions.

Linear regression gives better accuracy compared to other regression problems.

Linear regression is the best fit for the problem

Linear regression provides an accuracy of 88%, giving out the most accurate results.

I hope you like my article on “Student performance analysis and prediction.” The entire code can be found in my GitHub repository. You can connect with me here on LinkedIn.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 


You're reading Student Performance Analysis And Prediction

Conflict And Levels Of Analysis

Conflict arises when two or more parties believe their interests are incompatible, express a hostile attitude, or pursue their interests through actions that harm the other party. Our modern world experiences varying degrees of conflict that involve religious, tribal, national, racial, and sociocultural tendencies. These macro aspects of the conflict are overt or outward expressions of the offended group’s dissatisfaction, designed to harm the other group(s) or to diminish, if not eliminate, existing intergroup relationships.

A conflict that has moved to the macro level becomes complicated for the parties involved to resolve their differences personally unaided. Therefore, this article examines effective conflict resolution and management styles from a macroeconomic perspective.

Levels of Analysis in Conflict Resolution between Individuals

Life is full of conflict. Most conflicts between people occur in the context of human society. The exception is if the two are on a deserted island. Conflict resolution methods can be classified according to the degree to which the surrounding social context determines the outcome.


Morals are a set of rules for resolving conflicts that almost everyone in society agrees on. Therefore, this is the highest level of conflict resolution.


Politics is the process of winning over as many people as possible. The general idea is that the more people agree with you, the more likely you will be the winner.

Direct Action

Direct action refers to methods of conflict resolution that either ignore or positively avoid interaction with society outside of the conflicting parties (of course, all members of society can be involved in a major conflict, in which case there is no external intervention). Hence it is the lowest level of conflict resolution.

Levels of Conflict in the Workplace

It includes −


This level relates to an internal dispute and affects only one person. This conflict arises from your thoughts, emotions, ideas, values, and dispositions. For example, this can happen when you struggle between what you “want” to do and what you “should” do.


This conflict occurs between two or more people in a larger organization. This may result from different personalities or views on goal attainment. Interpersonal conflicts can even arise without either party realizing that the conflict has ever happened.


This conflict occurs between members of the same group when multiple people with different opinions backgrounds, and work towards a common goal. While they may all want to achieve the same goal, they have differing views. Conflicts within the group can also arise when team members have different communication styles and personalities.


This conflict occurs between groups within a larger organization or groups with different common goals.

Macro & Micro Level

Conflicts can occur at either the macro or micro level or between the two. Macro refers to large-scale, global events, such as conflicts involving two or more countries. Micro conflicts occur between two people or between small groups. Most peace and conflict studies academics study at either the macro or micro level, but they occasionally try to generalise back and forth between them. However, in recent years, several researchers have focused on the level between macro and micro. Conflict resolution practitioners deal with big groups and organisations and small groups and interpersonal interactions.

Dealing with Conflict at All Levels

Conflict can be constructive in the workplace because it opens up new ideas and perspectives for employees and creates opportunities to find new and unique solutions to problems. Here are a few steps to resolve any level of conflict in the workplace.

Dealing with Intrapersonal Conflict

Intrapersonal conflict can occur daily, but managing it can sharpen your critical thinking and decision-making skills. To deal with intrapersonal conflict −

Follow Your Values: Determine how conflict affects your core values and what is essential to your productivity in the workplace. Then, think of solutions that match your beliefs and motivations.

Check Company Policies: If applicable, check company policies related to the conflict. Then, follow established procedures or ask your manager for direction.

Write down the Conflict: Review the pros and cons of your conflict and anticipate the results of different resolutions. Then, consider choosing a resolution that offers more benefits or better results.

Do not Forget the Time: Think about how much time you have to find a solution. Then, consider setting a deadline to ensure the conflict is resolved quickly.

Interpersonal Conflict Management

Interpersonal conflict management enables team members to work together towards a solution. Colleagues can improve their relationships and even develop entirely new strategies or problem-solving. Here are four steps you can use to resolve interpersonal conflict in the workplace −

Define Conflict: Begin by identifying exactly what the conflict is, including what caused it and how each party responded to the situation has reacted. Then, look at the situation from each person’s perspective to determine what each party wants and needs to resolve.

Put the Conflict in the Context: Discuss the impact of the conflict on each party, project, and workplace. This step can help each side understand the importance of resolving the conflict and motivate them to work together to find a solution.

Creation Options: Let each side come up with an idea for solving the conflict, each side taking turns. This step allows each side to determine how the conflict can be resolved peacefully. The parties can also brainstorm together to find solutions that benefit all parties.

Agree on a Decision: As a group, find a decision that will benefit each party. Consider including a goal setting in this step to evaluate and measure the progress of the solution.

Managing Intragroup Conflict

Managing intragroup conflict can help keep employees productive and ensure teams achieve group goals. Here are three steps you can take to resolve intragroup conflicts effectively −

Discuss the Conflict as a Team: Openly discuss what caused the conflict and what each side thinks about it. This step ensures that everyone is involved in finding a solution and can discuss the problem honestly. Next, ask each team member to explain why they hold their position and discuss the information behind those beliefs.

Small Group Collaboration: Divide the team into smaller groups with different viewpoints. Analyze the conflict and discuss the pros and cons of different solutions. Then, get together as a team and ask the groups to share their ideas. Smaller groups can allow for a more thorough discussion since fewer people are trying to argue their points simultaneously.

Make a Decision: As a team, decide what course of action to take or determine if further brainstorming is needed. Finally, ensure everyone is happy with the decision and committed to the proposed strategy.

Managing Conflicts between Groups

You can use conflicts between groups to build relationships between teams, generate new creative ideas, and increase employees’ confidence in overcoming future conflicts. Here are three steps to help get you started −

Discuss the Issue with all Relevant Parties: You may engage in conversation with large groups, such as in an open forum. This situation may work for issues that affect a large group of people and can be used to hear various perspectives, ideas and concerns with a smaller group of stakeholders.

Gather a variety of Possible Solutions: Encourage each side to hold meetings to discuss issues as they arise. You can move team members from one team to another so they can better see the issue from the other team’s perspective. Then brainstorm together the solutions that will have the most positive impact. Finally, to come to a decision, you should conduct a poll to assess each party’s interest in the proposed solutions.


Conflicts erupt in all areas of life and can be effectively dealt with by addressing the root cause of the problem and gradually resolving the conflict. Conflicts occur at numerous levels. These levels are interrelated and consistently linked to one another and should not be considered independent entities. Furthermore, conflict and violence can exist on more than one level, and when this occurs, the attempts to address them must be multilayer as well. Conflicts can occur at either the macro or micro level, or in between. Individual, social, international, and planetary levels of conflict can be distinguished from the macro and micro levels of conflict. Another method for analysing conflict levels is to consider the current problem, the relational aspect, the subsystem, and the systemic concerns.

Movie Recommendation And Rating Prediction Using K

Recommendation systems are becoming increasingly important in today’s hectic world. People are always in the lookout for products/services that are best suited for them. Therefore, the recommendation systems are important as they help them make the right choices, without having to expend their cognitive resources. In this blog, we will understand the basics of knn recommender system and learn how to build a Movie Recommendation System using collaborative filtering by implementing the K-Nearest Neighbors algorithm. We will also predict the rating of the given movie based on its neighbors and compare it with the actual rating.

Types of kNN Recommender System

Recommendation systems can be broadly classified into 3 types —

Collaborative Filtering

Content-Based Filtering

Hybrid Recommendation Systems

Collaborative Filtering

Further, there are several types of collaborative filtering algorithms —

User-User Collaborative Filtering: Try to search for lookalike customers and offer products based on what his/her lookalike has chosen.

Item-Item Collaborative Filtering: It is very similar to the previous algorithm, but instead of finding a customer lookalike, we try finding item lookalike. Once we have an item lookalike matrix, we can easily recommend alike items to a customer who has purchased an item from the store.

Other algorithms: There are other approaches like market basket analysis, which works by looking for combinations of items that occur together frequently in transactions.

Collaborative v/s Content-based filtering illustration

Content-based Filtering

These filtering methods are based on the description of an item and a profile of the user’s preferred choices. In a content-based recommendation system, keywords are used to describe the items, besides, a user profile is built to state the type of item this user likes. In other words, the algorithms try to recommend products that are similar to the ones that a user has liked in the past.

Hybrid Recommendation Systems

Hybrid Recommendation block diagram

Recent research has demonstrated that a hybrid approach, combining collaborative filtering and content-based filtering could be more effective in some cases. Hybrid approaches can be implemented in several ways, by making content-based and collaborative-based predictions separately and then combining them, by adding content-based capabilities to a collaborative-based approach (and vice versa), or by unifying the approaches into one model.

Netflix is a good example of the use of hybrid recommender systems. The website makes recommendations by comparing the watching and searching habits of similar users (i.e. collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering).

Now that we’ve got a basic intuition of Recommendation Systems, let’s start with building a simple Movie Recommendation System in Python.

Find the Python notebook with the entire code along with the dataset and all the illustrations here.

TMDb — The Movie Database

The Movie Database (TMDb) is a community built movie and TV database which has extensive data about movies and TV Shows. Here are the stats —

For simplicity and easy computation, I have used a subset of this huge dataset which is the TMDb 5000 dataset. It has information about 5000 movies, split into 2 CSV files.

tmdb_5000_movies.csv: Contains information like the score, title, date_of_release, genres, etc.

tmdb_5000_credits.csv: Contains information of the cast and crew for each movie.

The link to the Dataset is here.

Step 1 — Import the dataset

Import the required Python libraries like Pandas, Numpy, Seaborn, and Matplotlib. Then import the CSV files using read_csv() function predefined in Pandas.

movies = pd.read_csv('../input/tmdb-movie-metadata/tmdb_5000_movies.csv') credits = pd.read_csv('../input/tmdb-movie-metadata/tmdb_5000_credits.csv')

Step 2 — Data Exploration and Cleaning

We will initially use the head(), describe() function to view the values and structure of the dataset, and then move ahead with cleaning the data.


Python Code:

Similarly, we can get an intuition of the credits dataframe and get an output as follows —


Checking the dataset, we can see that genres, keywords, production_companies, production_countries, spoken_languages are in the JSON format. Similarly in the other CSV file, cast and crew are in the JSON format. Now let’s convert these columns into a format that can be easily read and interpreted. We will convert them into strings and later convert them into lists for easier interpretation.

The JSON format is like a dictionary (key: value) pair embedded in a string. Generally, parsing the data is computationally expensive and time-consuming. Luckily this dataset doesn’t have that complicated structure. A basic similarity between the columns is that they have a name key, which contains the values that we need to collect. The easiest way to do so parse through the JSON and check for the name key on each row. Once the name key is found, store the value of it into a list and replace the JSON with the list.

# changing the genres column from json to string for index,i in zip(movies.index,movies['genres']): list1 = [] for j in range(len(i)): list1.append((i[j]['name'])) # the key 'name' contains the name of the genre movies.loc[index,'genres'] = str(list1)

In a similar fashion, we will convert the JSON to a list of strings for the columns: keywords, production_companies, cast, and crew. We will check if all the required JSON columns have been converted to strings using movies.iloc[index]

Details of the movie at index 25

Step 3 — Merge the 2 CSV files

We will merge the movies and credits dataframes and select the columns which are required and have a unified movies dataframe to work on.

movies = movies.merge(credits, left_on='id', right_on='movie_id', how='left') movies = movies[['id', 'original_title', 'genres', 'cast', 'vote_average', 'director', 'keywords']]

We can check the size and attributes of movies like this —

  Step 4 — Working with the Genres column

We will clean the genre column to find the genre_list

movies['genres'] = movies['genres'].str.strip('[]').str.replace(' ','').str.replace("'",'') movies['genres'] = movies['genres'].str.split(',')

Let’s plot the genres in terms of their occurrence to get an insight into movie genres in terms of popularity.

plt.subplots(figsize=(12,10)) list1 = [] for i in movies['genres']: list1.extend(i) ax = pd.Series(list1).value_counts()[:10].sort_values(ascending=True).plot.barh(width=0.9,color=sns.color_palette('hls',10)) for i, v in enumerate(pd.Series(list1).value_counts()[:10].sort_values(ascending=True).values): ax.text(.8, i, v,fontsize=12,color='white',weight='bold') plt.title('Top Genres')

Drama appears to be the most popular genre followed by Comedy

Now let’s generate a list ‘genreList’ with all possible unique genres mentioned in the dataset.

genreList = [] for index, row in movies.iterrows(): genres = row["genres"] for genre in genres: if genre not in genreList: genreList.append(genre) genreList[:10] #now we have a list with unique genres

Unique genres

One Hot Encoding for Multiple Labels

‘genreList’ will now hold all the genres. But how do we come to know about the genres each movie falls into. Now some movies will be ‘Action’, some will be ‘Action, Adventure’, etc. We need to classify the movies according to their genres.

Let’s create a new column in the dataframe that will hold the binary values whether a genre is present or not in it. First, let’s create a method that will return back a list of binary values for the genres of each movie. The ‘genreList’ will be useful now to compare against the values.

Let’s say for example we have 20 unique genres in the list. Thus the below function will return a list with 20 elements, which will be either 0 or 1. Now for example we have a Movie which has genre = ‘Action’, then the new column will hold [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0].

Similarly for ‘Action, Adventure’ we will have, [1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]. Converting the genres into such a list of binary values will help in easily classifying the movies by their genres.

def binary(genre_list): binaryList = [] for genre in genreList: if genre in genre_list: binaryList.append(1) else: binaryList.append(0) return binaryList

Applying the binary() function to the ‘genres’ column to get ‘genre_list’

We will follow the same notations for other features like the cast, director, and the keywords.

movies['genres_bin'] = movies['genres'].apply(lambda x: binary(x)) movies['genres_bin'].head()

genre_list columns values

Step 5 — Working with the Cast column

Let’s plot a graph of Actors with Highest Appearances

plt.subplots(figsize=(12,10)) list1=[] for i in movies['cast']: list1.extend(i) ax=pd.Series(list1).value_counts()[:15].sort_values(ascending=True).plot.barh(width=0.9,color=sns.color_palette('muted',40)) for i, v in enumerate(pd.Series(list1).value_counts()[:15].sort_values(ascending=True).values): ax.text(.8, i, v,fontsize=10,color='white',weight='bold') plt.title('Actors with highest appearance')

Samuel Jackson aka Nick Fury from Avengers has appeared in maximum movies. I initially thought that Morgan Freeman might be the actor with maximum movies, but Data wins over assumptions!

When I initially created the list of all the cast, it had around 50k unique values, as many movies have entries for about 15–20 actors. But do we need all of them? The answer is No. We just need the actors who have the highest contribution to the movie. For eg: The Dark Knight franchise has many actors involved in the movie. But we will select only the main actors like Christian Bale, Micheal Caine, Heath Ledger. I have selected the main 4 actors from each movie.

One question that may arise in your mind is how do you determine the importance of the actor in the movie. Luckily, the sequence of the actors in the JSON format is according to the actor’s contribution to the movie.

Let’s see how we do that and create a column ‘cast_bin’

for i,j in zip(movies['cast'],movies.index): list2 = [] list2 = i[:4] movies.loc[j,'cast'] = str(list2) movies['cast'] = movies['cast'].str.strip('[]').str.replace(' ','').str.replace("'",'') movies['cast'] = movies['cast'].str.split(',') for i,j in zip(movies['cast'],movies.index): list2 = [] list2 = i list2.sort() movies.loc[j,'cast'] = str(list2) movies['cast']=movies['cast'].str.strip('[]').str.replace(' ','').str.replace("'",'')

castList = [] for index, row in movies.iterrows(): cast = row["cast"] for i in cast: if i not in castList: castList.append(i)

movies[‘cast_bin’] = movies[‘cast’].apply(lambda x: binary(x)) movies[‘cast_bin’].head()

cast_bin column values

Step 6 — Working with the Directors column

Let’s plot Directors with maximum movies

def xstr(s): if s is None: return '' return str(s) movies['director'] = movies['director'].apply(xstr)

plt.subplots(figsize=(12,10)) ax = movies[movies['director']!=''].director.value_counts()[:10].sort_values(ascending=True).plot.barh(width=0.9,color=sns.color_palette('muted',40)) for i, v in enumerate(movies[movies['director']!=''].director.value_counts()[:10].sort_values(ascending=True).values): ax.text(.5, i, v,fontsize=12,color='white',weight='bold') plt.title('Directors with highest movies')

We create a new column ‘director_bin’ as we have done earlier

directorList=[] for i in movies['director']: if i not in directorList: directorList.append(i)

movies['director_bin'] = movies['director'].apply(lambda x: binary(x)) movies.head()

So finally, after all this work we get the movies dataset as follows

Movies dataframe after One Hot Encoding

Step 7 — Working with the Keywords column

The keywords or tags contain a lot of information about the movie, and it is a key feature in finding similar movies. For eg: Movies like “Avengers” and “Ant-man” may have common keywords like superheroes or Marvel.

For analyzing keywords, we will try something different and plot a word cloud to get a better intuition:

from wordcloud import WordCloud, STOPWORDS import nltk from nltk.corpus import stopwords

plt.subplots(figsize=(12,12)) stop_words = set(stopwords.words('english')) stop_words.update(',',';','!','?','.','(',')','

    Above is a word cloud showing the major keywords or tags used for describing the movies

We find ‘words_bin’ from Keywords as follows —

movies['keywords'] = movies['keywords'].str.strip('[]').str.replace(' ','').str.replace("'",'').str.replace('"','') movies['keywords'] = movies['keywords'].str.split(',') for i,j in zip(movies['keywords'],movies.index): list2 = [] list2 = i movies.loc[j,'keywords'] = str(list2) movies['keywords'] = movies['keywords'].str.strip('[]').str.replace(' ','').str.replace("'",'') movies['keywords'] = movies['keywords'].str.split(',') for i,j in zip(movies['keywords'],movies.index): list2 = [] list2 = i list2.sort() movies.loc[j,'keywords'] = str(list2) movies['keywords'] = movies['keywords'].str.strip('[]').str.replace(' ','').str.replace("'",'') movies['keywords'] = movies['keywords'].str.split(',')

words_list = [] for index, row in movies.iterrows(): genres = row["keywords"] for genre in genres: if genre not in words_list: words_list.append(genre)

movies['words_bin'] = movies['keywords'].apply(lambda x: binary(x)) movies = movies[(movies['vote_average']!=0)] #removing the movies with 0 score and without drector names movies = movies[movies['director']!='']

Step 8 — Similarity between movies

We will be using Cosine Similarity for finding the similarity between 2 movies. How does cosine similarity work?

Let’s say we have 2 vectors. If the vectors are close to parallel, i.e. angle between the vectors is 0, then we can say that both of them are “similar”, as cos(0)=1. Whereas if the vectors are orthogonal, then we can say that they are independent or NOT “similar”, as cos(90)=0.

Recommendation System using K-Nearest Neighbors – Cosine Similarity

For a more detailed study, follow this link.

Below I have defined a function Similarity, which will check the similarity between the movies.

from scipy import spatial

def Similarity(movieId1, movieId2): a = movies.iloc[movieId1] b = movies.iloc[movieId2] genresA = a['genres_bin'] genresB = b['genres_bin'] genreDistance = spatial.distance.cosine(genresA, genresB) scoreA = a['cast_bin'] scoreB = b['cast_bin'] scoreDistance = spatial.distance.cosine(scoreA, scoreB) directA = a['director_bin'] directB = b['director_bin'] directDistance = spatial.distance.cosine(directA, directB) wordsA = a['words_bin'] wordsB = b['words_bin'] wordsDistance = spatial.distance.cosine(directA, directB) return genreDistance + directDistance + scoreDistance + wordsDistance

Let’s check the Similarity between 2 random movies

We see that the distance is about 2.068, which is high. The more the distance, the less similar the movies are. Let’s see what these random movies actually were.

It is evident that The Dark Knight Rises and How to train your Dragon 2 are very different movies. Thus the distance is huge.

Step 9 — Score Predictor (the final step!)

So now when we have everything in place, we will now build the score predictor. The main function working under the hood will be the Similarity() function, which will calculate the similarity between movies, and will find 10 most similar movies. These 10 movies will help in predicting the score for our desired movie. We will take the average of the scores of similar movies and find the score for the desired movie.

Now the similarity between the movies will depend on our newly created columns containing binary lists. We know that features like the director or the cast will play a very important role in the movie’s success. We always assume that movies from David Fincher or Chris Nolan will fare very well. Also if they work with their favorite actors, who always fetch them success and also work on their favorite genres, then the chances of success are even higher. Using these phenomena, let’s try building our score predictor.

import operator

def predict_score(): name = input('Enter a movie title: ') new_movie = movies[movies['original_title'].str.contains(name)].iloc[0].to_frame().T print('Selected Movie: ',new_movie.original_title.values[0]) def getNeighbors(baseMovie, K): distances = [] for index, movie in movies.iterrows(): if movie['new_id'] != baseMovie['new_id'].values[0]: dist = Similarity(baseMovie['new_id'].values[0], movie['new_id']) distances.append((movie['new_id'], dist)) distances.sort(key=operator.itemgetter(1)) neighbors = [] for x in range(K): neighbors.append(distances[x]) return neighbors K = 10 avgRating = 0 neighbors = getNeighbors(new_movie, K)

print('nRecommended Movies: n') for neighbor in neighbors: avgRating = avgRating+movies.iloc[neighbor[0]][2] print('n') avgRating = avgRating/K print('The predicted rating for %s is: %f' %(new_movie['original_title'].values[0],avgRating)) print('The actual rating for %s is %f' %(new_movie['original_title'].values[0],new_movie['vote_average']))

Now simply just run the function as follows and enter the movie you like to find 10 similar movies and it’s predicted ratings


Recommendation System using K-Nearest Neighbors: Predict Scores

Thus we have completed the Movie Recommendation System implementation using the K-Nearest Neighbors algorithm.

Sidenote — K Value

In this project, we have arbitrarily chosen the value K=10.

But in other applications of KNN, finding the value of K is not easy. A small value of K means that noise will have a higher influence on the result and a large value make it computationally expensive. Data scientists usually choose as an odd number if the number of classes is 2 and another simple approach to select k is set K=sqrt(n).

Check out – Future of AI and ML in the Next 10 Years

Ready to Use kNN Recommender System?

kNN algorithm is a reliable and intuitive recommendation system that leverages user or item similarity to provide personalized recommendations. kNN recommender system is helpful in e-commerce, social media, and healthcare, and continues to be an important tool for generating accurate and personalized recommendations.


Apple M2 Ultra Performance, Features, And Release Date

At WWDC23, Apple unveiled the M2 Ultra – a monster of a chip that completes the M2 family. According to the tech giant, M2 Ultra is the largest and most capable chip they have ever created, making the new Mac Studio and Mac Pro the most powerful Mac desktops ever made.

I am sure all such talk about this powerhouse has got you intrigued. To provide you with all the nitty-gritty, in this article, I will take a closer look at the M2 Ultra and discuss its performance, features, pricing, and more.

Powerful build with UltraFusion Technology

The M2 Ultra features the same UltraFusion engineering as the M1 Ultra. This technology uses a silicon interposer incorporating a second-generation 5-nanometer manufacturing process and over 10,000 signal connectors to link two M2 Max chipsets.  

So, the M2 Ultra appears as a single chipset and offers 2.5TB/s low latency interprocessor bandwidth. Also, you will enjoy the performance of two M2 Max chips. Apple has included 20 billion additional transistors making a total of 134 billion transistors than the M1 Ultra.   

CPU performance

M2 Ultra is power packed with a 24-core CPU consisting of 16 next-generation high-performance cores and 8 high-efficiency cores. The high-performance cores are designed for demanding video editing and 3D rendering tasks. Thus, it outperforms M1 Ultra by up to 20%.

In terms of performance, it will be three times faster than its predecessor. If you use DaVinci Resolve on Mac Studio with M2 Ultra, you can get up to 50% quicker video processing than Mac Studio with M1 Ultra. Therefore, it’s the accurate representation of effectiveness and performance.

GPU improvements

You can choose between two GPU configurations, either 60 or 76-core GPU, twice as many cores as the M2 Max. Therefore, it results in a massive performance boost for graphics-intensive gaming and video editing tasks. This upgrade fits Apple’s agenda of boosting the gaming sector of Macs ideally. 

This next-generation architecture offers up to a 30 percent improvement compared to the mighty GPU of the M1 Ultra. In practice, you can render 3D effects using Octane on Mac Studio with M2 Ultra 3x faster than Mac Studio with M1 Ultra.

Ultimate Unified Memory

Along with upgrading the physical architecture, Apple has extended its Unified Memory capacity architecture for incredible bandwidth, low latency, and unmatched power efficiency. M2 Ultra features 800GB/s of system memory bandwidth configured with a massive 192GB unified memory. Let that sink in!

So, you can now train large machine-learning models in a single system. It’s a revolution, considering the fact that even the most potent of PCs out there can’t process this kind of workload as effortlessly.

M2 Ultra is the most powerful chip Apple has ever created. And to set a new standard for performance in the Mac lineup, Apple has pushed the boundaries of machine learning capabilities and the latest custom technologies. 

M2 Ultra has a 32-core Neural Engine, delivering 31.6 trillion operations per second. This makes it 40% faster than its predecessor M1 Ultra. So, it is ideal for image recognition and natural language processing tasks. Besides, the media engine is twice powerful as M2 Max, accelerating video processing.  

Moreover, you can play up to 22 streams of 8K ProRes videos and enjoy 8K resolution with refresh rates of up to 240Hz. This is possible because of dedicated, hardware-enabled H.264, HEVC, and ProRes encode and decode capabilities. You can connect up to six Pro Display XDRs to further pump your workspace. 

The M2 Ultra’s display engine supports driving more than 100 million pixels. Besides, as Apple’s standard priority on high-end security, it has incorporated the latest Secure Enclave and hardware-verified secure boot and runtime anti-exploitation technologies. 

With M2 Ultra, Apple aims to revolutionize the computing experience for professionals and power users across the globe. You can experience the power of this SOC with the new Mac Studio and Mac Pro. The Mac Studio starts at $1,999, and the Mac Pro starts at $6,999. 

Get the beast in your workspace!

So, it will surely be a game-changer for the Mac lineup and a popular choice for creative professionals and power users. And Apple has once again raised the bar, delivering an unparalleled user experience and setting new standards for performance and efficiency. 

Explore more…

Author Profile


Ava is an enthusiastic consumer tech writer coming from a technical background. She loves to explore and research new Apple products & accessories and help readers easily decode the tech. Along with studying, her weekend plan includes binge-watching anime.

Boldly Bring Them Back: Interventions For Student Reengagement And Dropout Prevention

Schools can work to inspire confidence in students who have fallen behind by offering consistent support, starting in the summer.

School administrators across the nation have seen a lack of student engagement during Covid-19. The school structure during the pandemic has made it easier for students to disengage from school, leading to random pop-ins on Google Meets and sporadic responses to emails. This lack of engagement and focus places students at risk of dropping out of high school. 

In order to prevent this, school administrators can seek out opportunities to intervene and reengage those students. Fundamentally, students want and need to feel connected and cared for. When schools uphold their responsibility to provide social and emotional support, the consistent and intentional connection with students can support engagement and prevent dropouts. Summer and early fall is a good time to implement practical interventions to help disengaged students before the issue progresses.

Mobile Mondays 

Visiting students’ homes during the summer is a meaningful tool for reengagement. This can be done as often as a district likes, with a specifically designated team of administrators, teachers, and support staff.

Try visiting homes every Monday during the summer. This should not be done in the style of a typical truancy home visit—show care and interest in the student’s well-being to identify and reduce barriers to their engagement. When done with care and intention, the visit shows that the school is committed to the student and welcomes them back.

This is a great opportunity to get to know the student better. Call ahead and schedule the home visit according to the student’s and family’s availability. It’s perfectly acceptable to visit in the front yard and socially distance during the pandemic.

R&R: Reengagement at Registration

Registration for the new school year is also a great time for reengagement. Use previously collected data to create a list of students who were disengaged during the previous semester or academic year. When the student arrives at the registration event, follow up with them. If your district or school does not have a formal registration process, invite families to come to the school before the school year begins. The location of where the following steps occur is flexible and can be adjusted to fit your school’s needs.

1. Hold a conference. Choose a private space for an administrator, teacher, or support staff to hold the conference. This provides an opportunity to create a relationship with the student and their parents.

2. Collect information. Provide a brief form asking questions to gather information about the reason for the student’s disengagement and identify their needs. Providing a list of possible options (e.g., family issues, mental health, employment) can be helpful. Schools can choose to address those needs as a preventive measure.

3. Provide a list of resources. The reengagement process includes sharing information with students about services that the school offers for support and school credit recovery. Social and emotional services are essential for this population of students. The list can include local agencies that partner with the district.

4. Create a check-in system. Every student benefits from having at least one supportive adult at their school. They can go to that adult throughout the school year for assistance as needed. That adult (or the school) can send ongoing motivational and encouraging emails or texts to the student. If a student can’t think of anyone on their own, the school can identify a teacher or support staff member who is available to help. The hope is that the adult and student will develop a strong rapport as they spend time checking in together.

5. Determine the student’s level of confidence. Students who lack self-efficacy need support to build their confidence and reduce self-inflicted barriers. There are plenty of self-efficacy measures available online, but simply asking students to rate their own level of confidence about whether they think that they can pass all their classes is acceptable (e.g., on a Likert scale: very confident, somewhat confident, not confident). 

Transition Track 

Accepting that some students won’t have the capacity to attend full days of school for five days per week is a realistic perspective. It may sound counterintuitive to reduce the instructional time for students who already missed so much, but some students might need to slowly transition back into the full-time school structure.

Students are unlikely to just bounce back from being disengaged for a year or two. They might feel overwhelmed by pressure to catch up with their peers. Discovering what works for the nontraditional student is key. A half-day program or another form of an abbreviated schedule might be more appropriate. Check with your state board of education in regard to seat-time and attendance policies. Some state boards issue waivers that allow students to use competency-based education in place of seat time.

If reducing the school day isn’t an option, try starting the school year earlier. Students who were disengaged can transition back to school a week or two before the rest of the student body. The transition time could focus on executing functioning skills and social and emotional learning. A program focused on these areas helps reduce some of the anxiety connected to attending school and failing behind. 

Your school can create a path for disengaged students to succeed.

Student Cyclist Killed On Comm Ave

Student Cyclist Killed on Comm Ave COM grad student was talented photojournalist

A BU student cyclist was killed Thursday morning in a collision with a tractor-trailer at Commonwealth Avenue and St. Paul Street.

Christopher Weigl (COM’13), a 23-year-old BU graduate student who was pursuing a master’s in photojournalism at the College of Communication, collided with a 16-wheel tractor-trailer at about 8:30 a.m. Witnesses say Weigl and the truck both were traveling east on Comm Ave and collided when the truck made a wide right turn onto St. Paul Street. The accident is being investigated by the Boston Police Department, which cordoned off the area and closed Comm Ave eastbound. Police said Weigl was wearing a helmet and traveling in a marked bike lane. No citations have been issued.

“Chris was just a great guy,” says Sarah Ganzhorn (COM’13), a fellow graduate student in COM’s photojournalism program. “He was always smiling. He was just a really chill guy who never had anything negative to say about anything.”

Peter Smith (COM’80), a COM senior lecturer in journalism, who taught Weigl’s multimedia class, says Weigl was an extraordinary student. “He came in with a lot of strong photography skills,” Smith says. “He also had a lot of social concerns, and he had a tremendous work ethic. He was one of the best graduate students I’ve had here; he took responsibility for all of his work and met every deadline. He was the kind of student you hope for.”

BU President Robert A. Brown sent a note about the tragedy to the BU community Thursday afternoon, saying the thoughts and prayers of all of the BU community go out to the family and friends who are experiencing this terrible loss. Brown said University officials responsible for public safety will work with Boston public officials to better understand the causes of this accident.

“We are very concerned about the dangers faced by members of our community who must navigate the streets on and near our campus, especially bicyclists and pedestrians,” said Brown. “As we identify ways in which education and changes in practice can reduce risks, we will take all necessary and possible steps to do so.”

Weigl is the second BU cyclist to die in a collision with a motor vehicle in less than a month and the fifth bicyclist killed in Boston this year. On November 12, Chung-Wei Yang (CAS’15) was killed when he was struck by a number 57 MBTA bus at the busy Allston intersection of Harvard Avenue and Brighton Avenue.

Between January 1 and November 13, 2012, Boston Emergency Medical Services responded to 579 bicycle-related incidents. The Boston Globe reports that Dot Joyce, spokeswoman for the mayor’s office, says that city transportation officials will investigate the circumstances of the crash.

Weigl grew up in Southborough, Mass., where he was an Eagle Scout, and he graduated from Skidmore College in 2011. There, he was president of the Photo Club and photography editor of the student daily news website, Skidmore News. He was also a student photographer for Skidmore’s communications department, where he was nominated as employee of the year as a senior, as well as a freelance photography intern at Panorama magazine and a freelance photographer for MetrowestDaily News.

Weigl’s website attests to his extraordinary eye and his versatility as a photojournalist. He was as adept at covering the performing arts, architecture, and sports as he was at shooting portraits, weddings, and images of the natural world. He had spent much of the past few months covering the 2012 presidential campaign. A vivid portrait of President Obama at a rally on September 7 in Portsmouth, N.H., is among the more memorable images in his portfolio.

His website also reflects his love of travel. Images from his trips to New Zealand and South America in 2007, Italy in 2010, and Cambodia and Thailand in 2011 feature prominently in his portfolio.

“Photojournalism is a ticket to curiosity, a way to explore the world, meet people doing interesting things, and share their work with others,” Weigl wrote on his website. “Photography has the unique ability to tell the story of a moment in time that will never be relived. From the wedding day to a simply human-interest story, the capture of emotion in a split-second is a truly powerful, almost magical ability.”

The last project Weigl completed for Smith’s multimedia class was on the Lucy Stone House, a Unitarian Universalist cooperative in Roxbury, Mass.

“When we found out during class that Chris had been killed, I decided to show the video,” Smith says. “Chris showed so much sensitivity and so much empathy toward his subject in that video. His professionalism was beyond expectation. He was a good storyteller, and that’s the most important skill you can have as a journalist. And he loved what he did.”

Ganzhorn agrees, saying, “It seemed like he was born to his profession. It came so easily to him. Every shot was beautiful, and the connections he made with his subjects just seemed natural and easy. He was easy to trust.”

Weigl had expressed interest in working for a wire service, she says, such as the Associated Press or Reuters, or for the company Getty Images. He also wanted to spend more time working overseas.

“Chris was a photojournalism student, but he came to us as an already talented photographer and videographer,” says Thomas Fiedler (COM’71), dean of COM. “He was a common sight around our buildings behind a camera, taking photos of our student activities and our freshmen. He was very popular. Chris loved what he wanted to do—photography.”

A memorial service will be held on Sunday, December 16, at 2 p.m. at the First Congregational Church in Holliston, 725 Washington St. Visitation will be on Saturday, December 15, from 3 to 7:30 p.m. at the Chesmore Funeral Home of Holliston, 854 Washington St., Route 16. In lieu of flowers, the family asks that donations be made to the Boston University College of Communication, and designated for the Christopher Weigl Memorial Fund. The fund is intended to create opportunities for photojournalism students to “continue Christopher’s work,” helping them to discover the wonder of creating the pictures that tell a story. Donations can be made here.

Counseling is also available through the Dean of Students Office and from:

Marsh Chapel chaplains, who can be reached at 617-353-3560.

Student Health Services, whose counselors can be reached at 617-353-3575. A behavioral medicine provider can be contacted at 617-353-3569.

The Sexual Assault Response & Prevention Center, which can be reached at 617-353-7277.

The Faculty and Staff Assistance Office, which can be reached at 617-353-5381.

Update the detailed information about Student Performance Analysis And Prediction on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!