Categories
Education

Cross-system social web user modeling personalization of recommender system

User modeling assists us to predict users’ behavior and interaction. Originated user model from a user can be used in a personalized system in which the user is interacting with it, for example, applying to improve recommender systems. “Cold Start” is one of the principal challenges in user modeling, personalization and recommender systems which exists in all inner-system user modeling. This phenomenon causes sparse data on initial user data, which leads to an inaccurate forecast of the user’s behavior and then incorrect personalization and unsuitable recommendations. To overtake this problem, it is possible to use users’ public profiles on other social media accounts of his or hers. This approach is the definition of cross-system modeling. The problem we are trying to solve is retrieving metadata from the user’s public profile, which are presented on YouTube and Twitter in order to cause improvement in recommender systems personalization.

  1. Which features are more important in cross-system modeling on Twitter and YouTube?
  2. How accurately can we predict selected features using other features?
  3. What are the possible user models that we can plug into the features’ relationship?

DATASET

We couldn’t use open-source data because of the ethical standpoint of available open-source data. All of them preserving users’ privacy, and they don’t provide real person names, addresses, or any other personal information, so we can’t look up for their other social accounts. We tackled this problem by using the most popular channels on YouTube. Using a questionnaire was not an option for our research as long as finding people with an active YouTube channel, and the Twitter account was not common for accessible participants. We have started our dataset with other dataset called “Top 5000 YouTube channels.”

Our steps to reach the ideal dataset were first adding Channel ID to the dataset. These ids helped us to crawl and fetch more data about each channel. Those data are but not limited to about page, channel’s latest videos and updates, and analytical data of YouTube on each channel. We pick out YouTube channels that had Twitter account links on their about page. Purged broken or protected accounts on Twitter and collected as much data on Twitter using Twitter API as much as we could. Example of these is Tweets, metadata on tweets, followers, and followings and metadata on them as well.

long the way of collecting such a dataset, we faced some unique challenges. Both Twitter and YouTube (Google) force a heavy restriction on their API usage for reasonable causes. That led to a more time-consuming task of testing and implementing cycle than usual. And as long as these target people in the final dataset are “internet fame,” testing on ordinary people is something that can be worked on future researches. At this point, we produced a dataset that intersected on both parties Twitter and YouTube, which is roughly 300 records.

FEATURE SELECTION

For this research, we have only picked a few features of the entire dataset features, those that we assumed are the most important ones at the end:

  • YouTube view count: Each channel on YouTube shows its total view count on the About page of that channel.
  • YouTube subscriber count: Each channel on YouTube shows a total number of subscribers, people who will be notified of new content on that channel.
  • YouTube uploaded video count: Total number of videos uploaded to a channel.
  • Twitter follower count: Total number of people who follow a person on Twitter.

RESULTS

correlation heatmap between selected features

We utilized correlation heatmap to spot potential associations among picked features in our dataset. Following conclusions can be made base upon the heatmap:

  • The close connection between subscriber count and total view count
  • Near no-connection among uploaded video count and total view count
  • Corresponding importance of Twitter follower count and YouTube subscriber count

Then we applied the regression algorithm using the Sklearn library in Python to predict on total view count of the YouTube feature. We could conclude the following results on our dataset:

  • Average view count of 3,550,524,704.7
  • Maximum Residual Error of 104.61
  • The average absolute error of 27.27
  • The average execution time of 372ms

It is feasible to use concluded data in the static and stereotype user model as an addition to available user model features and use it for recommendation and personalization. The regression algorithm is resulting in acceptable time and accuracy in comparison to the average of view count and size of the dataset. We can use Twitter follower feature instead of a YouTube subscriber count in case of mitigating the cold-start problem for a newly joined creator who is well-established on Twitter to make their content more discoverable.

FUTURE WORKS

  • Check the content of images, videos, and texts of Tweets and videos on YouTube.
  • Check YouTube links in a Tweet.
  • Check based on YouTube channel classification.
  • Check videos and tweets head to head.
  • Add other common social media like Facebook and Instagram.
  • Creating a system for collecting users’ public data for application on other social media.

This is a brief report of my master thesis titled “Cross-system social web user modeling personalization of recommender system” mainly focused on social computing between Twitter and YouTube to help YouTube creators, written originally in Persian at Shahid Beheshti University under supervision of Dr. Elaheh Homayounvala. The paper is underwriting.

Read on Medium

Categories
Education

Master thesis abstract and presentation

Abstract: User modeling assists us to predict users’ behavior and interaction. Originated user model from a user can be used in a personalized system which user is interacting with it, for example, using to improve recommender systems. “Cold Start” is one of the principal challenges in user modeling, personalization and recommender systems which exists in all inner-system user modeling. This phenomenon causes sparse data on initial user data, which leads to an inaccurate forecast of the user’s behavior and then incorrect personalization and unsuitable recommendations. To overtake this problem, it is possible to use users’ public profile on other social media accounts of his or hers. This approach is the definition of cross-system modeling. The problem we are trying to solve in this study is retrieving metadata from the user’s public profile, which are presented on YouTube and Twitter in order to cause improvement in recommender systems personalization. That being said, Application Programming Interface or API has been employed to mine the data, and 5000 YouTube social media records have been recorded. Structure of the mined data has been reviewed and analyzed to discard outdated and outlier data. In order to review the connection between user’s features in two systems, Regression algorithm has been examined for precision and runtime execution measures. Results showed that subscribers count of a channel has little to none relation to count of uploaded videos of that channel. Also, connection and the same advantage of Twitter followers count feature of a person to predict YouTube total view count on the user’s channel has been concluded. The outcome of this study can be applied in the improvement of the personalized recommender system in YouTube channels where they have begun freshly. In these circumstances, it is feasible to use the Twitter follower count feature alternatively to the YouTube subscriber count feature to moderate the cold start problem for that channel.

Keywords: Recommender systems, user modeling, social media, cross-system user modeling, personalization

Categories
Education

First Ph.D. proposal

I just wrote my very first Ph.D. proposal. It’s about making a new UI that corresponds to user personality and preferences by not just looking at his/her profile but by scraping all available data from him/her all over the web. I think the final result can be super exciting and useful.
Imagine a UI that understand you are at work and hide some games and application from your phone. Or on a website that assumes you are on you’re studying and not disrupt your focusing time (for example, Twitter not suggesting meme on that particular time). Or a website that from the moment you entered, for example, your email, understand that you prefer a cozy dark theme over a comfortable, light theme.
Moreover, think about Netflix or PlayStation Store where if they could understand that I am on my off day and they can offer their new release or suggest to continue watching or playing.
All of the above can be obtained with lots of available public data. Still, a system to aggregate all data with privacy in mind and let each user remove/add preference can be a considerable improvement to user experience.