Hi, this is Kiuchi. For the House of Councils election that we recently had, the Liberal Democratic Party won by obtaining majority of the votes. Nowadays, the internet is actively used for election campaigns.
Keeping that in mind, let’s look back on the House of Councils election using Elasticsearch.
A data source of the analysis is needed. We used the Public stream( https://dev.twitter.com/streaming/public )provided by Twitter. Apache Spark was used for the reception of Public stream and posting on Elasticsearch.
A simple system configuration is written below.
The tweets incorporated by Elasticsearch can easily be seen using Kibana. Below is an example of when Kibana displayed the Public stream tweets.
Retrieving tweets that contain the word "election:(選挙)" on Twitter.
Now, let’s look up various information using the saved tweets. First, simply look up tweets containing the word "election". When filling in the search box as shown below on Kibana, you will be able to extract the list of tweets that contain the word "election" in the tweet itself.
The results are as shown below... there aren't a lot.
You will be able to broaden the search range by adjusting the search range that is located on the top right hand side of the screen on Kibana. The default is set as "last 15 minutes" so we will broaden it to "last 60 days" and re-search.
We are able to get more hits. Kibana visually display the number of hits by date and time. The number of hits (number of tweets) for the search was about 140,000. We can see that there are a lot of tweets on the June 18th, and July 10th. July 10th was the day of the election so it makes sense that there were a lot of tweets, but what was the case for June 18th? When clicking the bar graph, you will be able to further extract the data for a certain date and time.
Election was election but the hit was for the "AKB48 General Election：（AKB48総選挙）". This differs from our point, so let’s filter it. We re-searched the tweets using the search options as below.
message: "選挙" AND -message: AKB48 AND -message: "総選挙"
The results have become a lot closer to what we intended them to be. Once again we search the tweets concerning the election for the last 60 days.
The number of hits we got were about 100,000 tweets in all. This is Public stream so it does not mean that we were able to obtain all of the tweets, but I believe that a similar trend can be seen in Twitter.
In addition we can see that the most tweets were made on the day of the election, followed by June 22nd, where the number of tweets gradually started increasing. June 22nd was the day of the results, so we can see that the tweets were increasing just as in the start of the race.
They are probably tweeting about the result of the election. In about 3 days, the number of tweets concerning the election went down to the original state before June 22nd.
The election rate for this election was low. As everyone's interests does not seem to be very high, their interests may have gone to something else.
The relationship between the number of people elected into office and the number of tweets
Let’s continue comparing the number of tweets for name of the political party and actual number of people elected.According to Yomiuri online Page (http://www.yomiuri.co.jp/election/sangiin/2016/?from=ycnav3)、the number of people elected was like this;
|Party Name||Number of people elected(the total number of seats incl. this election)||Rate(Rate of the total number of seats)|
|Liberal Democratic Party:(自民党)||55(120)||45.5%(49.6%)|
|Democratic Progressive Party:()||32(49)||26.4%(20.2%)|
|Clean Government Party||14(25)||11.6%(10.3%)|
|Initiatives from Osaka||7(12)||5.8%(5.0%)|
|Social Democratic People's Party||1(2)||0.8%(0.8%)|
|Social Democratic People's Party||1(2)||0.8%(0.8%)|
Now let’s filter the tweets concerning the election that we extracted above (total: 101,889 tweets) using the name of the political party. We will conduct a query as below.
(message: "選挙" AND -message: AKB48 AND -message: "総選挙") AND (message: "自民党" OR message: "自由民主党" OR message: "自民")
Listed below is the number of tweets after being filtered by each political party. The parameter for the rate is the total number of tweets (25,345 tweets) in reference to the name of the political party.
|Party’s name(Other word used for keyword)||Number of tweet(||Number of people elected(|
|Liberal Democratic Party(自民党, 自民)||8695||34.3%|
|Democratic Progressive Party(民進)||4722||18.6%|
|Clean Government Party（公明）||2225||8.8%|
|Initiatives from Osaka（維新の会、維新）||1704||6.7%|
|Social Democratic People's Party（社民党、社民）||1429||5.6%|
|People's Life Party（生活の党、生活）||1495||5.9%|
When made into a graph, it will look like the one below.
It is surprising to see that there is a correlation between the number of members elected in and the tweets reflecting the name of the political party. When looking closely, you can see that they are as listed below.
- It looks there was no correlation between number of tweets and number of members elected in for Communist Party.
- There were the same rate of tweets for Initiatives from Osaka, Social Democratic People's Party and People's Life Party, but the number of members elected in were different each other.
To analyze further
This is just one example of how the opinions on SNS such as Twitter and the real world are related to each other. When analyzing more closely, you may be able to get better results if you consider the points listed below.
- Accuracy could be higher if you add emotional analysis for tweets. (Political party in this case) There are tweets regarding political party, some of them are with negative criticism, though.
- The councilor individually get a Twitter account and tweet by themselves. So, if you can add analysis which councilors are tweeted and which political party they are following, you may be able to get more interesting results.
By combining Elasticsearch and Kibama, we are able to get a stronger data base and are able to research using natural language. This will work as a strong tool when analyzing.
In addition, by using Apache Spark, you will be able to save stream data such as Twitter to Elasticsearch continuously.
The example we used this time is a very simple one, but by further utilization, you may be able to get interesting results. If there is anything you notice, we will appreciate it if you could point them out.
We are happy if you were able to get any new findings through reading this article.
CL LAB Mail Magazine
- (Japanese text only.) Azure Databricks の紹介 #Microsoft #Azure #DataBricks #spark
- (Japanese text only.) Apache Spark縛りでKaggleのコンペティションやってみた #Spark
- (Japanese text only.) 普通のPythonスクリプトをSpark化してお手軽並列処理する #spark
- (Japanese text only.) GraphFramesでMovieGraphを遊んでみる #spark
- (Japanese text only.) Elastic CloudにおけるKibanaダッシュボードの自動送信について #Elastic #Elasticsearch #ElasticCloud #Kibana