StoryTelling with Haxe Neko

Playing with data

Hello!


  • I am ValĂ©rie (elimak)
  • Flash Developer
  • Freelance
  • Serious Games, Data-visualization, Gamification

What to expect from this talk

  1. Not technical
  2. Walk through 3 projects where Haxe Neko assisted me in the overall process

    • Do I have enough data to support my concept?
    • Is my data interesting/relevant?
    • Extracting and using really specific data from a large resource.

Data-vizualization for UNESCO


A bit of background/context...

Characteristics of the data


  1. Comparison between genders on education data
  2. Data divided in 4 groups:

    • Primary
    • Secondary (lower & upper)
    • Tertiary
    • Adult

Main concept

Let the user embody the data

Global flow

Challenge 1: Lack of data


What should happen if the user is from a country lacking data?


  • Shall we disable some countries?
  • Can we fall back on a different set of data?
  • Can we offer an alternative country?

Challenge 2: Select the data


Challenge 3: making the viz compatible with the up-coming API



[
   {
    "Indicator":"GER0T",
    "Year":2000,
    "Value":34.13281,
    "Country":"40510"
   }
]

Role of Haxe Neko



  • Parsing the data and formatting the output for compatibility with the API
  • Defining the validation rules
  • Adjusting these rules to optimize the selection of the data

Schematic validation rule


Identified issues


Solutions



  • Older data
  • 1 indicator became optional
  • Always fall back on regional data when available
  • Second fall back on high-income countries
Mind the gap: Gender & Education


Outcome


  • 110,000 unique visitors during the first 2 weeks
  • Top 3 countries: France, United States, Canada
  • 3 months after launching: 800 to 1200 daily unique visitors

NUFORC

The National UFO Reporting Center

Characteristics of the data


  1. Released as a single JSON file by InfoChimps
  2. 80M uncompressed
  3. No record ID in the JSON objects
  4. Full text description included
  5. Most recent record was August 2010

Online reports

Main goals


  1. Play
  2. Fun
  3. Curiosity, would I find any epic events?

Finding trends

Finding trends

Sightings per inhabitant between 2000 and 2010

Washington: 1/3246, Oregon: 1/4310, Arizona: 1/4587,
Vermont: 1/4629, Montana: 1/4651, New Mexico:1/4926

Finding trends

Sightings per inhabitant in the past 7 decades

Looking for events


  1. Frequency per week
  2. Shape of UFO reported
    "round" = "round", "sphere","disk", "circle", "dome","oval", "egg"


Are these reports describing the same event?

  1. Date + time is similar?
  2. Duration?
  3. General description (shape, color, move)?

Highlighted Cases

Highlighted Cases

Highlighted Cases

Highlighted Cases

Reports:

Two red light just hovering... (duration 15 min, 2005/09/30 at 23:00)
2 red lights moving west to east... (duration 15 min, 2005/09/30 at 23:00)
Strange red lights... (duration 10 min, 2005/09/30 at 23:00)
Two red lights were dangling in the air... (duration 30 min, 2005/09/30 at 23:00)
Two bright red objects move across sky... (duration 10-15 min, 2005/09/30 at 23:00)
Two bright lights moving slowly from west to east... (duration 20 min, 2005/09/30 at 23:15)
2 Red light ufo's passing across sky... (duration 10 min, 2005/09/30 at 23:15)
The 3 red lights of south chicago... (duration 1 hour, 2005/09/30 at 23:30)
I spotted three red circular lights in the western skies... (duration 1 hour, 2005/10/01 at 00:00)
Saw three red glowing lights in the sky... (duration 15 min, 2005/10/01 at 01:00)
I saw 3 red flashing/blinking lights in sky... (duration 30 min, 2005/10/01 at 01:00)
I saw three red lights above my house... (duration unknown, 2005/10/01 at 01:10)

Arizona - March 1997

Arizona - March 1997


Next steps


Sentiment analysis...


  • Where are the people who are the most afraid of UFO?
  • The most aggressive/protective?
  • The most amazed?


User-built database of music!

More than 140,000 people have contributed

Catalog of more than 3.5 million recordings and 2.5 million artists

Characteristics of the data


  1. Rest API (artists, releases, labels per ID)
  2. Monthly dumped database (release, 1.8G compressed)

Try opening 11.6G of xml...

Split it...

Exploring the Data

Global snapshot


  1. 7.5 Million releases
  2. 16 unique tags "genres"
  3. 382 unique tags "styles"

Exploring the Data

Tags "genre"

Exploring the Data

Tags "style" (top 40)

Relevant info in the release data

To identify a music scene


  1. Label(s)
  2. Artist(s)
  3. Country
  4. Year

Quick benchmark result


   5.950 Million of tag "styles"

+ 4.780 Million of tag "genres"

------------------------------

= 10.730 Million of tags extracted in 24 min

7450 tags/sec


Using multi-threading (4 threads)

9 min 12' == 19438 tags/sec

Main goals


  1. Exploring different music scenes
  2. Building advance search

The limitation of filtering

The power of associations (and exclusions)


Preliminary work

  1. Extracting all tags
  2. Searching for associated tags


[
   {
    "id:173204|Downtempo,Ambient|Electronic",
    "id:176160|Hardstyle,Acid,Progressive Trance|Electronic",
    "id:185874|House,Downtempo|Electronic",
    "id:173207|Downtempo,Trip Hop, Experimental|Electronic",
   }
]

Associated tags


Examples with Downtempo + Ambient

Associated tags


Examples with Dub + Trip Hop

Give weight to your tag's associations

Examples with Dub + Trip Hop

Create a scoring system to sort the best matches


How I built mine...

  1. Exponential score for each category (the second tag weighs more than the first one and so on)
  2. Bonus when only "awesome" + "very_good" tags are listed
  3. Bonus when "no neutral" tag is listed

Amazing!

I potentially have a great selection of lovely tunes now



but wait... where

do I listen to my selection?


Summary


  1. Extract Id + styles + genres
  2. Build scoring system to allow indexation of tag's associations
  3. Couple discogs results with Spotify API (or any, Bandcamp for instance)
  4. Save the search result and the statistics on the related music scene
  5. Build a website and share your result! :)


Thank you!


info@elimak.com