What would "data literature" look like?

May 19, 2017

My eldest daughter is now in secondary school and, while she enjoys and is good at Maths, what she really loves studying is History and English. Watching the critical thinking and analysis skills that she is learning and using for those subjects, I have started to wonder if we should be approaching data literacy from a different angle.

The need for children and adults to be equipped with data skills is well recognised. The Nesta paper Analytic Britain: Securing the Right Skills for the Data-Driven Economy contains some recommendations, for example. However, much of this work focuses on the development of what I would frame as data science skills: the basic skills like the ability to clean data, analyse it, display it in graphs and maps, and the more advanced skills of machine learning and interactive visualisations. Data literacy becomes equated with the ability to do things with data.

But for me, data literacy, and the skills we all need to have in our policymaking, businesses and lives, go beyond handling data. We need to know what data is capable of (even if we can’t do those things ourselves). We need to understand the limits of data, the ways it can be used for both good and ill, the implications that has on our lives and society. Understanding these things would help us use data well in government, business and our day to day lives and have more informed debate about how we use data in society.

You may remember from your own childhood studying both English Language and English Literature. English Language focuses on reading and writing, the production of material, the manipulation of language. English Literature focuses on the study of English in use, the material produced by different authors, their use of different techniques, the context in which they produced their works and the impact their work had. The two areas of study feed on each other: producing poetry enables you to understand poetry as a form, and studying great poems improves your own technique. But the focus of each is distinct. We expect children to be able to read and write when they leave school. We also expect them to understand how others’ writing has contributed to our culture and society.

Could we apply the same approach to data? Children are already taught Data Language as part of the Maths curriculum. They are taught how to collect data, record it, create basic statistics, make charts and graphs from it, even in primary school. But what about Data Literature?

What if children were taught about Florence Nightingale’s use of data? They could unpick the method of collection, the birth of new forms of visualisation and the use of data for argument and persuasion and change. They could examine the context of Nightingale’s work at the time and the repercussions through to the present day. They could create new works from her data, put together new visualisations and invent modern-day newspaper stories.

They could examine the works of great modern day data visualisers and compare and contrast their works around particular key events, such as the Iraq war or the 2016 presidential election, or on thematic topics such as climate change. They could examine commonalities in form - citation of sources, provision of values - as well as differences in style and expression. They could produce their own visualisations in the style of one of the greats, or simply copy a work to see how it’s done.

They could look at the use of data in reports, from official statistical releases, through academic papers, to sports commentary. They could look at how these have evolved over time, and the varying ways in which numbers and statistics can be used to inform and substantiate a story that is being told. They could look at the choices made about what numbers get quoted in such stories, and have exercises where they select different numbers or use different rhetorical devices (eg “almost 20%” vs “less than 20%”) to reach a different conclusion.

Children could be taught the history of census taking, from the Roman census that reportedly led to at least one birth in a stable, through the Doomsday book that redistributed land, to the modern day. They could examine different forms of census taking and the way in which the data is used. But they could also examine the way in which census taking, or indeed the gathering and use of any data, can exert power and change reality.

There are many other topics that would make rich study material: the art of fact checking; the role of open data in government transparency and accountability; the data flows in adtech; conversational interfaces with data such as Siri and Alexa; surveillance and secret data; personalisation and data ownership in smart devices.

I am not an educationalist, but I think that these kinds of topics would equip children with a much better understanding of what data really means to society. And I think it taps into the skills that those who lean towards the arts and social sciences enjoy exercising: skills such as critical thinking, context awareness and artistic appreciation. There are people who are turned off data because they don’t enjoy maths. This provides a different route to reach them.

I am sure there must be people thinking of and doing this already. I know of the Calling Bullshit course, for example. What else is there? Does this idea have legs? How could we advance it? Let me know at jeni@theodi.org.

Adding data trading to an agent-based model of the economy

May 15, 2016

It’s been a while since I gave an update about my attempt to build an agent-based model for the information economy. That’s partly because I got distracted crowdsourcing election candidate data and results for Democracy Club. It’s also partly because of this:

you think success is a straight line but it's actually a squiggle
Image from Demetri Martin’s “This is a Book”

You may recall that I constructed a basic agent-based economic model and added trade to it, based on Wilhite’s Bilateral Trade and ‘Small-World’ Networks. Then I did some sensitivity analysis on it to check that the assumptions that I’d made in coding it all up weren’t changing the outputs in any major way.

My next step was to add DATA to the mix.In the process I realised that the model wasn’t mirroring reality sufficiently for me to be drawing conclusions from it.

Creating an economic model that includes data

Adding data to the agent-based model is pretty straight forward. It’s just the same as FOOD. Each firm has some initialData which it can supplement either by trading or by producing (based on its dataPerStep productivity) to update its currentData running total. A Firm’s utility is calculated as DATA x FOOD x GOLD rather than just FOOD x GOLD in the original scenario.

In my initial model, the price for DATA is calculated in the same way as the price for FOOD. This isn’t fair, however, because unlike with FOOD, when DATA is sold the seller doesn’t lose any DATA. (The impact of this essential difference between data and physical goods is the thing that I wanted to explore in these models.)

If we take the same example as I gave when added trade to the basic model, but this time with DATA rather than FOOD, this is how the trade works between a Firm that starts off with 30 GOLD and 5 DATA and a Firm that has 10 GOLD and 15 DATA. The price is set, as it would be if they were trading FOOD, to 2 GOLD for each DATA. At that price, the exchange goes like this:

GOLDA DATAA UA mrsA GOLDB DATAB UB mrsB
30 5 150 6.00 10 15 150 0.67
28 6 168 4.67 12 15 180 0.80
26 7 182 3.71 14 15 210 0.93
24 8 192 3.00 16 15 240 1.07
22 9 198 2.44 18 15 270 1.20
20 10 200 2.00 20 15 300 1.33
18 11 198 1.64 22 15 330 1.47

In the original example, with FOOD, both Firms gain from the transaction up until 10 GOLD have been traded for 5 FOOD, at which point both have 20 GOLD, 10 FOOD and a utility of 200, an increase of 50 each. If they keep trading from that point their utility start to go down.

When DATA is involved, however, the Firm that is selling DATA does much better out of the transaction. Every step of the trade increases its utility because it is always simply adding GOLD rather than reducing its stock of DATA. So while the buyer’s utility rises from 150 to 200 in the trade, the seller’s utility doubles from 150 to 300.

Figuring out what a “fair price” for DATA would actually be, within this model, is something I want to come back to, but I thought I’d run the model with the price set in the same way as the price for FOOD is set, to give a baseline.

An economic model that includes data increases trading

Including data in the agent-based model does make some obvious changes. The first thing that’s really apparent is that compared to the FOOD-GOLD model, a lot more trading goes on.

In the FOOD-GOLD model, each Firm initiates an average of 1.25 trades, with a range of 0 to 12. Over the 5 runs, there are a total of 3182 trades.

In the DATA-FOOD-GOLD model, each Firm initiates an average of 3.54 trades over the 20 ticks, with a range of 0 to 14, with a total of 8893 trades over the 5 runs. This is biased heavily towards DATA trading, with each firm averaging 2.61 DATA trades (ranging from 0 to 13) and 0.93 FOOD trades (ranging from 0 to 7). About 74% of the trades that go on in this model involve DATA.

I realised having done this that the increase in trading could just be because there’s an additional good to trade, namely DATA. So I created an alternative COAL-FOOD-GOLD model where there’s again the additional good, but one that operates exactly like FOOD.

In the COAL-FOOD-GOLD model, each firm initiates an average of 1.59 trades, with a range of 0 to 15 and a total of 3971 trades over the 5 runs. As you’d expect, these are pretty evenly split between COAL and FOOD: 49% of the trades involve COAL.

So the increase in trading is partly due to there being more goods to trade, but mostly because of the unique nature of DATA.

Price stabilitsation with data trading

The price stabilisation graphs for food and for data are below.

price stabilisation for food

price stabilisation for data

Both prices stabilise at around 1 GOLD over time, but the prices of FOOD are a lot more variable than those for DATA. My guess is that this is because there’s less trading of FOOD than there is trading for DATA.

A look at inequality

One of the things that I’m particularly keen to examine in this model is whether data being in the mix changes the inequality in the set of Firms in the economy.

One thing to examine here is the relationship between the initial utility of each Firm and the final utility of each firm. In this version of the model, the initial utility is randomised (rather than being based on how much the Firm can produce, or all being initially equal). You’d expect a small correlation between what you start with and what you end up with, but not a large one as 20 steps is plenty of time to trade or produce your way out of your starting position. Here are the correlations:

model correlation
FOOD-GOLD 11%
COAL-FOOD-GOLD 19%
DATA-FOOD-GOLD 23%

So there’s some evidence there that organisations that when data is added to the mix, the starting condition of each Firm is more influential than it would otherwise be, but it’s still not a very strong correlation.

The other thing I looked at were Gini coefficients of the economy, which is a measure of how unequal a society is, with 0% being perfect equity and 100% perfect inequality (one person having all the wealth). (For reference, the UK’s Gini coefficient is about 34%.)

model initial Gini coefficient final Gini coefficient
COAL-FOOD-GOLD 56% 33%
DATA-FOOD-GOLD 56% 36%

The Gini coefficients are roughly the same. But the thing about this result that makes me question whether the model is accurate is the fact that the Gini coefficients are decreasing from the initial state to the final state. This isn’t the case in the UK or the US, for example, where inequality is growing, and if you look at the global Gini index you’ll see it’s been increasing over time.

If the Gini coefficient in the model is decreasing, that’s a sign that the model isn’t properly reflecting inequalities that arise in real economies. That would probably be fine if I didn’t want explicitly to study inequalities. Given I do, it feels like I have to refine the model a bit more to make it better reflect reality so that we can draw conclusions from it.

Next steps

First, I need to add a mechanism to the model to measure the Gini coefficient over time. I’m currently only measuring the Gini coefficient at the very beginning (where wealth distribution is randomised) and at the very end (after 20 ticks of trading). It might be that the Gini coefficient goes down rapidly during the price stabilisation phase and then starts increasing, and therefore the model is accurately reflecting the increase of the Gini coefficient over time, once it gets going. I need to monitor it on each tick in order to work that out.

Then, if the Gini coefficient isn’t increasing, I need to add some mechanisms to the mix that are likely to increase inequality. Things that I’ve thought of are:

  • reducing FOOD and/or GOLD by a fixed amount each tick, to mirror the minimum expenses a Firm incurs simply for existing; if I do this I have to add the possibility of Firms failing (and new Firms being created) or going into debt
  • adding a mechanism that enables those Firms that have more GOLD to get more GOLD, for example by lending at interest to other Firms or by investing in increasing their own productivity
  • breaking up the economy into smaller sub-economies that only trade with each other, with some connecting Firms that can trade across those sub-economies; this was a variant in Wilhite’s original paper but I don’t know if it had an effect on inequality

If you have any other ideas, let me know.

Trying it out

As before, if you want to try out where I’ve got to so far, I’ve committed the source code to Github and there are instructions there about how to run it. Any comments/PRs will be gratefully received. The code is quite messy now and could do with a refactor.

I’ve also put all the raw data generated from the runs described in this post in spreadsheets. These are:

Feel free to copy and create your own analyses if you don’t want to run the models yourself.

Sensitivity analysis on an agent-based economic model

Apr 1, 2016

Previously, in my quest to build an agent-based model for the information economy, I constructed a basic model and added trade to it, based on Wilhite’s Bilateral Trade and ‘Small-World’ Networks.

From doing that, we’ve seen that price stabilisation occurs over roughly the first 10 cycles, with about 38% of the 500 agents being pure producers, and about 5% only responding to trade requests from others.

There are a few parts of this model where I’ve made choices that might influence the outcome. To test these out, I want to do a sensitivity analysis to double-check that I’m not drawing unwarranted conclusions from single runs.

Setting up Repast to do multiple runs

Repast can be used to do batch runs of a particular model, spawning several instances with different starting conditions and therefore different end points.

Getting this working had a few false starts. Batch runs need to include code that stops the run after a set number of cycles. This code needs to be placed in the src/informationeconomy/context/SimBuilder.groovy file, which you don’t normally see when viewing the Package Explorer in Eclipse. Getting the simulation to stop after 20 iterations requires a simple line:

public class SimBuilder implements ContextBuilder {
	
	public Context build(Context context) {
        ...	
		RunEnvironment.getInstance().endAt(20)
		...
	}
}

With that in place, the Batch Run Configuration tool enables you to run any number of concurrent “worlds”. I ran five with different random seeds. The following price stabilisation graph shows that they all reach price stabilisation after about eight iterations (pale lines are individual runs; stronger lines are the average over these runs):

graph showing price stabilisation as max and min prices converge on a mean over around 8 ticks

With 20 ticks per run, about 43% spend all their time producing goods and 57% trade in some way. About 6% never initiate trade themselves but just respond to offers from other agents.

Initial FOOD and GOLD

The first area where I want to carry out some sensitivity analysis is in the initial amount of FOOD and GOLD that each agent has. In the runs described above, each agent starts with the same amount of FOOD and GOLD that they can make in a turn. There are two other options that I want to test out: one where every agent starts with one FOOD and one GOLD, and one where each agent starts with a random amount of FOOD and GOLD (between 1 and 30).

With all Firms initially having a random amount of FOOD and GOLD, there are slightly fewer pure producers (38%) and more Firms that only accept trades (10%). Prices don’t start as high and follow a smoother path to a later stabilisation (around 14 steps in), as shown here:

graph showing price stabilisation as max and min prices converge on a mean over around 16 ticks

As we’d expect, there’s no relationship between initial utility and final utility when the Firms’ initial utility is unrelated to their ability to produce goods:

graph showing final utility based on initial utility

Starting with one FOOD and one GOLD leads to more trading, with only 30% pure producers and 17% of Firms only accepting trades. Prices start higher (after no trading in the first step) but settle down in the same way as with the other kinds of starting conditions.

graph showing price stabilisation as max and min prices converge on a mean over around 16 ticks

Given the smoothness of the price stabilisation curve when Firms start with random amounts of FOOD and GOLD, I will use this version of the code going forward.

Randomising FOOD or GOLD production

On each step, each Firm currently has to decide whether to produce FOOD, produce GOLD, or trade. The code that determines which they choose to do has some built-in biases:

if (utilityMakingFood > utilityMakingGold) {
	if (trade['utility'] > utilityMakingFood) {
		action = makeTrade(trade)
	} else {
		currentFood += foodPerStep
		action = [ type: 'make', good: 'food', amount: foodPerStep, utility: currentUtility() ]
	}
} else if (trade['utility'] > utilityMakingGold) {
	action = makeTrade(trade)
} else {
	currentGold += goldPerStep
	action = [ type: 'make', good: 'gold', amount: goldPerStep, utility: currentUtility() ]
}

The Firm will only consider making FOOD if the utility of making it is greater than the utility of making GOLD. Similarly, it will only consider making a trade if the utility of trading is greater than producing either FOOD or GOLD. This should bias the Firms towards producing FOOD or GOLD, and specifically towards producing GOLD, all things being equal.

Under the initial configuration, where Firms begin with the amount of FOOD and GOLD that they can produce in a single step, 85% of Firms produce GOLD at some point, and 84% produce FOOD. Across the 5 runs, only 24 Firms produce only FOOD (never trading or producing GOLD), but even fewer produce only GOLD (2, over the 5 runs).

Under a randomised initial amount of FOOD and GOLD, 79% produce GOLD and 81% produce FOOD, with 41 only producing FOOD and 21 only producing GOLD over the 5 runs.

So I don’t think that the code is biasing the results towards the producing of GOLD, but it’s hard to tell whether it’s biasing away from trade. I’ve added a bit of randomness:

def randomlyTrue = random(10000) > 5000
if (utilityMakingFood > utilityMakingGold || (utilityMakingFood == utilityMakingGold && randomlyTrue)) {
	randomlyTrue = random(10000) > 5000
	if (trade['utility'] > utilityMakingFood || (trade['utility'] == utilityMakingFood && randomlyTrue)) {
		action = makeTrade(trade)
	} else {
		currentFood += foodPerStep
		action = [ type: 'make', good: 'food', amount: foodPerStep, utility: currentUtility() ]
	}
} else if (trade['utility'] > utilityMakingGold || (trade['utility'] == utilityMakingGold && randomlyTrue)) {
	action = makeTrade(trade)
} else {
	currentGold += goldPerStep
	action = [ type: 'make', good: 'gold', amount: goldPerStep, utility: currentUtility() ]
}

As anticipated, this makes very little difference. Over the five runs, one more Firm produces FOOD than previously, one more never trades, one more only produces FOOD and five more only produces GOLD. There is a more significant increase in the percentage of Firms that only receive (but do not initiate) trade, rising from 257 (10%) to 273 (11%).

Price stabilisation occurs as before, though the graph does show a more regular oscillation in maximum price over the first few ticks, compared to the slightly smoother trajectory shown in the previous graphs.

graph showing price stabilisation as max and min prices converge on a mean over around 16 ticks

All in all, the model does not appear to be sensitive to the biases in the code that determine how Firms choose what to do on each step. I will keep the less biased code.

Next steps

Next, it’s time to introduce DATA to the mix. My goal for the initial experiment is simply to replace FOOD with DATA, use the same formula to work out the price for DATA, but introduce the crucial difference between FOOD and DATA, namely that when you trade DATA, you do not lose it. I want to see what happens to price stabilisation in this scenario, and look at the kinds of Firms that emerge.

Trying it out

As before, if you want to try out where I’ve got to so far, I’ve committed the source code to Github and there are instructions there about how to run it. Any comments/PRs will be gratefully received.

I’ve also put all the raw data generated from the runs described in this section in a spreadsheet which you’re welcome to copy and run your own analysis over.

Introducing trade to a basic agent-based economic model

Mar 11, 2016

My next step in building an agent-based model for the information economy is to add trade to the basic model.

The trade protocol is described in Wilhite’s Bilateral Trade and ‘Small-World’ Networks. First, you calculate the marginal rate of substitution for an agent: the amount of GOLD that they would need to exchange for a unit of FOOD in order to increase their utility. This can be calculated for each agent as:

mrs = GOLD FOOD

or in code form:

def mrs() {
	currentGold / currentFood
}

If this is greater than 1 then the Firm would rather buy FOOD (give up GOLD for FOOD). If it’s less than 1 then the Firm would rather sell FOOD (give up FOOD for GOLD).

Working out a fair price

Now imagine you have two Firms, A and B. Firm A has 20 GOLD and 10 FOOD (a utility of 20 × 10 = 200, and a mrs of 20 / 10 = 2). Firm B has 10 GOLD and 20 FOOD (with a utility of 200 as well, but a mrs of 10 / 20 = 0.5). These two firms can beneficially trade with each other to maximise their utility. If Firm A buys 5 FOOD from Firm B for 5 GOLD, both Firms end up with 15 GOLD and 15 FOOD, a utility of 225 with a mrs of exactly 1. This is the best they can do: if Firm A buys 4 FOOD for 4 GOLD they both end up with a utility of 14 × 16 = 224, if Firm A buys 6 FOOD for 6 GOLD they also both end up with a utility of 16 × 14 = 224.

In another scenario, let’s say you have Firm A with 20 GOLD and 10 FOOD and Firm B with the same. In this case, both Firms want to buy FOOD — they both have a mrs of 20 / 10 = 2, above 1 — and there is no mutually beneficial trade.

Then there’s the case where Firm A has 30 GOLD and 5 FOOD (a utility of 150 and a mrs of 30 / 5 = 6) and Firm B has 10 GOLD and 15 FOOD (a utility of 150 and a mrs of 10 / 15 = 0.67). In this case, if Firm A bought 2 FOOD from Firm B for 2 GOLD, Firm A would have 28 GOLD and 7 FOOD (a utility of 196, an increase of 46). Firm B would have 12 GOLD and 13 FOOD (a utility of 156, an increase of just 6). In this case the price of 1 GOLD for 1 FOOD unfairly benefits Firm A more than Firm B.

Wilhite uses the following formula to calculate the acceptable price between two firms:

price = GOLDA + GOLDB FOODA + FOODB

In the final scenario described above, the price would be:

price = 30 + 10 5 + 15 = 40 20 = 2

This means Firm A should pay 2 GOLD for each FOOD from Firm B. Let’s see how that price works out:

GOLDA FOODA UA mrsA GOLDB FOODB UB mrsB
30 5 150 6.00 10 15 150 0.67
28 6 168 4.67 12 14 168 0.86
26 7 182 3.71 14 13 182 1.08
24 8 192 3.00 16 12 192 1.33
22 9 198 2.44 18 11 198 1.64
20 10 200 2.00 20 10 200 2.00
18 11 198 1.64 22 9 198 2.44

Each step of the trade is equitable, and both achieve a maximum utility when Firm A has bought 5 FOOD for 10 GOLD, at which point they both have the same amount of GOLD and FOOD.

Turning this into code

First a few utility functions. We already have one to work out the marginal rate of substitution. This one works out the price that the firm should use to deal with another firm:

def priceForTrade(firm) {
	(currentGold + firm.currentGold) / (currentFood + firm.currentFood)
}

Now some functions that work out how much FOOD or GOLD to trade:

def tryBuyingFood(firm) {
	def price = priceForTrade(firm)
	def foodToBuy = Math.floor((firm.currentFood - currentFood) / 2)
	def result = [
		firm: firm,
		price: price,
		food: currentFood + foodToBuy,
		gold: currentGold - foodToBuy * price,
		utility: 0
	]
	result.put('utility', utility(result['food'], result['gold']))
	return result
}

def trySellingFood(firm) {
	def price = priceForTrade(firm)
	def foodToSell = Math.floor((currentFood - firm.currentFood) / 2)
	def result = [
		firm: firm,
		price: price,
		food: currentFood - foodToSell,
		gold: currentGold + foodToSell * price,
		utility: 0
	]
	result.put('utility', utility(result['food'], result['gold']))
	return result
}

For this version of the project, each Firm is going to try to trade with every other firm. The code to work out the best possible trade looks like this:

// work out the mrs for the firm
def mrs = mrs()

// set up a variable to record the best trade found
def trade = [
	firm: null,
	price: 0,
	food: currentFood,
	gold: currentGold,
	utility: currentUtility()
]

// cycle through each of the firms to see whether a trade is worthwhile
def thisFirm = self()
firms().each {
	def result = null
	if (mrs >= 1 && it.mrs() < 1) {
		// more GOLD than FOOD, so buy FOOD
		result = thisFirm.tryBuyingFood(it)
	} else if (mrs < 1 && it.mrs() >= 1) {
		// more FOOD than GOLD, so sell FOOD
		result = thisFirm.trySellingFood(it)
	} else {
		result = trade
	}
	// set the best trade to the result if it's a better trade than the best found so far
	if (result['firm'] == null) {
		trade = result
	} else if (result['utility'] > trade['utility']) {
		trade = result
	}
}

The final piece of the puzzle is to work out what to do given knowledge about the best possible trade, the potential utility achieved by making FOOD and the potential utility achieved by making GOLD. The decision uses this code:

if (utilityMakingFood > utilityMakingGold) {
	if (trade['utility'] > utilityMakingFood) {
		makeTrade(trade)
	} else {
		currentFood += foodPerStep
	}
} else if (trade['utility'] > utilityMakingGold) {
	makeTrade(trade)
} else {
	currentGold += goldPerStep
}

Where makeTrade() is defined as:

def makeTrade(trade) {
	trade['firm'].currentFood += currentFood - trade['food']
	trade['firm'].currentGold += currentGold - trade['gold']
	currentFood = trade['food']
	currentGold = trade['gold']
}

Adding monitoring

This code is all that’s needed to get the agents operating in a market. But to get useful data out of the model, we have to capture what’s going on for each agent. To do that, I’ve set up an actions property that holds an array of actions that the agent takes. These can be of three types: making FOOD, making GOLD and initating trade (I don’t capture the recipient of trade). I’ve also created an activity property that is very similar but includes times when the agent is the recipient of a trade rather than the initiator. They’re both defined as arrays:

def actions = []
def activity = []

with the makeTrade() function returning the relevant trade action and adding to the activity of the firm that’s being traded with:

def makeTrade(trade) {
	def action = [ type: 'trade', food: trade['food'] - currentFood, gold: trade['gold'] - currentGold, price: trade['price'], utility: trade['utility'] ]
	trade['firm'].currentFood += currentFood - trade['food']
	trade['firm'].currentGold += currentGold - trade['gold']
	trade['firm'].activity << [ type: 'receive-trade', food: currentFood - trade['food'], gold: currentGold - trade['gold'], price: trade['price'], utility: trade['firm'].currentUtility() ]
	currentFood = trade['food']
	currentGold = trade['gold']
	return action
}

and the code that decides which action to take adding the relevant action to the actions list:

def action = []
if (utilityMakingFood > utilityMakingGold) {
	if (trade['utility'] > utilityMakingFood) {
		action = makeTrade(trade)
	} else {
		currentFood += foodPerStep
		action = [ type: 'make', good: 'food', amount: foodPerStep, utility: currentUtility() ]
	}
} else if (trade['utility'] > utilityMakingGold) {
	action = makeTrade(trade)
} else {
	currentGold += goldPerStep
	action = [ type: 'make', good: 'gold', amount: goldPerStep, utility: currentUtility() ]
}
actions << action
activity << action

Looking at the results

First achievement: this looks to work! In a single sample run of 30 ticks, some of the agents (about 38%) spend all their time producing goods, while others (about 62%) trade in some way. Of those that trade, about 5% never initiate trade themselves but just respond to offers from other agents. The price negotiated with each trade starts off quite uneven, but stabilises at around 1 FOOD for 1 GOLD:

graph showing price stabilisation as max and min prices converge on a mean over around 8 ticks

Because of the way the model is set up, with no consumption of either FOOD or GOLD once it’s created, all the Firms increase in utility over the run. We can plot the final utility of each firm against its initial utility like so:

graph showing correlation between initial utility of a Firm and its final utility

The expected utility of a firm, if it doesn’t participate in any trading, can be calculated as foodPerStep * 16 * goldPerStep * 16. This accounts for the strong diagonal line in the above graph: one dot for each Firm that is a pure producer. The dots above that line are the Firms that participate in trade, who manage to achieve a greater utility through trading than they would if they had just produced all the time. There are no dots below the line because (unlike in real life) the Firms are very sensible about only choosing actions that are going to increase their utility.

An example trading history of just the first activities for a Firm that does well out of trading looks like this:

activity FOOD GOLD utility
  24 2 48
accept offer to sell 11 FOOD for 6.6 GOLD 13 8.6 111.8
produce 24 FOOD 37 8.6 318.2
sell 12 FOOD for 18.13 GOLD 25 26.73 668.19
produce 24 FOOD 49 26.73 1309.65
produce 24 FOOD 73 26.73 1951.11
sell 19 FOOD for 23.93 GOLD 54 50.66 2735.73
produce 24 FOOD 78 50.66 3951.61
produce 24 FOOD 102 50.66 5167.49
produce 24 FOOD 126 50.66 6383.36
accept offer to sell 40 FOOD for 37.66 GOLD 86 88.32 7595.29

This Firm only ever produces FOOD. It trades eight times during the 30 ticks of the simulation, initating the trade itself half of the time and ends up having about 9 times as much utility by the end as you’d expect given its starting condition.

It’s easy to predict which Firms will do well within the simulation: the higher the ratio between the amount of FOOD a Firm can produce and the amount of GOLD they can produce, the better they do. For example, a Firm that can produce only 1 FOOD per tick, but 25 GOLD per tick will do a lot better than a Firm that can produce 5 FOOD and 5 GOLD per tick, despite starting with the same utility. Specialisation wins!

Next steps

There are a few parts of the model that I am not sure about. There are parts of the code that, all things being equal, favour production over trading and favour GOLD production over FOOD production. I’ve also started each Firm off with amounts of FOOD and GOLD that depend on how much they can create, rather than starting everyone off with 1 FOOD and 1 GOLD, or randomising how much they get at the start. I want to do a bit of sensitivity analysis to make sure the model is robust before I expand it to include DATA.

I also want to work out how to run the simulation multiple times so that I can aggregate the results and smooth out some of the jagged lines in the graph.

Trying it out

As before, if you want to try out where I’ve got to so far, I’ve committed the source code to Github and there are instructions there about how to run it. Any comments/PRs will be gratefully received.

Building a basic agent-based economic model in Repast

Feb 9, 2016

My previous post talked about building an agent-based model for the information economy.

Step one is to create the basic agent-based model described in Wilhite’s Bilateral Trade and ‘Small-World’ Networks in Repast Simphony. In fact I’m going to set myself an even smaller task for this post: setting up the Firm agents to produce FOOD and GOLD without letting them trade.

Let’s set up the agents first. Each agent has at any point in time an amount of FOOD and an amount of GOLD. These are set in the Firm.groovy code.

def currentFood = 0
def currentGold = 0

Calculating utility

From the FOOD and GOLD we can apparently calculate the utility of the agent using a rudimentary Cobb-Douglas function like so:

def utility(food, gold) {
	food * gold
}

Utility is an economic term and as I said I’m not an economist. According to the Wikipedia article, utility is about commodities, not agents, so I’m not sure that this is “the agent’s utility” as such. The Cobb-Douglas function is giving the utility of the commodity generated from the FOOD and GOLD. Since that commodity doesn’t go anywhere, I’m taking it that in this model the “current utility” is more like a measure of, in layman’s terms, “current wealth”. I’d love it if someone corrected my understanding here.

The Cobb-Douglas function used here, and in Wilhite’s paper, is one particular form of a more general function:

Y = A L β K α

(Go, MathML! I remember a time you’d have to have a plugin in the page to render that!)

In the version we’re using L (usually labour) is FOOD and K (usually capital) is GOLD. These seem like reasonable analogies, or you could argue it the other way round.

As you can see, in the version Wilhite uses, both α and β are 1. There are some consequences to this (“returns to scale are increasing”, whatever that means) but I’m going to follow Wilhite’s model for now.

Choosing what to do

At each step, the agents can choose whether to produce FOOD or to produce GOLD. The amount that they can produce of each at each step is random and needs to be set up when the agent is first created. So I need to define some variables on the agents to hold the values:

def foodPerStep = 0
def goldPerStep = 0

Then when the agents are created I need to set these randomly (between 1 and 30 is what Wilhite uses). This is in the UserObserver.groovy code as the setup function run at the beginning of the simulation:

@Setup
def setup(){
	clearAll()
	setDefaultShape(Firm, "circle")
	createFirms(500){
		setxy(randomXcor(),randomYcor())
		foodPerStep = random(29) + 1
		goldPerStep = random(29) + 1
	}
}

You’ll notice that I decided to make the Firms circles in the visualisation and distribute them randomly. Good enough for now. And I’m making 500 Firms: that’s the number Wilhite used.

On each step, each Firm will decide whether to produce FOOD or produce GOLD, based on which increases their utility the most. The choice is in this code:

def step() {
	def utilityMakingFood = utility(currentFood + foodPerStep, currentGold)
	def utilityMakingGold = utility(currentFood, currentGold + goldPerStep)
	if (utilityMakingFood > utilityMakingGold) {
		currentFood += foodPerStep
	} else {
		currentGold += goldPerStep
	}
}

Note that if it doesn’t make a difference whether FOOD or GOLD gets produced, it’ll produce GOLD.

To make that work, the UserMonitor.groovy code needs to call the Firm.step() method for each Firm on each step:

@Go
def go() {
	ask(firms()){
		step()
	}
}

Monitoring

When you run this code in Repast it generates a rather pretty picture of lots of multi-coloured circles (completely meaningless of course):

Randomly distributed firms in Repast

Repast lets you monitor individual agents. So I can see that the first agent in the simulation has a foodPerStep of 12 and a goldPerStep of 1. Here’s how its food and gold changes over the ticks of the simulation:

step FOOD GOLD utility
0 0 0 0
1 0 1 0
2 12 1 12
3 12 2 24
4 24 2 48
5 24 3 72
6 36 3 108

So things are working as expected (if not in any very interesting way, since the agents aren’t currently interacting with each other).

Thinking ahead

The main next step is to get trade working between the agents, which will hopefully make things more interesting. There are several areas that seem extremely simplified:

  • I’m not sure that α and β in the Cobb-Douglas function should both be 1. I think perhaps they should sum to 1 (eg be 0.75 and 0.25). Maybe different agents should have slightly different values for those exponents.

  • If the projected benefit of making FOOD or GOLD are the same, perhaps the agent should randomly choose between them rather than always producing GOLD. That would introduce a bit more randomness into the simulation.

  • Currently, the amount of FOOD and GOLD that the agent can produce at each step is distributed evenly across the agents: roughly the same number of agents will be able to produce between 1-5 FOOD as between 10-15 FOOD as between 25-30 FOOD. There are various other distributions that could be used: perhaps a exponential distribution to reflect the fact that there are many more small companies than large ones.

  • Shouldn’t the ability of an organisation to produce be related to its current holdings? I would have thought that the production ability of a firm should be proportional to its existing FOOD and GOLD rather than fixed throughout its life.

  • Shouldn’t there be some consumption of FOOD and GOLD that leads to firms going bankrupt if they don’t have minimum levels? Should there be some mechanism for new entrants to appear? If we’re looking at an area where there is innovation, these factors seem important.

I keep reminding myself that examining what happens when these tweaks are added is part of the experimentation that the model enables. The main goal at this stage is just to replicate Wilhite.

The one piece where I will need to make a decision is in how to fit DATA’s role into the Cobb-Doublas function to represent the value of information or knowledge. I’d value some guidance on this. The two options that I can think of are:

  • DATA is A. A is supposed to be “Total factor productivity”, representative of technological growth and efficiency.

  • DATA adjusts α and/or β. These are supposed to be the “output elasticities” of labour and capital, ie increasing the utility from the same amount of FOOD and/or GOLD. This could be rationalised as data providing the ability to get more output from the same inputs.

Using DATA in either of these ways will change the way in which utility is measured but scale in different ways. I’m inclined to use the simplest (DATA is A) as a starting point.

Trying it out

If you want to try out where I’ve got to so far, I’ve committed the source code to Github and there are instructions there about how to run it. Any comments/PRs will be gratefully received.