将 CSV 行导入 class

Importing CSV lines into a class

这是一个简单得令人反感的问题,在某种程度上,我什至觉得不好意思问这个问题。这两天我一直在用头撞墙。

我正在尝试使用一个面向对象的程序,该程序采用 csv 的行并将该 CSV 的每一行转换为我可以在以后使用的变量。我想以某种方式(我不知道如何)将该 CSV 的每一行放入 class。我知道这甚至可能不是最好的方法,但由于其他原因,我不得不以这种方式解决问题。

我的知识还不够Python,甚至不知道如何查找此问题的解决方案,我需要知道如何为我正在从事的项目执行此操作。

这是我基于此的代码:

import argparse
from collections import defaultdict
import csv


class Actor(object):
    """An actor with bounded rationality.

    The methods on this class such as u_success, u_failure, eu_challenge are
    meant to be calculated from the actor's perspective, which in practice
    means that the actor's risk aversion is always used, including to calculate
    utilities for other actors.

    I don't understand why an actor would assume that other actors share the
    same risk aversion, or how this implies that it is from the given actor's
    point of view, but as far as I can tell this is faithful to BDM's original
    formulation as well as Scholz's replication.
    """
    def __init__(self, name, c, s, x, model, r=1.0):
        self.name = name
        self.c = c  # capabilities, float between 0 and 1
        self.s = s  # salience, float between 0 and 1
        self.x = x  # number representing position on an issue
        self.model = model
        self.r = r  # risk aversion, float between .5 and 2

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        return '%s(x=%s,c=%s,s=%s,r=%.2f)' % (
            self.name, self.x, self.c, self.s, self.r)

    def compare(self, x_j, x_k, risk=None):
        """Difference in utility to `self` between positions x_j and x_k."""
        risk = risk or self.r

        position_range = self.model.position_range
        x_k_distance = (abs(self.x - x_k) / position_range) ** risk
        x_j_distance = (abs(self.x - x_j) / position_range) ** risk
        return self.c * self.s * (x_k_distance - x_j_distance)

    def u_success(self, actor, x_j):
        """Utility to `actor` successfully challenging position x_j."""
        position_range = self.model.position_range
        val = 0.5 - 0.5 * abs(actor.x - x_j) / position_range
        return 2 - 4 * val ** self.r

    def u_failure(self, actor, x_j):
        """Utility to `actor` of failing in challenge position x_j."""
        position_range = self.model.position_range
        val = 0.5 + 0.5 * abs(actor.x - x_j) / position_range
        return 2 - 4 * val ** self.r

    def u_status_quo(self):
        """Utility to `self` of the status quo."""
        return 2 - 4 * (0.5 ** self.r)

    def eu_challenge(self, actor_i, actor_j):
        """Expected utility to `actor_i' of `actor_i` challenging `actor_j`.

        This is calculated from the perspective of actor `self`, which in
        practice means that `self.r` is used for risk aversion.
        """
        prob_success = self.model.probability(actor_i.x, actor_j.x)
        u_success = self.u_success(actor_i, actor_j.x)
        u_failure = self.u_failure(actor_i, actor_j.x)
        u_status_quo = self.u_status_quo()

        eu_resist = actor_j.s * (
            prob_success * u_success + (1 - prob_success) * u_failure)
        eu_not_resist = (1 - actor_j.s) * u_success
        eu_status_quo = self.model.q * u_status_quo

        return eu_resist + eu_not_resist - eu_status_quo

    def danger_level(self):
        """The amount of danger the actor is in from holding its policy position.

        The smaller this number is, the more secure the actor is, in that it
        expects fewer challenges to its position from other actors.
        """
        return sum(self.eu_challenge(other_actor, self) for other_actor
                   in self.model.actors if other_actor != self)

    def risk_acceptance(self):
        """Actor's risk acceptance, based on its current policy position.

        I have two comments:
        - It seems to me that BDM's intent was that in order to calculate
          risk acceptance, one would need to compare an actor's danger level
          across different policy positions that the actor could hold. Instead,
          Scholz compares the actor's danger level to the danger level of all
          other actors. This comparison doesn't seem relevant, given that other
          actors will have danger levels not possible for the given actor
          because of differences in salience and capability.
        - Even (what I assume to be) BDM's original intention is an odd way to
          calculate risk acceptance, given that the actor's policy position may
          have been coerced, rather than having been chosen by the actor based
          on its security preferences.
        """

        # Alternative calculation, which I think is more faithful to
        # BDM's original intent.

        # orig_position = self.x
        # possible_dangers = []
        # for position in self.model.positions():
        #     self.x = position
        #     possible_dangers.append(self.danger_level())
        # self.x = orig_position

        # max_danger = max(possible_dangers)
        # min_danger = min(possible_dangers)

        # return ((2 * self.danger_level() - max_danger - min_danger) /
        #         (max_danger - min_danger))

        danger_levels = [actor.danger_level() for actor in self.model.actors]
        max_danger = max(danger_levels)
        min_danger = min(danger_levels)
        return ((2 * self.danger_level() - max_danger - min_danger) /
                (max_danger - min_danger))

    def risk_aversion(self):
        risk = self.risk_acceptance()
        return (1 - risk / 3.0) / (1 + risk / 3.0)

    def best_offer(self):
        offers = defaultdict(list)

        for other_actor in self.model.actors:
            if self.x == other_actor.x:
                continue

            offer = Offer.from_actors(self, other_actor)
            if offer:
                offers[offer.offer_type].append(offer)

        best_offer = None
        best_offer_key = lambda offer: abs(self.x - offer.position)

        # This is faithful to Scholz' original code, but it appears to be a
        # mistake, since Scholz' paper and BDM clearly state that each actor
        # chooses the offer that requires him to change position the
        # least. Instead, Scholz included a special case for compromises which
        # results in some bizarre behavior, particularly in Round 4 when
        # Belgium compromises with Netherlands to an extreme position rather
        # than with France.
        def compromise_best_offer_key(offer):
            top = (abs(offer.eu) * offer.actor.x +
                   abs(offer.other_eu) * offer.other_actor.x)
            return top / (abs(offer.eu) + abs(offer.other_eu))

        if offers['confrontation']:
            best_offer = min(offers['confrontation'], key=best_offer_key)
        elif offers['compromise']:
            best_offer = min(offers['compromise'],
                             key=compromise_best_offer_key)
        elif offers['capitulation']:
            best_offer = min(offers['capitulation'], key=best_offer_key)

        return best_offer


class Offer(object):
    CONFRONTATION = 'confrontation'
    COMPROMISE = 'compromise'
    CAPITULATION = 'capitulation'
    OFFER_TYPES = (
        CONFRONTATION,
        COMPROMISE,
        CAPITULATION,
    )

    def __init__(self, actor, other_actor, offer_type, eu, other_eu, position):
        if offer_type not in self.OFFER_TYPES:
            raise ValueError('offer_type "%s" not in %s'
                             % (offer_type, self.OFFER_TYPES))

        self.actor = actor  # actor receiving the offer
        self.other_actor = other_actor  # actor proposing the offer
        self.offer_type = offer_type
        self.eu = eu
        self.other_eu = other_eu
        self.position = position

    @classmethod
    def from_actors(cls, actor, other_actor):
        eu_ij = actor.eu_challenge(actor, other_actor)
        eu_ji = actor.eu_challenge(other_actor, actor)

        if eu_ji > eu_ij > 0:
            offer_type = cls.CONFRONTATION
            position = other_actor.x
        elif eu_ji > 0 > eu_ij and eu_ji > abs(eu_ij):
            offer_type = cls.COMPROMISE
            concession = (other_actor.x - actor.x) * abs(eu_ij / eu_ji)
            position = actor.x + concession
        elif eu_ji > 0 > eu_ij and eu_ji < abs(eu_ji):
            offer_type = cls.CAPITULATION
            position = other_actor.x
        else:
            return None

        return cls(actor, other_actor, offer_type, eu_ij, eu_ji, position)

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        type_to_fmt = {
            self.CONFRONTATION: '%s loses confrontation to %s',
            self.COMPROMISE: '%s compromises with %s',
            self.CAPITULATION: '%s capitulates to %s',
        }
        fmt = type_to_fmt[self.offer_type] + "\n\t%s vs %s\n\tnew_pos = %s"

        return fmt % (self.actor.name, self.other_actor.name, self.eu,
                      self.other_eu, self.position)


class BDMScholzModel(object):
    """An expected utility model for political forecasting."""

    def __init__(self, data, q=1.0):
        self.actors = [
            Actor(name=item['Actor'],
                  c=float(item['Capability']),
                  s=float(item['Salience']),
                  x=float(item['Position']),
                  model=self)
            for item in data]
        self.name_to_actor = {actor.name: actor for actor in self.actors}
        self.q = q
        positions = self.positions()
        self.position_range = max(positions) - min(positions)

    @classmethod
    def from_csv_path(cls, csv_path):
        return cls(csv.DictReader(open(csv_path, 'rU')))

    def actor_by_name(self, name):
        return self.name_to_actor.get(name)

    def __getitem__(self, key):
        return self.name_to_actor.get(key)

    def positions(self):
        return list({actor.x for actor in self.actors})

    def median_position(self):
        positions = self.positions()
        median = positions[0]
        for position in positions[1:]:
            votes = sum(actor.compare(position, median, risk=1.0)
                        for actor in self.actors)
            if votes > 0:
                median = position
        return median

    def mean_position(self):
        return (sum(actor.c * actor.s * actor.x for actor in self.actors) /
                sum(actor.c * actor.s for actor in self.actors))

    def probability(self, x_i, x_j):
        if x_i == x_j:
            return 0.0

        # `sum_all_votes` below is faithful to Scholz' code, but I think it is
        # quite contrary to BDM's intent. Instead, we should have.
        # denominator = sum(actor.compare(x_i, x_j) for actor in self.actors)

        # This would make sure that prob(x_i, x_j) + prob(x_j, x_i) == 1.
        # However, because of the odd way that salience values are used as
        # the probability that an actor will resist a proposal, this results in
        # the actors almost always confronting each other.

        # My theory is that Scholz got around the confrontation problem by
        # introducing this large denominator, causing extremely small
        # probability values. This prevents actors from confronting each other
        # constantly, but the result is comical, in that the challenging actor
        # always has a vanishingly small chance of winning a conflict, yet the
        # challenged actor often gives up without a fight because of low
        # salience.
        sum_all_votes = sum(abs(actor.compare(a1.x, a2.x))
                            for actor in self.actors
                            for a1 in self.actors
                            for a2 in self.actors)
        return (sum(max(0, actor.compare(x_i, x_j)) for actor in self.actors) /
                sum_all_votes)

    def update_risk_aversions(self):
        for actor in self.actors:
            actor.r = 1.0

        actor_to_risk_aversion = [(actor, actor.risk_aversion())
                                  for actor in self.actors]
        for actor, risk_aversion in actor_to_risk_aversion:
            actor.r = risk_aversion

    def update_positions(self):
        actor_to_best_offer = [(actor, actor.best_offer())
                               for actor in self.actors]
        for actor, best_offer in actor_to_best_offer:
            if best_offer:
                print best_offer
                actor.x = best_offer.position

    def run_model(self, num_rounds=1):
        print 'Median position: %s' % self.median_position()
        print 'Mean position: %s' % self.mean_position()

        for round_ in range(1, num_rounds + 1):
            print ''
            print 'ROUND %d' % round_
            self.update_risk_aversions()
            self.update_positions()

            print ''
            print 'Median position: %s' % self.median_position()
            print 'Mean position: %s' % self.mean_position()


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'csv_path',
        help='path to csv with input data')
    parser.add_argument(
        'num_rounds',
        help='number of rounds of simulation to run',
        type=int)
    args = parser.parse_args()

    model = BDMScholzModel.from_csv_path(args.csv_path)
    model.run_model(num_rounds=args.num_rounds)

是的,那是很多代码,但是阅读代码,然后 运行 它,我可以看到发生了什么。

您可能遇到此错误:

% python2 so.py sample.csv 1
Traceback (most recent call last):
  File "so.py", line 336, in <module>
    model = BDMScholzModel.from_csv_path(args.csv_path)
  File "so.py", line 241, in from_csv_path
    return cls(csv.DictReader(open(csv_path, 'rU')))
  File "so.py", line 233, in __init__
    for item in data]
KeyError: 'Actor'

你会收到这个错误,因为仅仅创建一个 DictReader 并没有真正读取数据,这仍然是你必须明确执行的一组步骤。这是 Python2 文档中 DictReader 的最小示例:

import csv
with open('names.csv') as csvfile:
     reader = csv.DictReader(csvfile)
     for row in reader:
         print(row['first_name'], row['last_name'])

在你的情况下,你想将一个字典列表传递给你的 BDMScholzModel 构造函数,并在其 __init__() 方法中将这些单独的字典转换为 Actors。

因此,您的 from_csv_path() 类方法需要看起来更像该示例,并进行以下更改:

  • 在创建 reader、data = []
  • 之前先创建一个空列表
  • 在 row-in-reader 循环中,只需将每一行附加到数据,data.append(row)(DictReader 会为您处理 field/key 名称)
  • 在整个 with-open 块之后,最后调用你的 BDMScholzModel 初始化程序 w/your 数据,return cls(data)

这些都是我做的。然后勾勒出这个 CSV 样本:

sample.csv

Actor,Capability,Salience,Position
foo,1,1,1
bar,2,2,2
baz,3,3,3

我还在新 from_csv_path() 类方法末尾的 cls(data) 调用之前添加了一条调试打印语句:

print 'debug data: %s\n' % data
return cls(data)

和运行:

python2 so.py sample.csv 1

找到我了:

debug data: [
  {'Capability': '1', 'Position': '1', 'Salience': '1', 'Actor': 'foo'}, 
  {'Capability': '2', 'Position': '2', 'Salience': '2', 'Actor': 'bar'}, 
  {'Capability': '3', 'Position': '3', 'Salience': '3', 'Actor': 'baz'}
]

Median position: 3.0
Mean position: 2.57142857143

ROUND 1

Median position: 3.0
Mean position: 2.57142857143

这是我的完整 from_csv_path() 方法:

@classmethod
def from_csv_path(cls, csv_path):
    data = []
    with open(csv_path) as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            data.append(row)
    print 'debug data: %s\n' % data
    return cls(data)