Skip to content

Latest commit

 

History

History
316 lines (229 loc) · 26.1 KB

File metadata and controls

316 lines (229 loc) · 26.1 KB
title
SQLAlchemy / ORM (Object Relational Mapper)

SQLAlchemy / ORM (Object Relational Mapper)

  • Object Relational Tutorial — SQLAlchemy 1.2 Documentation #ril
    • SQLAlchemy ORM (Object Relational Mapper) 提出一種將 user-defined Python class 跟 database table 聯結 (associate) 起來的方法,也就是每個 instance 都對應到個別的 table row。有個 unit of work 的機制可以自動同步 object 與 row 的改變 (transparently),也可以用 class 及宣告的 relationship 來表達 database query。

    • ORM 是基於 SQLAlchemy 的 SQL Expression Language (用 Python constructs 間接執行 SQL statements),一個 application 完全只用 ORM 是沒有問題的,極少數情況下才需要直接使用 SQL Expression Language。

    • 生成 mapped class instance 只是第一步,要寫進 database 得透過 Session.add() 加進 session 才行,下次 flush 時就會被寫進資料庫。

      >>> ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')
      >>> session.add(ed_user)
      
    • 加進 session 但還沒寫入資料庫前,這個 instance 處於 pending state (未加入 session 前是 transient state),因為 SQL statement 尚未執行,而且在 database 裡也還沒有對應的 row (還沒有 PK)。Session 會在需要時執行 SQL 寫入資料庫,這個動作稱做 flush,例如在 query 前 - 因為 query 發生在 database 端,異動先進 database (但還沒 commit),查詢的結果才能反應最新的異動。

      >>> our_user = session.query(User).filter_by(name='ed').first()
      >>> our_user
      <User(name='ed', fullname='Ed Jones', password='edspassword')>
      
    • 上面 query 的動作會先 flush 再執行 query。這裡傳回的 our_user 會等同於 ed_user (assert our_user is ed_user),那是因為 session 會維護一個 identity map,在同一個 session 裡,在資料庫裡具有特定 primary key 的 row,只會對應到 session 裡的一個 Python object。

Mapping, Declarative, Metadata, Instrumentation ??

  • Declare a Mapping - Object Relational Tutorial — SQLAlchemy 1.2 Documentation #ril

    • When using the ORM, the configurational process starts by describing the database tables we’ll be dealing with, and then by defining our own classes which will be mapped to those tables. In modern SQLAlchemy, these two tasks are usually PERFORMED TOGETHER, using a system known as DECLARATIVE, which allows us to create classes that INCLUDE DIRECTIVES TO DESCRIBE THE ACTUAL DATABASE TABLE they will be mapped to.

      這段話很重要,但要先走過 SQLAlchemy Core 才知道單純的宣告 metadata 是怎麼回事,所以 declarative = mapper(class + metadata),在 class 裡直接提供生成 metadata 需要的素材 (例如 __tablename__relationship() 等),這中間的魔法來自 declarative base class 的 metaclass 機制。

    • Declarative system 需要先定義 mapping (實質上就是 mapped class),裡面主要是 table name、column name/datatype 與 database table 的對應關係。這些 class 要繼承一個 declarative base class (用 sqlalchemy.ext.declarative.declarative_base() 產生),內部會維護 class/table 的清單 (catalog),通常一個 app 只會有一個 base class。

    • Mapped class 至少要宣告 __tablename__ 跟一個 Column 做為 PK,因為 SQLAlchemy 不對 table name、data type、constraints 等不做任何假設;不過自訂 custom base class 或 mixin 是被鼓勵的 #ril

    • User 對應 users table 為例,有 id (PK)、namefullnamepassword 4 個欄位;另外 __repr__() 不一定要實作,但印到 logs 時好像很方便?

      from sqlalchemy.ext.declarative import declarative_base
      from sqlalchemy import Column, Integer, String
      
      Base = declarative_base()
      
      class User(Base):
        __tablename__ = 'users'
      
        id = Column(Integer, primary_key=True)
        name = Column(String)
        fullname = Column(String)
        password = Column(String)
      
        def __repr__(self):
          return "<User(name='%s', fullname='%s', password='%s')>" % (
                 self.name, self.fullname, self.password)
      
    • 當 class 建立時,declarative system 透過 metaclass (來自 declarative base class) 將所有的 Column 置換成 descriptor (對 attribute access 動手腳)、並自動產生 table metadata - Table (可以從 __table__ attribute 取得)、透過 Mapper 建立兩邊的聯結,形成 mapped class,這個過程稱做 instrumentation,可以同步 class 與 database 兩邊的資料;這要先看過 SQL Expression Language 前面才會懂。

      >>> User.__table__
      Table('users', MetaData(bind=None),
                  Column('id', Integer(), table=<users>, primary_key=True, nullable=False),
                  Column('name', String(), table=<users>),
                  Column('fullname', String(), table=<users>),
                  Column('password', String(), table=<users>), schema=None)
      
    • Mapped class 跟其他 class 沒什麼不同,可以宣告其他的 attribute、method 等。

    • 雖然建議使用 declarative system,但要使用 SQLAlchemy ORM 也可以走 classical mapping - 任何 Python class 都可以跟透過 mapper() 產生 Mapper,以建立 class 與 Table 的聯結。

    • Table 只是 MetaData 的一員,可以從 declarative base class 的 metadata attribute 取得,可以把它想成是整個 schema,底下有許多 Table。可以想見,透過 MetaData.create_all() (搭配 Engine 做為 database connectivity) 就可以建立資料庫 - 把不存在的 table 建立起來。

      >>> Base.metadata.create_all(engine) # 這時才會用到 engine 跟資料庫溝通
      SELECT ...
      PRAGMA table_info("users")
      ()
      CREATE TABLE users (
          id INTEGER NOT NULL, name VARCHAR,
          fullname VARCHAR,
          password VARCHAR,
          PRIMARY KEY (id)
      )
      ()
      COMMIT
      
    • Minimal Table Descriptions vs. Full Descriptions 提到 "The length field on String, as well as similar precision/scale fields available on Integer, Numeric, etc. are not referenced by SQLAlchemy other than when creating tables.",也就是說透過 SQLAlchemy 來建立資料結構時這些 constraint 才會作用,跟 validation 無關。

    • 要新增一個 instance/row 時,首先要生成一個 mapped class 的 instance:

      >>> ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')
      >>> ed_user.name
      'ed'
      >>> ed_user.password
      'edspassword'
      >>> str(ed_user.id)
      'None'
      
    • 顯然 instrumentation 還提供了 __init__() constructor,可以接受以 column 為名的 keyword arguments,除非有自訂 __init__()

  • instrumentation - Glossary — SQLAlchemy 1.2 Documentation

    • Instrumentation refers to the process of augmenting the FUNCTIONALITY and ATTRIBUTE SET of a particular class. Ideally, the behavior of the class should REMAIN CLOSE TO A REGULAR CLASS, except that additional behaviors and features are made available.
    • The SQLAlchemy MAPPING PROCESS, among other things, adds DATABASE-ENABLED descriptors to a mapped class which each represent a particular database column or relationship to a related class.

Session ??

  • SQL Expression Language Tutorial — SQLAlchemy 1.2 Documentation 這裡完全沒提到 session,是 ORM 才有的概念。

  • Creating a Session - Object Relational Tutorial — SQLAlchemy 1.2 Documentation #ril

    • ORM 跟 database 對話的接口 (handle) 是 Session,一個 session 是一個 workspace,對應一個 database connection (來自 engine 的 connection pool)。首先得用 sqlalchemy.orm.sessionmaker() 產生一個 Session 的 factory (可以設定 auto flush、auto commit 等),有趣的是這個 factory 也習慣命名為 Session (callable),然後呼叫它以產生一個 Session instance ... 這聽起來有點繞口? This custom-made Session class will create new Session objects which are bound to our database.

      from sqlalchemy.orm import sessionmaker
      Session = sessionmaker(bind=engine)
      # Session.configure(bind=engine) # 事後綁定
      
      session = Session()
      
    • Engine 一樣,雖然 Session() 生成了 Session instance,但還沒有建立連線,開始使用時才會從 engine 的 connection pool 取用一個 connection,直到 commit 或關閉 session 時。

    • Session Lifecycle Patterns 提到 Session instance 的產生時機會因應用類型而異,強調 session 是個 workspace - local to a particular database connection。

    • Session Lifecycle Patterns 強調 "the Session is just a workspace for your objects, local to a particular database connection",workspace 這說法類似於 Git 的 working tree,有助於理解 transient、pending、dirty、new 這些說法 #ril

  • Using the Session — SQLAlchemy 1.2 Documentation #ril

  • Session Basics — SQLAlchemy 1.2 Documentation #ril

  • State Management — SQLAlchemy 1.2 Documentation object Mapped class 的 instance 在 session 看來有 transient、pending、persistent、deleted 四種狀態 #ril

  • Session API — SQLAlchemy 1.2 Documentation #ril

Relationship ??

日前重新看過 SQLAlchemy Core & ORM 的文件,覺得 SQLAlchemy 的 metadata 主要是跟 SQL DDL (CREATE XXX) 有關,跟 SQL DML (SELECT, UPDATE, DELETE 等) 應該沒有關係才對。

這樣的想法透過下面的實驗得到證實;注意 ForeignKey()relationship() 都是一般的用法:

class User(Base):
    __tablename__ = 'user'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    addresses = relationship('Address', order_by='Address.id', back_populates='user')

class Address(Base):
    __tablename__ = 'address'
    id = Column(Integer, primary_key=True)
    email_address = Column(String, nullable=False)
    _user_id = Column('user_id', Integer, ForeignKey('user.id'))
    user = relationship('User', back_populates='addresses')

def test_orm_without_fk_constraints__mysql(mysql_engine):
    conn = mysql_engine.connect()
    trans = conn.begin()
    try:
        # 刻意用單純的 SQL 來建立 table (而非 Metadata.create_all()),確認 DB 裡沒有 FK constraints
        conn.execute("CREATE TABLE user (id INT NOT NULL AUTO_INCREMENT, name CHAR(20) NOT NULL, PRIMARY KEY (id));")
        conn.execute("CREATE TABLE address (id INT NOT NULL AUTO_INCREMENT, user_id INT NOT NULL, email_address VARCHAR(50), PRIMARY KEY (id));")
        conn.execute("INSERT INTO user (id, name) VALUES (1, 'Jeremy');")
        conn.execute("INSERT INTO address (id, user_id, email_address) VALUES (1, 1, 'imsardine@gmail.com'), (2, 1, 'jeremykao@kkbox.com')")
        trans.commit()
    except:
        trans.rollback()
        raise

    # Start a new transaction implicitly
    Session = sessionmaker(bind=mysql_engine)
    session = Session()

    try:
        # Test ORM querying 讀取沒問題
        jeremy = session.query(User).filter_by(id=1).one()
        assert jeremy.name == 'Jeremy'
        assert [addr.email_address for addr in jeremy.addresses] == \
                ['imsardine@gmail.com', 'jeremykao@kkbox.com']

        judy = User(name='Judy')
        judy.addresses.append(Address(email_address='imjudykao@gmail.com'))
        session.add(judy)
        session.commit()

        # Test ORM writing 寫入也沒問題
        judy = session.query(User).filter_by(name='Judy').one()
        assert [addr.email_address for addr in judy.addresses] == ['imjudykao@gmail.com']
        assert judy.addresses[0].user is judy # backref
    finally:
        session.close()

過去我們對 "MySQL Cluster 不支援 FK constraints" 這句話可能有誤解,所以一開始就把 _user_id = Column('user_id', Integer, ForeignKey('user.id')) 中的 ForeignKey construct 拿掉,但根據 API 文件的說法,那是 relationship() 推導 table 間關聯方式的根據,結果就變成要手動在 relationship() 裡明確給定 foreign_keysprimaryjoin 等。

Building a Relationship - Object Relational Tutorial — SQLAlchemy 1.2 Documentation

... uses the foreign key relationships between the two tables to determine the nature of this linkage

這裡的 foreign key relationships 指的就是 ForeignKey()ForeignKeyConstraint()

foreign_keys - Relationships API — SQLAlchemy 1.2 Documentation

In normal cases, the foreign_keys parameter is not required. relationship() will automatically determine which columns in the primaryjoin conditition are to be considered “foreign key” columns based on those Column objects that specify ForeignKey, or are otherwise listed as referencing columns in a ForeignKeyConstraint construct. foreign_keys is only needed when:

很明顯的,我們一開始誤將 ForeignKey() construct 拿掉,因此要自訂 foreign_keys -- 但這通常都不用給,SQLAlchemy 會自己推算,跟 DB backend 支不支援 FK constraints 也沒有關係。


參考資料:

Relationship > One To Many, Many To One, One To One ??

Relationship > Many To Many ??

FK 需要同時揭露 ID 與 relationship 兩個 field 嗎?

  • 若同時揭露 ID 與 relationship,要存取 ID 是很方便,但更新 relationship 時 ID 會變嗎? 或者更新 ID 時 relationship 會變嗎?

如何不用 Foreign Key 宣告 Relationship ??

NoForeignKeysError ??

Validation ??

  • Simple Validators - Changing Attribute Behavior — SQLAlchemy 1.3 Documentation #ril
    • A quick way to add a “validation” routine to an attribute is to use the validates() decorator. An attribute validator can raise an exception, halting the process of mutating the attribute’s value, or can change the given value into something different. (像是 filtering)

    • Validators, like all attribute extensions, are only called by normal USERLAND CODE; they are not issued when the ORM is populating the object: 這表示從資料庫載入資料時不會經過這一層,不會有多餘的運算。

      from sqlalchemy.orm import validates
      
      class EmailAddress(Base):
          __tablename__ = 'address'
      
          id = Column(Integer, primary_key=True)
          email = Column(String)
      
          @validates('email')
          def validate_email(self, key, address):
              assert '@' in address
              return address
      

多個 Declarative Base 間會互相干擾??

在生成第一個 mapping class 的 instance 時,觸發了下面的初始化:

# Initialize the inter-mapper relationships of all mappers that have been constructed thus far.
sqlalchemy.orm.mapper.configure_mappers()
sqlalchemy.orm.relationships._determine_joins()

問題或許就在 "inter-mapper relationships of all mappers",會引發下面的錯誤:

NoForeignKeysError: Could not determine join condition between parent/child tables on relationship Address.user - there are no foreign keys linking these tables.  Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify a 'primaryjoin' expression.

InvalidRequestError: One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'Mapper|Address|address'. Original exception was: Could not determine join condition between parent/child tables on relationship Address.user - there are no foreign keys linking these tables.  Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or specify a 'primaryjoin' expression.

其中 "One or more mappers failed to initialize" 看起來就滿合理的。

參考資料:

Querying ??

參考資料 {: #reference }

文件: