如何解决Scrapy-未在SQLite中创建SQLalchemy外键
我试图使用itemLoader运行Scrapy来收集所有数据并将其放入SQLite3。我成功收集了所有想要的信息,但是我无法使用{在ThreadInfo和PostInfo表中生成外键{1}}(带有外键)。我确实尝试过back_populates
,但也没有用。
我的Scrapy完成后,所有其他信息都插入到SQLite数据库中。
我的目标是要使四个表boardInfo,threadInfo,postInfo和authorInfo相互链接。
- boardInfo将与threadInfo一对多
- threadInfo与postInfo具有一对多关系
- authorInfo与threadInfo和
具有一对多关系 postInfo。
我将DB Browser用于SQLite,发现外键的值为back_ref
。
我尝试查询值(threadInfo.boardInfos_id),并显示Null
。我尝试修复了许多天并仔细阅读了文档,但无法解决问题。
如何在我的threadInfo和postInfo表中生成外键?
感谢您的所有指导和意见。
这是我的模特。py
None
这是我的管道。py
from sqlalchemy import create_engine,Column,Table,ForeignKey,MetaData
from sqlalchemy import Integer,String,Date,DateTime,Float,Boolean,Text
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
from scrapy.utils.project import get_project_settings
Base = declarative_base()
def db_connect():
'''
Performs database connection using database settings from settings.py.
Returns sqlalchemy engine instance
'''
return create_engine(get_project_settings().get('CONNECTION_STRING'))
def create_table(engine):
Base.metadata.create_all(engine)
class BoardInfo(Base):
__tablename__ = 'boardInfos'
id = Column(Integer,primary_key=True)
boardName = Column('boardName',String(100))
threadInfosLink = relationship('ThreadInfo',back_populates='boardInfosLink') # One-to-Many with threadInfo
class ThreadInfo(Base):
__tablename__ = 'threadInfos'
id = Column(Integer,primary_key=True)
threadTitle = Column('threadTitle',String())
threadLink = Column('threadLink',String())
threadAuthor = Column('threadAuthor',String())
threadPost = Column('threadPost',Text())
replyCount = Column('replyCount',Integer)
readCount = Column('readCount',Integer)
boardInfos_id = Column(Integer,ForeignKey('boardInfos.id')) # Many-to-One with boardInfo
boardInfosLink = relationship('BoardInfo',back_populates='threadInfosLink') # Many-to-One with boardInfo
postInfosLink = relationship('PostInfo',back_populates='threadInfosLink') # One-to-Many with postInfo
authorInfos_id = Column(Integer,ForeignKey('authorInfos.id')) # Many-to-One with authorInfo
authorInfosLink = relationship('AuthorInfo',back_populates='threadInfosLink') # Many-to-One with authorInfo
class PostInfo(Base):
__tablename__ = 'postInfos'
id = Column(Integer,primary_key=True)
postOrder = Column('postOrder',Integer,nullable=True)
postAuthor = Column('postAuthor',Text(),nullable=True)
postContent = Column('postContent',nullable=True)
postTimestamp = Column('postTimestamp',nullable=True)
threadInfos_id = Column(Integer,ForeignKey('threadInfos.id')) # Many-to-One with threadInfo
threadInfosLink = relationship('ThreadInfo',back_populates='postInfosLink') # Many-to-One with threadInfo
authorInfos_id = Column(Integer,back_populates='postInfosLink') # Many-to-One with authorInfo
class AuthorInfo(Base):
__tablename__ = 'authorInfos'
id = Column(Integer,primary_key=True)
threadAuthor = Column('threadAuthor',String())
postInfosLink = relationship('PostInfo',back_populates='authorInfosLink') # One-to-Many with postInfo
threadInfosLink = relationship('ThreadInfo',back_populates='authorInfosLink') # One-to-Many with threadInfo
解决方法
从我看到的代码中,我看起来并不像您在任何地方设置ThreadInfo.authorInfosLink
或ThreadInfo.authorInfos_id
(所有FK /关系都一样)。
要将相关对象附加到ThreadInfo实例,您需要先创建它们,然后将它们附加如下:
# Input info to authorInfo
authorInfo = AuthorInfo()
authorInfo.threadAuthor = item['threadAuthor']
threadInfo.authorInfosLink = authorInfo
如果每个对象都通过FK关联,则可能不希望对它们进行session.add()。您将要:
- 实例化
BoardInfo
对象bi
- 然后实例化附加相关的
ThreadInfo
对象ti
- 附加您的相关对象,例如
bi.threadInfosLink = ti
- 在所有链接关系的末尾,您可以简单地使用
bi
将session.add(bi)
添加到会话中-所有相关对象将通过它们之间的关系添加,并且FK正确。
根据我对其他答案的评论中的讨论,以下是我如何合理化您的模型以使它们对我更有意义。
注意:
- 我到处都删除了不必要的“信息”
- 我已从模型定义中删除了明确的列名,而将依靠SQLAlchemy的能力根据我的属性名为我推断那些列名
- 在“ Post”对象中,我没有将属性命名为PostContent,这意味着该内容与Post有关,因为这就是我们访问它的方式-而是简单地将属性称为“ Post”
- 我删除了所有“链接”术语-在我认为您要引用一组相关对象的地方,我提供了该对象的多个属性作为关系。
- 我已在“帖子”模型中留下了一行供您删除。如您所见,您不需要两次“作者”-一次作为一个相关对象,一次在“帖子”上,这违反了FK的目的。
通过这些更改,当您尝试从其他代码中使用这些模型时,很明显在需要使用.append()的位置以及仅分配相关对象的位置。对于给定的Board对象,您知道'threads'是仅基于属性名称的集合,因此您将执行类似b.threads.append(thread)
from sqlalchemy import create_engine,Column,Table,ForeignKey,MetaData
from sqlalchemy import Integer,String,Date,DateTime,Float,Boolean,Text
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
class Board(Base):
__tablename__ = 'board'
id = Column(Integer,primary_key=True)
name = Column(String(100))
threads = relationship(back_populates='board')
class Thread(Base):
__tablename__ = 'thread'
id = Column(Integer,primary_key=True)
title = Column(String())
link = Column(String())
author = Column(String())
post = Column(Text())
reply_count = Column(Integer)
read_count = Column(Integer)
board_id = Column(Integer,ForeignKey('Board.id'))
board = relationship('Board',back_populates='threads')
posts = relationship('Post',back_populates='threads')
author_id = Column(Integer,ForeignKey('Author.id'))
author = relationship('Author',back_populates='threads')
class Post(Base):
__tablename__ = 'post'
id = Column(Integer,primary_key=True)
order = Column(Integer,nullable=True)
author = Column(Text(),nullable=True) # remove this line and instead use the relationship below
content = Column(Text(),nullable=True)
timestamp = Column(Text(),nullable=True)
thread_id = Column(Integer,ForeignKey('Thread.id'))
thread = relationship('Thread',back_populates='posts')
author_id = Column(Integer,ForeignKey('Author.id'))
author = relationship('Author',back_populates='posts')
class AuthorInfo(Base):
__tablename__ = 'author'
id = Column(Integer,primary_key=True)
name = Column(String())
posts = relationship('Post',back_populates='author')
threads = relationship('Thread',back_populates='author')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。