Scrapy-未在SQLite中创建SQLalchemy外键

如何解决Scrapy-未在SQLite中创建SQLalchemy外键

我试图使用itemLoader运行Scrapy来收集所有数据并将其放入SQLite3。我成功收集了所有想要的信息,但是我无法使用{在ThreadInfo和PostInfo表中生成外键{1}}(带有外键)。我确实尝试过back_populates,但也没有用。 我的Scrapy完成后,所有其他信息都插入到SQLite数据库中。

我的目标是要使四个表boardInfo,threadInfo,postInfo和authorInfo相互链接。

  • boardInfo将与threadInfo一对多
  • threadInfo与postInfo具有一对多关系
  • authorInfo与threadInfo和
    具有一对多关系 postInfo。

我将DB Browser用于SQLite,发现外键的值为back_ref。 我尝试查询值(threadInfo.boardInfos_id),并显示Null。我尝试修复了许多天并仔细阅读了文档,但无法解决问题。

如何在我的threadInfo和postInfo表中生成外键?

感谢您的所有指导和意见。

这是我的模特。py

None

这是我的管道。py

from sqlalchemy import create_engine,Column,Table,ForeignKey,MetaData
from sqlalchemy import Integer,String,Date,DateTime,Float,Boolean,Text
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
from scrapy.utils.project import get_project_settings

Base = declarative_base()

def db_connect():
    '''
    Performs database connection using database settings from settings.py.
    Returns sqlalchemy engine instance
    '''
    return create_engine(get_project_settings().get('CONNECTION_STRING'))

def create_table(engine):
    Base.metadata.create_all(engine)

class BoardInfo(Base): 
    __tablename__ = 'boardInfos'
    id = Column(Integer,primary_key=True)
    boardName = Column('boardName',String(100)) 
    threadInfosLink = relationship('ThreadInfo',back_populates='boardInfosLink') # One-to-Many with threadInfo

class ThreadInfo(Base):
    __tablename__ = 'threadInfos'
    id = Column(Integer,primary_key=True)
    threadTitle = Column('threadTitle',String())
    threadLink = Column('threadLink',String())
    threadAuthor = Column('threadAuthor',String())
    threadPost = Column('threadPost',Text())
    replyCount = Column('replyCount',Integer)
    readCount = Column('readCount',Integer)

    boardInfos_id = Column(Integer,ForeignKey('boardInfos.id')) # Many-to-One with boardInfo
    boardInfosLink = relationship('BoardInfo',back_populates='threadInfosLink') # Many-to-One with boardInfo

    postInfosLink = relationship('PostInfo',back_populates='threadInfosLink') # One-to-Many with postInfo
    
    authorInfos_id = Column(Integer,ForeignKey('authorInfos.id')) # Many-to-One with authorInfo
    authorInfosLink = relationship('AuthorInfo',back_populates='threadInfosLink') # Many-to-One with authorInfo

class PostInfo(Base):
    __tablename__ = 'postInfos'
    id = Column(Integer,primary_key=True)
    postOrder = Column('postOrder',Integer,nullable=True)
    postAuthor = Column('postAuthor',Text(),nullable=True)
    postContent = Column('postContent',nullable=True)
    postTimestamp = Column('postTimestamp',nullable=True)

    threadInfos_id = Column(Integer,ForeignKey('threadInfos.id')) # Many-to-One with threadInfo 
    threadInfosLink = relationship('ThreadInfo',back_populates='postInfosLink') # Many-to-One with threadInfo 
    
    authorInfos_id = Column(Integer,back_populates='postInfosLink') # Many-to-One with authorInfo

class AuthorInfo(Base):
    __tablename__ = 'authorInfos'
    id = Column(Integer,primary_key=True)
    threadAuthor = Column('threadAuthor',String())

    postInfosLink = relationship('PostInfo',back_populates='authorInfosLink') # One-to-Many with postInfo
    threadInfosLink = relationship('ThreadInfo',back_populates='authorInfosLink') # One-to-Many with threadInfo

解决方法

从我看到的代码中,我看起来并不像您在任何地方设置ThreadInfo.authorInfosLinkThreadInfo.authorInfos_id(所有FK /关系都一样)。

要将相关对象附加到ThreadInfo实例,您需要先创建它们,然后将它们附加如下:

        # Input info to authorInfo
        authorInfo = AuthorInfo()
        authorInfo.threadAuthor = item['threadAuthor'] 
        
        threadInfo.authorInfosLink = authorInfo

如果每个对象都通过FK关联,则可能不希望对它们进行session.add()。您将要:

  1. 实例化BoardInfo对象bi
  2. 然后实例化附加相关的ThreadInfo对象ti
  3. 附加您的相关对象,例如bi.threadInfosLink = ti
  4. 在所有链接关系的末尾,您可以简单地使用bisession.add(bi)添加到会话中-所有相关对象将通过它们之间的关系添加,并且FK正确。
,

根据我对其他答案的评论中的讨论,以下是我如何合理化您的模型以使它们对我更有意义。

注意:

  1. 我到处都删除了不必要的“信息”
  2. 我已从模型定义中删除了明确的列名,而将依靠SQLAlchemy的能力根据我的属性名为我推断那些列名
  3. 在“ Post”对象中,我没有将属性命名为PostContent,这意味着该内容与Post有关,因为这就是我们访问它的方式-而是简单地将属性称为“ Post”
  4. 我删除了所有“链接”术语-在我认为您要引用一组相关对象的地方,我提供了该对象的多个属性作为关系。
  5. 我已在“帖子”模型中留下了一行供您删除。如您所见,您不需要两次“作者”-一次作为一个相关对象,一次在“帖子”上,这违反了FK的目的。

通过这些更改,当您尝试从其他代码中使用这些模型时,很明显在需要使用.append()的位置以及仅分配相关对象的位置。对于给定的Board对象,您知道'threads'是仅基于属性名称的集合,因此您将执行类似b.threads.append(thread)

的操作
from sqlalchemy import create_engine,Column,Table,ForeignKey,MetaData
from sqlalchemy import Integer,String,Date,DateTime,Float,Boolean,Text
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

class Board(Base): 
    __tablename__ = 'board'
    id = Column(Integer,primary_key=True)
    name = Column(String(100)) 
    threads = relationship(back_populates='board')

class Thread(Base):
    __tablename__ = 'thread'
    id = Column(Integer,primary_key=True)
    title = Column(String())
    link = Column(String())
    author = Column(String())
    post = Column(Text())
    reply_count = Column(Integer)
    read_count = Column(Integer)

    board_id = Column(Integer,ForeignKey('Board.id'))
    board = relationship('Board',back_populates='threads')

    posts = relationship('Post',back_populates='threads')
    
    author_id = Column(Integer,ForeignKey('Author.id'))
    author = relationship('Author',back_populates='threads')

class Post(Base):
    __tablename__ = 'post'
    id = Column(Integer,primary_key=True)
    order = Column(Integer,nullable=True)
    author = Column(Text(),nullable=True)    # remove this line and instead use the relationship below
    content = Column(Text(),nullable=True)
    timestamp = Column(Text(),nullable=True)

    thread_id = Column(Integer,ForeignKey('Thread.id'))
    thread = relationship('Thread',back_populates='posts')
    
    author_id = Column(Integer,ForeignKey('Author.id')) 
    author = relationship('Author',back_populates='posts')

class AuthorInfo(Base):
    __tablename__ = 'author'
    id = Column(Integer,primary_key=True)
    name = Column(String())

    posts = relationship('Post',back_populates='author') 
    threads = relationship('Thread',back_populates='author')

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-